Once upon a time
In November 2017, I was invited by Andy Pavlo to give a talk at the CMU about the internals of QuasarDB.
During that talk, I briefly mentioned why we don’t use memory-mapped files and why it’s generally a mistake to do this.
To be clear, I only had a strong intuition, backed by my experience working on the Windows NT and FreeBSD kernels memory managers, that it was a poor fit. I didn’t have any hard numbers or facts to back that up.
Andy told me that he was super happy someone said that, and it was a pet peeve of his to see memory-mapped files used as a persistence layer for DBMS.
“I want to work on a paper about this!”
Five years later, that paper is finally here, and the conclusions are unambiguous. I highly recommend you to read it, and I’m not saying that because it confirms a bias I had (I promise).
Don’t write your persistence layer
You could see this post as a continuation of this one, where I explained why we didn’t write a custom persistence layer. That’s because it’s much, much more complicated than what you think it is.
This specific passage is worth highlighting:
For example, you could decide to memory map tables and write them individually in separate files. However, you will quickly run into reliability and performance problems because the paging algorithms are unsuitable for database workloads (And the thing you think is stored on disk? Hope you like gambling).
Many databases use memory-mapped files to store data on disk, and this passage was seen as an unsubstantiated attack on those engines.
Nothing could be further from the truth. Although I can think of two very famous DBMS that use (or did use) this approach, my comment wasn’t targeted at anything in particular.
I was saying that you can (and should) do better than the operating system for I/O management because the needs of the OS are not the needs of a DBMS.
Memory map files are not designed for database workloads. For many problems, they work great, but any database engine that primarily uses memory-mapped files as a persistence mechanism cannot be used as a reliable storage option.
As time goes by, the probability of losing or corrupting data on a DBMS using memory-mapped files converges to 1.
The paper goes into concrete details about why this is a bad idea.
TL;DR
I encourage you to read the paper, and I hope that this approximate summary will give you further motivation to do so:
- It’s tough to offer any transactional safety with memory-mapped files because you don’t know when dirty pages will be flushed to disk.
- When accessing data, you don’t know when the data is on disk or in memory. This means any read access can trigger an I/O stall. When the database gets very busy, this may result in thrashing because you can’t optimize memory usage for a given query. Remember, as a DBMS gets busier, memory usage tends to soar, resulting in even more pressure to the VMM, to which you delegated your I/O management!
- You have no strong guarantee, and when your data is going to be flushed to disk, thus you can’t correctly manage I/O errors and give accurate feedback to the user about the state of their data. It may be persisted, or it may be not persisted! FUGGEDABOUTIT! But wait, there’s more, errors with the underlying persistence layer are memory fault errors, much harder to handle than an typical I/O error and can result in the whole database crashing when the persistence media faults.
- Memory-mapped implementation scale poorly with the number of threads. This is the most counter-intuitive part and the most crucial part of the paper. Memory-mapped files are often used in new DBMS engines based on the assumption that the mechanism used to manage the swap file is state of the art. Yes, for the problem it is solving, e.g. managing the swap. A DBMS workload is hugely different. To make it work for a DBMS, you would have to change the way processors are made or redesign the OS memory manager.
In other words, if you do simple ingestion benchmarks, you can have the illusion that memory-mapped files perform well and reliably, but as soon as you add in the chaos and intensity of an actual production setup (multiple data sources, out of order updates, queries running at the same time), pain ensues.
Don’t say we didn’t warn you!
Curious about what Quasar is? Learn more here!
Want to take Quasar for a spin? Try our free community edition!
