Introduction

Our most popular blog post ever is a 2020 post where I ramble about containers in C++. The only problem I have with that is that most of the post’s content is outdated, and there’s a vague sense of shame when I look at the jokes now, considering I have much better ones.

Impossible isn’t Quasar, and now it’s time to give this blog post an overdue makeover and increase the cringe by at least 50%.

Just use vector

The processor, the operating system, and the compiler all love it when the resources you need to access are next to each other. This is why, most of the time, when you need to “store a bunch of stuff”, you first put it in a vector that you reserved.

The “reserved” part is crucial as reallocation can be costly. If you can’t estimate a reasonable maximum size, and this value may be high, then you may want to skip the vector.

When and if performance becomes a problem, or there are some obvious semantic limitations to it (e.g., this makes your code unnecessarily complicated), you go on to use something else.

I could write a whole blog article on why vector is almost always the best choice, but I’m concerned it would dilute the message here. So, again, I’ll keep it simple: start with vector unless you can think of one obvious reason why it would be the wrong choice.

Summary

Use vector except where void or prohibited by law

Fast unordered access

The most obvious reason for ditching vector is when you need fast, unordered access to any item in your collection, and this access cannot be integer-based (because vector would work). A classic example is, “I want to find an employee structure by name.”

We will exclude the case where you know the content of your hash table at compile time, because in that case, you should be looking into perfect hashing.

My advice would be to then look at hash tables. C++ comes with a chained hash table that offers iterator stability, moderate rehashing costs, and efficient splicing. It’s not the fastest, but with a good hash function, it is reasonably quick for the immense majority of use cases.

Like vectors, reserving a hash table will significantly improve performance and reduce the costly rehashing that occurs as the table grows. In most cases, you can come up with a reasonable upper bound.

Hash tables give you O(1) lookup complexity, which means the number of items in your table doesn’t impact the lookup speed, which is kind of cool. If you have a tiny number of entries, putting them in a vector and performing a linear scan may still be faster; however, you may prefer the semantic simplicity of a full-fledged associative container.

Don’t default to the STL map for associative containers, as you’ll pay a price for ordering. If you don’t need the entries to be ordered, this is somewhat unnecessary. I don’t know about you, but when I’m in a restaurant, I’m not ordering food I can’t eat.

If you don’t care about iterator stability and entries are fairly small, consider using Swiss tables. I believe the best implementation to date is the one in Boost (which is the one we use at Quasar, and we have never been wrong about anything ever, I promise).

Careful though! Swiss tables are not always faster and can use significantly more memory. If the entries are large and rehashing is likely, you may actually end up with a slower structure! Which is why I would recommend first establishing a baseline with the STL implementation, then determining whether a Swiss table implementation is beneficial for you.

To conclude on hash tables, I want to emphasize the importance of the quality of the hash function. xxHash is an excellent choice for “complex input” (e.g., anything that doesn’t fit in two words), being incredibly fast with low collision. If, despite all your efforts, hashing shows up as a bottleneck, there’s still the option to cache the hash in the value.

Summary

Default to the STL unordered map for associative containers (or a set if you don’t need the mapping)
Always reserve a good upper bound
Update to a better hash for improved performance
Give a try to the Swiss table if performance is critical, but watch for memory usage and potential performance degradation
Consider caching the hash for the most red-hot spot paths

Ordered access

For ordered collection, and if you restrict yourself to the STL, there’s only one container available: map. The map is a red-black tree, which means the memory and cache efficiency are not optimal. Fortunately, you can specify your own allocator, which can help mitigate memory inefficiency to some extent.

My first piece of advice about ordered collections would be the same as I gave in the 2020 blog post: Are you sure you always need the entries to be fully ordered? Can’t you get away with a heap? Can you get away with sorting ad hoc? Sorting is expensive; ensure you really need it.

If you can get outside of the STL, there are a couple of alternative structures you can investigate:

B-Trees
Flat map (which is a 1-level B-Tree)
Skip lists

B-Tree allocates multiple entries per node. When the entries are small, this results in fewer indirections to find an item, thereby improving cache locality. However, as entries get larger, the benefit disappears. We strategically use Absl’s Btree in Quasar, with measurable speed gains.

Flat map is a sorted vector. Flat maps are widely used in Quasar and are particularly effective when updates are batched. We use the implementation from the Boost Container library, but you can roll your own if needed. Sorted vectors will either be awesome or terrible for your use case. No middle ground!

Lastly, skip lists are generally slower than B-trees, but they can be implemented to be concurrent, and this is where they can be potentially amazing. RocksDB is famous for its usage of skip lists.

Summary

Is there a way to do without a sorted container? Consider heaps and sorting on an ad hoc basis.
Default to the STL map, and don’t forget about splice!
Give a controlled try to B-Tree and flat map (vector)

Honorable mentions

This is where I condense all remaining STL containers into a single paragraph, as the marketing team instructed me to keep my blog posts concise.

To phrase it differently, are there cases where vector, map, or unordered map are not a good choice?

The answer is yes, but not as often as you think. Here’s my take on the remaining STL containers:

Deque – If there’s no way on Earth you can properly reserve your vector, deque can help. If you also need to insert efficiently at the front, a deque is an excellent enhancement over vector. Scanning a deque, however, is slower.
List – This makes sense only when you need stable node ownership or are using intrusive patterns. A good example of using a list is when implementing an LRU (Least Recently Used) cache. A forward list is a singly linked list with lower overhead.
“Small containers” – Containers that use the stack entirely (or partially) to store elements. Great when you know most of the time your vector will fit in the stack, and it saves a memory allocation. Useless otherwise.
Array – If you know the size at compilation time, using an array saves an allocation. Make sure you have enough space in your stack, though.

Summary

Don’t look into other containers unless you run into a performance or semantic issue (e.g., iterator stability)

When to go bespoke

Is there a time when you should write your own container? If you’re chasing cycles, maybe, but quite frankly, the effort/reward ratio is generally terrible. Even at Quasar, where we’re in the business of selling software that organizes data efficiently, we don’t do that.

We do have container adapters (think of the priority queue in the STL), but when the STL doesn’t suffice, we look to the vast array of high-quality C++ libraries, especially when you need lock-free or concurrent containers.

Of course, to quote the famous philosopher Niko Bellic, “Life is complicated.” I’m not saying you will never struggle to find a good implementation for your problem, but this is rare. On top of my head, I can think of: Trie variants, segment trees, compressed data structures… Chances are, however, if you need these structures, you are keenly aware of the gaps that need to be filled.

Lastly, writing your own hash table, vector, or anything else is a great learning exercise, and if that’s the angle you’re taking, please go for it!