Monday April 29, 2019
This post is part of a series about the challenges behind database performance and how to accurately assess it.
When we started selling QuasarDB, we focused on its performance advantages and touted how great they were. The logic behind that was obvious: we were very strong in this area; thus we should bring the battle to where we are strong!
Let me share with you the cold, harsh truth: customers don’t care about performance. Yes, even the high-frequency shops that ingest multi-terabytes of level II market data every day and want to back test on a multiyear data set. They don’t care about performance for the sake of performance. It needs to be fast enough.
So here we are, a software publisher who designed from the ground up a brand-new database engine to address performance bottlenecks for demanding time series use cases, telling you, well, you probably don’t care about performance.
Thank you for reading. See you next week?
Well, let’s expand on what I mean.
A pure performance approach is too abstract and too far away from your business needs. What is performance exactly? And since every database vendor says they are the fastest, why should I care?
What we discovered is that there is a set of problems customers care about and the fact that you solve these problems through performance is often irrelevant to them. Very often, you can find a hammer to crush the problem.
If we go back to our hedge fund, here are some typical challenges:
For example, archiving the data (point 2) can be done in using a distributed file system. Whatever volume you have, you can build a distributed storage system that can take it; it’s “just” a matter of budget.
If you go to a hedge fund that built that solution and talked about how fast you can ingest the data, they will answer “We don’t have that problem, thanks”.
If they can already ingest and store the data, how would ingesting faster be helpful?
You lost the opportunity!
The most valuable thing we have is time. It’s very hard to buy time. Sometimes, it’s even impossible.
When you offer a new solution, your prospect is thinking about the time it’s going to take to test and migrate. Some solutions are easy to test and evaluate. Unfortunately, database management systems are amongst the longest to test and evaluate.
Plug’n’play doesn’t exist when it comes to infrastructure. A “simple” upgrade of a database management system to a new version can be risky and lengthy, so imagine switching to another system! And think about switching from files to a database! The horror!
You need to offer in value at least ten times the migration cost to generate interest. I don’t mean ten times the performance, I mean ten times the value.
For example, if you generate a report in 1 second instead of 10 seconds, you’re ten times faster, but not very valuable. One second instead of 1 hour? We may be talking as it means the customer may create new business practices.
You can work on both dimensions: migration cost and added value.
On the cost of migration side, we, for example, provided our customers with high-performance CSV loaders and sensible migration guidelines.
What can we do on the value side? What about saving time?
One defining moment was when we realized that although performance didn’t matter, what performance allows can matter a lot.
As always, the answer comes from happy customers:
“Perf is great, so we can abuse the database and not worry too much about it. We plug the streams into it and then query that directly. No ETL! Saved a lot of devops time for sure!”
In other words, the performance simplified the problem and thus results in saved time.
If your database is fast enough to centralize processes that were originally the responsibility of several, intricate components, the benefits in terms of reliability and TCO are huge. Notwithstanding the headroom performance can give you. Who wants to spend time redesigning their IT every six months?
If you compare a distributed file system storing zipped files to a high performance, database management system, if you can deliver the same disk efficiency, the same speed, but, in addition, you can handle fine-grained access control, server-side aggregations, and querying, you not only solve problem 1 and 2, but also 3, 4, 5, 6, 7, and maybe more!
In other words: do more with less.
What’s the number one cost of IT? Engineering time! What’s the hardest to hire? Competent software engineers! What did we just do? Reduce pain on both dimensions!
In future posts, we’ll continue digging into performance. We’ll also review the difference between a distributed file system and a database for storing time series.
Curious about QuasarDB? Why don’t you try out the community edition?