Friday December 10, 2021
That’s it! You’ve decided to run your DataBase Management System (DBMS) in the cloud, but now, you have more questions than before.
IaaS? PaaS? SaaS? DBaaS? Lil NaaS?
At Quasar, we do the full range of possible deployments, from databases embedded in ARM32 devices to on-premises running on high-performance bare-bone servers and as such are completely neutral when it comes to database deployment.
For context, Quasar is a comprehensive data engineering platform optimized for high volume event data (e.g. timeseries).
The #1 question we have when talking to a customer deciding on a cloud deployment is whether they should opt for DBaaS or PaaS.
In this short guide, we’re going to do our best to give you the most unbiased answer possible to the DBaaS vs PaaS debate. We hope you will find this guide useful, whether you are considering a Quasar deployment or any other kind of data platform.
DBaaS (Database as a service) means that a service provider gives you an endpoint attached to a database instance they host, configure, and manage.
DBaaS is a hands-off, turn-key solution: the service provider takes care of all the complexity of administrating the database and the servers.
All you have to do is configure your client to connect to the provided endpoint.
We probably speak for all software publishers when we say we work very hard to ensure the DBMS usage stays as simple as possible. However, at some point, the rubber needs to hit the road and to get good performance, you need to dive into the logic of the DBMS you are using.
Using DBaaS means delegating all that complexity to the service provider, including the SLA. It’s not just running a database instance in the cloud (that would be PaaS).
Examples of DBaaS (in no specific order and without any specific relation to each other): Redshift, Azure SQL, Big Query, Snowflake, Rockset, and of course, Quasar Managed!
PaaS (Platform as a service) are ready-to-use instances (usually virtual machines) that you can use at your discretion. You choose the computing capability, memory, and storage you need, and the provider takes care of everything.
From a DBMS deployment point of view, a PaaS deployment means the database runs on one or several instances running in your cloud environment.
The software publisher may install, configure, and tune the database, but you will be responsible for monitoring the instances on which the databases are running.
PaaS gives you greater control over your infrastructure, which means more responsibilities. PaaS also gives you a greater range of support: you can decide the degree of involvement the publisher has on your instances.
Examples of PaaS (in no specific order and without any specific relation to each other): MySQL on EC2, SQL Server on an Azure VMs, and of course Quasar on your favorite cloud provider!
When reading the descriptions above, you probably already have a grasp of the strengths and weaknesses of each solution. Let’s dig further on this DBaaS vs PaaS topic.
Convenience-wise DBaaS is a clear winner since it removes all the difficulty of operating a database. How much better highly depends on the database and the provider but in terms of speed of deployment and “time to solution,” the whole raison d’être of DBaaS is to help you save on these.
One aspect which is often overlooked is that you may not have the time or resources to take care of database deployment. For example, if you’re building a quick PoC, why bother with the details?
However, this ease of use can hide a cost structure not always aligned with your business constraints.
DBaaS are usually significantly more expensive than PaaS deployments, but this is when you only look at what the service provider charges you. A DBaaS deployment hasn’t always a higher Total Cost of Ownership (TCO) since you save on human resources and time.
However, the DBaaS cost structure may be a problem for your business as it tends to scale linearly (if not more) with usage. What started as a $100 monthly bill can quickly become a $10,000 bill.
DBaaS have, in essence, two pricing models:
PaaS deployments usually increase sub-linearly: you pay for your software license (and subscription), and you can adjust the underlying infrastructure. In addition, for PaaS deployment you can choose to do all the administration yourself, or offload some of it to the publisher, giving you more flexibility. DBaaS is more “all or nothing”.
While PaaS incurs resource-based costs, it’s easier to plan and at a lower price tag.
Almost all software publishers have degressive pricing, and some even have “all you can eat” pricing beyond a certain point.
When picking DBaaS, be prepared for a thorough discussion with your CISO! If you work in a smaller structure, that discussion may not exist, but authentication and connectivity will be more involved than accessing an instance running in your environment.
DBaaS will create extra work for you if data confidentiality is critical to your operation.
If you don’t have any specific security requirements (which does not mean you don’t care about being hacked), DBaaS can actually be better as it’s possible their security standards are better than the one you can deliver on your own.
The performance comparison is trickier than it seems because PaaS generally gives you more power for the same budget. In addition, DBaaS has more variability since you never know what exactly happens behind the scene. Is your instance running on a dedicated machine? Is the provider changing the way they organize instances? Is there any throttling in place?
Another crucial aspect is the connectivity between your application and the database. In a PaaS deployment, you fully control this; in a DBaaS, you need to ensure that the provider can give you a high-performance link.
It doesn’t mean you should rule out DBaaS if performance matters; it means you will need to spend more time studying what exactly you are getting for your budget and which guarantees you are getting.
So which should you pick? Here is the DBaaS vs PaaS decision diagram we use with our customers; we’re sure it applies to platforms other than Quasar!
What is the logic behind this diagram?
We start with the data volume has it’s highly correlated to the cost structure. The threshold we picked is relevant for the use cases we typically work with (remember, we usually compress data 10 to 20X at no performance cost); you may need to adjust them when considering other database technology.
Query volume is a subjective term. By “medium-low,” we mean that you don’t have a constant need to query your data set (for example, for visualization) or that you can reduce the volume through caching because data freshness isn’t that relevant for your use case.
We hope you found this short guide useful and don’t hesitate to get in touch with us if you have questions regarding timeseries data engineering!