Redefining Quasar’s scope
Our goal at Quasar has always been to make managing huge volumes of numerical data effortless.
Today, with 3.14.2 released, we have full SQL support, distributed Python execution, and support for every possible ingestion mode. Instant queries on petabytes of data with super flexible data ingestion. Mission accomplished?
In recent years, we have focused on optimizing time-series workloads because most large-scale numerical datasets include a time dimension. Also, if you have a time dimension, you usually care about retention, have higher density than average, and want sustained ingestion.
However, Quasar is used for workloads where time is not always the dominant dimension.
At the same time, Quasar’s performance model has historically exposed a tradeoff: stronger write guarantees meant lower throughput, and maximum throughput meant delayed visibility. In addition, data partitioning can sometimes feel like a chore when too much underlying logic is exposed.
In the next release, these two structural limits are removed. Partitioning is no longer bound to time and is much more flexible, and all writes are now immediately visible to queries, regardless of the mode.
Partitioning Beyond Time
What does arbitrary partitioning bring you?
Consider a manufacturing group operating 200 factories. Data is produced continuously, but queries are often scoped to a specific factory, production line, or machine. Even when everything is partitioned purely by time, queries still need to scan all factories within each time slice. The system works, but it’s not aligned with how the data is used.
When partitioning becomes fully user-defined and multi-dimensional, this changes.
You can shard by factory, customer, region, symbol, or a combination of dimensions. Time can still be one of them, but it no longer dictates the structure.
This has two direct effects:
- Better pruning. Queries touch fewer shards because partitioning matches real query patterns.
- No manual workarounds. You no longer need to simulate multi-dimensional layouts at the application level or maintain separate tables to achieve isolation.
Instead of forcing your data into a time-first model, Quasar now lets the physical layout follow the shape of your workload.
High Throughput. Zero Lag.
High-volume, but most importantly, sustained ingestion was on Quasar’s “must have” list from day one, when we built a system to ingest market data for risk calculations.
One way to solve the problem is to leverage asynchronous I/O and buffer writes. This lets you push data at full speed while maintaining durability and correctness.
But asynchronous comes with a tradeoff: delayed visibility. Which means that if you need immediate read-after-write semantics, you have to use one of the other write modes and accept potentially lower ingestion rates, or upgrade the underlying infrastructure.
That tradeoff is now gone.
In the next release, asynchronous writes become immediately visible to queries. Data can be ingested at full speed, and queries automatically include in-flight data.
Beyond performance, this simplifies the mental model. You no longer need to reason about visibility modes for the overwhelming majority of workloads. And for those who love fine-grained control, rest assured, Quasar stays true to its roots with the possibility of total control of all settings from start to finish.
Performance Improvements That Matter
Performance and compression remain top priorities, and we are still actively working on both.
Specifically, floating-point compression is currently under active development. We are studying how to adapt some of the strategies that worked for integers to doubles, and the preliminary results are auspicious.
Aggregation performance is also something we will always care about, particularly for high-cardinality group-bys and large distributed scans. What we’re doing in this domain is to improve our micro-indexes without any noticeable impact on write speed.
Those two improvements, combined with extended partitioning and zero-lag writes, reinforce Quasar’s core objective: handling large-scale numerical data as if it were “nothing”.
Availability & What to Expect
So… When can you take these features for a spin?
The zero-lag writes are being rolled out to select customers. We are actively validating behavior under diverse, real-world workloads.
Extended partitioning and the remaining features will be introduced progressively over the coming months, with general availability targeted before summer.
If you are interested in early access or would like to participate in the rollout, don’t hesitate to reach out!
