Analyzing 500 Billion Rows Using Akumuli

It’s been awhile since Scylla team demonstrated that their highly capable database can read 1B (1.000.000.000) rows per second on a 83-node cluster. This is roughly...

Runtime Cost of Write-Ahead Logging

Recent major Akumuli release added support for write-ahead logging. This is a huge milestone for several reasons...

High-cardinality support

Every TSDB have to deal with vast data volumes. But it’s not only writes per second. There is a different dimension to the problem which is not tackled by most TSDB vendors. This problem is a dataset cardinality, the number of...

Scaling TSDB-specific operations

The point of using a specialized time-series database is to have an edge over conventional databases in time-series specific operations. Most often, TSDB's are judged by their write speed. In my opinion, the read performance is as important if not the most. Moreover, not only plain reads but...

Inverted Index

Tag support is very important for any modern time-series database. The world from which time-series data is coming is complex. Time-series data is not just a time-ordered values (measurements), this time ordered values form individual series...

Storage engine design (part 2)

In the previous article, I wrote about the reasons that made me choose B+tree based data-structure for Akumuli. In this article, I want to tell about another advantages of the B+tree compared to LSM-tree.

Storage engine design

In time-series databases the querying pattern differs from the write pattern. We usually write data in time order by updating many series every second. But querying is a different story. We usually want to read only a handful of series leaving most of the data ...

Benchmarking Akumuli on 32-core machine

Recently I tested Akumulil on the m3.2xlarge EC2 instance. Write throughput was ...

Understanding Akumuli Performance

Akumuli was designed with performance in mind from the very beginning. I set the lower bound for the write throughput at the 1M writes/second level as one of the project goals. Every version so far delivers this performance, that's why this number is mentioned on the project page. But this is only a lower bound. It would be interesting to see what level of performance is achievable with today's hardware!

Time-series compression (part 2)

In the previous article I discussed timestamps compression, now it's time to talk about floating point data compression. This problem is not new, there are some good papers about it, e.g. Gorilla paper, and also this, and ...

Akumuli Markedly Outperforms InfluxDB in Time-Series Data & Metrics Benchmark

Usually, I'm trying to avoid comparisons with other databases. I can't create objective benchmark because I can't unlearn everything that I know about my own product. Most probably, I'll create a biased benchmark that will work great with Akumuli. But fortunately, InfluxData released their own set of benchmarks and published series of articles comparing their product with the competition. I decided to give it a try.

Time-series compression (part 1)

The most important component of the time-series database is a compression engine because it defines the trade-offs, and the trade-offs are shaping the architecture of the entire system.

Next iteration

Recently there were no news or blog updates from me. This doesn't mean that I stopped working on Akumuli. The project is moving forward but in a bit different direction.

Why Akumuli is a standalone database?

There is a lot of distributed time-series databases nowadays but Akumuli takes a different approach. It’s a standalone solution and there is some logic behind this design decision.

Progress report

I've decided to post progress reports periodically. The first one is going to be large.

Sorting, caching and concurrency

Akumuli is a time-series database input for which is supposed to be generated on different machines...

Time-series storage design (part three)

In my previous storage rant I sketched akumuli write algorithm. In this algorithm two different memory regions should be updated on every chunk write. Each memory region is updated sequentially but at the same time. In addition to that volume header should be updated too. One can argue that all of this results in non-sequential write pattern and should cause slowdown. This is not the case in reality and I want to explain why.

Why text-based serialization is awesome

One of the most important parts of the akumuli is serialization mechanism and network protocol. It should allow us to sustain steady message flow between client and server...

Time-series storage design (part two)

Let's talk about compression algorithms. You probably know that DBMS can be row-oriented or column-oriented. Both have some strengths and weaknesses. Row-oriented has higher write throughput and ...

Time-series storage design (part one)

Time series data can be difficult to store. It is often generated during a long time periods and can be huge. This can gives us two first requirements...

Motivation

I don't like most opensource time-series databases that have been created recently, some of them lack compression, some of them is slow or focus on the wrong problems. In my opinion good time-series database should satisfy several requirements.