Akumuli

Mar 10, 2020

Analyzing 500 Billion Rows Using Akumuli

It’s been awhile since Scylla team demonstrated that their highly capable database can read 1B (1.000.000.000) rows per second on a 83-node cluster. This is roughly...

Mar 13, 2019

Runtime Cost of Write-Ahead Logging

Recent major Akumuli release added support for write-ahead logging. This is a huge milestone for several reasons...

Aug 3, 2018

High-cardinality support

Every TSDB have to deal with vast data volumes. But it’s not only writes per second. There is a different dimension to the problem which is not tackled by most TSDB vendors. This problem is a dataset cardinality, the number of...

Apr 28, 2018

Scaling TSDB-specific operations

The point of using a specialized time-series database is to have an edge over conventional databases in time-series specific operations. Most often, TSDB's are judged by their write speed. In my opinion, the read performance is as important if not the most. Moreover, not only plain reads but...

Nov 17, 2017

Inverted Index

Tag support is very important for any modern time-series database. The world from which time-series data is coming is complex. Time-series data is not just a time-ordered values (measurements), this time ordered values form individual series...

Aug 1, 2017

Storage engine design (part 2)

In the previous article, I wrote about the reasons that made me choose B+tree based data-structure for Akumuli. In this article, I want to tell about another advantages of the B+tree compared to LSM-tree.

Apr 29, 2017

Storage engine design

In time-series databases the querying pattern differs from the write pattern. We usually write data in time order by updating many series every second. But querying is a different story. We usually want to read only a handful of series leaving most of the data ...

Mar 10, 2017

Benchmarking Akumuli on 32-core machine

Recently I tested Akumulil on the m3.2xlarge EC2 instance. Write throughput was ...

Feb 13, 2017

Understanding Akumuli Performance

Akumuli was designed with performance in mind from the very beginning. I set the lower bound for the write throughput at the 1M writes/second level as one of the project goals. Every version so far delivers this performance, that's why this number is mentioned on the project page. But this is only a lower bound. It would be interesting to see what level of performance is achievable with today's hardware!

Feb 5, 2017

Time-series compression (part 2)

In the previous article I discussed timestamps compression, now it's time to talk about floating point data compression. This problem is not new, there are some good papers about it, e.g. Gorilla paper, and also this, and ...

Jan 24, 2017

Akumuli Markedly Outperforms InfluxDB in Time-Series Data & Metrics Benchmark

Usually, I'm trying to avoid comparisons with other databases. I can't create objective benchmark because I can't unlearn everything that I know about my own product. Most probably, I'll create a biased benchmark that will work great with Akumuli. But fortunately, InfluxData released their own set of benchmarks and published series of articles comparing their product with the competition. I decided to give it a try.

Dec 30, 2016

Time-series compression (part 1)

The most important component of the time-series database is a compression engine because it defines the trade-offs, and the trade-offs are shaping the architecture of the entire system.

Sep 14, 2016

Next iteration

Recently there were no news or blog updates from me. This doesn't mean that I stopped working on Akumuli. The project is moving forward but in a bit different direction.

Oct 12, 2015

Why Akumuli is a standalone database?

There is a lot of distributed time-series databases nowadays but Akumuli takes a different approach. It’s a standalone solution and there is some logic behind this design decision.

Aug 18, 2015

Progress report

I've decided to post progress reports periodically. The first one is going to be large.

Mar 19, 2015

Sorting, caching and concurrency

Akumuli is a time-series database input for which is supposed to be generated on different machines...

Feb 13, 2015

Time-series storage design (part three)

In my previous storage rant I sketched akumuli write algorithm. In this algorithm two different memory regions should be updated on every chunk write. Each memory region is updated sequentially but at the same time. In addition to that volume header should be updated too. One can argue that all of this results in non-sequential write pattern and should cause slowdown. This is not the case in reality and I want to explain why.

Feb 9, 2015