Next iteration

####Next iteration Recently there were no news or blog updates from me. This doesn’t mean that I stopped working on Akumuli. The project is moving forward but in a bit different direction. First of all I need to mention that Akumuli is working. It can have some bugs and not fully feature complete (e.g. aggregation is not implemented) but I think it’s more stable than some other products that I’ve used as early adopter. Nevertheless I don’t want to encourage anyone to use it for anything except learning and self education because Akumuli is about to change drastically. The reason for this is some inherent design flaws:

  • Storage system is based on flat files so random writes can’t be implemented. I’ve discovered that this is a showstopper for really large amount of people. Data import becomes hard and backfill is almost impossible.
  • It’s impossible to read one time-series without reading entire data range. Read amplification is huge. This makes Akumuli bad choice for interactive applications. If you’re curious why there is no Grafana integration yet - this is why. Akumuli can read and write data fast but it can’t draw graphs quickly.
  • I didn’t implement rollup aggregation in Akumuli because the idea is moot. With existing solutions you should preconfigure rollup aggregations (e.g. Graphite does this) or use some sort of materialized view (InfluxDB way). In both cases you should configure it beforehand and if you did it wrong - it’s hard to fix. In some cases you may not know what rollups will be needed.
  • Scalability/portability issues. Akumuli is standalone server/library and can work only on 64bit linux. Current storage engine makes clustering impossible. It’s also makes ARM port hard (because it relies on mmap heavily).
  • All hardware is parallel now (CPUs, SSDs) but current implementation can’t leverage parallelism. There is a single writer thread inside the storage engine. Because everything is log-structured and globally ordered it’s impossible to bypass this shortcoming.

The good thing is that I’ve already been working on the solution about six month. I’ve created new storage engine. It’s based on “Numeric B+tree” data structure that I’ve designed. It’s described in this technical document. New storage engine is mostly implemented but needs some polish and integration into the system.

Quick recap on its features:

  • Storage is column-oriented. Each series can be retrieved individually.
  • Log-structured append-only Numeric B+tree, multiversion concurrency control. All writes and reads are aligned and mmap is not used anymore.
  • Crash recovery mechanism that actually works and have been tested.
  • Fast aggregation without preconfigured rollups or materialized views.
  • Built-in compression for memory and disk resident data. It is pretty fast (about 1GB/s on my machine) and it is tested using AFL fuzzer.
  • Dedicated writer thread is gone. It is possible to write data in parallel from several threads.

After everything will be done I expect performance to become even better. The goal is to have about 2 million writes/second per core. At the same time aggregations should have sub millisecond latencies. My benchmarks shows that this already the case and I just need to preserve this results.