Recently I tested Akumulil on the m3.2xlarge EC2 instance. Write throughput was around 4.5 million elements/second. This number might look unrealistic at first but in fact, that’s less than 20MB/s of disk write throughput because each data point is tiny (less than five bytes in that particular case) and all data is compressed in real time.
Next step was to try it on a bigger machine. I chose c3.8xlarge EC2 instance with 32-core Intel Xeon E5-2680 v2 and SSD. This is what I did:
TCP.pool_size
parameter to 32, created the database and started the network daemon.run.sh
script on each m3.xlarge box. I set TARGET_HOST
variable correctly and decreased the number of archives from 32 to 8 (each host had been sending the unique subset of test data).run.sh
on all machines simultaneously using parallel-ssh
.During the test run, each m3.xlarge instance had been sending data in parallel using 8 threads (32 threads total). All this data was written to disk in parallel on the c3.8xlarge instance. This is how everything looked in htop
:
The dataset contained 2764832000 data points in 32000 series. All nodes finished sending data in less than 3 minutes and write throughput was above 16 million elements/second. Resulting database size was 11GB. SSD drive was underutilized, it can write far more than 64MB/s that was demanded by this test run.
This puts Akumuli in the same ballpark as BTrDB. As far as I know, these are the only two open source time-series databases that can write data in parallel. All other solutions still struggle with single writer solutions.