One of the most important parts of akumuli is the serialization mechanism and network protocol. It should allow us to sustain a steady message flow between client and server. There are many serialization tools suited for this purpose - cap’n’proto, thrift, protocol buffers, message pack, etc. After considering all the strengths and weaknesses I’ve decided to use a text-based RESP (REdis Serialization Protocol) serialization format in akumuli.
RESP is very easy to implement, human readable and fast to parse. More importantly it does not require an additional build step and can be very secure.
This is how an actual RESP encoded message can look like:
+network.loadavg host=postgres\r\n
+2015-02-09T07:42:42Z\r\n
+24.3\r\n
New lines (\r\n) are used to delimit fields of the struct. The first field is an ID string, the second is a timestamp and the last is a floating-point number encoded using RESP string. It looks like three lines of text in editor in contrast to protobuf or thrift or any other binary serialization format.
This data can be parsed easily, let’s look at integer parser as an example:
uint64_t _read_int_body(InputStream *stream) {
uint64_t result = 0;
const int MAX_DIGITS = 84; // Maximum number of decimal digits in uint64_t
int quota = MAX_DIGITS;
while(quota) {
Byte c = stream->get();
if (c == '\r') {
c = stream->get();
if (c == '\n') {
return result;
}
throw_exception("Bad stream");
}
// c must be in [0x30:0x39] range
if (c > 0x39 || c < 0x30) {
throw_exception("can't parse integer (character value out of range)");
}
result = result*10 + static_cast<int>(c & 0x0F);
quota--;
}
throw_exception("integer is too long");
}
uint64_t read_int(InputStream *stream) {
Byte c = stream->get();
if (c != ':') {
throw_exception("bad call");
}
return _read_int_body(stream);
}
One can easily encode data using this format on the client-side using any programming language. Some client-side code can be reused between Redis and Akumuli. Also, this format is very secure as there are no back-references or length-prefixes, just a stream of bytes (contrary to many binary serialization formats).
Everything has its downsides. RESP-encoded data is less compact and slower to parse then binary encoded data. But in this case decoding and encoding performance is an order of magnitude better than needed. Akumuli’s storage engine can handle several million writes per second and the RESP parser can decode data fast enough to keep the storage engine busy. On AWS, when you pay for traffic, compression atop of RESP will be a good option too, this is a subject for future improvements.