Progress on a new data store

I’m happy to report a few users to the system! That’s pretty great, and I’m ahead of my own expectations.

Progress has been made on the high priority blockers as well as continued polish on the main infrastructure. Just the other day, I sorted out an interesting race condition where writes were out of sequence. This was detected as I was load testing the new storage engine. I’m feeling joyful as the new storage engine is shaping up to be better on a few dimensions, so let’s chat about the changes.

First, the gains from switching away from gRPC greatly lifted the number of potential streams and message rates which revealed a few core issues with the data stack. I’ve got some special code in production that lets me test the new data services, and one of them is a simple write ahead logger that shreds writes to various disk using the file system.

It greatly reduces the latency of a write to be flushed to disk, but it creates a wicked hang-over when building the final files to be queried. The overhead of opening files, moving files, and just dealing with the filesystem is great. However, it can be amortized such that overall latency is decent. It’s not half bad, but I can do better by building a log structured merge tree. Before dealing with my thoughts on investing more along this path, let’s checkpoint how recent investments change the equation.

If we recall our launch analysis comparing Adama to AWS, we saw that even “slow” and “expensive” Adama was 78% cheaper on the sample scenario. This scenario was the result of testing Adama with reasonable characteristics for board games and 1600 players playing 400 games over an hour.

By using the local instance data with EBS, we reduced the computing overhead as we no longer need the db.m6g.large instance ($0.152/hr). This alone reduces Adama’s price by 66.5%. Furthermore, the capacity has dramatically changed as we can now host 4,000 players playing 1,000 games with half the latency. We can then throw these new numbers in to compare and contrast against using AWS Lambda, AWS Gateway, and AWS DynamoDB:

Service	Rate	Units Used	Total
AWS Lambda	$0.2 per 1 million requests	1,440,000 plays	$0.288
AWS Lambda	$0.0000166667 per GB-second	70312.5 GB-sec	$1.1719
AWS Gateway	$0.8 per billion messages	0.0072 billion messages	$0.00576
AWS Gateway	$0.25 per million-connection minutes	0.24	$0.06
AWS DynamoDB	$1.25 per million write	1.44 million writes	$1.44
AWS DynamoDB	$0.25 per million read	1.44 million reads	$0.36
AWS	-	-	$3.33

So, we compare the $0.0765 price of Adama to the $3.33 price of “serverless” to see Adama as a stark 97.7% cheaper. This isn’t much of a surprise to anyone understanding what a single machine can do versus the composition of services where every interaction is metered.

However, given the benefits of Adama and the deployment model, this is kind of amazing. I believe I can further push limits by switching to a leaner binary format rather than JSON. Not only can I make the format use less memory and storage, but reading and writing the new format would use less CPU which would reduce latency as well.

However, I am not close to having a new and shiny data store ready for production as I still need to figure out how I want to think about scale, durability, back-ups, availability, and data integrity. The key challenge is that I have to figure which trade-offs I want to live with.

Sadly, there is no ultimate solution available, and I have to sort out which properties feed into various markets. Currently, I’m targeting board games which will be less sensitive than billion dollar enterprises. I can, in the organic sense, start small and lean. Perhaps, all I need to do is switch gears, focus on scaling side, use Kinesis+S3 for durability, and then build a tailer to shred the stream into something like S3.

There is much fun to be had!

March 18th, 2022 Progress on a new data store By Jeffrey M. Barber