August 15th, 2022 Scaling to the moon By Jeffrey M. Barber

Here I am with two servers and a service in early access, and I’m thinking about massive scale. And, by massive scale, I mean the entire planet filled will billions of devices. This kind of scale is the one thing that I’ll probably never achieve again since leaving Meta where I was the architect of how Meta does real-time. Have you read my former team’s SOSP 21 paper on BladeRunner? It’s a fun read. Regardless, I was wondering if Adama could achieve massive scale, so let’s think loudly.

For Adama, massive scale for reading personalized documents per viewer is relatively low hanging fruit because Adama emits a delta log. Logs replicate easily, and I can introduce a new read-only connect operation called observe. This is much like a connection except (a) documents have to allow it at a static policy level, and (b) viewers can not send messages to the document (However, viewers can share view state).

Since the operation is read-only, massive scale is achievable due to hierarchical delta replication which scales so long as the amplification fits within the outbound network card capacity at each level. Since every level is doing differentiation on a stream, the amplification factor can be surprisingly high.

replication in a tree

A document with world-wide attention would have a finite number of participants that can write, and then replication on the authoritative server only needs to replicate to the various regions. Within a remote region, a proxy instance is elected to vend to multiple tiers (which may also just be proxies to do even more massive fanout). The last mile has a specialized version of the Adama document which only has the privacy logic (as this keeps the code footprint smaller than all document logic).

And, that’s it! But what about writes from a billion devices…

That’s a much harder game, but here is where the language approach shines yet again. Instead of defining a message handler, I could define a message reducer such that all people connected to a rack could have a portion of their voice heard by combining them into a reduced payload. Once you have some kind of reducer which takes a fire hose into a garden hose, then you run the connections the other way and the document will get a sustainable stream of feedback. This assumes there is no product-level way to introduce more documents into the picture, but that’s a conversation for a different day.

For now, the question of priority is simple: massive scale is not a priority. However, I am a curious monk. The good news is that the majority of the efforts required is already done, and I just need to connect the dots and plumb things together. However, the really hard problem is the on-demand provisioning of capacity such that the hierarchy emerges within a multi-tenant environment.

I do not have an answer yet, but I’m thinking about it…