Building WebSocket-First Infrastructure

I have built three WebSocket-based systems across my career, and the lesson I keep re-learning is this: "we also support WebSocket" and "WebSocket is the primary protocol" are two entirely different engineering games. The first is a feature checkbox. The second is an architectural commitment that reshapes your storage layer, your deployment strategy, your failure modes, and your sleep schedule.

This report documents what I learned building Adama -- a reactive document platform where every client connection is a persistent WebSocket streaming per-viewer state deltas. The hard-won parts. The parts where I was wrong. The parts where I replaced gRPC with a custom binary protocol and discovered that half my latency problems were actually compaction problems.

The case for WebSocket-first

The pitch is seductive. With a persistent connection, a small change on the server produces a small change on the wire. In Adama's benchmark of a board game with 802 user actions across 4 players, delta-based streaming over WebSocket consumed 1.17 MB of client bandwidth. The equivalent poll-and-fetch approach? 24 MB. Storage bandwidth dropped from 32 MB to 644 KB. Close to 95% of server-to-client updates fit within a single Ethernet frame (under 1500 bytes MTU).

Those numbers sound too good. They are real, but they come with a price list that nobody shows you upfront.

The failure modes that will eat you alive

Here is the fundamental problem: a request-response cycle lasts milliseconds. A WebSocket connection lasts minutes, hours, days. Stretch your imagination for handling state across that time horizon, and the failure modes start multiplying.

Disconnect storms. When a WebSocket server has a bad moment and drops connections, every client reconnects simultaneously. The reconnect is not cheap -- it requires re-authentication, re-subscription, and a fresh state snapshot. If you lack spare capacity, the reconnect surge causes more failures, which causes more reconnects. This is a death spiral, and I have watched it happen in production more than once.

The invisible broken stream. A broken stream looks identical to an inactive stream. With request-response, a timeout tells you the truth. With a stream, silence could mean "nothing changed" or "your connection is dead and nobody knows." This is the single hardest operational problem in WebSocket infrastructure.

Split brain routing. In a distributed system, two nodes might simultaneously believe they own the same document. Each accumulates state. When the split resolves, you have divergent histories. The storage layer must reject writes from the stale node, which then needs to patch itself up and retry. I implemented this conflict detection using sequence-based compare-and-set against the durable log, but the latency spikes during resolution are real and user-visible.

Backpressure from slow clients. A mobile client on a congested network cannot consume updates as fast as the server produces them. Without flow control, you get an unbounded queue on the server that eventually eats all your memory. With flow control, you need a strategy for what happens to the updates the client missed.

WebSocket Connection Lifecycle with Failure Handling Connect Authenticate Subscribe Receive Deltas Heartbeat (bidirectional) every 5-10s FAILURE ZONE Network Drop Server Deploy Timeout (silent) Backpressure RECOVERY PATH Exp. Backoff 1s, 2s, 4s...30s Reconnect Re-auth Full Resync (fresh snapshot) Resume Deltas

Heartbeats and exponential backoff that actually work

You cannot trust the browser. You cannot trust the server. You cannot trust TCP. You need bidirectional proof of life.

I use heartbeat pings every 5-10 seconds in both directions. These heartbeats carry a small gossip-style payload: the current document sequence number and a hash of the client's known state. If the sequence numbers diverge beyond a threshold, something went wrong silently and a resync is triggered. This is how you distinguish "inactive stream" from "dead stream" without waiting for TCP's own timeout (which can take minutes).

On reconnect, exponential backoff is non-negotiable. Start at 1 second. Double each attempt. Cap at 30 seconds. Add jitter (random factor of 0.5-1.5x) to prevent synchronized reconnect waves. The cap matters -- without it, clients that lost connectivity for a long period would wait unreasonably long to come back.

But here is the subtlety people miss: exponential backoff is not just for the client. Internal services need it too. When the WebProxy loses its connection to an Adama host, it applies the same backoff strategy internally. An aggressive retry policy inside your own infrastructure is a self-inflicted DDoS.

Why gRPC got replaced with a custom binary protocol

The initial architecture used gRPC between the WebProxy tier and the Adama document hosts. It worked. It was also burning CPU.

When I replaced gRPC with vanilla Netty and code-generated binary codecs, the results were immediate:

Concurrent Streams gRPC p95 Latency Custom Protocol p95 Improvement
400 65 ms 54 ms 17%
800 75 ms 67 ms 11%
1600 400 ms 260 ms 35%
3200 2100 ms 1200 ms 43%

CPU usage for networking dropped by 50% on the WebProxy. The improvement scaled super-linearly with load -- gRPC's overhead was disproportionately painful under contention. The custom protocol is simple: code-generated codecs with no reflection, no protobuf parsing, no HTTP/2 framing overhead. Just bytes on a Netty channel.

(The irony is that the latency investigation exposed a deeper problem: compaction in the storage layer was blocking the data service thread. Replacing the protocol was the right call, but the biggest win came from fixing compaction scheduling. You often find the real bug while fixing the wrong thing.)

Request-response correlation over WebSocket

Every message on the Adama WebSocket uses a method + id pattern. The client sends {"method": "connection/create", "id": 7, ...} and every response references "id": 7. This is the trick that makes WebSocket feel like HTTP while keeping the connection alive.

For streaming responses (document connections that emit ongoing deltas), the callback is retained in a map and invoked repeatedly. For one-shot requests, the callback fires once and gets cleaned up. The protocol is asymmetric on purpose: clients send request-response upstream, and the server streams deltas downstream. This mirrors Server-Sent Events for reads with regular HTTP for writes -- which means the entire protocol can fall back to long polling for the 1.6% of environments where WebSocket is unavailable.

The id correlation pattern also makes multiplexing natural. A single WebSocket carries multiple document connections, each on their own ID. No head-of-line blocking between documents. No need for separate connections per subscription.

The reducer pattern for slow clients

When a client cannot keep up, the naive approach is to queue. Queues grow. Memory dies. Game over.

The better approach -- and this is the core insight from building BladeRunner at scale where 90%+ of messages were intentionally discarded -- is to coalesce. Delta updates are inherently coalescable. If the server computed {"score": 5} and then {"score": 7} before the client consumed either, the merged delta is just {"score": 7}. The intermediate value is irrelevant.

This is why delta-based streaming handles congestion "for free" when implemented correctly. The server maintains a pending delta buffer per client. When flow control kicks in and the network pipe is full, new deltas merge into the pending buffer. When the pipe opens up, one coalesced delta ships. The size of the pending buffer asymptotically approaches the size of a full state snapshot -- bounded, predictable, no unbounded queues.

The cost? A slow client sees state jumps. It misses the animation of score going 5, 6, 7 and just sees 5 then 7. For most applications this is the right tradeoff. For the rare case where every intermediate state matters, you need a different protocol entirely (and probably a different architecture).

Backpressure: Coalescing Deltas for Slow Clients Server time --> d1 d2 d3 d4 d5 d6

Fast Client

d1 d2 d3 d4 d5 d6

6 deltas received

Slow Client

d1 network congested -- buffer full d2+d3+d4+d5 d6

3 deliveries

How coalescing works: d2: {"score": 5} d3: {"score": 6} d4: {"score": 7, "level": 2} d5: {"lives": 2} Coalesced: {"score": 7, "level": 2, "lives": 2} -- intermediate values discarded, final state preserved Buffer size is bounded: worst case = full document snapshot. No unbounded queues. No OOM.

The routing architecture: WebProxy, routing table, and Adama Host

The production topology evolved through many iterations (I documented eight topologies on the path from "Ultra YOLO single process" to the production architecture). The one that stuck has three tiers.

The WebProxy terminates TLS and WebSocket connections from browsers. It handles authentication, serves static assets, and manages the API surface. It is stateless in the important sense -- losing a WebProxy means clients reconnect to another one and resume.

The Routing Table uses a gossip-based failure detector to track which Adama Host owns which document. When hosts join or leave, rendezvous hashing rebalances document assignments and the routing table propagates changes reactively to WebProxies. The WebProxy subscribes to host assignments rather than looking them up once -- because the host for a document can change mid-stream during deployments or failures.

The Adama Host is where documents live as in-memory state machines. Each document is an actor: single-threaded, sequential message processing, atomic transactions with rollback. The host persists changes to a write-ahead log (data-caravan with memory-mapped files and async S3 backup), and streams deltas to connected viewers through the WebProxy.

Between WebProxy and Adama Host, the custom binary protocol replaces the original gRPC link. This is where the 50% CPU reduction happened. The protocol is code-generated from an API definition, so adding a new operation means editing XML and regenerating -- no hand-written serialization.

Production WebSocket Architecture: Three Tiers Browser A WebSocket Browser B WebSocket Mobile C WebSocket

WSS (TLS)

WebProxy Tier (stateless, N instances) TLS + Auth API Router Stream Router subscribe to hosts

Binary Protocol

Gossip Routing Table rendezvous hashing Adama Host 1 Doc: game-001 Doc: game-002 Doc: chat-017

actor model, single-threaded

Adama Host 2 Doc: game-003 Doc: lobby-001

per-doc WAL + S3 backup

WAL + S3 durable storage WSS (client) Binary protocol (internal) Gossip / async Key insight: WebSocket terminates at WebProxy. Internal traffic uses a leaner protocol.

Delta streaming as the natural fit

The WebSocket protocol's streaming nature aligns perfectly with delta-based state synchronization. When a document's state changes, the runtime computes per-viewer deltas -- JSON merge patches (RFC 7386) extended with @o ordering for arrays and @s sequence numbers for ordering. Each connected client gets a personalized delta reflecting only the fields they are allowed to see (privacy filtering happens during delta computation, not after).

The delta format is intentionally coalescable. Two deltas applied in sequence produce the same result as one merged delta. This property is what makes the backpressure story work and what makes reconnection simple: on reconnect, the server sends a fresh full snapshot, and the client replaces its local state. No negotiation about "where did we leave off." Just a clean restart. The cost of this simplicity is re-downloading the full view, but for most documents (tens of KB) this is a single TCP round trip.

When NOT to use WebSocket

Honesty compels me to say: most applications should not be WebSocket-first.

If your data changes less than once per minute, polling with HTTP is simpler, cheaper, and far easier to cache and scale. CDNs were built for request-response. Load balancers were built for request-response. Every monitoring tool, every debugging proxy, every developer's mental model defaults to request-response.

If you need guaranteed delivery of every intermediate state (financial transactions, audit logs), WebSocket delta coalescing will lose data by design. Use a persistent queue with acknowledgment.

If your team does not have experience operating stateful connections, the operational burden will surprise you. Connection counts, memory per connection, graceful drain during deploys, thundering herd mitigation -- these are not things you figure out in a sprint.

WebSocket-first makes sense when your application is fundamentally about real-time shared state: collaborative editors, multiplayer games, live dashboards, chat. For everything else, REST with occasional polling is the right call, and choosing it is not a failure of ambition. It is engineering maturity.

The WebSocket is an optimization. A powerful one. But an optimization of a system that must first work correctly without it. That is why Adama can be polled -- every document state is fetchable via a single GET. The WebSocket layer exists to make that poll infinitely cheaper. It is not the foundation. The data model is the foundation. The delta protocol is the optimization. And the WebSocket is the transport that makes the optimization practical.