Why Pub/Sub Fails for Real-Time User Experiences (And What Replaces It)

I spent over half a decade as a technical leader for a very large soft-real-time distributed system. In that time, I built pub/sub infrastructure, debugged pub/sub infrastructure, and watched pub/sub infrastructure collapse under its own weight. The pattern looks seductive -- publish to a topic, subscribers get notified, done. But when your subscribers are also publishers (which is to say, when your subscribers are people), the math goes quadratic and everything falls apart.

Let me walk through exactly why, with numbers, and then show the architectural trick that turns quadratic back into linear.

The Toy Model That Reveals Everything

The entire problem is visible in the core pub/sub loop:

function publish(subscribersByTopic, topic, payload) {
  var list = subscribersByTopic[topic];
  if (list) {
    for (var i = 0; i < list.length; i++) {
      list[i].deliver(payload);
    }
  }
}

This is the core logic of topical pub/sub. Everything else -- brokers, distributed logs, Kafka, MQTT, AWS Gateway -- is engineering on top of this fundamental loop: for each publish, iterate over all subscribers and deliver.

There are two schools of distributed pub/sub. The broker philosophy (AWS API Gateway, MQTT brokers) focuses on managing the subscriber mapping durably -- making sure subscribersByTopic survives failures. The distributed log philosophy (Kafka, Kinesis) focuses on making the publishes durable -- subscribers tail the log and filter for their topic.

Both scale. Both work for decoupling predictable systems. Both fail in the same way when humans are involved.

The Quadratic Problem

Here is the scenario. You have an interactive experience -- a game, a collaboration session, a live dashboard -- with N connected users. Each user is both reading state and producing state. In pub/sub terms, each user is both a subscriber and a publisher.

When user A makes a change, it must be delivered to all N-1 other users. When user B makes a change, same thing. When all N users are active:

N users produce messages
Each message fans out to N-1 subscribers
Total deliveries per round: N * (N-1)

That is O(N^2) message deliveries. For 10 users, that is 90 deliveries. For 100 users, that is 9,900. For 1,000 users, that is 999,000.

Pub/Sub: Every publisher to every subscriber

A B C D E

5 users = 20 message paths 100 users = 9,900 paths

Reducer: N messages in, N views out

A B D E Reducer (Document) C

5 users = 10 message paths (5 in + 5 out) 100 users = 200 paths

At 1,000 users: pub/sub = 999,000 deliveries vs reducer = 2,000 paths Solid lines = messages in. Dashed lines = computed views out.

The traditional fix for this is the client/server model: clients send their state to a central server, the server aggregates it, and sends a compact representation back to everyone. This works, but it is what you are building when you put a server in front of pub/sub. You have just reimplemented pub/sub with an aggregation layer. The question is why that aggregation layer is so painful to build on top of existing messaging infrastructure.

The Deeper Problem: Message Loss Is Physics

Even if you solve the quadratic fan-out, pub/sub has a fundamental problem: message loss is inevitable. When a client disconnects and reconnects, messages published during the gap are gone. MQTT illustrates this clearly:

Client X subscribes to topic X
Client X loses connectivity
Client Y publishes payload "A" to topic X
Client X reconnects and resubscribes
Client X never receives "A"

The fix is to make the publish stream durable -- store every message and let reconnecting clients catch up. But MQTT lacks a sequencer concept, so you have to build a new protocol layer on top. And now you are maintaining a durable log with consumer offsets, which is Kafka, and you have just traded one infrastructure problem for another.

Worse, even with a durable log, flow control between the system and slow clients means different clients will see different intermediate states. Client A might see messages 1 through 50, while slow client B only saw 1, 2, 3, 15, 30, 50. Both end up at the same final state, but the intermediate experience is wildly inconsistent. If your product relies on the stream of events (not just the final state), debugging this is a nightmare.

The Reducer: Turning NN into 2N

The insight behind Adama is to place a stateful reducer between publishers and subscribers. Instead of fanning out raw events, clients send messages to a single document. The document ingests those messages, applies product logic, updates its state, and computes per-viewer deltas.

Here is a minimal pub/sub system built in Adama:

record Publish {
  public principal who;
  public long when;
  public string payload;
}

table<Publish> _publishes;

public formula publishes =
  iterate _publishes order by when asc;

message PublishMessage {
  string payload;
}

channel publish(PublishMessage message) {
  _publishes <- {
    who: @who,
    when: Time.now(),
    payload: message.payload
  };

  // Age out old publishes
  (iterate _publishes
     where when < Time.now() - 60000).delete();

  // Hard cap at 100 most recent
  (iterate _publishes
     order by when desc
     offset 100).delete();
}

The complexity math changes completely:

N clients send messages to the document: N operations
The document batches and reduces these into one state update
One state update produces N per-viewer deltas: N operations
Total: 2N, not N^2

When publishes outstrip the ability to deliver updates, the document absorbs the spike. It acts as a reducer, enforcing product logic (age out old messages, cap at 100) to keep state bounded. The delta protocol then handles distribution to clients with built-in flow control -- slow clients get coarser deltas covering more changes, fast clients get fine-grained updates.

Where the Reducer Sits

4. ACK to sender

5. Compute Per-Viewer Deltas Privacy filter + JSON diff 6. Stream Deltas Out

Slow clients receive coarser deltas (more changes batched). Flow control is built in: the delta size approaches the poll size, not an unbounded queue.

The reducer sits between clients and persistence. It absorbs the N incoming messages, reduces them into a single state update, persists that update, and then computes N per-viewer deltas for distribution. The scaling limit of the log is significantly elevated because Adama can leverage the data center's memory bandwidth (100+ GB/s) to reduce updates, which outstrips both a 10 Gbit network and a 3 GB/s NVMe.

This architecture can scale horizontally with respect to the number of documents. Each document is independent. But within a single document, the reducer and its log form a choke point. For most applications -- games, chat rooms, collaborative editing -- this is not a problem because the per-document write rate is bounded by human input speed.

The Sequencer Pattern: Pub/Sub as Notification

There is an even simpler use of the reducer pattern. Many systems use pub/sub as a notification layer: "something changed, go re-query the database." You can model this directly:

public int max_db_seq = 0;

message NotifyWrite {
  int db_seq;
}

channel notify(NotifyWrite message) {
  if (message.db_seq > max_db_seq) {
    max_db_seq = message.db_seq;
  }
}

This ships an ever-increasing number to all connected clients. There is no data loss -- if you miss sequence 5 and get sequence 8, you know to re-query. The reducer absorbs spikes: if 100 writes happen in a burst, the document settles to the highest sequence number, producing one delta to each client instead of 100.

This illustrates the despair of building a low-latency pub/sub system. You optimize the hell out of the fanout pipeline, and then your consumers take that notification and... query a database. The last mile of the race is taken in a geo metro. The reducer pattern embraces this reality: make the notification smart enough to carry the data itself, and the database query disappears.

The Real Cost of Pub/Sub You Do Not See

The hidden tax on pub/sub is debugging. When clients see inconsistent intermediate states because flow control dropped different messages for different clients, the bug reports are maddening. "User A sees the chat in one order, user B sees a different order, and user C is missing three messages." Good luck reproducing that.

With a reducer, every client converges to the same state (modulo privacy filtering). The intermediate states may differ due to timing, but the final state is consistent. And if something goes wrong, you can replay the document's event log deterministically to reproduce exactly what happened.

The Tradeoffs

The reducer model has clear costs. First, it introduces a single point of serialization per document. All writes go through one place. For applications where the write rate exceeds what a single thread can handle (high-frequency trading, IoT telemetry at scale), this is a bottleneck.

Second, it requires the server to hold document state in memory while active. A pub/sub topic is cheap -- just a routing table entry. A reducer document has state, privacy policies, reactive formulas. The per-document overhead is higher.

Third, it binds your data model to the reducer's language and runtime. With pub/sub, you can use any language, any framework, any data format. With a reactive document, you write Adama code or you do not get the benefits.

For cross-document use cases -- leaderboards, search indexes, global directories -- where you need to aggregate data from many documents into one place, Adama provides a replication engine. Documents can watch expressions via content hashing and push create/update/delete events to external systems (MySQL, analytics pipelines, billing). This addresses the legitimate pub/sub use case of "something changed, go update the aggregate" without the quadratic fan-out.

These are real costs. But for the class of applications where people are interacting with shared state -- which is the class of applications that actually needs real-time synchronization -- the reducer model eliminates the quadratic scaling problem, the message loss problem, and the intermediate state consistency problem in one architectural move. I will take those tradeoffs.