Q3 2020

Performance

Making a Reactive Engine Fast Enough

A non-productive theme for July is to jump into performance and measure a few things. That sentence aged poorly. July turned into a month of obsessive benchmarking, humbling discoveries, and a 78% CPU reduction that I earned through equal parts engineering and stumbling over my own stupidity.

The test setup was a 4.8 KLOC board game prototype. Seed the random number generator for predictability, play one meaty game 101 times (throw away the first run for JVM warmup), measure time and "billing cost" -- a number proportional to work done since the language bills by the statement. Starting point: 739ms per game, 12.8M billing cost. That's approximately 1ms per decision. Super slow.

First optimization: exploit primary key lookups in where clauses. I wrote a simple extraction algorithm to detect id == expression patterns in boolean expressions, then short-circuit the table scan.

function extract(expr) {
  if (expr.type === "BinaryOp" && expr.op === "==") {
    if (expr.left === "id") return expr.right;
    if (expr.right === "id") return expr.left;
    return null;
  }
  return null;
}

Result: 2.5% billing reduction, 1.4% time reduction. Ok, this is going to be a slog.

I turned off client view computation as a direction-seeking experiment. Time dropped from 728 to 510ms. Billing dropped 70%. The client views were eating a third of my CPU and the vast majority of billing cost. This gave a tingle to explore.

Caching record-to-JSON views came next. If a record has no bubbles, no visibility requirements, only public/private fields, and only primary data types -- then the JSON output can be cached per record and invalidated reactively on change. This alone cut billing 20% and CPU 12.4%.

Then I went after table indexing. Built a DocumentMonitor interface to measure which columns in which tables were most effective at rejecting rows. The data was revealing:

table	column	calls	effectiveness
skill_cards	location	71,199	78.34%
skill_cards	skill	57,681	80%
civilian_ships	status	31,507	96.24%

Out of 6.7M tests on skill_cards.location, 5.2M could be quickly rejected via an index. The challenge was not paying the insertion cost in a reactive system -- you need lazy book-keeping with a catch-all bucket for indeterminate items. After upgrading the parser, updating code generation, and building the set intersection logic: an additional 2% CPU reduction. I am Jack's sense of disappointment. But billing dropped 77%, which is customer-friendly, so we kept it.

The real breakthrough came from fixing a bug in the reactive tree. Something felt off about why so much computation was happening. An audit revealed sloppy mixing of invalidation and dirty signals. The two principles: invalidation always flows from data down to formulas; dirty signals flow up from data to root. All but two classes behaved correctly, and one had a giant TODO on it. Fixing these dropped us to 550ms and 2.3M billing -- a 25% CPU reduction and 82% billing reduction from the start.

Then I found a stupid 1ms spin-lock delay in my benchmark code. The scheduler for future state transitions was injecting random delays on 0ms transitions. Fixing this: 550 became 350. Moral of the day -- measuring is hard.

Day three of the second performance push: I'm dumb. I wasn't actually deleting items from tables. I was hiding them and not removing them from internal structures, so most loops were filtering dead stuff. Fixing this alone dropped time below 120ms. I found this while investigating the correctness of the delta model, not performance. The delta model is the core idea: instead of the entire document being a compare-and-set blob, Adama translates domain messages into data differentials that can be appended to a log. The physics are beautiful -- as document size increases, time stays bounded by changes in flight, network cost is proportional to changes, CPU cost is proportional to changes, and conflicts can be batched locally.

function integrate_message2(doc_reactive_cache, msg, key) {
  sync_document(doc_reactive_cache, key);
  delta = compute_prepare_delta(doc_reactive_cache, msg, key);
  if (!append_delta(key, delta)) {
    integrate_message(doc_reactive_cache, msg, key);
  } else {
    sync_document(doc_reactive_cache, key);
  }
}

For the final push, I tackled client view deltas. The naive approach -- compute the entire view per client, compare to the previous view, emit a diff -- was 350ms. Computing the delta as we go by storing a resident copy and producing the delta as a side-effect: 137ms. Switching from JSON trees to streaming readers and writers everywhere: 95ms.

The chat room case study proved the architecture worked end to end. Three steps: write the back-end in Adama (define state, handle messages, expose reactive formulas), upload the script, build a thin UI that listens to tree changes. A chat message results in a delta like {"chat":{"44":{"who":{"agent":"jeffrey"},"what":"Hello Human"},"@o":[...]}} flowing to every connected client. The trifecta of laying out state, ingesting state from people, and reactively exposing state in real-time via formulas was sufficient to build real products.

The final scoreboard told the story. Comparing stateless compare-and-set to Adama's approach on the board game: CPU time 11.1%, client bandwidth 5%, storage bandwidth 2%, and 94.8% of client updates fit within a single ethernet frame. The potential was real. I just had to build the rest of the platform around it.