Q2 2020

Genesis

Building a Language from Scratch

I hate. I hate bad infrastructure. I hate leaky abstractions which become infrastructure. And apparently I hate myself enough to start building a programming language from scratch during a global pandemic.

This is how Adama began -- not with a grand plan, but with psychotic issues and a belief that I'd found a different evolutionary branch in how we build software. The language is domain-specific, originally meant for board games, but somewhere in the building I discovered properties that make it broadly applicable. The vision crystallized around a new type of infrastructure where documents are tiny virtual machines, state synchronization is automatic, and developers stop thinking about the plumbing between users.

By May, I had over 3,000 lines of code written in my own language. That's a remarkable feeling -- suffering your own creation while making progress on a real product rather than being stuck in the infinite meta-game of improving the language itself. The first milestone was simple: play an actual board game with friends. Not release it publicly (licensing issues with Battlestar Galactica), but prove that this thing could ship something real.

Progress was slow. Every weekend I lurched forward, filling in missing rules, adding content, fixing bugs, re-reading the rule book, working on the UI. I adopted a strategy borrowed from the wisdom of "Write Games, Not Engines" -- resist the siren's call to improve the language endlessly, and instead follow a loop: fix show-stoppers, ship a game, retrospective, pick exactly three improvements, repeat.

The whole thing actually started years earlier with building a custom browser using SDL and then Skia with C#. I wanted a single socket as the only way to get data, limited client-side logic, anti-entropy protocols for reconciliation, 100% reactive UI, and the server in complete control. Building a new browser turned out to be (surprise) a monumental task. But the core insight survived: when all state lives within a single server, the challenge of shipping a complex product drops by several orders of magnitude. There are practically zero failure modes. There's just one tiny problem -- the reliability of a single server in today's cloud is not great. I persisted anyway.

The UI architecture that emerged was ruthlessly simple. The entire application state is a giant JSON object. The server differentiates that state and keeps clients up to date using JSON merge (RFC 7386). The UI just reacts to changes of a single object. Small change from client to back-end results in a small change from back-end to other clients using a small amount of network. When clients learn of changes, they update proportional to the change at hand. No excess. No fuss.

GOAT.SieveSubscribe(sieve, {'turn': function(change) {
  document.getElementById('turn').innerHTML = change.after;
}});

GOAT.SieveMerge(state, diff, sieve); // powers the entire engine

This "object sieve subscribe" pattern meant updates on the UI are ONLY derived from the update stream. No global re-computation, no giant reconstruction for small changes. I was aiming for a battery efficient engine where only updates would refresh the screen.

Then in June, I ripped out ANTLR and wrote the parser from scratch. I'd started with ANTLR for parsing and building AST nodes, but frustration mounted -- white space handling, error messages, the inability to cleanly support refactoring and code completion and LSP hover docs. The wisdom from those who recommend not using parser tools proved true. The new hand-rolled parser was 20% faster, achieved 100% code coverage, and gave me control over comments, formatting, and error reporting that ANTLR never would.

The objectives were clear: unified comment handling, the parser as an ally for formatting, great error messages, fast code completion queries, fail on the first error (it is simply better that way). With the new parser I had enough information to think about code coverage at the character level. I started dreaming about a testing tool where test cases are generated automatically, transitions are generated to achieve code coverage, and AI techniques minimize the total number of automated tests. It felt exciting that I'd never need to mock or write test code again since the goal of testing is primarily about reducing entropy and maximizing determinism.

Looking back, this quarter was about establishing the fundamentals -- a language that works, a parser I control, a UI architecture built on differential state synchronization, and enough of a real product to know the ideas hold weight. The game worked but the UI sucked and I was afraid of game-stopping bugs. The empire was small. But it existed.