Durable State Machines That Survive Server Crashes
I spent years building multiplayer backends with Node.js, and the operational pain was real. Every time I restarted a server, every in-flight game session died. Players lost their progress. The state machine driving the game logic -- whose turn it was, what phase the round was in, which cards had been played -- all gone. I had to build external coordination layers: Redis for state, RabbitMQ for message queuing, Postgres for checkpointing. The saga pattern crept in with its compensating transactions, and suddenly my "simple card game server" had more infrastructure than a banking system.
The dungeon master pattern is the alternative I built into Adama. The server controls the workflow -- like a tabletop game's Dungeon Master -- and the entire execution state survives crashes, restarts, and migrations between hosts. No external coordinators. No saga compensations. Just straight-line code that happens to be immortal.
Most backend architectures are passive. A request arrives, you process it, you respond. The client orchestrates everything. But games, approval workflows, auctions, and onboarding flows all share a common property: the server needs to drive the conversation. It decides whose turn it is, asks specific people for input, enforces ordering, and handles timeouts.
In Adama, this is expressed through state machines defined with the # notation. Each state is a named block of code that executes when the document enters that state:
#waitingForPlayers {
// Logic runs when we enter this state
}
#gameInProgress {
// Different logic here
}
#gameOver {
// Final state
}
The transition keyword moves between states. Crucially, transitions are not immediate jumps -- they schedule the next state to run after the current transaction commits. This means every state transition is a durable checkpoint.
Channels are how clients talk to documents. A channel is a typed endpoint that accepts a specific message type and runs handler code. Think of them as the API surface of a document:
message Say { string text; }
channel say(Say msg) {
_chat <- {who: @who, text: msg.text, when: Time.datetime()};
}
But the real power shows up with incomplete channels -- channels declared without handlers. These exist so the state machine can ask specific clients for input, flipping the usual client-server relationship:
message Move { int x; int y; }
channel<Move> move_channel;
The fetch and await pair is where the dungeon master pattern gets interesting. fetch requests input from a specific principal (player). await blocks until that input arrives. But this is not a busy-wait. The document state -- including the pending request -- is persisted to durable storage. The server can crash, restart on a different machine, and the await will resume exactly where it left off when the player finally responds.
#playerTurn {
future<Move> f = move_channel.fetch(current_player);
Move m = f.await();
applyMove(m);
if (checkWinner()) {
transition #gameOver;
} else {
current_player = getNextPlayer();
transition #playerTurn;
}
}
Beyond fetch, channels provide decide and choose for structured input. decide(principal, options[]) restricts the player to picking from a set of valid options -- no cheating. choose(principal, options[], limit) lets the player select a subset. Both return futures and are fully durable:
channel<Play> play;
#playerTurn {
list<Play> open = iterate _board where piece == @no_one;
if (play.decide(current_player, @convert<Play>(open)).await() as pick) {
applyMove(pick);
}
transition #nextTurn;
}
The separation of fetch and await into two calls is deliberate. It enables parallel requests to multiple players:
#getResponses {
future<Answer> f1 = answer.fetch(player1);
future<Answer> f2 = answer.fetch(player2);
// Both players respond independently. Order doesn't matter.
Answer a1 = f1.await();
Answer a2 = f2.await();
processAnswers(a1, a2);
transition #nextPhase;
}
Even though we await player1 first, player2 can respond at any time. If player2 responds before player1, their answer is queued and immediately available when we reach f2.await().
Add in followed by a number of seconds to delay a transition:
#bidding {
transition #auctionClosed in 3600; // Close after 1 hour
}
These delayed transitions are durable. Schedule a transition for one hour from now, crash the server after 30 minutes, and the transition still fires 30 minutes after recovery. No external cron jobs. No Redis-backed timers. No SQS delay queues.
This matters for real applications. Auction deadlines, session timeouts, scheduled events -- they all execute reliably as a built-in property of the document.
The saga pattern exists because distributed systems need multi-step workflows that can fail partway through. Each step has a compensating transaction to undo it if a later step fails. In practice, this means: