Woe betide anyone that uses MQTT (at scale)

As I wander, I’m drafting new marketing around the real-time aspects of this platform. This approach is more relatable and easier to explain, but I feel it comes with expectations around integration. One integration would be slapping an MQTT front-end to adapt MQTT clients to the platform. The benefits would be to simplify integrations as there are many people familiar with MQTT and have it within their stack already. The drawback is that I spent multiple years strangling MQTT at Meta where I (regrettably) patented a new protocol.

All manner of problems arise when you scale MQTT up. Specifically, when you turn MQTT into a proxy to multiple subscription sources. Products will issue a SUBSCRIBE request which the broker then can use the topic to route to a subscription source. There are two options to contend with. Either that SUBSCRIBE request durably maps that subscriber and topic to that host, or the SUBSCRIBE request is ephemeral like a socket.

Let’s start by assuming the MQTT broker has a single subscription mapping to a remote host via a TCP-like stream. Well, the stream will inevitable fail for a multitude of reasons (like deployments). MQTT is unable to signal to the client that it died mid-way, and so the engineers have three choices. The first choice is do nothing and let the subscription languish as dead. The second choice is to invent a mini-protocol specific to the product to signal to the client the death of the stream. The third choice is to have the broker implement retry logic, and this choice is nice except it creates a reliability problem as messages go poof during the retry period. These choices suck, so let’s consider the typical option of making the subscription durable.

Now, you have a durable subscription which relies on a mapping of topic to a list of subscribers on various hosts. By the way, this is what AWS recommends when using the AWS Gateway. The expectation then is that events generated within the fleet must lookup the subscriptions and then publish to each via a network call. This implicitly creates holes as publishers will encounter problems publishing to the gateway. Worse yet, there will be inconsistency between people subscribed to the same topic because fan-out introduces partial failures. This is then compounded by the new failure modes of the mapping of topic to subscribers. Yet another confounding issue is that caching and replication of the mapping of topics to subscribers affect the consistency of the mapping which creates yet another reliability issue.

The core issues fundamentally boils down to the fact that publish/subscriber is an awful pattern. Yes, it seems and feels simple, but it has problems and should only be used when you don’t really care about reliability. Publish/Subscribe just sucks, and because it sucks so does MQTT… if you care about reliability. MQTT is basically only good for unreliable sensor data.

Ultimately, reliability is hard, and I’ve dealt with this demon for long enough. I’m very happy with where I left that team because the essential thing was to be honest with customers. You can learn more about the scale aspect via my srecon17 presentation, the protocol via the patent, or the serverless broker called BladeRunner. However, I’m looking towards a brighter future where I don’t have to resurrect such things.

Shift your mind towards reliability

Looking to the future, an interesting option is RSocket which requires a mind-shift which I do believe is the correct one. I’ve contended with RSocket in the past much to my dismay (and the people that created RSocket). What I didn’t realize at the time was that RSocket and publish/subscribe are fundamentally in-congruent precisely because publish/subscribe sucks and lacks any notion of flow control. Flow control fundamentally creates message loss which is precisely the issues that both ephemeral and durable subscriptions exhibit.

If you care about reliability, then you have to shift your mind. This shift require thinking about data and what you publish. Here is where the problems of existing real-time systems reveal themselves. Most real-time systems don’t have the authoritative data and are optimistic update channels. At first, it may be just some kind of notification that something new is available, but such system is easy to abuse. More abuse will escalate the importance of reliability, and that’s when things have to change.

Don’t get me wrong, business is good for teams that make real-time offerings because it is hard to remove the positive impact of real-time. It’s better to have pub/sub than nothing. For many years, my aspirational north star for the team I led was that “reality is real-time”. However, the end goal requires an exceptional focus on the data being shared rather than optimistic signals. However, the focus on data must contend with the reality of the network which requires one thing: flow control.

Flow control and Adama

Adama is close to leveraging flow control by using document differentials which can batch and collapse on the server, and a reliable transport just needs to make sure differentials are sent in order. If there is a disconnect, then we can have the nuclear option of sending the entire document again. Furthermore, since a large history is persistent, it’s entirely possible to further complicate the protocol by having a copy of the document be rewound to the client’s state and then fast forward recent updates to produce a new differential. I’ll avoid this complex negotiation for a long time since sending the entire document is both easier and cheaper, but it’s nice that such possibilities are available in the future. It’s a warm and fuzzy feeling.

By focusing on the data, the failure mode will look like polling. This has the consequence that I could leverage an unreliable transport like UDP (or MQTT) with client code to negotiate re-ordering and gaps. This unfortunately has the consequence of building some of the key components that make TCP… TCP, but this then reframes how I think about MQTT from a business perspective.

Ultimately, if I wanted to offer customers a MQTT front-end, then I’d have to provide some kind of configuration.

The first model is that the MQTT client would need to support the delta model and handle full updates as failures occur; this requires clients to deploy code. This need for custom client code tends to negate the reason for using MQTT in the first place.
The second model is configuration on the server to simply send the entire document every update with some kind of rate limit knob. This second model would mirror what sensors send snapshots via MQTT. I’m not a fan of the second option, but it does avoid custom code on clients.

This trade-off however is not free for clients and server, so the answer is dip into how AWS thinks and simply charge customers for bandwidth and memory. This philosophy could extend to a variety of protocols which are not as efficient or as elegant, but empower the customers. I’m not a fan of trade-offs, but they are inevitable in life and business.

If you are excited about a real-time offering, then join the community and ask for things.

July 6th, 2022 Woe betide anyone that uses MQTT (at scale) By Jeffrey M. Barber

Shift your mind towards reliability

Flow control and Adama