Designing a Usage/Limit system to avoid large bills

Since this platform aims to be a next-gen new kind of all-in-one server-less thing, I want to avoid the criticism of “infinite cost” (and being shamed on Serverless horrors). I too believe that all vendors should provide a maximum usage cap, and I’m shocked it isn’t done in more places. Actually, I’m not shocked since the lack of usage caps maximizes shareholder value.

However, my goal is to provide a reasonable usage cap that can be understood well enough and also provide a reasonable degree of precision. “Precision” is what this hard because the moment you check-in with the accountant, cost has incurred. There are three types of machine that will submit billing records: web, adama, and overlord. The web fleet needs to submit bandwidth and request records. The adama fleet submits records around cpu, messaging, memory, backups, and etc. The overlord fleet submits storage charges.

I’m not comfortable with doing any kind of cap on storage as that requires rejecting requests which become exceptionally problematic. However, the two places I’m comfortable placing caps are (1) bandwidth, and (2) compute/memory.

Bandwidth Usage Caps

This is the macro concern and top priority, so I’ll take a multi-tiered approach as this will need to grow in sophistication to maximize value. At first, I intend to create a bandwidth token budget that can refill periodically and then coordinate with the billing document. At the end of the day, I want to give three cost control mechanisms:

Maximum Hourly Crawl Bandwidth over a Day (GB)
Maximum Hourly Public Bandwidth over a Day (GB)
Maximum Hourly Authenticated Bandwidth over a Day (GB)

The easiest thing per bucket of bandwidth is just have a token bucket that gets refreshed by coordinating with a central resource (in Adama’s universe, the billing document). The difficult aspect is sharing that between many web servers, and the web servers have to be involved as Adama is also a cache/CDN. The number of machines become a huge multiplier of over-charge, so we have to divide the tokens between the machines as we vend them out. This is why I’m making the granularity of the rate limit in terms of an hour “over a day” as we expect there to be surges which we want to be tolerant of (and this helps maximize shareholder value too).

The algorithm when requesting a token refresh is to sum the bandwidth for the bucket over the prior 24 hour, subtract that from max-hour-per-hour * 24, then return that value divided by number of web servers. At a certain point, this will start to return low-values, and if we have too many web servers then we just choke out. There are many ways to overcome this challenge using fun techniques to reduce the pressure on asking a coordinating entity, so I’ll skip those details.

The different buckets are precisely to cater to various markets, and here is where I’ll need to grow in sophistication such that I could apply machine learning to classify a traffic pattern as either “good faith actor” versus “bad faith person”. The initial goal is to apply simple rules to classify traffic into the various buckets, but this can grow into various protocols to validate actual humans are using the thing.

Compute / Memory

Fortunately, the design of Adama makes it advantageous to introduce usage caps around the most scarce resource: memory. From that, we also get a bounded amount of compute, so that’s nice. Here, we will give two macro knobs: minimum capacity and maximum capacity. This is ultimately going to inform capacity planning and traffic routing to constrain capacity to hosts, and then each host will constrain itself to an appropriate budget without need for coordination as this also helps minimize noisy neighbor aspects of the service.

So, we will introduce

Minimum Bricks
Maximum Bricks

Then, each brick gets two knobs for memory:

Soft Maximum Hourly of Memory (MB)
Hard Maximum Hourly of Memory (MB)

The soft cap will prevent the loading of documents while the hard cap will start the process of tearing down existing documents. Each brick then gets a soft and hard maximum for CPU

Soft Maximum CPU usage per hour (minutes).
Hard Maximum CPU usage per hour (minutes).

Such that the soft cap will reject new traffic and the hard cap will start shedding existing traffic.

Now, I just have to define a brick. Chances are, it will be a single thread on a single machine, so two bricks will use two threads on distinct machines. Something like that. However, this gets very confusing quickly, and an alternative is to just set a maximum budget and work backwards. So I’ll create a formula that converts $ to these six parameters and then enable experts to tune their system as the formula evolves over time to maximize experience.

Future

Ultimately, I believe in usage caps are a good fit, but they are antagonistic in many ways. As the platform evolves, I’ll gather data to offer various bundles which will increase my profits and provide at the expense of low-traffic users. I’m not sure how I feel about that in general, so I want to maintain the faith that the market will demand usage rates.

February 28th, 2024 Designing a Usage/Limit system to avoid large bills By Jeffrey M. Barber

Bandwidth Usage Caps

Compute / Memory

Future