Workload management strategies

In a React world, everything is an event that anybody can grab and process according to whatever its responsibility is. The notion of event allows for very low contract coupling; it acts as a medium between classes. A message is similar, but with more coupling as it aggregates endpoints or endpoints addresses.
This model offers flexible design where events flow and are processed through the systems. But what happens when you have too many events to process?

If you do not act, the system will exhibit delays, and if the flow does not slow down, the delay will get larger and larger.
This is not acceptable for many systems and this post will offer you some tackling strategies and rules of thumb when designing a system.

I am busy AND late!

lines are time line, bullets for event, paint lines represent processing time
lines are time line, bullets for event, paint lines represent processing time

The first one is to use horizontal or vertical scalability, but both of those have their limits. And more often than not, scalability is limited by the available budget. Your only option then is to try to limit the workload.


This one is well-known. The idea is to handle the tasks in batches instead of one by one. It typically allows for scale optimization, as part of the processing can be done for a set of events instead of once per event.

Processing events in batch provides only marginal optimization
Processing events in batch provides only marginal optimization

If preprocessing steps are time-consuming, such as opening a file or some connection to be established, batching greatly improves throughput, but not latency. Alas, one usually obtain marginal gains when other tactics are available, such as keeping the resources open for the whole duration.


It means that you explicitly control somehow the incoming workload. Thereby you simply prevent any system to submit requests/events/workload until you are in a position to process it. This is typically done through the TCP flow control mechanism.

The application can control/throttle the incoming flow.
The application can control/throttle the incoming flow.

It is a very efficient mechanism but  you are transferring your workload problem to some upstream system. If you are the sole consumer, the producer can apply any of those strategy to cope with a slower downstream system.

But if multiple consumers are present, this is a problem and the source may soon be forced to disconnect/stop transmitting to the slow system. This is a conversation for a future post.


It is a very crude method, that implies that you are ok to discard a significant part of the workload. Two algorithms dominate the approach:

  1. keeping one (1) sample out of every n samples
  2. keeping one (1) sample on a periodic basis, such as one per second
Here we drop every other samples
Here we drop every other samples

The first one is the simplest to implement, but as you divide the incoming workload by a constant, there is still a chance that the system may be overwhelmed by incoming traffic. But dividing traffic can only bring you so far, and a ten time increase can bring your system on its knees.

A sample is snapshoted at a fixed frequency to be propagated/processed
A sample is snapshoted at a fixed frequency to be propagated/processed

The second one is a bit trickier to implement, but it provides you with a controlled maximum workload; this one is interesting as you put a cap on the processed traffic. It is an interesting solution, the main downside is that it still incur an extra latency, especially when incoming traffic is limited. So it is time to look for another solution.


This means that you are able to conflate the incoming workload through an n to 1 transformation. In layman terms, imagine you have a queue of pending work that you can capture at once and compress to a single task. This implies that work tasks are similar and can be transformed with an acceptable loss of information.
In practice this is often applicable by topic/stream and related to measures of some kind.

Discarding events that are already late/superseded by a new value
Discarding events that are already late/superseded by a new value

Such as CPU thermal capture reading for fan speed control: you care only for the current temperature, so if multiple readings are pending, you can keep the latest and dropped the rest.
Imagine you have the following readings in your queue: 50,52,54,52,60; the first value being the oldest. You can process 60 and empty the queue. Alternatively, you could compute the average (53.6) and process it. Or a weighting system to favor recent readings against old one.
You get the idea.
Two assumptions for that to apply are:
– you can compress the incoming data with acceptable impact to the system responsibility
– the ‘compression’ induced workload is negligible in regard to the actual processing.

At the end of the day

First thing first: understand the behavior of your inputs

Secondly: identify upstream data importance, whether you can drop some events or not.

Thirdly: measure, measure and then measure, just to be sure.

Finally: plan for when your system will be overloaded and late and fail responsibly!!

3 thoughts on “Workload management strategies

  1. Nice post!

    I simply disagree with one statement: “batching greatly improves throughput, but not latency”.

    Even if it’s counter intuitive, a smart batching may sometimes improve both the throughput, but also the average latency (I know, I know… we shouldn’t find ‘latency’ and ‘average’ in the same sentence 😉

    Refers to Martin Thompson’s Smart Batching post for further details:


    1. Thanks for the appreciation.
      I have the usual/basic batching strategy. Smart batching is different and typically assumes contention to be beneficial. It would deserve a dedicated entry. But the post is already long.


    2. Thanks for the word of appreciation and the pointer.

      Indees, there are occurences where batching can reduce latency versus a basic per event implementation.

      Martin Thompson’s example is pretty extreme in the sense that he assumes on overhead/data ratio of 10, i.e 10 bytes of overhead versus 1 byte of data.

      Yes it can happen, but this is not the general case either.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.