In a React world, everything is an event that anybody can grab and process according to whatever its responsibility is. The notion of event allows for very low contract coupling; it acts as a medium between classes. A message is similar, but with more coupling as it aggregates endpoints or endpoints addresses.
This model offers flexible design where events flow and are processed through the systems. But what happens when you have too many events to process?
If you do not act, the system will exhibit delays, and if the flow does not slow down, the delay will get larger and larger.
This is not acceptable for many systems and this post will offer you some tackling strategies and rules of thumb when designing a system.
I am busy AND late!

The first one is to use horizontal or vertical scalability, but both of those have their limits. And more often than not, scalability is limited by the available budget. Your only option then is to try to limit the workload.
Batching
This one is well-known. The idea is to handle the tasks in batches instead of one by one. It typically allows for scale optimization, as part of the processing can be done for a set of events instead of once per event.

If preprocessing steps are time-consuming, such as opening a file or some connection to be established, batching greatly improves throughput, but not latency. Alas, one usually obtain marginal gains when other tactics are available, such as keeping the resources open for the whole duration.
Throttling/Capping
It means that you explicitly control somehow the incoming workload. Thereby you simply prevent any system to submit requests/events/workload until you are in a position to process it. This is typically done through the TCP flow control mechanism.

It is a very efficient mechanism but you are transferring your workload problem to some upstream system. If you are the sole consumer, the producer can apply any of those strategy to cope with a slower downstream system.
But if multiple consumers are present, this is a problem and the source may soon be forced to disconnect/stop transmitting to the slow system. This is a conversation for a future post.
Sampling
It is a very crude method, that implies that you are ok to discard a significant part of the workload. Two algorithms dominate the approach:
- keeping one (1) sample out of every n samples
- keeping one (1) sample on a periodic basis, such as one per second

The first one is the simplest to implement, but as you divide the incoming workload by a constant, there is still a chance that the system may be overwhelmed by incoming traffic. But dividing traffic can only bring you so far, and a ten time increase can bring your system on its knees.

The second one is a bit trickier to implement, but it provides you with a controlled maximum workload; this one is interesting as you put a cap on the processed traffic. It is an interesting solution, the main downside is that it still incur an extra latency, especially when incoming traffic is limited. So it is time to look for another solution.
Conflating
This means that you are able to conflate the incoming workload through an n to 1 transformation. In layman terms, imagine you have a queue of pending work that you can capture at once and compress to a single task. This implies that work tasks are similar and can be transformed with an acceptable loss of information.
In practice this is often applicable by topic/stream and related to measures of some kind.

Such as CPU thermal capture reading for fan speed control: you care only for the current temperature, so if multiple readings are pending, you can keep the latest and dropped the rest.
Imagine you have the following readings in your queue: 50,52,54,52,60; the first value being the oldest. You can process 60 and empty the queue. Alternatively, you could compute the average (53.6) and process it. Or a weighting system to favor recent readings against old one.
You get the idea.
Two assumptions for that to apply are:
– you can compress the incoming data with acceptable impact to the system responsibility
– the ‘compression’ induced workload is negligible in regard to the actual processing.
At the end of the day
First thing first: understand the behavior of your inputs
Secondly: identify upstream data importance, whether you can drop some events or not.
Thirdly: measure, measure and then measure, just to be sure.
Finally: plan for when your system will be overloaded and late and fail responsibly!!
Nice post!
I simply disagree with one statement: “batching greatly improves throughput, but not latency”.
Even if it’s counter intuitive, a smart batching may sometimes improve both the throughput, but also the average latency (I know, I know… we shouldn’t find ‘latency’ and ‘average’ in the same sentence 😉
Refers to Martin Thompson’s Smart Batching post for further details:
http://mechanical-sympathy.blogspot.fr/2011/10/smart-batching.html
LikeLike
Thanks for the appreciation.
I have the usual/basic batching strategy. Smart batching is different and typically assumes contention to be beneficial. It would deserve a dedicated entry. But the post is already long.
LikeLike
Thanks for the word of appreciation and the pointer.
Indees, there are occurences where batching can reduce latency versus a basic per event implementation.
Martin Thompson’s example is pretty extreme in the sense that he assumes on overhead/data ratio of 10, i.e 10 bytes of overhead versus 1 byte of data.
Yes it can happen, but this is not the general case either.
LikeLike