The Sequencer (part 2.1)

Update

I received two more proposals I will comment soon.

Original

I closed the last episode with a little exercise for my readers: suggest me how to complete my requirements, namely by ensuring guaranteed ordering for the sequencer.

Alas, only two were skilled or brave enough to face the challenge and try an answer, and I thank them for that.

The two proposal were similar, but freeman did provide a gist, so let’s discuss it


using System;
using System.Collections.Generic;
using System.Threading;
namespace Sequencer
{
using System.Collections.Concurrent;
public class Sequencer
{
private readonly ConcurrentQueue<Action> _pendingTasks = new ConcurrentQueue<Action>();
private readonly object _running = new Object();
public void Dispatch(Action action)
{
// Queue the task
_pendingTasks.Enqueue(action);
// Schedule a processing run (may be a noop)
ThreadPool.QueueUserWorkItem( x=> Run());
}
// run when the pool has available cpu time for us.
private void Run()
{
if (Monitor.TryEnter(_running))
{
try
{
Action taskToRun;
while (_pendingTasks.TryDequeue(out taskToRun))
{
taskToRun();
}
}
finally
{
Monitor.Exit(_running);
}
}
}
}
}

view raw

Sequencer_1.cs

hosted with ❤ by GitHub

Definitely, this will capture and secure the order of execution. And I like the smart use of TryEnter, allowing to get rid of the boolean used to store the state

But, and this is a big but, this solution violates another (implicit) requirement. I have to apologize for this one, as I failed to state it earlier.

But you know customers: they understand what they need when the development is over :-).

That requirement is fairness: fairness between Sequencer instances as well as fairness between Sequencers and other tasks. Fairness is to be understood as the guarantee that all submitted tasks will eventually executed,that they have equivalent access to execution units (i.e. cores), and with similar delays.
It means that no system can gain exclusive access to execution unit(s) ans that tasks are executed roughly in the order they are submitted.

This being defined, this solution is not fair. Can you tell me why?

Note: here is a gist for the first proposal

The Sequencer (part 2)

In my last post I made a promise: showing the implementation of Sequencer

So please, bear with me while I review the requirements. The sequencer

  1. Is a facade: wraps an execution context, such as a thread or a pool of threads
  2. Prevents race conditions: enforces sequential execution of submitted tasks
  3. Guarantees ordering: submitted tasks are processed in the order they are submitted
  4. Avoid useless consumptions of resources.

The first one is simply a design choice, no need to elaborate. Next is how to prevent race conditions: as the execution basis is a lambda/delegate/runable of some sort, we need to prevent concurrent execution of those.

A simplistic state machine can help us. The possible states are running and waiting; transitions are triggered by incoming tasks.
If the machine is in the ‘waiting‘ state, it transitions to ‘running‘ and executes the task; when the task is done, it transitions back to ‘waiting‘.
But if the machine is in the ‘running‘ state, execution cannot start, it needs to wait for the appropriate state.
It can be implemented with a lock, the ‘running‘ state being embodied by the lock being held. What interests me is how we deal with conflicts, i.e. when a task comes in while the state is still ‘running‘.
If we respect the lock semantic, the incoming request is blocked until we reached the desired state. Blocking implies waste and potential deadlock. So no go there.

We need to complexify a bit our logic. So, when a task is submitted for execution and the state is ‘running‘, we can store it for a differed execution. There is no need to keep to current thread blocked.
As a consequence, the transition out of ‘running‘ is now: when the task is executed, if there is at least another task pending execution, we execute it (state remains ‘running‘). When there are no pending task, we can safely transition to ‘waiting‘.

Ok, time to look at some code. Disclaimer: this implementation is for educational purpose, I know it will look ugly to some, slow to others,… you name it. But I do not care, so bear with me.
For non C# readers: an Action is a parameter-less procedure ( i.e. no return value).


using System;
using System.Collections.Generic;
using System.Threading;
namespace Sequencer
{
public class Sequencer
{
private Queue<Action> _pendingTasks = new Queue<Action>();
private bool _isRunning;
private readonly object _lock = new Object();
public void Dispatch(Action action)
{
// here we queue a call to our own logic
ThreadPool.QueueUserWorkItem( (x)=> Run((Action) x), action );
}
// run when the pool has available cpu time for us.
private void Run(Action action)
{
lock(_lock) {
if (_isRunning) {
// we need to store and stop
_pendingTasks.Enqueue(action);
return;
}
// ok, we can run
_isRunning = true;
}
while (true) {
// execute the next action
action ();
// we check if others are available
lock (_lock) {
if (_pendingTasks.Count == 0) {
_isRunning = false;
return;
}
// pop the next task
action = _pendingTasks.Dequeue();
}
}
}
}
}

view raw

Sequencer_1.cs

hosted with ❤ by GitHub

As you can see, the actual implementation is slightly more complex than described earlier. Indeed, we need to cater for two atomic transactions:

  1. On a new task: starting it right away or queuing it
  2. When the task is done: fetching a new one or ‘freeing’ the sequencer

Lets look at the requirements

  • Requirement #1: we wrap an existing execution mechanism (the .Net thread pool). Nothing fancy here, it will be improved in a future revision
  • Requirement #2: done as well, it has been specifically for that purpose
  • Requirement #3: we did nothing specific here, but on the other hand it looks ok
  • Requirement #4: by using a differed execution mechanism, we prevented the need to block threads, something we know is expensive

So, are we done now?

Nope, requirements #3 is definitely not covered. On a multicore machine (basically any machine nowadays), multiple threads (from the thread pool) can start to work on tasks at the same time, and there will be a race to queue them in the sequencer.

We need to refine this implementation.
What would be your proposal to have ordering guarantee?

The Sequencer (part 1)

Several years ago, I was lead on a new mass market making system. One of the objectives of the project was to have low latencies and controlled throughput, target latency was < 1 ms isolated event latency and have a 99 percentile below 10 ms.

Throughput had to be controlled because downstream bandwidth was reduced to a trickle, a mere 50 updates/sec while the inputs were in the order of 2000 updates/sec. So the system required an arbitration algorithm for this output decisions.

At that time, I still had a burning memory of my time as a lead C++ dev on a in house multithreaded fat client app: I basically spent the best part of two years rebuilding the threading model from scratch, a long and painful process. This definitely was an apprentice journey towards multithreading craftsmanship.

Sadly, I ultimately came to the conclusion that, at that time, building and maintaining a multi threaded app was a daunting task.

Therefore I decided to push for a radically different approach: managing concurrency without any locks.

When you think about it, what is the best way to manage concurrency?

First of all by realizing there is no concurrency per se. 99.99% of the time you’ll have locks that enforce mutual exclusion. Managing concurrency means preventing actual concurrency from happening in the first place.

And then was born the ‘Sequencer’.

The ‘Sequencer’ is a design pattern that has been created in C# and replicated in Java and C++. The contract is simple:

The Sequencer:OLYMPUS DIGITAL CAMERA

  • Is a facade: wraps an execution context, such as a thread or a pool of threads
  • Prevents race conditions: enforces sequential execution of submitted tasks
  • Guarantees sequentiality: submitted tasks are processed in the order they are submitted

And that’s it. The Sequencer was coupled with another simpler pattern: the ‘Dispatcher’ in charge of providing actual execution capacity. There are several variant implementations of IDispatcher but I will present them on a future post.

Actually, the Sequencer is a specific implementation of an IDispatcher.

It looks a lot like an actor without the messaging infrastructure.

Let’s talk implementation.

Behind its apparent simplicity, the Sequencer hides moderate complexity. But I’ll talk further about it in an upcoming post.

Quality is too serious to be left in the Craftsmen’s hands!

First contact

I must confess I have something of an ambivalent relationship with the ‘Software Craftsmanship‘ movement. My first contact with it was the discovery of some post about ‘developer’s pride’. personally, I am more a ‘developer’s humility’ kind of guy! Boasting pride sound stupid and short-sighted to me. Reading the manifesto and other articles somewhat moderated my initial reaction: striving for continuous self-improvement, promoting test driven practices are objectives I deeply believe in. Then my next realisation was that Software Craftmanship was basically promoting XP practices. But, instead of making them core to a team based effort, the craftsmanship movement promotes the idea that those must be a personal initiative.

Kinda cute but…

It looks fairly naive to me. Don’t get me wrong, it is important to embrace a quality driven approach, but if you believe you can transform such a large industry by the sheer goodwill of some engineers, I am sorry to say you are gonna be very disappointed. Furthermore, I think this may even be counterproductive as it will eventually create a split between craftsmen and your average developers, the formers being more consistent in their delivery than the seconds, but being significantly more expensive too.

By the way, lets speak it out in the open: quality has a cost, period. And let’s go to the core of the argument: Quality is too serious to be left in the Craftsmen’s hands.

Of course

Yes, it is often a smart investment, but it has an upfront cost. Therefore, it must not be hidden to the customer/product owner/sponsor. Most IT projects main deliverable is a code base and processes that transform this code base to a working software. And this code base is more often than not expected to be extendable, at least decently maintainable. That implies that automated tests are part of that deliverable, and one that significantly enhance the maintainability of it.

The question is then: what is the business value of tests?

Every other industry that I am aware of have a decent appreciation of the value of tests.

At the lower end you will find cheap plastic contraptions made by some sub par eastern factory: we don’t need test to understand we sell crap and replacing is cheaper than fixing.

At the higher end you will find the aeronautic industries where extensive testing is the norm: whenever someone fails, the whole industry is at risk.

Both of those testing strategies are relevant to the business model of the respective industry.

USE THE SAME APPROACH FOR YOUR PROJECTS!

negotiate with your customer/product owner the investment on tests. You have metrics to prove the effort:

– coverage (line or branch)

– mutant testing (demonstrate effectiveness)

Of course, regulatory constraint may as well drive the expected level of quality and testing.

So what about Craftmanship?

If you wanna engage on the Craftmanship journey, you have my full support. But understand that you have to adapt your testing practices to the wishes of your customer.

 

The secret for 100% test coverage: remove code

Update note

Based on the interesting feedback I got (can be seen on Tom’s ramblings), I realized this post probably needed some tweaking and scope precision. I did put them at the end.

What is the adequate objective for test coverage?

60%?

80%?

99%?

100%?

Like many others, I have often pondered this question, like many before me, and many after I suppose. Why aiming for 100%? The 80/20 law clearly applies to test coverage: to try to cover every corner cases that lie in the code is going to require a significant investment in time and brain cells. Plus integration point can never really be properly covered.

On the other hand, having 100% coverage provides huge benefits:

  1. Every single line of code is constantly tested
  2. Trust in code is high
  3. Any uncovered line is a regression

What happens if the target is, 80%:

  1. Significant part of the code is never tested
  2. Trust in code is moderate and can degrade
  3. 20% uncovered line codes is significant, 2000 lines for a 10K code base. That means full namespaces can hide in there.

For me, there is no question 100% coverage is the only worthy objective, do not settle for less.

Yes there are exceptions, usually at integration points. Mocks are not a real solution either, they can help you increase your coverage but not by that much. The pragmatic solution is to wrap them into isolated modules (jars/assemblies). Think hexagonal architecture there. You will have specific coverage target for those, you also need to make sure that no other code creeps in and finally, understands those are weak points in your design.

While I am working on nFluent, I constantly make sure unit tests exert every single line of code of the library. It also means that I help contributors reach the target. It is not that difficult, especially if you do TDD!

There is one simple golden rule: to reach and maintain 100% coverage, you do not need to add tests, you have to remove not covered lines!

This is important, so let me restate that: a line of code that is not covered is not maintainable, must be seen as not working and must be removed!

Think about it:

  1. The fact that no automated test exists means that the behavior can be silently changed, not just the implementation!
  2. Any newcomer, including your proverbial future self, will have to guess the expected behavior !
  3. What will happen if the code get executed some day in production?
  4. If you are doing TDD you can safely assume the code is useless!

So, when you discover not covered lines, refactor to remove them or to make sure they are exerted. But do not add tests for the sake of coverage.

Having 100% coverage does not mean the code is bug free

Tom’s comments implied that I was somehow trying to promote this idea. 100% coverage is no bug free proof at all, and do not imply this at all. Quality and relevance of your tests are essential attributes; that is exactly why I promote removing non tested lines. Any specially crafted test will not be driven by an actual need and would be artificial. The net result would be a less agile code base.

On the other hand, if you have 100% coverage and you discover some reproducible bug, either by manual testing or in production, you should be able to add an automated test to prevent any re occurrence.

When coverage is insufficient, there is a high probability that you will not be able to add this test, keeping the door open for future regression!

If you want to build trust based on coverage metrics, you need to look into branch coverage and property based testing at the very least. But I do not think this is a smart objective.

Note

  • This post focuses  on new code! For legacy code, the approach should be to add tests before anything else, and never remove working code 🙂