TPL findings

I designed a event driven concurrency framework several years ago for the .net eco system. The name is IA+ Threading and it is still in use and gaining grounds. While I will be discussing its specificities at some future date, let us just say we have so far discarded any attempt to migrate to TPL, mostly because we are unable to implement a similar design.
Of course smart newcomers challenge this choice and pile up arguments in favor of TPL. Main concerns were:
– performance: TPL is expected to be the fastest kid on the block, due to superior skillset and larger budget. Fair assumptions.
– data parallelism: IA+ lacks anything ressembling Parallel.For
– no support for task canceletion

Our framework has been designed to implement massive real time systems, such as pricers or arbitration automata. Definitely performance was a major requirement. Performance both from a throughput and latency point of view, and we were not really concerned.

We discussed the introduction of data parallelism into the framework; it was defintely a new paradigm and a shift from the original intent. But from a pragmatic point of view, it only required implementing the equivalent of Parallel.For and Parallel.ForEach. So we decided to include those feature and bechmark our implementation against TPL.

We got many unexpected results and findings that are worth sharing:

  • TPL implementation is aggressively unfair: by default it grabs all CPU to complete its task. It implements long running tasks and work stealing techniques. It means that one should not expect any other task to be executed as long as some Parallel.For is in progress.
  • TPL options to limit concurrency are to be regarded more as hints than hard limits. Our tests hinted that more threads than expected were active. We did not make exploratory tests to understand the algorithm, so no definitive conclusion./li>
  • The .Net threadpool sizing is done on first use and takes into account processor affinity, i.e. the number of cores made available to the process and not simply the number of existing cores
  • On Hyperthreaded CPU, core usage readings are misleading. This will be part of a specific post

The main message is you should not mix task and data parallelisms in any low latency system. And for those you may be interested, our framework fared well, going par with TPL in terms of speed and scalability; we are actually 2-3% slower (than TPL) but used 5% less CPU. Our assumpttion is this delay is linked to the fact that the invoking thread does not participate to the //.For, and upur superior efficiency (lesss CPU) is due to IA+ offering less options and is leanier than TPL.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.