Message based concurrency

Message based concurrency is a share nothing approach: no business object is shared between the various collaborating systems. Note that ‘systems’ here covers both application/services and thread, and they are usually referred to as ‘agents‘, ‘actors‘ or ‘nodes‘, depending on the messaging strategy. I will refer to those as ‘agents’ for the rest of this entry.

Instead of sharing entities, agents send messages detailing part or whole current state of one or more entities. For example, whenever a new article is added to a basket by an agent, it will send the content of the basket to the agent dedicates to computing the total value of the basket, which in turn will communicate back the updated total.
This approach is also known as ‘Event driven’ architecture.
Note that this approach fits nicely with inversion of control, and often include facilities allowing any agent to receive notification for any entity it is interested in.

Pros

  • Share nothing means no race condition (at the entity/data level)
  • Message passing helps keeping coupling low
  • It eases horizontal scalability (message can be in or cross process)

Cons

  • Synchronization is hard, but less needed
  • Provides eventual consistency (due to little to no synch)
  • Harder do debug

Overall message based concurrency is a very good model. Learning curve is a bit steep, but resulting systems are stable and scalable. Unless hard consistency is a strong requirement, you should consider using message based concurrency before anything.

This is my personal favorite for quite some time now, and I am still waiting for a use case where it would not be the best approach.

The hard parts are:
– living in an eventually consistent world
– managing asynchronous operations
– identifying the adequate libraries

Advertisements

The reactive manifesto

The reactive manifesto states that modern systems must be reactive to offer the expected level of service. In order to achieve those objectives, those need to be event-driven, message based designs. And that those key design decisions help have them scalable, resilient and reactive.

If you are a regular reader, event-driven and message based should definitely ring a bell here!

http://www.reactivemanifesto.org/ ,,

OOD and concurrency

LOOP: Lego Object Oriented Programming
LOOP: Lego Object Oriented Programming (Photo credit: jurvetson)

Some background first

Object Oriented Development is still the dominant programming model. It has replaced Structured Development something like 20 years ago. New languages made interesting breakthrough recently (Erlang, Python), but none of them is in a position to get the throne: Functional programming is powerful, but hard to master, and dynamic typing is expressive but leads to code mess.

Its success mainly comes from the accessibility of C++. Many interesting proposals were available at the time, including Smalltalk, but C++ ruled them all simply by being pragmatic.

Thanks to OOD, developers made significant progress towards the Holy Grail: code reuse. To that effect, it offered three major paradigms:

  • inheritance: reuse of code by having a class inherit from other(s). Permit changes done on the super class to trickle to the derived classes
  • encapsulation: the concept of interface: some code can manipulate entities thanks to some contract. The implementation details can change without impacting the code (as long as the contract does not change)
  • polymorphism: those contracts can be implemented by many classes, each of them having various side effects

Initially, there was a strong emphasis on inheritance, but, years of experience as taught us that the value lied in encapsulation and polymorphism, because they permit low coupling. Inheritance turned out to be more an embarrassment than anything else and is often used by opponents.

What about concurrency?

Concurrency is orthogonal to OOD. And as OOD implied linear execution, concurrency wreaks havoc in that perfect world. Methods are transactions that change or consume the state of the object (this/self). If two methods are executed simultaneously, the object sate will shift unexpectedly, e.g: the key found is some internal dictionary may be removed by a delete command, there is an error message but there is no error code, etc…

The traditional workaround for this is to rely on exclusive access, i.e. using synchronized in Java and locks for other languages.
But, we know that locks contain so many pitfalls that you cannot expect to produce durable code using them.
That pushed many highly skilled developers to reject OOP altogether and embrace a more radical approach: functional programing.
Clojure is their flagship, allowing to implement easily scalable code thanks to a share nothing paradigm.

I have a different view on this, I strongly believe that any solution will a steep learning curve will fail, as most of the developers would be kept out of it. That being said, I am very strong supporter of immutability, message passing and asynchronous programming. It is just that I am sure you can combine this with good old OOD.

Well, I have been doing that for more than 7 years now.

Concurrency Paradigms: Data parallelism

Data parallelism is the approach where the same program/algorithm can be applied to multiple subsets of data in parallel. This is not a new approach and it has variants:

  • vector computing: data are expressed as vector, often of fixed size, and all entries are processed in parallel. This technique is used in all current GPUs and exists in most current CPU (SSEx instructions for Intel, as an example)
  • mass parallelism: systems using many dumb nodes with dedicated memory and small
    English: CRAY-1 on display in the hallways of ...
    English: CRAY-1 famous SIMD calculator (Photo credit: Wikipedia)

    processing capabilities

In C#, the Parallel.For method (and its variants) help to implement data parallelism.

Pros:

  • Scales well as the dataset is usually far larger than the number of available execution units (think thousands versus tens)
  • Many helping framework/libraries exist
  • Primitives are friendly
  • Can be used for horizontal scalability as well

Cons:

  • Only makes sense at the algorithm level
  • Does not work for all algorithms

Bottom line: Dominant model for scientific/computational intensive topics.

Concurrency Paradigms: Task parallelism

Task parallelism is the approach where the application execution is broken down in smaller independent tasks. Those tasks are typically executed by a thread pool; as long as they are independent, they can be executed concurrently. If any synchronization or mutual exclusion is needed, one can still use locks or events as usual; but you need to remain cautious, as thread pools and synchronization may not play well together and can lead to tricky deadlock or livelock situations.

The strongest attribute of this paradigm is the fact that it scales very well, assuming the program is able to provide enough independent tasks to feed the thread pool.

This approach has gained a lot of traction since the new millennium, especially as it fits very well the basic needs of http servers: they expose a stateless service,each request being short-lived.

Pros:

  • Scale well
  • Relatively easy to master

Cons:

  • No help regarding synchronization needs
  • Implies tasks are short-lived.

Bottom line: THE PARADIGM OF CHOICE for server-side applications, but you need to think ahead your synchronisation/mutex needs.