Concurrency Paradigms: Data parallelism


Data parallelism is the approach where the same program/algorithm can be applied to multiple subsets of data in parallel. This is not a new approach and it has variants:

  • vector computing: data are expressed as vector, often of fixed size, and all entries are processed in parallel. This technique is used in all current GPUs and exists in most current CPU (SSEx instructions for Intel, as an example)
  • mass parallelism: systems using many dumb nodes with dedicated memory and small
    English: CRAY-1 on display in the hallways of ...
    English: CRAY-1 famous SIMD calculator (Photo credit: Wikipedia)

    processing capabilities

In C#, the Parallel.For method (and its variants) help to implement data parallelism.

Pros:

  • Scales well as the dataset is usually far larger than the number of available execution units (think thousands versus tens)
  • Many helping framework/libraries exist
  • Primitives are friendly
  • Can be used for horizontal scalability as well

Cons:

  • Only makes sense at the algorithm level
  • Does not work for all algorithms

Bottom line: Dominant model for scientific/computational intensive topics.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.