Data parallelism is the approach where the same program/algorithm can be applied to multiple subsets of data in parallel. This is not a new approach and it has variants:
- vector computing: data are expressed as vector, often of fixed size, and all entries are processed in parallel. This technique is used in all current GPUs and exists in most current CPU (SSEx instructions for Intel, as an example)
- mass parallelism: systems using many dumb nodes with dedicated memory and small
English: CRAY-1 famous SIMD calculator (Photo credit: Wikipedia) processing capabilities
In C#, the Parallel.For method (and its variants) help to implement data parallelism.
Pros:
- Scales well as the dataset is usually far larger than the number of available execution units (think thousands versus tens)
- Many helping framework/libraries exist
- Primitives are friendly
- Can be used for horizontal scalability as well
Cons:
- Only makes sense at the algorithm level
- Does not work for all algorithms
Bottom line: Dominant model for scientific/computational intensive topics.