Transcript Slide 1

Designing and Evaluating
Parallel Programs
Anda Iamnitchi
Federated Distributed Systems
Fall 2006
Textbook (on line): Designing and Building Parallel Programs
by Ian Foster
(http://www-unix.mcs.anl.gov/dbpp/)
Parallel Machines
Flynn's Taxonomy
First proposed by Michael J. Flynn in 1966:
• SISD: single instruction, single data
• MISD: multiple instruction, single data
• SIMD: single instruction, multiple data
• MIMD: multiple instruction, multiple data
A Parallel Programming Model:
Tasks and Channels
Task operations:
• send msg
• receive msg
• create task
• terminate
In practice:
1. Message passing:MPI
2. Data parallelism
3. Shared memory
Parallel Algorithms Examples:
Finite Difference
Parallel Algorithms Examples:
Pairwise Interactions
Molecular dynamics: total force fi acting on atom Xi
Parallel Algorithms Examples:
Search
Parallel Algorithms Examples:
Parameter Study
Parallel Program Design
Partitioning
Domain Decomposition:
Functional Decomposition:
Partitioning Design Checklist
• Does your partition define at least an order of magnitude
more tasks than there are processors in your target
computer?
• Does your partition avoid redundant computation and
storage requirements?
• Are tasks of comparable size?
• Does the number of tasks scale with problem size?
Ideally, an increase in problem size should increase the
number of tasks rather than the size of individual tasks.
• Have you identified several alternative partitions? You
can maximize flexibility in subsequent design stages by
considering alternatives now. Remember to investigate
both domain and functional decompositions.
Communication
Local
Unstructured and Dynamic
Global
Asynchronous
Communication Design Checklist
•
•
•
•
Do all tasks perform about the same
number of communication operations?
Does each task communicate only with a
small number of neighbors?
Are communication operations able to
proceed concurrently?
Is the computation associated with
different tasks able to proceed
concurrently?
Agglomeration
Increasing Granularity
Replicating Computation
Sum all and store it on all nodes.
• array: 2(N-1) steps
• tree: 2 log N steps
Ring instead of array in (N-1) steps?
Replicating Computation
Agglomeration Design Checklist
•
•
•
•
•
•
•
•
Has agglomeration reduced communication costs by increasing locality?
If agglomeration has replicated computation, have you verified that the
benefits of this replication outweigh its costs, for a range of problem sizes
and processor counts?
If agglomeration replicates data, have you verified that this does not
compromise the scalability of your algorithm by restricting the range of
problem sizes or processor counts that it can address?
Has agglomeration yielded tasks with similar computation and
communication costs?
Does the number of tasks still scale with problem size?
If agglomeration eliminated opportunities for concurrent execution, have you
verified that there is sufficient concurrency for current and future target
computers?
Can the number of tasks be reduced still further, without introducing load
imbalances, increasing software engineering costs, or reducing scalability?
If you are parallelizing an existing sequential program, have you considered
the cost of the modifications required to the sequential code?
Mapping
Recursive Bisection
Task Scheduling
Mapping Design Checklist
• If considering an SPMD design for a complex problem,
have you also considered an algorithm based on
dynamic task creation and deletion?
• If considering a design based on dynamic task creation
and deletion, have you also considered an SPMD
algorithm?
• If using a centralized load-balancing scheme, have you
verified that the manager will not become a bottleneck?
• If using a dynamic load-balancing scheme, have you
evaluated the relative costs of different strategies?
• If using probabilistic or cyclic methods, do you have a
large enough number of tasks to ensure reasonable load
balance? Typically, at least ten times as many tasks as
processors are required.
Case Study: Atmosphere Model
Approaches to Performance
Evaluation
• Amdhal’s Law
• Developing models:
– Execution time:
• Computation time
• Communication time
• Idle time
– Efficiency and Speedup
• Scalability analysis:
– With fixed problem size
– With scaled problem size