CSE 574 Parallel Processing
Download
Report
Transcript CSE 574 Parallel Processing
Lecture 3:
Designing Parallel Programs
Methodological Design
Designing and Building Parallel
Programs
by Ian Foster
www-unix.mcs.anl.gov/dbpp
Methodological Design
Partitioning
Communication
Agglomeration
Mapping
Methodological Design
PROBLEM
Methodological Design
PROBLEM
Partitioning
Partitioning
The computation that is to
be performed and the data
operated on by this
computation are
decomposed into small
tasks.
Practical issues such as
the number of processors
in the target computer are
ignored, and attention is
focused on recognizing
opportunities for parallel
execution.
Partitioning
Functional Decomposition
Data Decomposition
Partitioning
Functional
Decomposition
Partitioning
Data
Decomposition
Methodological Design
Single Program Multiple Data (SPMD)
programming model
A fixed number of identical tasks are created at
program startup.
Each task executes the same program but
operates on different data.
Methodological Design
PROBLEM
Communication
Partitioning
Communication
The communication
required to coordinate task
execution is determined,
and appropriate
communication structures
and algorithms are
defined.
Communication
Latency (L): How long does it take to start sending a
"message"? (in microseconds)
Bandwidth (B): What data rate can be sustained once the
message is started? (in Mbytes/sec)
Transfer time = L + (1/B) * data size
Communication
Local Communication
Xi,j = (4 Xi,j + Xi-1,j + Xi+1,j + Xi,j-1 + Xi,j+1 )/8
Xi-1,j
Xi,j-1
Xi,j
Xi+1,j
Xi,j+1
5-point Stencil
Communication
Global Communication
n
∑X
i
i=0
∑07
∑03
∑01
X0
X1
∑47
∑23
X2
X3
∑45
X4
X5
∑67
X6
X7
Communication
Static Communication
Communicating tasks do not change over time.
Dynamic Communication
Communicating tasks are determined by data
computed at run time.
Communication
Dynamic Communication
k
1 2 1 3 4
A
10 15 20 25 30
B
10 15 10 20 25
B[i]=A[ k[i] ]
Methodological Design
PROBLEM
Partitioning
Communication
Agglomeration
Agglomeration
The task and
communication structures
defined in the first two
stages of a design are
evaluated with respect to
performance requirements
and implementation costs.
If necessary, tasks are
combined into larger tasks
to improve performance or
to reduce development
costs.
Agglomeration
Agglomeration
Agglomeration
Goals are:
Increasing Granularity (fine grain / course grain)
Decreasing communication/computation ratio
Agglomeration
Xi,j = (4 Xi,j + Xi-1,j + Xi+1,j + Xi,j-1 + Xi,j+1 )/8
Agglomeration
Xi,j = (4 Xi,j + Xi-1,j + Xi+1,j + Xi,j-1 + Xi,j+1 )/8
Agglomeration
Methodological Design
PROBLEM
Mapping
Partitioning
Communication
Agglomeration
Mapping
Each task is assigned to a
processor in a manner that
minimizes the total
execution time.
Mapping can be specified
statically or determined at
runtime by load-balancing
algorithms.
Mapping
If each task performs the same amount of computation
and communicates:
Cyclic Mapping
Mapping
The goal of mapping algorithms is to minimize total
execution time.
Mapping algorithms attempt to satisfy the competing
goals of:
maximizing processor utilization
minimizing communication costs.
Mapping is NP-complete
Mapping
Two strategies to achieve this goal:
Place tasks that are able to execute concurrently
on different processors
Place tasks that communicate frequently on the
same processor
Mapping
Consider the topology
Methodological Design
Case Study:
Atmosphere
Model
Methodological Design
Case Study:
Atmosphere
Model
Methodological Design
Case Study:
Atmosphere
Model
Methodological Design
Case Study:
Atmosphere
Model
Partitioning
Nx x Ny x Nz tasks
Methodological Design
Case Study:
Atmosphere
Model
Communication
9-point stencil in horizontal dimension
3-point stencil in vertical dimension
Methodological Design
Computations
1. Global operations
where
denotes the mass at grid point (i,j,k) .
2. Physics calculations
The total clear sky (TCS) at level k≥1 is defined as
Case Study:
Atmosphere
Model
Communication
where level 0 is the top of the atmosphere and
cldi is the cloud fraction at level i .
In total, the physics component of the model requires
on the order of 30 communications per grid point and per time step
Methodological Design
Case Study:
Atmosphere
Model
Communication
obtains data from eight other tasks
Methodological Design
Case Study:
Atmosphere
Model
Communication
only 4 communications
obtains data from eight other tasks
are required per task
Methodological Design
Case Study:
Atmosphere
Model
Vertical dimension requires communication (2 messages)
also (30 messages) for various ``physics'' computations
These communications can be avoided by
agglomerating tasks within each vertical column.
Therefore
tasks will be created.
Agglomeration
Methodological Design
Case Study:
Atmosphere
Model
Mapping
Methodological Design
Methodological Design
Agglomeration
Methodological Design
Increase Granularity
Decrease communication/computation ratio