ppt - Parallel Programming Laboratory
Download
Report
Transcript ppt - Parallel Programming Laboratory
Implementing Simplified
Molecular Dynamics Simulation
in Different Parallel Paradigms
Chao Mei
April 27th, 2006
CS498LVK
MD Simulation Recall
Time evolution of a set of interacting
atoms
Interaction between molecules
Intra: bonds and bends between atoms
Inter: interaction of point charges
Interaction can simply follow Newton’s
law (F=ma) but in certain range
Intuitive Sequential Algorithm
Input: an array of atoms
Physical attributes: position, force, velocity
etc.
Algorithm in one timestep:
for each atom A in the array
for each atom B in the array
if(distance(A,B)<=cutoff)
do force calculation for A and B
Update physical attributes of atoms
Ideas of Parallelization
Data decomposition
Chunking atom array
Distribute every chunk to processors
Atoms information be communicated
between every chunk pair
Implementation in Shared-Memory
Barrier before next time step
Simple in OpenMP
Only add compiler directives on sequential code
Automatic chunking by compiler
Not difficult in others (HPF, UPC, CAF, GAs
etc.)
Chunking done by programmer
Different code form for remote array access
Access Conflicts in Shared Data
1
2
3
P1
4
7
8
P4
Be careful! Add programming complexity
Implementations in Message-Passing
Basic scheme of one time step
for every other processor i
Send local atoms chunk processor i
for every other processor i
Receiving atoms chunk from processor i
Compute force between local chunk and received
one
Update local atoms
Barrier()
Overlapping communication/computation
Charm++ and GAs
Other Paradigms
Fit
Moderately fit
BSP
STAPL
DSM (Treadmarks, CID, CASHMERe)
Probably not fit
Cilk
Implementation Performance
200
180
160
Time (ms)
140
120
on
on
on
on
100
80
60
40
20
0
OpenMP
MPI
Charm++
1
2
4
8
procs
procs
procs
procs
Wait…think again!
The above parallelization approach is
not good for high-performance
Possible wasteful communication
Better ideas of parallelization
Space decomposition (1-away) based
on cutoff
Simulation System
Cell
Pros/Cons of Better Ideas
Reduced communication
Update in every step change atoms’
position
Re-decompose space at every step?
Do every certain amount steps
Possible imbalanced computation
every cell contains different number of
atoms
What We Needed for Implementation?
Cell arrays (each with size of cutoff)
A way to know which two cells need to
exchange data
A global table (can be indexed by two Cells)
Neighbor list associated with each cell
Load-balancing
Charm++ Perfectly Fits Better Ideas
Object-oriented programming
One-sided communication for
overlapping computation
Support for multidimensional array
Data-driven computation
Built-in load-balancing
Summary
MD simulation is not very difficult to
implement both in shared memory and
message-passing paradigms
Parallelization techniques matters
Simple one not difficult to implement
Better one motivates more features in
language
Thank You!