REC Tiruchi, Dec 2002 - CNT simulations

Download Report

Transcript REC Tiruchi, Dec 2002 - CNT simulations

Computational issues in Carbon
nanotube simulation
Ashok Srinivasan
Department of Computer Science
Florida State University
Outline
• Background
– Sequential computation
– Performance analysis
• Parallelization
– Shared memory
– Message passing
– Load balancing
– Communication reduction
• Research issues
Background
• Uses of Carbon nanotubes
– Materials
– NEMS
– Transistors
– Displays
– Etc
• www.ipt.arc.nasa.gov
Sequential computation
• Molecular dynamics, using Brenner’s
potential
– Short-range interactions
– Neighbors can change dynamically during
the course of the simulation
– Computational scheme
• Find force on each particle due to interactions
with “close” neighbors
• Update position and velocity of each atom
Force computations
• Pair interactions
• Bond angles
• Dihedral
• Multibody
Performance analysis
Profile of execution time
• 1: Force
• 2: Neighbor list
• 3: Predictor/corrector
• 4: Thermostating
• 5: Miscellaneous
Profile for force computations
Parallelization
•
•
•
•
Shared memory
Message passing
Load balancing
Communication reduction
Shared memory parallelization
• Do each of the following loops in parallel
– For each atom
• Update forces due to atom i
• If neighboring atoms are owned by other threads, update an
auxiliary array
– For each thread
• Collect force terms for atoms it owns
– Srivastava, et al, SC-97 and CSE 2001
• Simulated 105 to 107 atoms
• Speedup around 16 on 32 processors
• Include long-range forces too
Lexical
decomposition
Message passing parallelization
• Decompose domain into cells
– Each cell contains its atoms
• Assign a set of adjacent cells to each processor
• Each processor computes values for its cells,
communicating with neighbors when their data is
needed
• Caglar and Griebel, World scientific, 1999
– Simulated 108 atoms on up to 512 processors
– Linear speedup for 160,000 atoms on 64 processors
Load balancing
• Atom based decomposition
– For each atom, compute
forces due to each bond,
angle, and dihedral
– Load not balanced
Load balancing ... 2
• Bond based decomposition
– For each bond, compute forces due to that bond, angles, and
dihedrals
– Finer grained
– Load still not
balanced!
Load balancing ... 3
• Load imbalance was not caused by
granularity
– Symmetry is used to reduce calculations through
– If i > j, don’t compute for bond (i,j)
• So threads get unequal load
– Change condition to
• If i+j is even, don’t compute bond (i,j) if i > j
• If i+j is odd, don’t compute bond (i,j) if i < j
• Does not work, due to regular structure of nanotube
– Use a different condition to balance load
Load balancing ... 4
• Load is much better balanced now
– ... at least for this simple configuration
Locality
• Locality important to reduce cache misses
• Current scheme based
on lexical ordering
• Alternate: Decompose
based on a breadth
first search traversal of
the atom-interaction
graph
Locality ... 2
Research issues
• Neighbor search
• Parallelization
– Dynamic load balancing
– Communication reduction
• Multi-scale simulation of nano-composites
Neighbor search
• Neighbor lists
– Crude algorithm
• Compare each pair, and determine if they are close enough
• O(N2) for N atoms
– Cell based algorithm
•
•
•
•
Divide space into cells
Place atoms in their respective cells
Compare atoms only in neighboring cells
Problem
– Many empty cells
– Inefficient use of memory
Computational geometry techniques
• Orthogonal search data structures
– K-d tree
• Tree construction time: O(N log N)
• Worst case search overhead: O(N2/3)
• Memory: O(N)
– Range tree
• Tree construction time: O(N log2N)
• Worst case search overhead: O(log2N)
• Memory: O(N log2N)
Desired properties of search
techniques
• Update should be efficient
– But the number of atoms does not change
– Position changes only slightly
– The queries are known too
– May be able to use knowledge of the
structure of the nanotube
– Parallelization
Parallelization
• Load balancing and locality
– Better graph based techniques
– Geometric partitioning
– Dynamic schemes
• Use structure of the tube
• Stochastic versions of
– Spectral partitioning
– Diffusive schemes
Multi-scale simulation of nanocomposites
• Molecular dynamics at nano-scale
• Finite element method at larger scale
– New models to links the scales
– New algorithms to dynamically balance the
loads, while minimizing communication
– Latency tolerance and scalability to large
numbers of processors