ACES III and SIAL
Download
Report
Transcript ACES III and SIAL
ACES III and SIAL: technologies
for petascale computing in
chemistry and materials physics
Erik Deumens, Victor Lotrich, Mark
Ponton, Tomasz Kus, Norbert Flocke,
Ajith Perera, Rod Bartlett
AcesQC, LLC
QTP, University of Florida
Gainesville, Florida
Nov 14, 08
ACES III and SIAL
1
Outline of the talk
Performance results
Design of petascale capable
software
Nov 14, 08
What can be done today?
How does SIAL work?
What makes it different?
Outlook
ACES III and SIAL
2
ACES III software
Developed under CHSSI CBD-03
Parallel for shared and distributed
memory
Capabilities
Nov 14, 08
Hartree-Fock (RHF, UHF)
MBPT(2) energy, gradient, hessian
CCSD(T) energy and gradient
(DROPMO)
EOM-CC excited state energies
ACES III and SIAL
3
Two examples
Nov 14, 08
Luciferin(C11H8O3S2N2)
RHF
C1 symmetry
Basis = aug-cc-pvdz
(494 bf)
Ncorrocc = 46
ACES III and SIAL
Sucrose (C12H22O11)
RHF
C1 symmetry
Basis = 6-311G**
(546 bf)
=91
4
Luciferin CCSD(T)
CCSD on 128 processors
(T)
Nov 14, 08
One iteration: 23 min
Total 12 iterations: 275 min
Hardest 8 occupied orbitals: 420 min
on 128 processors
Total 48 correlated orbitals: 420 min on
768 processors
ACES III and SIAL
5
Luciferin CCSD scaling
min per iter; 12 iterations; two versions;
140
120
100
Jan code
May code
ideal
80
60
40
20
0
32
Nov 14, 08
64
128
ACES III and SIAL
256
6
Sucrose CCSD scaling
min per iter, 8 iterations, on Cray XT4
35
30
25
20
Sep code
ideal
15
10
5
0
256
Nov 14, 08
512
1024
1536
ACES III and SIAL
2048
7
(H2O)21H+ scaling
min per iter; 657 bf 84 corr occ
35
30
25
20
Sep code
ideal
15
10
5
0
1024
Nov 14, 08
2048
3072
ACES III and SIAL
4096
8
Outline of the talk
Performance results
Design of petascale capable
software
Nov 14, 08
What can be done today?
How does SIAL work?
What makes it different?
Outlook
ACES III and SIAL
9
A computer with a single CPU
Basic data item: 64 bit number
High level language: Fortran, C
Assembly language
Nov 14, 08
c=a+b
ADD dest,src
ADD is an operation code
dest and src are registers
ACES III and SIAL
10
The ACES III parallel machine
Basic data item: data block 10,000
64 bit numbers -> super number
High level language: being
developed
Assembly language: SIAL super
instruction assembly language
Nov 14, 08
R(I,J,K,L) += V(I,J,C,D) * T(C,D,K,L)
xaces3 -> super instruction
processor
ACES III and SIAL
11
User level execution flow
input
algo.sio
algo.sial
ACES III
SIAL compiler
Nov 14, 08
ACES III and SIAL
12
Coarse grain parallelism
Executing super instructions in SIAL
algorithm
Example: memory super instruction
GET block
Can be from
Local node RAM
Other node RAM
Nov 14, 08
Time for data to become available
differs
ACES III and SIAL
13
Fine grain parallelism
Inside super instructions
Example: Compute super
instruction
* (contractions)
compute_integrals
Can use multiple cores
Can use accelerators
GPGPUs and Cell processors
FPGAs (field programmable gate arrays)
Nov 14, 08
ACES III and SIAL
14
Super instruction flow
Worker i
GET a -> ask j
…
d=b*c
… wait for a?
a arrives < e=a*d
…
Nov 14, 08
Worker j
…
<- send a
…
…
…
…
…
ACES III and SIAL
15
Super instruction performance
Nov 14, 08
Super instructions are asynchronous
Makes execution very elastic
Helps maintain consistent
performance on many parallel
architectures
ACES III and SIAL
16
Distributed data
N worker tasks, each with local RAM
Data distributed in RAM of workers
Nov 14, 08
AO-based: direct use of integrals
MO-based: use transformed integrals
Array blocks are spread over all
workers
ACES III and SIAL
17
Served (disk resident) data
M server tasks
Nov 14, 08
have access to local or global disk
storage
accept, store and retrieve blocks
also can compute integrals when asked
Data served to and from disk
ACES III and SIAL
18
ACESIII design
High level
Problem Performance
Low level
concepts
communication
Data structures
algorithms
Input/output
Super instruction
Assembly language
SIAL
Super instruction
Processor
SIP (xaces3)
input
Nov 14, 08
ACES III and SIAL
output
19
Outline of the talk
Performance results
Design of petascale capable
software
Nov 14, 08
What can be done today?
How does SIAL work?
What makes it different?
Outlook
ACES III and SIAL
20
Clear divisions
Extreme object oriented approach
High level = problem domain
specific
Low level = focus on performance
Nov 14, 08
Concepts
Data structures
Algorithms
Processor and memory speed
Communication latency and bandwidth
ACES III and SIAL
21
Super Instruction Coding
Write algorithm in high level super
instruction assembly language
Nov 14, 08
Declare (block) arrays, (block) indices
DO - END DO construct
PARDO – END PARDO construct
Basic operations: add and multiply and
contract
SIP_BARRIER
Each line maps to a few super
instructions
ACES III and SIAL
22
Optimize and Tune
Optimize with traditional techniques
Nov 14, 08
optimize the basic contraction
operations by mapping them to DGEMM
calls
create fast code to generate integrals
optimize memory allocation by using
multiple block stacks
optimize execution and data movement
ACES III and SIAL
23
Programmer productivity: Other
Other tools for parallel development
Simple syntax
Specify precise data layout
Nov 14, 08
UPC (Universal Parallel C)
CAF (Co-Array Fortran)
GA (Global Array Tools)
DDI (Distributed Data Interface)
PGAS partitioned global address space
Rigorous array blocking
ACES III and SIAL
24
Programmer productivity: SIAL
SIAL has simple syntax
Exact data layout is done by SIP
Allows runtime tuning and optimization
SIAL has rich set of data structures
Nov 14, 08
Experience shows it is more expressive
Distributed array
Served array
Temporary array
Local array
ACES III and SIAL
25
Outline of the talk
Performance results
Design of petascale capable
software
Nov 14, 08
What can be done today?
How does SIAL work?
What makes it different?
Outlook
ACES III and SIAL
26
New SIAL developer tools coming
Develop higher level programming
language
Programmer support
Eclipse as IDE (integrated development
environment) for SIAL coding
Understands SIAL syntax
Code refactoring tools
Rewrite code
Help improve performance
Nov 14, 08
ACES III and SIAL
27
New algorithms being explored
SIAL: Data staging
ACES III: Linear scaling
Nov 14, 08
Huge served array
Copy section in distributed array
Work efficiently on distributed array
Similar to BLAS-3 management of
cache
Localized orbitals
ACES III and SIAL
28
New domains being explored
Need
Apply “super instruction” design
pattern
Nov 14, 08
A domain specialist, or a few of them
Willingness and expertise to explore
alternative algorithms
Find “super number”, the basic data
item in the domain
“Super instructions” then follow
ACES III and SIAL
29
Towards petascale computing
ACES III
SIAL
Nov 14, 08
Ready for real work
Has run on 8,192 processors
Useful in electronic structure
Can be used in other domains
ACES III and SIAL
30