ACES III and SIAL

Download Report

Transcript ACES III and SIAL

ACES III and SIAL: technologies
for petascale computing in
chemistry and materials physics
Erik Deumens, Victor Lotrich, Mark
Ponton, Tomasz Kus, Norbert Flocke,
Ajith Perera, Rod Bartlett
AcesQC, LLC
QTP, University of Florida
Gainesville, Florida
Nov 14, 08
ACES III and SIAL
1
Outline of the talk

Performance results


Design of petascale capable
software



Nov 14, 08
What can be done today?
How does SIAL work?
What makes it different?
Outlook
ACES III and SIAL
2
ACES III software



Developed under CHSSI CBD-03
Parallel for shared and distributed
memory
Capabilities




Nov 14, 08
Hartree-Fock (RHF, UHF)
MBPT(2) energy, gradient, hessian
CCSD(T) energy and gradient
(DROPMO)
EOM-CC excited state energies
ACES III and SIAL
3
Two examples





Nov 14, 08
Luciferin(C11H8O3S2N2)
RHF
C1 symmetry
Basis = aug-cc-pvdz
(494 bf)
Ncorrocc = 46





ACES III and SIAL
Sucrose (C12H22O11)
RHF
C1 symmetry
Basis = 6-311G**
(546 bf)
=91
4
Luciferin CCSD(T)

CCSD on 128 processors



(T)


Nov 14, 08
One iteration: 23 min
Total 12 iterations: 275 min
Hardest 8 occupied orbitals: 420 min
on 128 processors
Total 48 correlated orbitals: 420 min on
768 processors
ACES III and SIAL
5
Luciferin CCSD scaling
min per iter; 12 iterations; two versions;
140
120
100
Jan code
May code
ideal
80
60
40
20
0
32
Nov 14, 08
64
128
ACES III and SIAL
256
6
Sucrose CCSD scaling
min per iter, 8 iterations, on Cray XT4
35
30
25
20
Sep code
ideal
15
10
5
0
256
Nov 14, 08
512
1024
1536
ACES III and SIAL
2048
7
(H2O)21H+ scaling
min per iter; 657 bf 84 corr occ
35
30
25
20
Sep code
ideal
15
10
5
0
1024
Nov 14, 08
2048
3072
ACES III and SIAL
4096
8
Outline of the talk

Performance results


Design of petascale capable
software



Nov 14, 08
What can be done today?
How does SIAL work?
What makes it different?
Outlook
ACES III and SIAL
9
A computer with a single CPU


Basic data item: 64 bit number
High level language: Fortran, C


Assembly language



Nov 14, 08
c=a+b
ADD dest,src
ADD is an operation code
dest and src are registers
ACES III and SIAL
10
The ACES III parallel machine



Basic data item: data block 10,000
64 bit numbers -> super number
High level language: being
developed
Assembly language: SIAL super
instruction assembly language


Nov 14, 08
R(I,J,K,L) += V(I,J,C,D) * T(C,D,K,L)
xaces3 -> super instruction
processor
ACES III and SIAL
11
User level execution flow
input
algo.sio
algo.sial
ACES III
SIAL compiler
Nov 14, 08
ACES III and SIAL
12
Coarse grain parallelism


Executing super instructions in SIAL
algorithm
Example: memory super instruction


GET block
Can be from
Local node RAM
 Other node RAM


Nov 14, 08
Time for data to become available
differs
ACES III and SIAL
13
Fine grain parallelism


Inside super instructions
Example: Compute super
instruction
* (contractions)
 compute_integrals



Can use multiple cores
Can use accelerators
GPGPUs and Cell processors
 FPGAs (field programmable gate arrays)

Nov 14, 08
ACES III and SIAL
14
Super instruction flow
Worker i
 GET a -> ask j
 …
 d=b*c
 … wait for a?
 a arrives < e=a*d
 …
Nov 14, 08
Worker j
 …
 <- send a
 …
 …
 …
 …
 …
ACES III and SIAL
15
Super instruction performance



Nov 14, 08
Super instructions are asynchronous
Makes execution very elastic
Helps maintain consistent
performance on many parallel
architectures
ACES III and SIAL
16
Distributed data


N worker tasks, each with local RAM
Data distributed in RAM of workers



Nov 14, 08
AO-based: direct use of integrals
MO-based: use transformed integrals
Array blocks are spread over all
workers
ACES III and SIAL
17
Served (disk resident) data

M server tasks




Nov 14, 08
have access to local or global disk
storage
accept, store and retrieve blocks
also can compute integrals when asked
Data served to and from disk
ACES III and SIAL
18
ACESIII design
High level
Problem Performance
Low level
concepts
communication
Data structures
algorithms
Input/output
Super instruction
Assembly language
SIAL
Super instruction
Processor
SIP (xaces3)
input
Nov 14, 08
ACES III and SIAL
output
19
Outline of the talk

Performance results


Design of petascale capable
software



Nov 14, 08
What can be done today?
How does SIAL work?
What makes it different?
Outlook
ACES III and SIAL
20
Clear divisions


Extreme object oriented approach
High level = problem domain
specific




Low level = focus on performance


Nov 14, 08
Concepts
Data structures
Algorithms
Processor and memory speed
Communication latency and bandwidth
ACES III and SIAL
21
Super Instruction Coding

Write algorithm in high level super
instruction assembly language






Nov 14, 08
Declare (block) arrays, (block) indices
DO - END DO construct
PARDO – END PARDO construct
Basic operations: add and multiply and
contract
SIP_BARRIER
Each line maps to a few super
instructions
ACES III and SIAL
22
Optimize and Tune

Optimize with traditional techniques




Nov 14, 08
optimize the basic contraction
operations by mapping them to DGEMM
calls
create fast code to generate integrals
optimize memory allocation by using
multiple block stacks
optimize execution and data movement
ACES III and SIAL
23
Programmer productivity: Other

Other tools for parallel development






Simple syntax
Specify precise data layout


Nov 14, 08
UPC (Universal Parallel C)
CAF (Co-Array Fortran)
GA (Global Array Tools)
DDI (Distributed Data Interface)
PGAS partitioned global address space
Rigorous array blocking
ACES III and SIAL
24
Programmer productivity: SIAL

SIAL has simple syntax


Exact data layout is done by SIP


Allows runtime tuning and optimization
SIAL has rich set of data structures




Nov 14, 08
Experience shows it is more expressive
Distributed array
Served array
Temporary array
Local array
ACES III and SIAL
25
Outline of the talk

Performance results


Design of petascale capable
software



Nov 14, 08
What can be done today?
How does SIAL work?
What makes it different?
Outlook
ACES III and SIAL
26
New SIAL developer tools coming


Develop higher level programming
language
Programmer support



Eclipse as IDE (integrated development
environment) for SIAL coding
Understands SIAL syntax
Code refactoring tools
Rewrite code
 Help improve performance

Nov 14, 08
ACES III and SIAL
27
New algorithms being explored

SIAL: Data staging





ACES III: Linear scaling

Nov 14, 08
Huge served array
Copy section in distributed array
Work efficiently on distributed array
Similar to BLAS-3 management of
cache
Localized orbitals
ACES III and SIAL
28
New domains being explored

Need



Apply “super instruction” design
pattern


Nov 14, 08
A domain specialist, or a few of them
Willingness and expertise to explore
alternative algorithms
Find “super number”, the basic data
item in the domain
“Super instructions” then follow
ACES III and SIAL
29
Towards petascale computing

ACES III



SIAL


Nov 14, 08
Ready for real work
Has run on 8,192 processors
Useful in electronic structure
Can be used in other domains
ACES III and SIAL
30