Transcript ppt

Efficient Deflation-Based Preconditioning
for the
Communication-Avoiding
Conjugate Gradient Method
Erin Carson
Nicholas Knight, James Demmel
Univ. of California, Berkeley
Monday, March 16, SIAM CSE 2015, Salt Lake City, Utah
Why Avoid “Communication”?
• Algorithms have two costs: computation and communication
• Communication : moving data between levels of memory hierarchy
(sequential), between processors (parallel)
sequential
parallel
CPU
cache
CPU
DRAM
CPU
DRAM
DRAM
CPU
DRAM
CPU
DRAM
• On today’s computers, communication is expensive, computation is
cheap, in terms of both time and energy
• Need to redesign algorithms to avoid communication!
2
Future Exascale Systems
Petascale
Predicted Exascale
Systems (2009)
Systems
System Peak
Factor
Improvement
~1000
Node Memory
Bandwidth
25 GB/s
0.4-4 TB/s
~10-100
Total Node Interconnect
Bandwidth
3.5 GB/s
100-400 GB/s
~100
100 ns
50 ns
~1
Memory Latency
Interconnect Latency
*Sources:
~1
from P. Beckman (ANL), J. Shalf (LBL), and D. Unat (LBL)
• Gaps between communication/computation cost only growing larger in
future systems
• Avoiding communication will be essential for applications at exascale!
3
Krylov Solvers: Limited by Communication
Orthogonalize
 Inner products
Parallel: global reduction (Allreduce)
Sequential: multiple reads/writes to slow memory
SpMV
orthogonalize
Dependencies between communication-bound kernels
in each iteration limit performance!
4
Example: Classical Conjugate Gradient (CG)
SpMV
Inner products
5
Communication-avoiding Krylov methods
• Communication-avoiding Krylov subspace methods (CA-KSMs) can
asymptotically reduce parallel latency
• First known reference: (Van Rosendale, 1983); lots of work by
Chronopoulos on “s-step” methods
• Many methods and variations created since; see Hoemmen’s 2010
PhD thesis for thorough overview
6
Outer Loop
7
• No communication required!
8
Example: CA-Conjugate Gradient
via CA Matrix
Powers Kernel
Local computations
within inner loop require
no communication!
9
Deflation
• Deflation: technique to increase convergence rate
• Krylov solvers very efficient for reducing high frequency modes but
slow to smooth out low frequency ones
• Idea: solve in a separate manner high freq. and low freq. errors
• Explicitly solve the linear system in the known eigenspace, use
CG to solve for the remaining invariant subspace
• Deflated CG first due to Nicolaides (1987)
• Used in many applications
• Pressure-Poisson equation within an incompressible flow solver
(e.g., blood flow): (Mut, Aubry, Löhner, and Cebral, 2010)
• Bubbly flow problems (Tang and Vuik, 2007)
• Magnetostatic FEM problems (De Gersem and Hameyer, 2001)
• Structural dynamics (Perotti and Simoncini, 2003)
10
Deflation
Can deflation techniques be applied to CA-CG while maintaining
asymptotic reduction in communication cost?
11
Deflated CG Algorithm (Saad et al., 2000)
SpMVs and dot products
required in each inner
loop, as in CG
New
computations
due to deflation
12
Avoiding Communication in Deflation
13
Outer Loop
Construct s-step bases for Krylov
subspaces (Nearest neighbor
communication)
14
Deflated CA-CG
Local operation,
requires no
communication
15
Computation and Communication Complexity
Flops
Words moved
Messages
CG
CA-CG
Deflated
CG
Deflated
CA-CG
16
Rough Performance Model
Modeled speedup per
iteration of CA vs. classical
method on model problem
(2D 5pt stencil)
Not the whole story…
17
Is This Efficient in Practice?
18
Extensions: 2-Level Preconditioning
• Which preconditioners will still allow communication-avoiding
techniques is another question
• No communication for diagonal scaling…
19
A Potential Application
• (Vuik, Segal, and Meijerink, 1999) Oil drilling: determine fluid pressures
to predict presence of oil and gas in reservoirs
• Solve time-dependent diffusion equation using finite elements
• Underground consists of layers with very large differences in
permeability
• Coefficient matrix very ill-conditioned
Earth surface
• Many small eigenvalues in the
matrix, but with diagonal
preconditioning:
# of extreme eigenvalues =
number of layers with a high
permeability (sandstone)
sandstone
shale
sandstone
shale
sandstone
shale
sandstone
20
Conclusions and Future Work
• Summary
• Deflation can be implemented in CA-CG in a way that still avoids
communication
• Nontrivial tradeoffs between speed per iteration and convergence
rate for different methods – requires further study and applicationspecific analysis
• Related work
• Deflated restarting in CA-GMRES variant (Wakam and Erhel, 2013)
• Deflation in CA-GMRES (Yamazaki, Tomov, and Dongarra, 2014)
• Deflated pipelined CG (Ghysels, Vanroose, and Meerbergen, 2014)
• Future Work & Extensions
• Performance studies and applications
• Further exploration of 2-level CA preconditioning strategies
• Solving (slowly-changing) series of linear systems (recycling Krylov
subspaces (Parks et al., 2006))
21
Thank you!
email: [email protected]