Parallel Solution of the Poisson Problem Using
Download
Report
Transcript Parallel Solution of the Poisson Problem Using
Department of Mechanical & Nuclear Engineering
Parallel Solution of the Poisson
Problem Using MPI
Jason W. DeGraw
Dept. of Mechanical and Nuclear
Engineering
The Pennsylvania State University
M&NucE
Introduction
• The Poisson equation arises frequently in many areas of
mechanical engineering:
– Heat transfer (Heat conduction with internal heat generation)
– Solid mechanics (Beam torsion)
– Fluid mechanics (Potential flow)
•
There are several reasons to use the Poisson equation as a
test problem:
– It is (relatively) easy to solve
– Many different exact solutions are a available for comparison
M&NucE
A Basic Poisson Problem
Our first case is the simple two dimensional Poisson equation on a
square:
(-1,1)
(1,1)
y
u xx u yy 1 in
[1,1] [1,1]
x
u 0
(-1,-1)
(1,-1)
M&NucE
Finite Difference Approach
• The standard second order difference equation is:
ui , j 1 ui , j 1 4ui , j ui 1, j ui 1, j
h
2
• This formulation leads to a system of equations that is
– symmetric
– banded
– sparse
1
M&NucE
Solving the System
• The system may be solved in many ways:
– Direct methods (Gaussian elimination)
– Simple iterative methods (Jacobi, Gauss-Seidel, etc.)
– Advanced iterative methods (CG, GMRES)
• We will only consider iterative methods here, as we would like to
test ways to solve big problems. We will use
– Jacobi
– Gauss-Seidel
– Conjugate gradient
M&NucE
Jacobi and Gauss-Seidel
• The Jacobi and Gauss-Seidel iterative methods are easy to
understand and easy to implement.
• Some advantages:
– No explicit storage of the matrix is required
– The methods are fairly robust and reliable
• Some disadvantages
– Really slow (Gauss-Seidel)
– REALLY slow (Jacobi)
• Relaxation methods (which are related) are usually better, but
can be less reliable.
M&NucE
The Conjugate Gradient Method
• CG is a much more powerful way to solve the problem.
• Some advantages:
– Easy to program (compared to other advanced methods)
– Fast (theoretical convergence in N steps for an N by N system)
• Some disadvantages:
– Explicit representation of the matrix is probably necessary
– Applies only to SPD matrices (not an issue here)
• Note: In this particular case, preconditioning is not an issue. In
more general problems, preconditioning must be addressed.
M&NucE
CG (cont’d)
• As discussed in class, the CG algorithm requires two types of
matrix-vector operations:
– Vector dot product
– Matrix-vector product
• The matrix-vector product is more expensive, so we must be
careful how we implement it.
• Also keep in mind that both operations will require
communication.
M&NucE
Matrix Storage
• We could try and take advantage of the banded nature of the
system, but a more general solution is the adoption of a sparse
matrix storage strategy.
• Complicated structures are possible, but we will use a very
simple one:
1
2
A
0
0
0
3
0
0
4
5
0
0
0
1
1
1
2 3 4
1 2 3
3
0
, j
, n
As
5 6
3 4
2
6
7
7
4
1
M&NucE
Matrix Storage (Cont’d)
• The previous example is pretty trivial, but consider the following:
– An N by N grid leads to an N2 by N2 system
– Each row in the system has no more than 5 nonzero entries
• Example: N=64
– Full storage: 644=16777216
– Sparse storage: 642*5*3=61440
– “Sparseness” ratio (sparse / full): 0.00366
• For large N, storing the full matrix is incredibly wasteful.
• This form is easy to use in a matrix-vector multiply routine.
M&NucE
Sparse Matrix Multiplication
• The obligatory chunk of FORTRAN 77 code:
c
Compute Ax=y
do i=1,M
y(i)=0.0
do k=1,n(i)
y(i)=y(i)+A(i,j(k))*x(j(k))
enddo
enddo
M&NucE
Limitations of Finite Differences
• Unfortunately, it is not easy to use finite differences in complex
geometries.
• While it is possible to formulate curvilinear finite difference
methods, the resulting equations are usually pretty nasty.
• The simplicity that makes finite differences attractive in simple
geometries disappears in more complicated geometries.
M&NucE
Finite Element Methods
• The finite element method, while more complicated than finite
difference methods, easily extends to complex geometries.
• A simple (and short) description of the finite element method is
not easy to give. Here goes anyway:
PDE
Weak
Matrix
Form
System
M&NucE
•
The following steps usually take place
1.
2.
3.
4.
5.
•
Finite Element Methods (cont’d)
Load mesh and boundary conditions
Compute element stiffness matrices
Assemble global stiffness matrix and right-hand side
Solve system of equations
Output results
Where can we improve things?
– Steps 1 and 5 are somewhat fixed
– Step 4 – use parallel solution techniques
– Avoid step 3 entirely – use an element by element solution
technique
M&NucE
An Element by Element Algorithm
• The assembly process can be considered as a summation:
K Ke
• Then:
K x K e x K e1 x K e 2 x
• We can compute each element stiffness matrix and store it.
Computing the matrix-vector product is then done via the above
element-by-element (EBE) formula.
M&NucE
Computation Time
M&NucE
Computation Time
M&NucE
Communication Ratio
M&NucE
Speedup
M&NucE
Parallel Efficiencies
M&NucE
MFLOPS
M&NucE
Solution
M&NucE
A Potential Flow Problem
• A simple test problem for which there is an “exact” solution is
potential flow over a wavy wall
0
n
U
n
U 100 ft/s
0
0
n
xx yy 0
y 0.05 sin( x)
M&NucE
A Potential Flow Problem (cont’d)
• The exact solution is
U[ x .05 sin( x) exp( y)]
• This problem is easily solved with
finite element methods
• The computed solution at right is
accurate to within 1%
M&NucE
Conclusions
• CG is a “better” iterative method than Jacobi or Gauss-Seidel.
• Finite element methods, while more complicated than finite
difference methods, are more geometrically flexible.
• Finite element methods are generally more difficult to implement
and the setup process is particularly expensive.
• The EBE technique reduced the communication ratio, but took
longer to actually produce results.
M&NucE
Conclusions (cont’d)
• Clearly, communication issues are important when using CGtype methods.
• Implementing a fully parallel finite element scheme is quite
difficult. Some issues that were not addressed here:
– Distributed assembly
– Better mesh distribution schemes (I.e. fewer broadcasts)
• Good results were obtained using both FEM and finite difference
methods.