Transcript CS 584
CS 484
Iterative Methods
Gaussian elimination is considered to
be a direct method to solve a system.
An indirect method produces a
sequence of values that converges to
the solution of the system.
Computation is halted in an indirect
algorithm when a specified accuracy is
reached.
Why Iterative Methods?
Sometimes we don't need to be exact.
– Input has inaccuracy, etc.
– Only requires a few iterations
If the system is sparse, the matrix can
be stored in a different format.
Iterative methods are usually stable.
– Errors are dampened
Iterative Methods
Consider the following system.
7
-8
-6
9
x1
=
x2
3
-4
Now write out the equations
– solve for the ith unknown in the ith equation
x1 = 6/7 x2 + 3/7
x2 = 8/9 x1 - 4/9
Iterative Methods
Come up with an initial guess for each xi
Hopefully, the equations will produce
better values and which can be used to
calculate better values, etc. and will
converge to the answer.
Iterative Methods
Jacobi iteration
– Use all old values to compute new values
k
0
10
20
30
40
50
x1
0.00000
0.14865
0.18682
0.19662
0.19913
0.19977
x2
0.00000
-0.19820
-0.24908
-0.26215
-0.26551
-0.26637
Jacobi Iteration
The ith equation has the form
n -1
A[i, j]x[ j ] = b[i]
j =0
Which can be rewritten as:
1
x[i] =
b[i] - A[i, j ]x[ j ]÷
÷
A[i, i ]
j i
Jacobi Iteration
The vector (b - Ax) is zero when and if
we get the exact answer.
Define this vector to be the residual r
Now rewrite the solution equation
r[i ]
+ x[i ]
x[i ] =
A[i, i ]
void Jacobi(float A[][], float b[], float x[], float epsilon)
{
int k = 0;
float x1[];
float r[]; float r0norm;
// Randomly select an initial x vector
r = b - Ax;
// This involves matrix-vector mult etc.
r0norm = ||r||2; // This is basically the magnitude
while (||r||2 > epsilon * r0norm)
{
for (j = 0; j < n; j++)
x1[j] = r[j] / A[j,j] + x[j];
r = b - Ax;
}
x = x1;
}
Parallelization of Jacobi
3 main computations per iteration
Inner product (2 norm)
Loop calculating x[j]s
Matrix-vector mult. to calculate r
If A[j,j], b[j], & r[j] are on the same proc.
Loop requires no communication
Inner product and Matrix-vector mult
require communication.
Inner Product (2 norm of residual)
Suppose data is distributed row-wise
Inner product (2 norm) is simply dot product
– IP = Sum(x[j] * x[j])
A
P0
P1
P2
P3
x
=
b
This only requires a global sum collapse
– O(log p)
Matrix-Vector Multiplication
Again data is distributed row-wise
Each proc. requires all of the elements
in the vector to calculate their part of the
resulting answer.
This results in all to all gather
– O(p log p)
Jacobi Iteration
Resulting cost for float (4 bytes)
– Tcomm = #iterations * (TIP + TMVM)
– TIP = log p * (ts + tw * 4)
– TMVM = p log p * (ts + tw * nrows/p * 4)
Iterative Methods
Gauss-Seidel
– Use the new values as soon as available
k
0
10
20
30
40
x1
0.00000
0.21977
0.20130
0.20008
0.20000
x2
0.00000
-0.24909
-0.26531
-0.26659
-0.26666
Gauss-Seidel Iteration
The basic Gauss-Seidel iteration is
i -1
n -1
1
xk [i ] =
b[i ] - xk [ j ] A[i, j ] - xk -1[ j ] A[i,
A[i, i ]
j =0
j = i +1
j ]÷
÷
Gauss-Seidel Iteration
Rule: Always use the newest values of
the previously computed variables.
Problem: Sequential?
Gauss-Seidel is indeed sequential if the
matrix is dense.
Parallelism is a function of the sparsity
and ordering of the equation matrix.
Gauss-Seidel Iteration
We can increase possible parallelism by
changing the numbering of a system.
Parallelizing Red-Black GSI
Partitioning?
Block checkerboard.
Communication?
2 phases per iteration.
1- compute red cells using values from black cells
2- compute black cells using values from red cells
Communication is required for each phase.
Partitioning
P0
P1
P2
P3
Communication
P0
P1
P2
P3
Procedure Gauss-SeidelRedBlack
while ( error > limit )
send black values to neighbors
recv black values from neighbors
compute red values
send red values to neighbors
recv red values from neighbors
compute black values
compute error
endwhile
/* only do every so often */
Extending Red-Black Coloring
Goal: Produce a graph coloring scheme
such that no node has a neighbor of the
same color.
Simple finite element and finite
difference methods produce graphs with
only 4 neighbors.
– Two colors suffice
What about more complex graphs?
More complex graphs
Use graph coloring heuristics.
Number the nodes one color at a time.
Successive Overrelaxation
Devised to speed up the convergence of
Gauss-Seidel
– apply extrapolation using a weighted
average between current & previous iterates
Choose weighting that will accelerate the
rate of convergence
SOR
Gauss-Seidel iteration
i -1
n -1
1
÷
xk [i ] =
b
[
i
]
x
[
j
]
A
[
i
,
j
]
x
[
j
]
A
[
i
,
j
]
1
k
k
÷
A[i, i ]
j =0
j = i +1
Don’t compute directly into x
Compute weighted average
i -1
n -1
1
÷
∂ [i ] =
b
[
i
]
x
[
j
]
A
[
i
,
j
]
x
[
j
]
A
[
i
,
j
]
1
k
k
÷
A[i, i ]
j =0
j = i +1
x k [i] = x k-1[i] + [i] - x k-1[i]
SOR
SOR Equation
x k [i] = x k-1[i] + [i] - x k-1[i]
Choose in the range [0, 2]
– technically < 1 is underrelaxation
– if you choose > 2 it won’t converge
– if you choose incorrectly it won’t
converge
SOR Algorithm
Choose an initial guess for x[i]
for k = 1,2…..
for i = 1,2, … n
∂=0
for j = 1,2,…., i-1
∂ = ∂ + a[i,j] * xk[j]
end
for j = i+1, … , n
∂ = ∂ + a[i,j] * xk-1[j]
end
∂ = (b[i] - ∂) / a[i,i]
xk[i] = xk-1[i] + w * (∂ - xk-1[i])
end
check for convergence
end
Parallelizing SOR
Just like Gauss-Seidel
Create the dependency graph
– Color
– Number by color
Phases
– Communicate nodes of previous phase
– Compute nodes of this phase
– Move to next phase
Conclusion
Iterative methods are used when an
exact answer is not computable or
needed.
Gauss-Seidel converges faster than
Jacobi but parallelism is trickier.
Finite element codes are simply
systems of equations
– Solve with either Jacobi or Gauss-Seidel
Consider
Are systems of equations and finite
element methods related?