Transcript CS 584

CS 484
Iterative Methods
Gaussian elimination is considered to
be a direct method to solve a system.
 An indirect method produces a
sequence of values that converges to
the solution of the system.
 Computation is halted in an indirect
algorithm when a specified accuracy is
reached.

Why Iterative Methods?

Sometimes we don't need to be exact.
– Input has inaccuracy, etc.
– Only requires a few iterations
If the system is sparse, the matrix can
be stored in a different format.
 Iterative methods are usually stable.

– Errors are dampened
Iterative Methods

Consider the following system.
7
-8

-6
9
x1
=
x2
3
-4
Now write out the equations
– solve for the ith unknown in the ith equation
x1 = 6/7 x2 + 3/7
x2 = 8/9 x1 - 4/9
Iterative Methods

Come up with an initial guess for each xi

Hopefully, the equations will produce
better values and which can be used to
calculate better values, etc. and will
converge to the answer.
Iterative Methods

Jacobi iteration
– Use all old values to compute new values
k
0
10
20
30
40
50
x1
0.00000
0.14865
0.18682
0.19662
0.19913
0.19977
x2
0.00000
-0.19820
-0.24908
-0.26215
-0.26551
-0.26637
Jacobi Iteration

The ith equation has the form
n -1
 A[i, j]x[ j ] = b[i]
j =0

Which can be rewritten as:

1 
x[i] =
b[i] -  A[i, j ]x[ j ]÷

÷
A[i, i ] 

j i

Jacobi Iteration
The vector (b - Ax) is zero when and if
we get the exact answer.
 Define this vector to be the residual r
 Now rewrite the solution equation

r[i ]
+ x[i ]
x[i ] =
A[i, i ]
void Jacobi(float A[][], float b[], float x[], float epsilon)
{
int k = 0;
float x1[];
float r[]; float r0norm;
// Randomly select an initial x vector
r = b - Ax;
// This involves matrix-vector mult etc.
r0norm = ||r||2; // This is basically the magnitude
while (||r||2 > epsilon * r0norm)
{
for (j = 0; j < n; j++)
x1[j] = r[j] / A[j,j] + x[j];
r = b - Ax;
}
x = x1;
}
Parallelization of Jacobi

3 main computations per iteration
Inner product (2 norm)
 Loop calculating x[j]s
 Matrix-vector mult. to calculate r


If A[j,j], b[j], & r[j] are on the same proc.


Loop requires no communication
Inner product and Matrix-vector mult
require communication.
Inner Product (2 norm of residual)


Suppose data is distributed row-wise
Inner product (2 norm) is simply dot product
– IP = Sum(x[j] * x[j])
A
P0
P1
P2
P3

x
=
b
This only requires a global sum collapse
– O(log p)
Matrix-Vector Multiplication
Again data is distributed row-wise
 Each proc. requires all of the elements
in the vector to calculate their part of the
resulting answer.


This results in all to all gather
– O(p log p)
Jacobi Iteration

Resulting cost for float (4 bytes)
– Tcomm = #iterations * (TIP + TMVM)
– TIP = log p * (ts + tw * 4)
– TMVM = p log p * (ts + tw * nrows/p * 4)
Iterative Methods

Gauss-Seidel
– Use the new values as soon as available
k
0
10
20
30
40
x1
0.00000
0.21977
0.20130
0.20008
0.20000
x2
0.00000
-0.24909
-0.26531
-0.26659
-0.26666
Gauss-Seidel Iteration

The basic Gauss-Seidel iteration is
i -1
n -1
1 
xk [i ] =
b[i ] -  xk [ j ] A[i, j ] -  xk -1[ j ] A[i,

A[i, i ] 
j =0
j = i +1

j ]÷
÷

Gauss-Seidel Iteration
Rule: Always use the newest values of
the previously computed variables.
 Problem: Sequential?
 Gauss-Seidel is indeed sequential if the
matrix is dense.
 Parallelism is a function of the sparsity
and ordering of the equation matrix.

Gauss-Seidel Iteration

We can increase possible parallelism by
changing the numbering of a system.
Parallelizing Red-Black GSI

Partitioning?
Block checkerboard.

Communication?
2 phases per iteration.
1- compute red cells using values from black cells
2- compute black cells using values from red cells
Communication is required for each phase.
Partitioning
P0
P1
P2
P3
Communication
P0
P1
P2
P3
Procedure Gauss-SeidelRedBlack
while ( error > limit )
send black values to neighbors
recv black values from neighbors
compute red values
send red values to neighbors
recv red values from neighbors
compute black values
compute error
endwhile
/* only do every so often */
Extending Red-Black Coloring
Goal: Produce a graph coloring scheme
such that no node has a neighbor of the
same color.
 Simple finite element and finite
difference methods produce graphs with
only 4 neighbors.

– Two colors suffice

What about more complex graphs?
More complex graphs
Use graph coloring heuristics.
 Number the nodes one color at a time.

Successive Overrelaxation

Devised to speed up the convergence of
Gauss-Seidel
– apply extrapolation using a weighted
average between current & previous iterates

Choose weighting that will accelerate the
rate of convergence
SOR

Gauss-Seidel iteration
i -1
n -1

1 
÷
xk [i ] =
b
[
i
]
x
[
j
]
A
[
i
,
j
]
x
[
j
]
A
[
i
,
j
]
1
k
k


÷
A[i, i ] 

j =0
j = i +1

Don’t compute directly into x

Compute weighted average
i -1
n -1

1 
÷
∂ [i ] =
b
[
i
]
x
[
j
]
A
[
i
,
j
]
x
[
j
]
A
[
i
,
j
]
1
k
k


÷
A[i, i ] 

j =0
j = i +1
x k [i] = x k-1[i] + [i] - x k-1[i]

SOR

SOR Equation
x k [i] = x k-1[i] + [i] - x k-1[i]

Choose  in the range [0, 2]
– technically  < 1 is underrelaxation
– if you choose  > 2 it won’t converge
– if you choose  incorrectly it won’t
converge
SOR Algorithm
Choose an initial guess for x[i]
for k = 1,2…..
for i = 1,2, … n
∂=0
for j = 1,2,…., i-1
∂ = ∂ + a[i,j] * xk[j]
end
for j = i+1, … , n
∂ = ∂ + a[i,j] * xk-1[j]
end
∂ = (b[i] - ∂) / a[i,i]
xk[i] = xk-1[i] + w * (∂ - xk-1[i])
end
check for convergence
end
Parallelizing SOR
Just like Gauss-Seidel
 Create the dependency graph

– Color
– Number by color

Phases
– Communicate nodes of previous phase
– Compute nodes of this phase
– Move to next phase
Conclusion
Iterative methods are used when an
exact answer is not computable or
needed.
 Gauss-Seidel converges faster than
Jacobi but parallelism is trickier.
 Finite element codes are simply
systems of equations

– Solve with either Jacobi or Gauss-Seidel
Consider

Are systems of equations and finite
element methods related?