ppt - Computer Science
Download
Report
Transcript ppt - Computer Science
Observation on Parallel
Computation of Transitive and
Max-closure Problems
Motivation
TC problem has numerous applications in
many areas of computer science.
Lack of course-grained algorithms for
distributed environments with slow
communication.
Decreasing the number of dependences in a
solution could improve a performance of the
algorithm.
Slide 2
What is transitive closure?
GENERIC TRANSITIVE CLOSURE PROBLEM (TC)
Input: a matrix A with elements from a semiring S= < , >
Output: the matrix A*, A*(i,j) is the sum of all simple paths
from i to j
<, >
TC
< or , and > boolean closure - TC of a directed graph
< MIN, + >
all pairs shortest path
<MIN, MAX> minimum spanning tree {all(i,j): A(i,j)=A*(i,j)}
Slide 3
Fine–grain and Coarse-grained
algorithms for TC problem
Warshall algorithm
(1 stage)
Leighton algorithm
(2 stages)
Guibas-Kung-Thompson (GKT) algorithm
(2 or 3 stages)
Partial Warshall algorithm
(2 stages)
Slide 4
Warshall algorithm
k k+1 k+2
X
k
k+1
k+2
for k:=1 to n
for all 1i,jn parallel do
X
X
Y
Y
Y
Warshall algorithm
Operation(i, k, j)
----------------------------------
Operation(i, k, j):
a(i,j):=a(i,j) a(i,k) a(k,j)
---------------------------------Slide 5
4
6
4
4
5
2
1
4
2
3
3
2
4
4
4
1
Slide 6
Coarse-Grained computations
A11
A24
n
A32
n
Slide 7
Naïve Course Grained
Algorithms
Actual =
1 0 0
1 1 0
1 1 1
1 1 0
1 1 1
0 0 0
0
0
1
1
1
0
0
0
0
0
1
0
1
1
0
0
0
1
Slide 8
Actual =
1
1
1
1
1
0
0
1
1
1
1
0
0
0
1
0
1
0
0
0
1
1
1
0
0
0
0
0
1
0
1
1
0
0
0
1
6
II
5
4
I
3
2
1
Slide 9
Course-grained Warshall algorithm
Algorithm Blocks-Warshall
for k :=1 to N do
A(K,K):=A*(K,K)
for all 1 I,J N, I K J parallel do
Block-Operation(K,K,J) and Block-Operation(I,K,K)
for all 1 I,J N parallel do
Block-Operation(I,K,J)
---------------------------------------------------------------------Block-Operation(I, K, J): A(I,J):=A(I,J) A(I,K) A(K,K) A(K,J)
Slide 10
----------------------------------------------------------------------
Implementation of
Warshall TC Algorithm
k
k
k
k
k
The implementation in terms of multiplication of submatrices
Slide 11
6
II
NEW
5
4
I
3
2
1
Slide 12
Decomposition properties
In order to package elementary operations into
computationally independent groups we consider
the following decomposition properties:
A min-path from i to j is a path whose
intermediate nodes have numbers smaller than
min (i,j)
A max-path from i to j is a path whose
intermediate nodes have numbers smaller than
max(i,j)
Slide 13
Slide 14
KGT algorithm
A’
A
B
C
B’
C’
KGT algorithm
Slide 15
An example graph
7
5
3
1
an initial path
6
a max-path
4
2
It is transitive closure closure of the graph
Slide 16
What is Max-closure problem?
Max-closure problem is a problem of
computing all max-paths in a graph
Max-closure is a main ingredient of
the TC closure
Slide 17
Max-Closure --> TC
Max-to-Transitive
performs 1/3 of the
total operations
Max-closure
computation
performs 2/3
of total
operations
Max-closure
algorithm
The algorithm ”Max-toTransitive ” reduces TC
to matrix multiplication
once the Max-Closure is
computed
Slide 18
A Fine Grained Parallel Algorithm
Algorithm Max-Closure
for k :=1 to n do
for all 1 i,j n, max(i,j) > k, ij
parallel do Operation(i,k,j)
Algorithm Max-to-Transitive
Input: matrix A, such that Amax = A
Output: transitive closure of A
For all k n parallel do
For all i,j max(i,j) <k, ij
Parallel do Operation(i,k,j)
Slide 19
Coarse-grained
Max-closure Algorithm
Algorithm CG-Max-Closure {Partial Blocks-Warshall}
for K:=1 to N do
A(K,K):= A*(K,K)
for all 1 I,J N, I K J parallel do
Block-Operation(K,K,J) and Block-Operation(I,K,K)
for all 1 I,J N, max(I,J) > K MIN(I,J) parallel do
Block-Operation(I,K,J)
-------------------------------------------------------------------------------Blocks-Operation(I, K, J): A(I,J):=A(I,J) A(I,K) A(K,J)
Slide 20
Implementation of Max-Closure
Algorithm
k
k
k
k
k
The implementation in terms of multiplication of submatrices
Slide 21
Experimental
results
~3.5 h
Slide 22
Increase / Decrease of overall time
While computation time decreases when
adding processes the communication time
increase
=> there is an ”ideal” number of processors
All experiments were carried out on cluster of
20 workstations
=> some processes were running more than
one worker-process.
Slide 23
Conclusion
The major advantage of the algorithm is the
reduction of communication cost at the expense
of small communication cost
This fact makes algorithm useful for systems with
slow communication
Slide 24