Parallel Computation
Download
Report
Transcript Parallel Computation
Complexity
21-1
Parallel Computation
Complexity
Andrei Bulatov
Complexity
21-2
Multiprocessor Computation
Obviously, several communicating processors often can solve problems
faster than a single processor
processor1
processor2
processor3
memory1
memory2
memory3
shared memory
…
processorn
memoryn
Complexity
Assumptions
• Exchanging messages takes no time (one step in the other model)
• Writing to and reading from the memory takes no time and can be shared
• We can use as many processors as we wish (for different instances of
the same problem different number of processors can be used), but this
number is polynomial in the size of the input
- from the point of view of complexity theory, fixed number of processors
makes no sense — it gives only constant speed up
- we need to reveal “true” parallelism in algorithms
- an algorithm with unlimited number of processors can be “scaled” to an
algorithm with fixed number of processors
21-3
Complexity
21-4
Example
A11
Given two matrices A
A
n1
compute their product C A B
A1n
B11
and B
B
Ann
n1
n
Recall that Cij Aik Bkj
k 1
Straightforward computation of C can be done in O ( n 3 )
Can we do better with parallel computation?
Obviously, yes: use a separate processor for each of the n 3
multiplications (denote it Pikj );
then use Pi1 j to compute the sum for Cij
Totally, we need time O(n)
B1n
Bnn
Complexity
We have obtained polynomial speed-up
This is not sufficient: changing the model of sequential computation it is
possible to get some polynomial speed-up
We seek for exponential drop in the time required, that is an algorithm that
works in O(logn) or at least O (log k n )
21-5
Complexity
21-6
Fast Algorithm for Matrix Multiplication
Pi1 j
Pi1 j
Pi1 j
Pi1 j
Pi 2 j
Pi 9 j
Pi 5 j
Pi 3 j
Pi 3 j
Pi 5 j
Pi 4 j Pi 5 j
Pi 6 j
O(log n) steps
Pi1 j
Pi 7 j
Pi 7 j
Thus, C can be computed in O(logn)
Pi 9 j
Pi11 j
Pi 8 j Pi 9 j Pi10 j Pi11 j Pi12 j
Complexity
21-7
Complexity Measures
We solved the Matrix Multiplication problem using n 3 processors in O(logn)
steps
These two numbers constitute the complexity measure for parallel algorithms:
• number of processors
• time complexity
As we parallelized an algorithm of time complexity O ( n 3 ) the total
amount of work done by a parallel algorithm cannot be less than n 3
n3
So the minimal number of processors required is O
log n
Complexity
Brent’s Principle
Our algorithm can be improved so that this theoretical bound is achieved
n3
• compute n in log n “shifts” using
processors at each shift
log
n
3
n
• use the same
processors to compute the first log log n
log n
parallel additions
3
• complete computation as before
Since the total number of steps is at most 2 log n, this algorithm works in
O(log n)
Brent’s Principle
If there is a parallelization, A, of a sequential algorithm that
works in O(f(n)), such that the time complexity of A is
g(n), then there is a parallelization, B, of the same algorithm
f (n)
such that B works in O(g(n)) and uses
processors
g (n)
21-8
Complexity
Graph Reachability
Reachability
Instance: A finite directed graph G and vertices s and t
Question: Is there a path from s to t?
The natural algorithm for Reachability (breadth-first search)
cannot be parallelized in an obvious way
Even if many processors are arranged to process nodes from the stack,
we cannot stop before a path is found that requires at least as many
steps as the length of the path, that is up to n – 1
It is suspected that both searches in graphs are inherently sequential
21-9
Complexity
21-10
Adjacency Matrix
1
4
3
G
2
10
1
A(G )
1
0
1
10
1
0
1
1
10
1
00
00
11
10
What is ( A(G )) 2 ?
n
A V Aik Akj
2
ij
k 1
Aij2 1 if and only if there is a path from i to j of length 2
Therefore, to solve Reachability for all possible pairs of vertices
we need to compute An1
This can be done in O (log 2 n ) parallel steps with O ( n 3 log 2 n ) total work