Transcript Document

Computability and Complexity
30-1
Parallel Computation
Computability and Complexity
Andrei Bulatov
Computability and Complexity
30-2
Multiprocessor Computation
Obviously, several communicating processors often can solve problems
faster than a single processor
processor1
processor2
processor3
memory1
memory2
memory3
shared memory
…
processorn
memoryn
Computability and Complexity
Assumptions
• Exchanging messages takes no time (one step in the other model)
• Writing to and reading from the memory takes no time and can be shared
• We can use as many processors as we wish (for different instances of
the same problem different number of processors can be used), but this
number is polynomial in the size of the input
- from the point of view of complexity theory, fixed number of processors
makes no sense — it gives only constant speed up
- we need to reveal “true” parallelism in algorithms
- an algorithm with unlimited number of processors can be “scaled” to an
algorithm with fixed number of processors
30-3
Computability and Complexity
30-4
Example
 A11 


Given two matrices A   
A
 n1 
compute their product C  A  B
A1n 
 B11


  and B   
B
Ann 
 n1
n
Recall that Cij   Aik  Bkj
k 1
Straightforward computation of C can be done in O ( n 3 )
Can we do better with parallel computation?
Obviously, yes: use a separate processor for each of the n 3
multiplications (denote it Pikj );
then use Pi1 j to compute the sum for Cij
Totally, we need time O(n)



B1n 

 
Bnn 
Computability and Complexity
We have obtained polynomial speed-up
This is not sufficient: changing the model of sequential computation it is
possible to get some polynomial speed-up
We seek for exponential drop in the time required, that is an algorithm that
works in O(logn) or at least O (log k n )
30-5
Computability and Complexity
30-6
Fast Algorithm for Matrix Multiplication
Pi1 j
Pi1 j
Pi1 j
Pi1 j
Pi 2 j
Pi 9 j
Pi 5 j
Pi 3 j
Pi 3 j
Pi 5 j
Pi 4 j Pi 5 j
Pi 6 j
O(log n) steps
Pi1 j
Pi 7 j
Pi 7 j
Thus, C can be computed in O(logn)
Pi 9 j
Pi11 j
Pi 8 j Pi 9 j Pi10 j Pi11 j Pi12 j
Computability and Complexity
30-7
Complexity Measures
We solved the Matrix Multiplication problem using n 3 processors in O(logn)
steps
These two numbers constitute the complexity measure for parallel algorithms:
• number of processors
• time complexity
As we parallelized an algorithm of time complexity O ( n 3 ) the total
amount of work done by a parallel algorithm cannot be less than n 3
 n3 
So the minimal number of processors required is O 

 log n 
Computability and Complexity
Brent’s Principle
Our algorithm can be improved so that this theoretical bound is achieved
 n3 
• compute n in log n “shifts” using 
 processors at each shift
log
n


3
 n 
• use the same 
processors to compute the first log log n

 log n 
parallel additions
3
• complete computation as before
Since the total number of steps is at most 2 log n, this algorithm works in
O(log n)
Brent’s Principle
If there is a parallelization, A, of a sequential algorithm that
works in O(f(n)), such that the time complexity of A is
g(n), then there is a parallelization, B, of the same algorithm
f (n)
such that B works in O(g(n)) and uses
processors
g (n)
30-8
Computability and Complexity
Graph Reachability
Reachability
Instance: A finite directed graph G and vertices s and t
Question: Is there a path from s to t?
The natural algorithm for Reachability (breadth-first search)
cannot be parallelized in an obvious way
Even if many processors are arranged to process nodes from the stack,
we cannot stop before a path is found that requires at least as many
steps as the length of the path, that is up to n – 1
It is suspected that both searches in graphs are inherently sequential
30-9
Computability and Complexity
30-10
Adjacency Matrix
1
4
3
G
2
 10

1
A(G )  
1

0
1
10
1
0
1
1
10
1
00

00
11

10
What is ( A(G )) 2 ?
n
A  V Aik  Akj
2
ij
k 1
Aij2  1 if and only if there is a path from i to j of length 2
Therefore, to solve Reachability for all possible pairs of vertices
we need to compute An1
This can be done in O (log 2 n ) parallel steps with O ( n 3 log 2 n ) total work