Fast computation of maximum - uni

Download Report

Transcript Fast computation of maximum - uni

Lecture 4 : Accelerated Cascading
and Parallel List Ranking
• We will first discuss a technique called
accelerated cascading for designing very
fast parallel algorithms.
• We will then study a very important
technique for ranking the elements of a list
in parallel.
1
Advanced Topics in Algorithms and Data Structures
Fast computation of maximum
Input: An array A holding p elements from a linearly
ordered universe S. We assume that all the elements in
A are distinct.
Output: The maximum element from the array A.
We use a boolean array M such that M(k)=1 if and only
if A(k) is the maximum element in A.
Initialization: We allocate p processors to set each entry
in M to 1.
2
Advanced Topics in Algorithms and Data Structures
Fast computation of maximum
Step 1: Assign p processors for each element in A,
p2 processors overall.
•Consider the p processors allocated to A(j). We
name these processors as P1, P2,..., Pi,..., Pp.
•Pi compares A(j) with A(i) :
If A(i) > A(j) then M(j) := 0
else do nothing.
3
Advanced Topics in Algorithms and Data Structures
Fast computation of maximum
Step 2: At the end of Step 1, M(k) , 1  k  p
will be 1 if and only if A(k) is the maximum
element.
•We allocate p processors, one for each
entry in M.
•If the entry is 0, the processor does nothing.
•If the entry is 1, it outputs the index k of the
maximum element.
4
Advanced Topics in Algorithms and Data Structures
Fast computation of maximum
Complexity: The processor requirement is p2
and the time complexity is O(1).
• We need concurrent write facility and
hence the Common CRCW PRAM model.
5
Advanced Topics in Algorithms and Data Structures
Optimal computation of
maximum
• This is the same algorithm which we used
for adding n numbers.
6
Advanced Topics in Algorithms and Data Structures
Optimal computation of
maximum
• This algorithm takes O(n) processors and
O(log n) time.
• We can reduce the processor complexity
to O(n / log n). Hence the algorithm does
optimal O(n) work.
7
Advanced Topics in Algorithms and Data Structures
An O(log log n) time algorithm
• Instead of a binary tree, we use a more complex
2k
tree. Assume that n  2 .
2k 1
• The root of the tree has 2  n children.
2k i1
• Each node at the i-th level has 2
children
for 0  i  k  1.
• Each node at level k has two children.
8
Advanced Topics in Algorithms and Data Structures
An O(log log n) time algorithm
Some Properties
• The depth of the tree is k. Since
n  2 , k    loglog n 
2k
• The number of nodes at the i-th level is
2k 2k i
2
,for 0  i  k.
Prove this by induction.
9
Advanced Topics in Algorithms and Data Structures
An O(log log n) time algorithm
The Algorithm
• The algorithm proceeds level by level,
starting from the leaves.
• At every level, we compute the maximum
of all the children of an internal node by
the O(1) time algorithm.
• The time complexity is O(log log n) since
the depth of the tree is O(log log n).
10
Advanced Topics in Algorithms and Data Structures
An O(log log n) time algorithm
Total Work:
• Recall that the O(1) time algorithm needs
O(p2) work for p elements.
2k i1
• Each node at the i-th level has 2 children.
• So the total work for each node at the i-th
2k i1 2
level is O(2 ) .
11
Advanced Topics in Algorithms and Data Structures
An O(log log n) time algorithm
Total Work:
2k 2k i
• There are 2
nodes at the i-th level.
Hence the total work for the i-th level is:
2k i1 2
O(2
2k 2k i
) 2
 O(2 )  O(n)
2k
• For O(log log n) levels, the total work is
O(n log log n) . This is suboptimal.
12
Advanced Topics in Algorithms and Data Structures
Accelerated cascading
• The first algorithm which is based on a
binary tree, is optimal but slow.
• The second algorithm is suboptimal, but
very fast.
• We combine these two algorithms through
the accelerated cascading strategy.
13
Advanced Topics in Algorithms and Data Structures
Accelerated cascading
• We start with the optimal algorithm until
the size of the problem is reduced to a
certain value.
• Then we use the suboptimal but very fast
algorithm.
14
Advanced Topics in Algorithms and Data Structures
Accelerated cascading
Phase 1.
• We apply the binary tree algorithm, starting
from the leaves and upto log log log n
levels.
• The number of candidates reduces to
n
2log log log n
n

.
log log n
• The total work done so far is O(n) and the
total time is O(log log log n) .
15
Advanced Topics in Algorithms and Data Structures
Accelerated cascading
Phase 2.
• In this phase, we use the fast algorithm on
n

) remaining candidates.
the n  O(
log log n
• The total work is O(n log log n)  O(n) .
• The total time is O(log log n)  O(log log n) .
• Theorem: Maximum of n elements can be
computed in O(log log n) time and O(n)
work on the Common CRCW PRAM.
16
Advanced Topics in Algorithms and Data Structures