Lecture 11, February 8

Download Report

Transcript Lecture 11, February 8

Lecture 11 Deterministic Selection
 The goal is to determine the i'th smallest element from a list of n
elements in linear time. No random numbers are used.
 The algorithm is due to Blum, Floyd, Pratt, Rivest, and Tarjan
(1973). The idea is the same as RANDOMIZED-SELECT, but
instead of choosing a partition randomly, we partition around an
element that we have chosen very cleverly.
 Here's the idea: we split our list of n numbers into floor(n/5) groups
of 5 elements, and one additional group of at most 4 elements. Then
we sort each group of 5 elements, and find the median of each
group of 5. This gives us a list of floor(n/5) medians. We now
(recursively!) determine the median of these floor(n/5) medians, and
call it x. This is the number we will partition around.
Picture: n=27
smallest
largest
S
S
S
*
*
S
S
S
*
*
S
S
S
*
*
S
S
x
L
L
*
*
L
L
L
*
*
L
L
L
* *
* *
L
L
L
Columns = 5 element groups.
S<x<L
Time Complexity: T(n) = T(n/5) + T(3n/4) + dn =
O(n) – we will prove this by induction in class.
Algorithm: select i-th smallest element
 SELECT(A,i)
n := |A|;
if n < 60 then sort(A); return i'th smallest element;
else
m := floor(n/5);
divide A up into m groups of 5 elements;
sort each of the m groups in ascending order;
M := array of medians of each group;
x := SELECT(M,ceil(m/2)); /* median of all the medians */
k := X-PARTITION(A,x);
/* partition array A into elements ≤ x and elements > x;
return number of elements on "low side" of the partition */
if i = k then return x;
else if (i < k) then return SELECT(A[1..k-1],i);
else return SELECT(A[k+1..n],i-k);
More careful analysis
 There are floor(n/5) total columns in which 5 elements appear.
Of these, at least 1/2 (more precisely, ceil(floor(n/5)/2) ) contain
an L. All of these columns, except the one where x appears,
contributes 3 to the count of L's; the one where x appears
contributes 2. The conclusion is that ≥ 3 ceil(floor(n/5)/2) - 1
elements are L's, that is, are greater than x.
 Hence at most ≤ n - (3 ceil(floor(n/5)/2) - 1) elements are ≤ x.
Now we claim that
n - (3 ceil(floor(n/5)/2) - 1) ≤ 7n/10 + 3
for all n ≥ 1. To prove this, note that floor(n/5) ≥ n/5 - 1
hence ceil(floor(n/5)/2) ≥ n/10 - ½, hence 3 ceil(floor(n/5)/2) - 1 ≥
3n/10 - 5/2. Hence n - (3 ceil(floor(n/5)/2) - 1) ≤ 7n/10 + 5/2,
which proves the result.
Analysis continues
 It follows that the time T(n) to select from a list of n elements
satisfies the following inequality:
T(n) ≤ T(floor(n/5)) + T(r) + dn
where r ≤ 7n/10 + 3, and the dn term soaks up the time needed
to do the sorting and the partitioning. Let's guess that this
recurrence obeys T(n) ≤ cn for some c:
 T(n) ≤ T(floor(n/5)) + T(r) + dn
≤ cn/5 + cr + dn
≤ cn/5 + c(7n/10 + 3) + dn
≤ 9cn/10 + 3c + dn
and we want this to be less than cn. Now
9cn/10 + 3c + dn ≤ cn iff (c/10 - d)n ≥ 3c.
 So let's assume n ≥ 60. Then (c/10 - d) n ≥ (c/10 - d) 60, and if
this is to be ≥ 3c, then we must have 6c - 60d ≥ 3c, and so c ≥
20d. If n ≥ 60, and c ≥ 20d, then the induction step indeed
works. For n < 60, everything can be done in O(1) time!
Prepare for the midterm Exam
 In particular, there will be one question on divide-and-
conquer, one question on dynamic programming, and
one question on greedy algorithms (and prove a
greedy algorithm is correct).
 You need to understand O, Θ, Ω asymptotic notations
 Know how to solve recurrence relationships. Use
Master’s theorem.