Transcript Lecture 4

Median/Order Statistics Algorithms
• Minimum and Maximum
• Selection in expected linear time
• Selection in worst-case linear time
Minimum and Maximum
• How many comparisons are sufficient to
find minimum/maximum?
• How many comparisons are sufficient to
find both minimum AND maximum?
• Show n + log n - 2 comparisons are
sufficient to find second minimum (and
minimum)
Median Problem
• How quickly can we find the median (or in
general the kth largest element) of an
unsorted list of numbers?
• Two approaches
– Quicksort partition algorithm expected Q (n)
time but W(n2) time in the worst-case
– Deterministic Q(n) time in the worst-case
Quicksort Approach
• int Select(int A[], k, low, high)
– Choose a pivot item
– Determine rank of pivot element in current
partition
• Compare all items to this pivot element
– If pivot is kth item, return pivot
– Else update low and high and recurse on
partition that contains kth item
Example
k=5
17 12 6 23 19 8 5 10
6 8 5 10 17 12 23 19
17 12 19 23
12 17
low high rank
1
8
5
8
4
5
6
7
found:
5
Probabilistic Analysis
• Assume each of n! permutations is equally likely
• Modify earlier indicator variable analysis of
quicksort to handle this k-selection problem
• What is probability ith smallest item is compared
to jth smallest item?
– If k is contained in (i..j)?
– If k ≤ i?
– If k ≥ j?
Cases where (i..j) do not contain k
• Case k ≥ j:
 S(i=1 to k-1)Sj = i+1 to k 2/(k-i+1) = Si=1 to k-1 (k-i) 2/(k-i+1)
= Si=1 to k-1 2i/(i+1) [replace k-i with i]
= 2 Si=1 to k-1 i/(i+1)
≤ 2(k-1)
• Case k ≤ i:
 S(j=k+1 to n)Si = k to j-1 2/(j-k+1) = Sj=k+1 to n (j-k) 2/(j-k+1)
= Sj = 1 to n-k 2j/(j+1)
[replace j-k with j and change bounds]
= 2 Sj=1 to n-k j/(j+1)
≥ 2(n-k)
• Total for both cases is ≤ 2n-2
Case where (i..j) contains k
• At most 1 interval of size 3 contains k
– i=k-1, j=k+1
• At most 2 intervals of size 4 contain k
– i=k-1, j=k+2 and i=k-2, j= k+1
• In general, at most q-2 intervals of size q contain k
• Thus we get S(q=3 to n) (q-2)2/q ≤ S(q=3 to n) 2 = 2(n-2)
• Summing together all cases we see the expected number of
comparisons is less than 4n
Best case, Worst-case
• Best case running time?
• What happens in the worst-case?
– Pivot element chosen is always what?
– This leads to comparing all possible pairs
– This leads to Q(n2) comparisons
Deterministic O(n) approach
• Need to guarantee a good pivot element while
doing O(n) work to find the pivot element
• int Select(int A[], k, low, high)
– Choosing pivot element
• Divide into groups of 5
• For each group of 5, find that group’s median
• Use median of the medians as pivot element
– Determine rank of pivot element
• Compare some remaining items directly to median
– Update low and high and recurse on partition that
contains kth item (or return kth item if it is pivot)
Guarantees on the pivot element
• Median of medians is guaranteed to be smaller than all the
red colored items
– Why?
– How many red items are there?
• Likewise, median of medians is guaranteed to be larger
than the blue colored items
• Thus median of medians is in the range:
• What elements do we need to compare to pivot to
determine its rank?
– How many of these are there?
Analysis of number of comparisons
• int Select(int A[], k, low, high)
– Choosing pivot element
• Analysis
– Choosing pivot element
• For each group of 5, find
that group’s median
• Find the median of the
medians
• c1 n/5
– c1 for median of 5
• Recurse on problem of
size n/5
– Compare remaining items
directly to median
– Recurse on correct partition
– c2 n comparisons
– Recurse on problem of size
7n/10
•
T(n) =
Solving recurrence relation
• T(n) = T(7n/10) + T(n/5) + O(n)
– Key observation: 7/10 + 1/5 = 9/10 < 1
• Prove T(n) ≤ cn for some constant n by
induction on n
• T(n) = 7cn/10 + cn/5 + dn
•
= 9cn/10 + dn
• Need 9cn/10 + dn ≤ cn
• Thus c/10 ≥ d  c ≥ 10d