Computer Algorithms Lecture 8 Quicksort Ch. 7

Download Report

Transcript Computer Algorithms Lecture 8 Quicksort Ch. 7

Computer Algorithms
Lecture 12
Medians and Order Statistics
Ch. 9
Some of these slides are courtesy of D. Plaisted et al, UNC and M. Nicolescu, UNR
1
Order Statistic
• ith order statistic: ith smallest element of a set of n distinct
elements.
• Minimum: first order statistic.
• Maximum: nth order statistic.
• Median: “half-way point” of the set.
– Unique, when n is odd – occurs at i = (n+1)/2.
– Two medians when n is even.
• Lower median, at i = n/2.
• Upper median, at i = n/2+1.
– In general:
• Lower median, at i= (n+1)/2
• Upper median, at i= (n+1)/2
• For consistency, “median” will refer to the lower median.
2
Selection Problem
• Selection problem:
– Input: A set A of n distinct numbers and a number i,
with 1 i  n.
– Output: the element x  A that is larger than exactly i -1
other elements of A.
• Obviously, this can be solve in O(n lg n). Why?
• Is it really necessary to sort all the numbers?
• We will study faster linear-time algorithms.
– For the special cases when i = 1 and i = n.
– For the general problem.
3
Minimum (Maximum)
Minimum (A)
1. min  A[1]
2. for i  2 to length[A]
3.
do if min > A[i]
4.
then min  A[i]
5. return min
Maximum can be determined similarly.
• T(n) = (n).
• No. of comparisons: n – 1.
• Can we do better?
•worst-case
• Why not?
• Minimum(A) has optimal # of comparisons.
4
Simultaneous Minimum and
Maximum
• Some applications need to determine both the maximum and minimum
of a set of elements.
– Example: Graphics program trying to fit a set of points onto a
rectangular display.
• Independent determination of maximum and minimum requires 2n – 2
comparisons.
• Can we reduce this number?
– Yes.
5
Simultaneous Minimum and
Maximum
• At most 3n/2 comparisons
• Setting up initial values:
– n is odd: set both min and max to the first element
– n is even: compare the first two elements, assign the smallest
one to min and the largest one to max
• Total number of comparisons:
– n is odd: we do 3(n-1)/2 comparisons
– n is even: we do 1 initial comparison + 3(n-2)/2 more
comparisons = 3n/2 - 2 comparisons
6
Example: Simultaneous Min, Max
•
n = 5 (odd), array A = {2, 7, 1, 3, 4}
1. Set min = max = 2
2. Compare elements in pairs:
–
1 < 7  compare 1 with min and 7 with max
 min = 1, max = 7
–
3 comparisons
3 < 4  compare 3 with min and 4 with max
 min = 1, max = 7
3 comparisons
We performed: 3(n-1)/2 = 6 comparisons
7
Example: Simultaneous Min, Max
•
n = 6 (even), array A = {2, 5, 3, 7, 1, 4}
1.
Compare 2 with 5: 2 < 5
2.
Set min = 2, max = 5
3.
Compare elements in pairs:
–
3 < 7  compare 3 with min and 7 with max
1 comparison
3 comparisons
 min = 2, max = 7
–
1 < 4  compare 1 with min and 4 with max
 min = 1, max = 7
3 comparisons
We performed: 3n/2 - 2 = 7 comparisons
8
Simultaneous Minimum and
Maximum. Summary
• At most 3n/2 comparisons
– Process elements in pairs
– Maintain the minimum and maximum of elements seen so far
– Don’t compare each element to the minimum and maximum
separately
– Compare the elements of a pair to each other
– Compare the larger element to the maximum so far, and
compare the smaller element to the minimum so far
– This leads to only 3 comparisons for every 2 elements
• Total no. of comparisons, C  3n/2.
– For odd n: C = 3n/2.
– For even n: C = 3(n – 2)/2 + 1 (For the initial comparison).
= 3n/2 – 2 < 3n/2.
9
General Selection Problem
• Seems more difficult than Minimum or Maximum.
– Yet, has solutions with same asymptotic complexity as Minimum
and Maximum.
• We will study 2 algorithms for the general problem.
– One with expected linear-time complexity.
– A second, whose worst-case complexity is linear.
10
Selection in Expected Linear Time
• The general selection problem seems much harder, but runs in (n)
• Works like QUICKSORT (Randomized Partition)
– Partition the array recursively
– Unlike QUICKSORT: only works on one partition!
– QUICKSORT runs in O(n lg n), SELECT runs (n)
• Exploits the abilities of Randomized-Partition (RP).
– RP returns the index k in the sorted order of a randomly chosen
element (pivot).
• If the order statistic we are interested in, i, equals k, then we are
done.
• Else, reduce the problem size using its other ability.
– RP rearranges the other elements around the random pivot.
• If i < k, selection can be narrowed down to A[1..k – 1].
• Else, select the (i – k)th element from A[k+1..n].
(Assuming RP operates on A[1..n]. For A[p..r], change k appropriately.)
11
Randomized Quicksort: review
Quicksort(A, p, r)
if p < r then
q := Rnd-Partition(A, p, r);
Quicksort(A, p, q – 1);
Quicksort(A, q + 1, r)
fi
A[p..r]
5
A[p..q – 1] A[q+1..r]
Partition
Rnd-Partition(A, p, r)
i := Random(p, r);
A[r]  A[i];
x, i := A[r], p – 1;
for j := p to r – 1 do
if A[j]  x then
i := i + 1;
A[i]  A[j]
fi
od;
A[i + 1]  A[r];
return i + 1
5
5
5
12
Randomized-Select
Randomized-Select(A, p, r, i) // select i-th order statistic.
1. if p = r
2. then return A[p]
3. q  Randomized-Partition(A, p, r)
4. k  q – p + 1
5. if i = k
6.
then return A[q]
7. elseif i < k
8.
then return Randomized-Select(A, p, q – 1, i)
9.
else return Randomized-Select(A, q+1, r, i – k)
The pivot is not included in any of the subarrays!!
13
Analysis
• Worst-case Complexity:
– (n2) – As we could get unlucky and always works on a subarray
that is only one element smaller than the previous subarray.
• Average-case Complexity:
– (n) – Intuition: Because the pivot is chosen at random, we expect
that we get rid of half of the list each time we choose a random
pivot q.
– Why (n) and not (n lg n)?
14
Random Variables and Expectation
Def.: (Discrete) random variable X: a function from a
sample space S to the real numbers.
– It associates a real number with each possible outcome of an
experiment
E.g.: X = face of one fair dice
– Possible values: {1, 2, 3, 4, 5, 6}
– Probability to take any of the values:
1/6
15
Random Variables and Expectation
• Expected value (expectation, mean) of a discrete random
variable X is:
E[X] = Σx x Pr{X = x}
– “Average” over all possible values of random variable X
E.g.: X = face of one fair dice
E[X] = 11/6 + 21/6 + 31/6 + 41/6 + 51/6 + 61/6 = 3.5
16
Example
E.g.: flipping two coins:
– Earn $3 for each head, lose $2 for each tail
– X: random variable representing your earnings
– Three possible values for variable X:
• 2 heads  x = $3 + $3 = $6, Pr{2 H’s} = ¼
• 2 tails  x = -$2 - $2 = -$4, Pr{2 T’s} = ¼
• 1 head, 1 tail  x = $3 - $2 = $1, Pr{1 H, 1 T} = ½
– The expected value of X is:
E[X] = 6  Pr{2 H’s} + 1  Pr{1 H, 1 T} – 4 Pr{2 T’s}
=6¼+1½-4¼=1
17
More Examples
E.g: X = lottery earnings
– 1/15,000,000 probability to win a 16,000,000 prize
0 and 16,000,000
Probability to win 0:
1 - 1/15,000,000
– Possible values:
–
1
14,999,999 16
0

 1.07
15,000,000 15,000,000 15
E[X] = 16,000,000
18
Indicator Random Variables
• Given a sample space S and an event A, we define the
indicator random variable I{A} associated with A:
– I{A} =
1
if A occurs
0
if A does not occur
• The expected value of an indicator random variable XA is:
E[XA] = Pr {A}
• Proof:
E[XA] = E[I{A}] = 1  Pr{A} + 0  Pr{Ā} = Pr{A}
19
Example
• Determine the expected number of heads obtained when flipping a
coin
– Space of possible values:
S = {H, T}
– Random variable Y: takes on the values H and T, each with probability ½
• Indicator random variable XH: the coin coming up heads (Y = H)
– Counts the number of heads obtain in the flip
– XH = I {Y = H} =
1
if Y = H
0
if Y = T
• The expected number of heads obtained in one flip of the coin is:
E[XH] = E [I {Y = H}] = 1  Pr{Y = H} + 0  Pr{Y = T} =
=1½+0½=½
20
Analysis of Expected Time for
Randomized-Select
• The analysis follows that of randomized quicksort
• Let T(n) = the random variable for the running time of
RANDOMIZED-SELECTo n an input of size n, assuming random
numbers are independent.
• For k= 0, 1, …, n–1, define the indicator random variable
1 if PARTITION generates a k: n–k–1 split,
Xk=
0 otherwise.
21
Average-case Analysis
• A call to RandomizedSelect may
– Terminate immediately with the correct answer,
– Recurse on A[p..q – 1], or
– Recurse on A[q+1..r].
• To obtain an upper bound, assume that the ith smallest element that we
want is always in the larger subarray.
• RP takes O(n) time on a problem of size n.
• Hence, recurrence for T(n) is:
n
T (n)   X k  (T (max( k  1, n  k ))  O(n))
k 1
• For a given call of RS, Xk =1 for exactly one value of k, and Xk = 0 for all
other k.
22
Analysis of Expected Time for
Randomized-Select
• To obtain an upper bound, assume that the ith element
always falls in the larger side of the partition:
T(max{0, n–1}) + Θ(n) if 0:n–1 split,
T(max{1, n–2}) + Θ(n) if 1:n–2 split
•
•
•
T(n)=
T(max{n–1, 0}) + Θ(n) if n–1:0 split
n 1
  X k (T (max{k , n  k  1})  (n))
k 0
23
Average-case Analysis
n
T (n)   X k  (T (max( k  1, n  k ))  O (n))
k 1
n
  X k  T (max( k  1, n  k ))  O (n)
k 1
Taking expectatio n, we have
n

E[T (n)]  E  X k  T (max( k  1, n  k ))  O (n)
 k 1

n
  E[ X k  T (max( k  1, n  k ))]  O (n)
(by linearity of expectation)
k 1
n
  E[ X k ]  E[T (max( k  1, n  k ))]  O (n)
(by Eq. (C.23))
k 1
n
1
   E[T (max( k  1, n  k ))]  O (n)
k 1 n
(by Eq. (9.1))
24
Average-case Analysis (Contd.)
k  1 if k  n / 2
max( k  1, n  k )  
 n  k if k  n / 2
The summation is expanded
1  E (T (n  1))  E (T (n  2))    E (T (n  n / 2))  

E[T (n)]  O(n)  
n  E (T ( n / 2))    E (T (n  1))

• If n is odd, T(n – 1) thru T(n/2) occur twice and T(n/2) occurs once.
• If n is even, T(n – 1) thru T(n/2) occur twice.
Thus, we have
2 n1
E[T (n)] 
E[T (k )]  O(n).

n k  n / 2 
25
Average-case Analysis (Contd.)
• We solve the recurrence by substitution.
• Guess E[T(n)] = O(n).
2 n 1
E[T (n)] 
 ck  an
n k  n / 2 
n / 2 1 
2c  n 1
   k   k   an
n  k 1
k 1 

c  3n 2 n
 
  2   an
n 4 2

 3n 1 2 
 c     an
 4 2 n
3cn c

  an
4 2
 cn c

 cn     an 
4 2

 cn c

cn     an   cn,
4 2

cn c
   an  0
4 2
 n(c / 4  a )  c / 2, or
n
c/2
2c

, if c  4a.
c / 4  a c  4a
Thus, if we assume T(n) = O(1) for
n < 2c/(c – 4a), we have E[T(n)] =
O(n).
26
Summary of Randomized Order
Statistic Selection
• Works fast: linear expected time.
• Excellent algorithm in practice.
• But, the worst case is very bad: Θ(n2).
• Question:
– Is there an algorithm that runs in linear time in the worst case?
• IDEA:
– Generate a good pivot recursively.
27
Selection in Worst-Case Linear Time
• Algorithm Select:
– Like RandomizedSelect, finds the desired element by
recursively partitioning the input array.
– Unlike RandomizedSelect, is deterministic.
• Uses a variant of the deterministic Partition routine.
• Partition is told which element to use as the pivot.
– Achieves linear-time complexity in the worst case by
• Guaranteeing that the split is always “good” at each
Partition.
• How can a good split be guaranteed?
28
Guaranteeing a Good Split
• We will have a good split if we can ensure that the pivot is the median
element or an element close to the median.
• Hence, determining a reasonable pivot is the first step.
29
Choosing a Pivot
• Median-of-Medians:
– Divide the n elements into n/5 groups.
•  n/5 groups contain 5 elements each; 1 group contains
n mod 5 < 5 elements.
• Determine the median of each of the groups.
– Sort each group using Insertion Sort. Pick the median from the
sorted list of group elements.
• Recursively find the median x of the n/5 medians.
• Recurrence for running time (of median-of-medians):
– T(n) = O(n) + T(n/5) + ….
30
Algorithm Select
• Determine the median-of-medians x (using the procedure on the
previous slide.)
• Partition the input array around x using the variant of Partition.
• Let k be the index of x that Partition returns.
• If k = i, then return x.
• Else if i < k, then apply Select recursively to A[1..k–1] to find the ith
smallest element.
• Else if i > k, then apply Select recursively to A[k+1..n] to find the (i – k)th
smallest element.
31
Choosing Pivot
1. Divide the n elements into groups of 5.
32
Choosing Pivot
Lesser
element
Greater
element
1. Divide the n elements into groups of 5. Find the median
of each 5-element group.
33
Choosing Pivot
x
1. Divide the n elements into groups of 5. Find the median
of each 5-element group.
2. Recursively SELECT the median x of the  n/5 group
medians to be the pivot.
34
Choosing Pivot
x
3. At least half the group medians are >=x, which is at
least   n/5  /2  =  n/10  group medians.
• Therefore, at least 3  n/10  elements are ≤x.
• Similarly, at least 3  n/10  elements are ≥x.
35
Choosing Pivot
Arrows point from larger to smaller elements.
Elements < x n/5 groups of 5 elements each.
n/5th group of n mod 5
elements.
Median-of-medians, x
Elements > x
36
Interesting Results
• Assumption: Elements are distinct. Why?
• At least half of the n/5 medians are
greater than x.
• Thus, at least half of the n/5 groups
contribute 3 elements that are greater
than x.
– The last group and the group containing
x may contribute fewer than 3 elements.
Exclude these groups. At least half of
the medians are greater than the
median of medians
• Hence, the no. of elements > x is at least
 1  n   3n
3    2    6
 2  5   10
• Thus, in the worst case, Select is called recursively on at most 7n/10+6
elements.
37
Working with Recursion
T(n)
SELECT (i,n)
1. Divide the n elements into groups of 5; Find the
Θ(n)
median of each 5-element group (insertion sort)
2. Recursively SELECT the median x of the  n/5 
T(n/5)
group medians to be the pivot.
Θ(n) 3. Partition around the pivot x. Let k= rank(x).
4. if i= k then return x
elseif i< k
T(7n/10)
then recursively SELECT the ith smallest
element in the lower part
else recursively SELECT the (i–k)th smallest
element in the upper part
38
The Recurrence Equation
• T (n) = T (n/5) + T (7n/10 + 6) + O(n)
Time to find the
median of medians
The remaining part
of the problem
Dividing, finding the medians
and partitioning around the
median of medians.
• This turns out to be linear
• Assume T(n)  (1), for n  140.
39
Solving the recurrence
• To show: T(n) = O(n)  cn for suitable c and all n > 0.
• Assume: T(n)  cn for suitable c and all n  140.
• Substituting the inductive hypothesis into the recurrence,
– T(n)  c n/5 + c(7n/10+6)+an
–cn/10 + 7c + an  0  c 
 cn/5 + c + 7cn/10 + 6c + an
10a(n/(n – 70)), when n >
= 9cn/10 + 7c + an
70.
= cn +(–cn/10 + 7c + an)
 cn, if –cn/10 + 7c + an  0.
For n  140, c  20a.
• n/(n–70) is a decreasing function of n. Verify.
• Hence, c can be chosen for any n = n0 > 70, provided it can be assumed
that T(n) = O(1) for n  n0.
• Thus, Select has linear-time complexity in the worst case.
40
Guaranteeing a Good Split.
Summary
• We will have a good split if we can ensure that the pivot is the median
element or an element close to the median.
• Hence, determining a reasonable pivot is the first step.
• Which element do you want to partition around?
• Worst-case: elements are sorted
• Does RANDOMIZED-SELECT guarantee a good split?
• Idea: recursively find the median of medians
1. Divide elements into groups of 5
2. Find the medians of those five elements
3. Find the median of those medians
4. Partition around that median
5. Your partition point will have k elements to the left, and
n -k elements to the right. Make decision.
41
Summary
• Finding the ith order statistic using sorting takes (n lg n).
• Not necessary to sort everything
• Can leverage off of the PARTITION method from
QUICKSORT
• You can guarantee a good split by finding the median of
medians
42