Divide-and-Conquer

Download Report

Transcript Divide-and-Conquer

Divide-and-Conquer
Dr. Yingwu Zhu
P65-74, p83-88, p93-96, p170-180
Divide-and-Conquer
The most-well known algorithm design
technique:
1. Divide instance of problem into two or more
smaller instances
• Recursive case
2. Solve smaller instances independently and
recursively
• When to stop? – base case
3. Obtain solution to original (larger) instance by
combining these solutions
Divide-and-Conquer Technique (cont.)
a problem of size n
subproblem 1
of size n/2
Divide
a solution to
subproblem 1
subproblem 2
of size n/2
a solution to
subproblem 2
a solution to
the original problem
A General Template
//S is a large problem with input size of n
Algorithm divide_and_conquer(S)
if (S is small enough to handle)
solve it //base case: conquer
else
split S into two (equally-sized) subproblems S1 and S2
divide_and_conquer(S1)
divide_and_conquer(S2)
combine solutions to S1 and S2
endif
End
General Divide-and-Conquer Recurrence
• Recursive algorithms are a natural fit for
divide-and-conquer
– Distinguish from Dynamic Programming
• Recall in Lecture 1, algorithm efficiency
analysis for recursive algorithms
– Key: Recurrence Relation
– Solve: backward substitution, often cumbersome!
General Divide-and-Conquer Recurrence
T(n) = aT(n/b) + f (n)
where f(n)  (nd), d  0, f(n) accounts for the time spent
on dividing the problem into smaller ones and combining
their solutions
Master Theorem: If a < bd, T(n)  (nd)
If a = bd, T(n)  (nd log n)
If a > bd, T(n)  (nlog b a )
Note: The same results hold with O instead of .
Examples: T(n) = 4T(n/2) + n  T(n)  ?
T(n) = 4T(n/2) + n2  T(n)  ?
T(n) = 4T(n/2) + n3  T(n)  ?
Divide-and-Conquer Examples
• Sorting: mergesort and quicksort
• Maximum subarray pronblem
• Multiplication of large integers
• Closest-pair problem
• Matrix multiplication: Strassen’s algorithm (reading)
Mergesort
• Question #1: what makes mergesort distinct
from many other sorting algorithms?
internal and external algorithm
• Question #2: How to design mergesort using
divide-and-conquer technique?
Mergesort
• Split array A[0..n-1] in two about equal halves and
make copies of each half in arrays B and C
• Sort arrays B and C recursively
• Q: when to stop?
• Merge sorted arrays B and C into array A as follows:
– Repeat the following until no elements remain in one of the arrays:
• compare the first elements in the remaining unprocessed
portions of the arrays
• copy the smaller of the two into A, while incrementing the
index indicating the unprocessed portion of that array
– Once all elements in one of the arrays are processed, copy the
remaining unprocessed elements from the other array into A.
Pseudocode of Mergesort
Pseudocode of Merge
8 3 2 9 7 1 5 4
Mergesort
Example
8 3 2 9
8 3
8
7 1 5 4
2 9
3
2
3 8
71
9
2 9
7
5 4
1
5
1 7
2 3 8 9
4 5
1 4 5 7
1 2 3 4 5 7 8 9
4
Analysis of Mergesort
• Time efficiency by recurrence reltation:
T(n) = 2T(n/2) + f(n)
n-1 comparisons in merge operation for worst case!
T(n) = Θ(n log n)
• Number of comparisons in the worst case is close to
theoretical minimum for comparison-based sorting:
log2 n! ≈ n log2 n - 1.44n (Section 11.2)
• Space requirement: Θ(nlogn) (not in-place)
• Can be implemented without recursion (bottom-up)
Mergsort: A big picture
• Problem: Assume you want to sort X terabytes
(even perabytes) of data using a cluster of M
(even thousands) computers.
• Solution?
Similar to Search Engines in some aspects:
data partitioning!
Exercises #1
a. Write a pseudocode for a divide-and-conquer algorithm for
finding the position of the largest element in an array of n
numbers.
b. Set up and solve a recurrence relation for the number of key
comparisons made by your algorithm.
c. How does this algorithm compare with the brute-force
algorithm for this problem?
Quicksort
• Select a pivot (partitioning element) – here, the first
element for simplicity!
• Rearrange the list so that all the elements in the first s
positions are smaller than or equal to the pivot and all
the elements in the remaining n-s positions are larger
than the pivot (see next slide for an algorithm)
p
A[i]p
A[i]>p
• Exchange the pivot with the last element in the first
(i.e., ) subarray — the pivot is now in its final position
• Sort the two subarrays recursively
Quicksort
• Basic operation: split/divide
– Differ from the divide operation in mergesort
– What is the major difference?
Quicksort
• Basic operation: split/divide
– Differ from the divide operation in mergesort
– What is the major difference?
• Each split will place the pivot in the right position, and
the left sublist < the right sublist
• No explicit merge
Partitioning Algorithm
A[i] > p
A[j] <= p
Quicksort Example
8, 2, 13, 5, 14, 3, 7
Analysis of Quicksort
• Best case: T(n) =?
• Worst case: T(n) =?
Analysis of Quicksort
• Best case: split in the middle — Θ(n log n)
• Worst case: sorted array! — Θ(n2)
• Average case: random arrays — Θ(n log n)
– Assume the split can happen in each position with equal
probability! See textbook for details!
• Improvements:
– better pivot selection: median-of-three partitioning
– switch to insertion sort on small sublists
– elimination of recursion
These combination makes 20-25% improvement
• Considered the method of choice for internal
sorting of large files (n ≥ 10000)
Randomized Quicksort
• Many regard the randomized quicksort as the
sorting algorithm of choice for large inputs
• Random sampling:
– Select a randomly-chosen element from the
subarray as the pivot
• Expected running time: O(nlogn)
– Chapter 7.4, p180
Randomized Quicksort
randomized_partition(A[l..r])
int pos = l + rand() % (r-l+1);
swap(A[l], A[pos]);
partition(A[l..r]); //call default one
Questions for Quicksort
• Q1: How to implement the median-of-three
rule by reusing the previous implementation
of the split operation?
• Q2: How to implement a non-recursive
quicksort?
Divide-and-Conquer Examples
•
Sorting: mergesort and quicksort
•
Binary search (degenerated)
•
Binary tree traversals
•
Multiplication of large integers
•
Matrix multiplication: Strassen’s algorithm
•
Closest-pair algorithm
Maximum Subarray Problem
Problem: given an array of n numbers, find the (a) contiguous
subarray whose sum has the largest value.
Application: an unrealistic stock market game, in which you
decide when to buy and see a stock, with full knowledge of
the past and future. The restriction is that you can perform
just one buy followed by a sell. The buy and sell both occur
right after the close of the market.
The interpretation of the numbers: each number represents the
stock value at closing on any particular day.
Maximum Subarray Problem
Example:
Maximum Subarray Problem
Another Example: buying low and selling high,
even with perfect knowledge, is not trivial:
A brute-force solution
• O(n2) Solution
• Considering Cn2 pairs
• Not a pleasant prospect if we are rummaging
through long time-series (Who told you it was
easy to get rich???), even if you are allowed to
post-date your stock options…
A Better Solution: Max Subarray
Transformation: Instead of the daily price, let us
consider the daily change: A[i] is the difference
between the closing value on day i and that on day i-1.
The problem becomes that of finding a contiguous
subarray the sum of whose values is maximum.
• On a first look this seems even worse: roughly the same
number of intervals (one fewer, to be precise), and the
requirement to add the values in the subarray rather than just
computing a difference:
W(n3)?
Max Subarray
Max Subarray
How do we divide?
We observe that a maximum contiguous subarray A[i…j] must be located as
follows:
1.It lies entirely in the left half of the original array: [low…mid];
2.It lies entirely in the right half of the original array: [mid+1…high];
3.It straddles the midpoint of the original array: i ≤ mid < j.
Max Subarray: Divide & Conquer
• The “left” and “right” subproblems are smaller versions of the
original problem, so they are part of the standard Divide &
Conquer recursion.
• The “middle” subproblem is not, so we will need to count its
cost as part of the “combine” (or “divide”) cost.
– The crucial observation (and it may not be entirely obvious) is that we
can find the maximum crossing subarray in time linear in the length
of the A[low…high] subarray.
How? A[i,…,j] must be made up of A[i…mid] and A[m+1…j] – so
we find the largest A[i…mid] and the largest A[m+1…j] and
combine them.
The middle subproblem
Algorithms: Max Subarray
Max Subarray: Algorithm Efficiency
We finally have:
1
if n  1,
T n   
2T n /2  n  if n  1.
The recurrence has the same form as that for
MERGESORT, and thus we should expect it to have
the same
solution T(n) = (n lg n).
This algorithm is clearly substantially faster than any of
the brute-force methods. It required some
cleverness, and the programming is a little more
complicated – but the payoff is large.
Multiplication of Large Integers
Consider the problem of multiplying two (large)
n-digit integers represented by arrays of their
digits such as:
A = 12345678901357986429
B = 87654321284820912836
Multiplication of Large Integers
Consider the problem of multiplying two (large) n-digit
integers represented by arrays of their digits such as:
A = 12345678901357986429 B = 87654321284820912836
The grade-school algorithm:
a1 a 2 … an
b1 b 2 … bn
(d10) d11d12 … d1n
(d20) d21d22 … d2n
…………………
(dn0) dn1dn2 … dnn
Efficiency: n2 one-digit multiplications
Multiplication of Large Integers
Consider the problem of multiplying two (large)
n-digit integers represented by arrays of their
digits such as:
A = 12345678901357986429
B = 87654321284820912836
Discussion: How to apply “divide-and-conquer”
to this problem?
First Cut
A small example: A  B where A = 2135 and B = 4014
A = (21·102 + 35), B = (40 ·102 + 14)
So, A  B = (21 ·102 + 35)  (40 ·102 + 14)
= 21  40 ·104 + (21  14 + 35  40) ·102 + 35  14
In general, if A = A1A2 and B = B1B2 (where A and B are
n-digit, A1, A2, B1, B2 are n/2-digit numbers),
A  B = A1  B1·10n + (A1  B2 + A2  B1) ·10n/2 + A2  B2
Recurrence for the number of one-digit multiplications
T(n):
T(n) = 4T(n/2), T(1) = 1
Solution: T(n) = n2
Second Cut
A  B = A1  B1·10n + (A1  B2 + A2  B1) ·10n/2 + A2  B2
The idea is to decrease the number of multiplications from 4
to 3:
(A1 + A2 )  (B1 + B2 ) = A1  B1 + (A1  B2 + A2  B1) + A2  B2,
i.e., (A1  B2 + A2  B1) = (A1 + A2 )  (B1 + B2 ) - A1  B1 - A2  B2,
which requires only 3 multiplications at the expense of (4-1)
extra add/sub.
Recurrence for the number of multiplications T(n):
T(n) = 3T(n/2), T(1) = 1
Solution: T(n) = 3log 2n = nlog 23 ≈ n1.585
Integer Multiplication
• To multiply two n-digit integers:
– Add two ½n digit integers.
– Multiply three ½n-digit integers.
– Add, subtract, and shift ½n-digit integers to obtain result.
x  2 n / 2  x1
y  2 n / 2  y1
xy  2 n  x1 y1
 2 n  x1 y1




x0
y0
2 n / 2  x1 y0  x0 y1   x0 y0
2 n / 2   (x1  x0 ) (y1  y0 )  x1 y1  x0 y0   x0 y0
A
B
A
C
C

T(n)  T  n /2   T  n /2   T 1 n /2  
recursive calls
 T(n)  O(n
44
log 2 3
)  O(n1.585 )
(n)
add, subtract, shift
Large-Integer Multiplication
• Questions:
• What if two large numbers have different number
of digits?
• What if n is an odd number?
Closest-Pair Problem
• S is a set of n points Pi=(xi, yi) in the plane
• For simplicity, n is a power of two
• Without loss of generality, we assume points
are ordered by their x coordinates
• Discussion: How to apply divide-and-conquer?
Closest-Pair Problem by Divide-and-Conquer
Step 1
Divide the points given into two subsets S1 and S2 by a
vertical line x = c so that half the points lie to the left or on the
line and half the points lie to the right or on the line.
Closest Pair by Divide-and-Conquer (cont.)
Step 2 Find recursively the closest pairs for the left and right
subsets.
Step 3 Set d = min{d1, d2}
We can limit our attention to the points in the symmetric
vertical strip of width 2d as possible closest pair. Let C1
and C2 be the subsets of points in the left subset S1 and of
the right subset S2, respectively, that lie in this vertical
strip. The points in C1 and C2 are stored in increasing
order of their y coordinates, which is maintained by
merging during the execution of the next step.
Step 4 For every point P(x,y) in C1, we inspect points in
C2 that may be closer to P than d. There can be no more
than 6 such points (because d ≤ d2)!
Closest Pair by Divide-and-Conquer: Worst Case
The worst case scenario is depicted below:
Efficiency of the Closest-Pair Algorithm
Running time of the algorithm is described by
T(n) = 2T(n/2) + f(n), where f(n)  O(n)
By the Master Theorem (with a = 2, b = 2, d = 1)
T(n)  O(n log n)