Introduction - Ceng Anadolu

Download Report

Transcript Introduction - Ceng Anadolu

Today’s Material
• Lower Bounds on Comparison-based Sorting
• Linear-Time Sorting Algorithms
– Counting Sort
– Radix Sort
1
How Fast Can We Sort?
•
Basic Sorting Algorithms
–
•
Fast Sorting Algorithms
–
•
BubleSort, InsertionSort and SelectionSort all run
in O(N^2)
Mergesort, Heapsort and Quicksort all run in O(N
log N) best case running time
Can we do any better?
–
Can we come up with a sorting algorithm that will
sort in O(N log logN)? What about O(N)?
2
Lower Bound on Comparison-based Sorting
•
Recall our basic assumption: we can only
compare two elements at a time
–
–
Suppose you are given N elements
How many possible orderings can you get?
•
•
Example: a, b, c (N = 3)
How many distinct sequences exist?
•
N = 3: We have 6 orderings = 3•2•1 = 3!
•
•
Orderings:
1. a b c
2. b c a
3. c a b
4. a c b
5. b a c
6. c b a
For N elements:
–
 N! orderings
N choices N-1 choices N-2 choices 1 choice
3
A “Decision Tree” for Sorting N=3 Elements
a < b < c, b < c < a,
Possible
c < a < b, a < c < b,
Orderings
b < a < c, c < b < a
a < b
a > b
a<b<c
c<a<b
a<c<b
Remaining
Orderings
b<c<a
b<a<c
c<b<a
a < c
a > c
b < c
a<b<c
a<c<b
c<a<b
b<c<a
b<a<c
b > c
c<b<a
b < c
b > c
a < c
a > c
a<b<c
a<c<b
b<a<c
b<c<a
Leaves contain all possible orderings of a, b, c
4
Decision Trees and Sorting
•
•
•
•
A Decision Tree for Sorting is a Binary Tree such that:
–
–
–
Each node = a set of orderings
Each edge = 1 comparison
Each leaf = 1 unique ordering
How many leaves for N distinct elements?
–
Only 1 leaf has correct sorted ordering for given a, b, c
Each sorting algorithm corresponds to a decision tree
–
Finds correct leaf by following edges (= comparisons)
Run time >= maximum no. of comparisons
–
–
Depends on: depth of decision tree
What is the depth of a decision tree for N distinct elements?
5
Decision Trees and Sorting
•
Suppose you have a binary tree of depth d . How many
leaves can the tree have?
–
–
–
•
•
E.g. Depth = 1  at most 2 leaves
Depth = 2  at most 4 leaves, etc.
Depth = d  at most 2d leaves. Easy to prove.
Number of leaves L <= 2d  d >= log(L)
–
–
–
Decision tree has L = N! leaves
Depth d >= log(N!)
What is log(N!)?
•
log(N!) = log N + log(N-1) + … log(N/2) + … + log 1
>= log N + log(N-1) + … log(N/2) (N/2 terms only)
>= (N/2)•log(N/2) = (N log N)
Result: Any sorting algorithm based on comparisons
between elements requires  (N log N) comparisons
6
Using the Lower Bound for Sorting
•
Lower bound that we proved for sorting is in
fact used to prove lower bound for other
problems. Here is how:
–
–
Say you have a problem for which you want to prove
a lower bound
If we can reduce the sorting problem to the
problem to be solved, then we have essentially
proved that the lower bound for the new problem is
nlogn. Why?
•
Because if we can solve your problem in less time than
nlogn, then we can solve sorting in less time than nlogn.
And we have just proved that that is not possible!
7
Example (NlogN) Problems
•
Convex Hull
–
•
Closest Pair
–
•
Given a set of points in the plane, find the closest
convex polygon that encloses them
Given a list of n elements, which pair are closest in
value
Element Uniqueness
–
Given a list of n numbers, are there any duplicates
in the list
8
What about Counting Sort?
•
•
Problem: Sort integers in the range 1 to B
Algorithm:
1.
2.
3.
4.
•
Allocate an array Count having B slots (“buckets”)
Initialize: Count[i] = 0 for i = 1 to B
Given input integer i, Count[i]++
After reading all inputs, scan Count[i] for i = 1 to B
and print i if Count[i] is non-zero
Ex: Sort the following integers in the range 1
to 9
–
4 2 5 5 9 8 8 3 1 2
9
What if A holds records with integer key?
//
//
//
//
//
A[1..n]: Holds the initial input. A[I].key is the integer
key value on which to sort on. A[I].data is the other satellite data
B[1..n]: Holds the sorted output
Count[1..B]: Count[I] is the rank of I, that is, the number of
elements of A whose key value is less than or equal to i
CountingSort(A, B, n, B) {
for I=1 to B do Count[I] = 0;
for j=1 to n do Count[A[j].key]++;
for I=2 to B do Count[I] += Count[I-1];
for j=n downto 1 do {
I = A[j].key;
B[Count[I]] = A[j];
Count[I]--;
} //end-for
} //end-CountingSort
10
Counting Sort Example
A
1
2
3
4
5
1
a
4
e
3
r
1
s
3
v
1
2
3
4
1
2
3
4
2
0
2
1
2
2
4
5
1
2
3
4
2
2
4
5
2
2
3
5
1
2
3
5
1
2
2
5
1
2
2
4
0
2
2
4
1
2
3
4
5
3
v
B
B
1
s
B
1
s
3
r
3
v
B
1
s
3
r
3
v
4
e
1
s
3
r
3
v
4
e
B
1
a
3
v
11
Counting Sort Running Time
•
What is the running time for sorting N
integers using bucket sort?
–
–
–
–
–
Running Time: O(B+N)
B to zero/scan the array and N to read the input
If B is O(N), then running time for Bucket sort =
O(N)
Doesn’t this violate the  (N log N) lower bound
result?
No – When we do Count[i]++, we are comparing
one element with all B elements, not just two
elements
•
Not regular 2-way comparison-based sorting
12
Radix Sort: Stable Counting Sort
•
•
•
Problem: What if number of buckets needed is
too large?
Recall: Stable sort = a sort that does not
change order of items with same key
Radix sort = stable bucket sort on “slices” of
key
1. Divide integers/strings into digits/characters
2. Bucket-sort from least significant to most
significant digit/character
–
–
–
Uses linked lists
Stability ensures keys already sorted stay sorted
Takes O(P(B+N)) time where P = number of digits
13
Radix Sort Example
576
494
194
296
278
176
954
49[4]
19[4]
95[4]
57[6]
29[6]
17[6]
27[8]
9[5]4
5[7]6
1[7]6
2[7]8
4[9]4
1[9]4
2[9]6
[1]76
[1]94
[2]78
[2]96
[4]94
[5]76
[9]54
176
194
278
296
494
576
954
14
Radix Sort Running Time
•
P = # of digits iterations over the data set
•
Each iteration takes O(B+N) time
•
–
Total: O(P(B+N)) time where P = number of digits
E.g., N numbers in base 10 in the range
0-999999 (max. 6 digits)
–
–
Total: O(6*(10+N))
Total: O(60 + 6N)
15
Summary of Sorting
•
Sorting choices:
–
–
–
O(N2) – Bubblesort, Selection Sort, Insertion Sort
O(Nx) – Shellsort (x = 3/2, 4/3, 2, etc. depending on
incr. seq.)
O(N log N) average case running time:
•
•
•
–
Mergesort: easy to code but uses O(N) extra space
Heapsort: needs 2 comparisons to move data (between 2
children and parent) – may not be fast in practice
Quicksort: fastest in practice but trickier to code, O(N2)
worst case
O(P·N)
•
Radix sort (using Counting sort) for special cases where
keys are P digit integers/strings
16