Transcript Lecture 17

CSE 326: Data Structures:
Sorting
Lecture 17: Wednesday, Feb 19, 2003
1
Today
• Bucket sort (review)
• Radix sort
• Merge sort for external memory
2
Lower Bound
Recall:
• Any sorting algorithm based on comparisons
requires (n log n) time
– More precisely: there exists a “bad input” for that
algorithm, on which it takes (n log n)
– The theorem can be extended: the average running time
is (n log n)
• The next two algorithms (Bucket, Radix)
apparently break this theorem !
3
Bucket Sort
• Now let’s sort in O(N)
• Assume:
A[0], A[1], …, A[N-1] {0, 1, …, M-1}
M = not too big
• Example: sort 1,000,000 person records on
the first character of their last names:
– Hence M = 128 (in practice: M = 27)
4
Bucket Sort
int bucketSort(Array A, int N) {
for k = 0 to M-1
Q[k] = new Queue;
for j = 0 to N-1
Q[A[j]].enqueue(A[j]);
Result = new Queue;
for k = 0 to M-1
Result = Result.append(Q[k]);
Stable
sorting !
return Result;
}
5
Bucket Sort
• Running time: O(M+N)
• Space: O(M+N)
• Recall that M << N, hence time = O(N)
• What about the Theorem that says sorting takes
(N log N) ??
This is not based on
key comparisons, instead
exploits the fact that keys
are small
6
Radix Sort
• I still want to sort in time O(N): non-trivial keys
• A[0], A[1], …, A[N-1] are strings
– Very common in practice
• Each string is:
cd-1cd-2…c1c0,
where c0, c1, …, cd-1 {0, 1, …, M-1}
M = 128
• Other example: decimal numbers
7
RadixSort
• Radix = “The base of a
number system”
(Webster’s dictionary)
– alternate terminology: radix is
number of bits needed to represent
0 to base-1; can say “base 8” or
“radix 3”
• Used in 1890 U.S.
census by Hollerith
• Idea: BucketSort on
each digit, bottom up.
8
The Magic of RadixSort
• Input list:
126, 328, 636, 341, 416, 131, 328
• BucketSort on lower digit:
341, 131, 126, 636, 416, 328, 328
• BucketSort result on next-higher digit:
416, 126, 328, 328, 131, 636, 341
• BucketSort that result on highest digit:
126, 131, 328, 328, 341, 416, 636
9
Inductive Proof that RadixSort Works
• Keys: d-digit numbers, base B
– (that wasn’t hard!)
• Claim: after ith BucketSort, least significant i digits
are sorted.
– Base case: i=0. 0 digits are sorted.
– Inductive step: Assume for i, prove for i+1.
Consider two numbers: X, Y. Say Xi is ith digit of X:
• Xi+1 < Yi+1 then i+1th BucketSort will put them in order
• Xi+1 > Yi+1 , same thing
• Xi+1 = Yi+1 , order depends on last i digits. Induction
hypothesis says already sorted for these digits because
BucketSort is stable
10
Radix Sort
int radixSort(Array A, int N) {
for k = 0 to d-1
A = bucketSort(A, on position k)
}
Running time: T = O(d(M+N)) = O(dN) = O(Size)
11
Radix Sort
A=
35
Q[0]
Q[1]
52
A=
Q[0]
53
Q[2]
Q[3]
32
52
32
Q[1]
25
33
Q[4]
53
53
Q[2]
25
A=
55
Q[3]
32
32
Q[5]
33
Q[4]
25
Q[7]
35
35
Q[5]
35
35
32
Q[6]
33
33
33
52
55
Q[6]
55
53
Q[9]
25
25
Q[7]
52
52
Q[8]
Q[8]
53
55
Q[9]
55
12
Running time of Radixsort
• N items, d digit keys of max value M
• How many passes?
• How much work per pass?
• Total time?
13
Running time of Radixsort
• N items, d digit keys of max value M
• How many passes?
d
• How much work per pass? N + M
– just in case M>N, need to account for time to empty out
buckets between passes
• Total time?
O( d(N+M) )
14
Radix Sort
• What is the size of the input ? Size = dN
cd-1
A[0] ‘S’
A[1] ‘J’
…
A[N-1]
cD-2
‘m’
‘o’
…
‘i’
‘n’
‘t’
‘e’
c0
‘h’
‘s’
• Radix sort takes time O(Size) !!
15
Radix Sort
• Variable length strings:
A[0]
A[1]
A[2]
A[3]
A[4]
• Can adapt Radix Sort to sort in time O(Size) !
– What about our Theorem ??
16
Radix Sort
• Suppose we want to sort N distinct numbers
• Represent them in decimal:
– Need d=log N digits
• Hence RadixSort takes time
O(Size) = O(dN) = O(N log N)
• The total Size of N keys is O(N log N) !
• No conflict with theory 
17
Sorting HUGE Data Sets
• US Telephone Directory:
– 300,000,000 records
• 64-bytes per record
– Name: 32 characters
– Address: 54 characters
– Telephone number: 10 characters
– About 2 gigabytes of data
– Sort this on a machine with 128 MB RAM…
• Other examples?
18
Merge Sort Good for Something!
• Basis for most external sorting routines
• Can sort any number of records using a tiny
amount of main memory
– in extreme case, only need to keep 2 records in
memory at any one time!
19
External MergeSort
• Split input into two “tapes” (or areas of disk)
• Merge tapes so that each group of 2 records is
sorted
• Split again
• Merge tapes so that each group of 4 records is
sorted
• Repeat until data entirely sorted
log N passes
20
Sorting
• Illustrates the difference in algorithm design
when your data is not in main memory:
– Problem: sort 8Gb of data with 8Mb of RAM
• We know we can do it in O(n log n) time, but let’s
see the number of disk I/O’s
21
2-Way Merge-sort:
Requires 3 Buffers
• Pass 1: Read a page, sort it, write it.
– only one buffer page is used
• Pass 2, 3, …, etc.:
Buffer size = 8Kb (typically)
– three buffer pages used.
INPUT 1
OUTPUT
INPUT 2
Disk
Main memory
buffers
Disk
22
2-Way Merge-sort
• A run = a sequence of sorted elements
2
40 649 66
 run 
70
 run 
80
75
65


40
50
 run 
• Main property of 2-way merge:
– If the minimum run length is L, then after merge the
minimum run length is 2L
•
•
•
•
Initially: minimum run length = 1
After one pass
=2
After two passes
=4
...
(why ?)
23
Two-Way External Merge Sort
• Each pass we read + write
each page in file.
• N pages in the file => the
number of passes
  log2 N   1
6,2
9,4
8,7
5,6
3,1
2
3,4
2,6
4,9
7,8
5,6
1,3
2
4,7
8,9
2,3
4,6
1,3
5,6
Input file
PASS 0
1-page runs
PASS 1
2
2-page runs
PASS 2
2,3
• So total cost is:

3,4

2 N log 2 N   1
• Improvement: start with
larger runs
• Sort 1GB with 1MB memory
in 10 passes
4,4
6,7
8,9
1,2
3,5
6
4-page runs
PASS 3
1,2
2,3
3,4
4,5
6,6
7,8
9
8-page runs
24
2-Way Merge-sort
• Hence we need exactly log N passes through the
entire data
• How much is N ?
N  106 (why ?)
• Hence we need to read and write the entire 8GB
data log(106) = 20 times !
• It takes about 1minute to read 1GB of data
– even more
• It takes at least160 minutes to sort = 3 hours
25
2-Way Merge-sort: less dumb
• Use the 8Mb of main memory better !
• Initial step: Run formation
– Read 8Mb of data in main memory
– Sort
(what algorithm would you use ?)
– Write to disk
• Now the runs are 8Mb after one pass !
• After subsequent passes the runs are
– 2  8Mb, 4  8Mb, . . .
• Need log(8Gb/8Mb) = log(103) = 10 passes
• 1.5h instead of 3h
have time to see a movie
26
Can We Do Better ?
• We have more main memory
• Used it during all passes
• Multiway merge:
• Given M sorted sequence
• Merge them
27
Multiway Merge
At each step:
select the smallest
value among the M,
store in the output
m
What data
structure should
we use here ?
1
2
2
3
2
4
32
69
94
...
1
9
27
60
80
...
2
3
4
5
94
...
10
20
30
40
50
...
4
4
...
28
Multiway Merge-Sort
• Phase one: load M bytes in memory, sort
– Result: runs of length M/B blocks
...
Disk
M/B blocks
M bytes of main memory
B = size of one block = 8Kb typically
...
Disk
29
Pass Two
• Merge m = M/B runs into a new run
• Runs have M/B (M/B – 1)  (M/B)2 blocks
Input 1
...
Input 2
....
Output
...
Input M/B
Disk
M bytes of main memory
Disk
30
Pass Three
• Merge M/B – 1 runs into a new run
• Runs have now M/R (M/B – 1)2  (M/B)3 blocks
Input 1
...
Input 2
....
Output
...
Input M/B
Disk
M bytes of main memory
Disk
31
Multiway Merge-Sort
• Input file has N bytes
• Need logM/B (N/B) = (log(N/B)) / (log(M/B)) complete
passes over the file
•
•
•
•
N/B = 8Gb / 8Kb = 106
M/B = 8Mb / 8Kb = 103
Hence need log(106)/log(103) = 2 passes !
2 minutes !
Time for two movies
32
Multiway Merge-Sort
• With today’s main memories, we can sort
almost any file in two passes
• The file can have (M/B)2 blocks
• XML Toolkit:
– xsort sorts using multiway merge, in two passes
33