Transcript Sorting
Sorting
HKOI Training Team (Advanced)
2006-01-21
What is sorting?
Given: A list of n elements: A1,A2,…,An
Re-arrange the elements to make them
follow a particular order, e.g.
Ascending Order: A1 ≤ A2 ≤ … ≤ An
Descending Order: A1 ≥ A2 ≥ … ≥ An
We will talk about sorting in ascending
order only
Why is sorting needed?
Some algorithms works only when data
is sorted
e.g. binary search
Better presentation of data
Often required by problem setters, to reduce
workload in judging
Why learn Sorting Algorithms?
C++ STL already provided a sort()
function
Unfortunately, no such implementation
for Pascal
This is a minor point, though
Why learn Sorting Algorithms?
Most importantly, OI problems does not
directly ask for sorting, but its solution
may be closely linked with sorting
algorithms
In most cases, C++ STL sort() is useless.
You still need to write your own “sort”
So… it is important to understand the
idea behind each algorithm, and also
their strengths and weaknesses
Some Sorting Algorithms…
Bubble Sort
Insertion Sort
Selection Sort
Shell Sort
Heap Sort
Merge Sort
Quick Sort
Counting Sort
Radix Sort
How many of
them do you
know?
Bubble, Insertion, Selection…
Simple, in terms of
Idea, and
Implementation
Unfortunately, they are inefficient
O(n2) – not good if N is large
Algorithms being taught today are far
more efficient than these
Shell Sort
Named after its inventor, Donald Shell
Observation: Insertion Sort is very
efficient when
n is small
when the list is almost sorted
Shell Sort
2
1
4
7
4
4
8
3
6
4
7
7
Divide the list into k non-contiguous segments
Elements in each segments are k-elements
apart
In the beginning, choose a large k so that all
segments contain a few elements (e.g. k=n/2)
Sort each segment with Insertion Sort
Shell Sort
2
1
4
4
4
8
3
6
7
7
Definition: A list is said to be “k-sorted” when
A[i] ≤ A[i+k] for 1 ≤ i ≤ n-k
Now the list is 5-sorted
Shell Sort
2
1
≥2
4
7
4
8
3
6
4
7
≥1
InsertInsert
After each pass, reduces k (e.g. by half)
Although the number of elements in each
segments increased, the segments are usually
mostly sorted
Sort each segments with Insertion Sort again
Shell Sort
2
1
4
7
≥4
4
8
3
6
4
7
≥7Insert Insert
After each pass, reduces k (e.g. by half)
Although the number of elements in each
segments increased, the segments are usually
mostly sorted
Sort each segments with Insertion Sort again
Shell Sort
2
1
≥2
4
7
<4
4
8
<4
3
6
4
7
Insert
After each pass, reduces k (e.g. by half)
Although the number of elements in each
segments increased, the segments are usually
mostly sorted
Sort each segments with Insertion Sort again
Shell Sort
2
1
3
≥1
7
4
<7
8
4
<8
6
4
7
Insert
After each pass, reduces k (e.g. by half)
Although the number of elements in each
segments increased, the segments are usually
mostly sorted
Sort each segments with Insertion Sort again
Shell Sort
2
1
3
6
4
7
4
8
≥4
4
7
Insert
After each pass, reduces k (e.g. by half)
Although the number of elements in each
segments increased, the segments are usually
mostly sorted
Sort each segments with Insertion Sort again
Shell Sort
2
1
3
6
4
7
4
≥7
8
4
<8
7
Insert
After each pass, reduces k (e.g. by half)
Although the number of elements in each
segments increased, the segments are usually
mostly sorted
Sort each segments with Insertion Sort again
Shell Sort
2
1
1
2
3
6
4
4
7
4
4
6
7
4
7
8
Finally, k is reduced to 1
The list look like mostly sorted
Perform 1-sort, i.e. the ordinary Insertion Sort
Shell Sort – Worse than Ins. Sort?
In Shell Sort, we still have to perform an
Insertion Sort at last
A lot of operations are done before the
final Insertion Sort
Isn’t it worse than Insertion Sort?
Shell Sort – Worse than Ins. Sort?
The final Insertion Sort is more efficient
than before
All sorting operations before the final one
are done efficiently
k-sorts compare far-apart elements
Elements “moves” faster, reducing
amount of movement and comparison
Shell Sort – Increment Sequence
In our example, k starts with n/2, and half
its value in each pass, until it reaches 1,
i.e. {n/2, n/4, n/8, …, 1}
This is called the “Shell sequence”
In a good Increment Sequence, all
numbers should be relatively prime to
each other
Hibbard’s Sequence: {2m-1, 2m-1-1, …, 7,
3, 1}
Shell Sort – Analysis
Average Complexity: O(n1.5)
Worse case of Shell Sort with Shell
Sequence: O(n2)
When will it happen?
Heap Sort
In Selection Sort, we scan the entire list
to search for the maximum, which takes
O(n) time
Are there better way to get the
maximum?
With the help of a heap, we may reduce
the searching time to O(lg n)
Heap Sort – Build Heap
Create a Heap with the list
1.
2
8
5
7
1
2
4
8
7
5
1
4
Heap Sort
Pick the maximum, restore the heap property
2.
8
7
5
2
1
8
4
7
2
5
1
4
Heap Sort
Repeat step 2 until heap is empty
3.
7
4
5
2
1
7
8
4
2
5
1
Heap Sort
Repeat step 2 until heap is empty
3.
5
4
1
2
7
5
8
4
2
1
Heap Sort
Repeat step 2 until heap is empty
3.
4
2
1
5
7
4
8
2
1
Heap Sort
Repeat step 2 until heap is empty
3.
2
1
4
5
7
2
8
1
Heap Sort
Repeat step 2 until heap is empty
3.
1
2
4
5
7
8
1
Heap Sort – Analysis
Complexity: O(n lg n)
Not a stable sort
Difficult to implement
Merging
Given two sorted list, merge the list to
form a new sorted list
A naïve approach: Append the second
list to the first list, then sort them
Slow, takes O(n lg n) time
Are there any better way?
Merging
We make use of a property of sorted lists:
The first element is always the minimum
What does that imply?
An additional array is needed store
temporary merged list
Pick the smallest number from the uninserted numbers and append them to
the merged list
Merging
List A
1
3
7
List B
2
3
6
Temp
9
Merge Sort
Merge sort follows the divide-andconquer approach
Divide: Divide the n-element sequence into
two (n/2)-element subsequences
Conquer: Sort the two subsequences
recursively
Combine: Merge the two sorted
subsequence to produce the answer
Merge Sort
1. Divide the list into two
2
8
5
Merge Sort
5
8
7
1
1
4
4
7
Merge Sort
2. Call Merge Sort recursively to sort the two
subsequences
Merge Sort
3. Merge the list (to temporary array)
2
5
8
1
4
4. Move the elements back to the list
7
Merge Sort – Analysis
Complexity: O(n lg n)
Stable Sort
Not an “In-place” sort
What is a stable sort?
i.e. Additional memory required
Easy to implement, no knowledge of
other data structures needed
Stable Sort
What is a stable sort?
The name of a sorting algorithm
A sorting algorithm that has stable
performance over all distribution of
elements, i.e. Best ≈ Average ≈ Worse
A sorting algorithm that preserves the
original order of duplicated keys
Stable Sort
Original List
1
3a
5
3b
4
2
Stable Sort
1
2
3a
3b
4
5
Un-stable Sort
1
2
3b
3a
4
5
Stable Sort
Which sorting algorithms is/are stable?
Stable
Un-stable
Bubble Sort
Selection Sort
Insertion Sort
Shell Sort
Merge Sort
Heap Sort
Stable Sort
In our previous example, what is the
difference between 3a and 3b?
When will stable sort be more useful?
Sorting records
Multiple keys
Quick Sort
Quick Sort also uses the Divide-andConquer approach
Divide: Divide the list into two by partitioning
Conquer: Sort the two list by calling Quick
Sort recursively
Combine: Combine the two sorted list
Quick Sort – Partitioning
Given: A list and a “pivot” (usually an
element in the list)
< Pivot
Pivot
≥ Pivot
Re-arrange the elements so that
Elements on the left-hand side of “pivot” are
less than the pivot, and
Elements on the right-hand side of the
“pivot” are greater than or equal to the pivot
Quick Sort – Partitioning
e.g. Take the first element as pivot
4
Pivot
6
7
0
9
3
9
4
lo ≥ pivot?
≥ pivot?
< pivot?
< pivot?
< pivot?
< pivot?
hi < pivot?
Swap all pairs of elements that meets
the following criteria:
The left one is greater than or equal to pivot
The right one is smaller than pivot
Swap pivot with A[hi]
Quick Sort
After partitioning:
0
3
4
Quick Sort Pivot
7
4
9
6
6
7
9
Quick Sort
Apply Quick Sort on both lists
4
9
Quick Sort – Analysis
Complexity
Best: O(n lg n)
Worst: O(n2)
Average: O(n lg n)
When will the worst case happen?
How to avoid the worst case?
In-Place Sort
Not a stable sort
Counting Sort
Consider the following list of numbers
5, 4, 2, 1, 4, 3, 4, 2, 5, 1, 4, 5, 3, 2, 3, 5, 5
Range of numbers = [1,5]
We may count the occurrence of each
number
1
2
3
4
5
2
3
3
4
5
Counting Sort (1)
With the frequency table, we can
reconstruct the list in ascending order
1
2
3
4
5
2
3
3
4
5
1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5
Counting Sort (1)
Can we sort records with this counting
sort?
Is this sort stable?
Counting Sort (2)
An alternative way: use cumulative
frequency table and a temporary array
Given the following “records”
3
2
1
2
2
Frequency Table
Cumulative
1
2
3
1
3
4
2
6
3
Counting Sort (2)
3
2
1
2
2
3
1
2
3
1
0
2
3
4
1
6
5
4
Counting Sort – Analysis
Complexity: O(n+k), where k is the range
of numbers
Not an In-place sort
Stable Sort (Method 2)
Cannot be applied on data with wide
ranges
Radix Sort
Counting Sort requires a “frequency
table”
The size of frequency table depends on
the range of elements
If the range is large (e.g. 32-bit), it may
be infeasible, if not impossible, to create
such a table
Radix Sort
We may consider a integer as a “record
of digits”, each digit is a key
Significance of keys decrease from left to
right
e.g. the number 123 consists of 3 digits
Leftmost digit: 1 (Most significant)
Middle digit: 2
Rightmost digit: 3 (Least signficant)
Radix Sort
Now, the problem becomes a multi-key
record sorting problem
Sort the records on the least significant
key with a stable sort
Repeat with the 2nd least significant key,
3rd least significant key, and so on
Radix Sort
For all keys in these “records”, the range
is [0,9] Narrow range
We apply Counting Sort to do the sorting
here
Radix Sort
Original List
101 0 97 141 110 997 733
Radix Sort
Sort on the least significant digit
101 097 141 110 997 733
0
1
1
2
3
4
3
3
4
4
5
6
7
4
4
6
8
9
6
6
Radix Sort
Sort on the 2nd least significant digit
110 101 141 733 097 997
0
1
1
2
3
4
2
2
3
4
5
6
7
4
4
4
8
9
4
6
Radix Sort
Lastly, the most significant digit
101 110 733 141 097 997
0
1
1
2
3
4
4
4
4
4
5
6
7
4
4
5
8
9
5
6
Radix Sort – Analysis
Complexity: O(dn), where d is the
number of digits
Not an In-place Sort
Stable Sort
Can we run Radix Sort on
Real numbers?
String?
Choosing Sorting Algorithms
List Size
Data distribution
Data Type
Availability of Additional Memory
Cost of Swapping/Assignment
Choosing Sorting Algorithms
List Size
If N is small, any sorting algorithms will do
If N is large (e.g. ≥5000), O(n2) algorithms
may not finish its job within time limit
Data Distribution
If the list is mostly sorted, running QuickSort
with “first pivot” is extremely painful
Insertion Sort, on the other hand, is very
efficient in this situation
Choosing Sorting Algorithms
Data Type
It is difficult to apply Counting Sort and
Radix Sort on real numbers or any other
data types that cannot be converted to
integers
Availability of Additional Memory
Merge Sort, Counting Sort, Radix Sort
require additional memory
Choosing Sorting Algorithms
Cost of Swapping/Assignment
Moving large records may be very timeconsuming
Selection Sort takes at most (n-1) swap
operations
Swap pointers of records (i.e. swap the
records logically rather than physically)