Sorting - Ken Cosh

Download Report

Transcript Sorting - Ken Cosh

Data Structures and
Algorithms
Dr. Ken Cosh
Sorting
Review
• Recursion
– Tail Recursion
– Non-tail Recursion
– Indirect Recursion
– Nested Recursion
– Excessive Recursion
This week
• Sorting Algorithms
– The efficiency of many algorithms depends on
the way data is stored.
– Most data structures can be improved by
ordering the data structure in some way.
– This week we will investigate different
approaches to sorting data.
Sorting Criteria
• The first step is to decide what to sort by;
– numerical
– alphabetical
– ASCII codes
Algorithm Measures
• Best Case
– Often when data already sorted
• Worst Case
– Data completely disorganised, or in reverse order.
• Average Case
– Random order
• Some sorting algorithms are the same for all
three cases, others can be tailored to suit certain
cases.
Sorting Algorithms
• Sorting Algorithms are generally
comprised of ‘comparisons’ and
‘movements’.
• If the data being sorted is complicated
(strings for example), then the comparison
part can be more expensive.
• If the data items are large, then the
movement part can be more expensive.
Elementary Sorting Algorithms
• First lets investigate 3 elementary sorting
algorithms;
– Insertion Sort
– Selection Sort
– Bubble Sort
Insertion Sort
• Insertion sort starts with the first two
elements, and compares them;
– a[0] < a[1] ?
• If necessary it will swap them.
• Next the 3rd element is compared and
positioned, and then the 4th etc.
Insertion Sort
template<class T>
void insertionsort(T data[], int n) {
for (int i = 1, j; i < n; i++) {
T tmp = data[i];
for (j=i; j<0 && tmp < data[j-1]; j--)
data[j] = data[j-1];
data[j] = tmp;
}
}
Insertion Sort
• The insertion method only sorts the part of the
array that needs to be sorted – if the array is
already sorted, less processing is required.
– Best Case when data is already in order, 1
comparison per position (n-1), or O(n).
– Worst Case is when data is in reverse order and ever,
in which case comparisons are required for both the
outer and the inner loop, O(n2)
– How about Average Case? The result should be
somewhere between O(n) and O(n2), but which is it
closer to?
Insertion Sort – Average Case
• The number of iterations of the outer loop
depends on how far the number is away from
where it should be;
– it could loop 0 times if it is where it should be, or i
times if it needs to be inserted at the start.
– Therefore on average it should be;
0+1+2+…+i / i = (i+1)/2
– This is the average per case, which needs to be
multiplied by the number of numbers (n), leaving
O(n2), sadly closer to the worst case than best.
Selection Sort
• Selection Sort takes each value and
moves it to its final place.
– i.e. it first searches for the smallest value and
moves it to the start, then the next smallest
and moves that to position 2.
Selection Sort
template<class T>
void selectionsort(T data[], int n) {
for (int i = 0,j,least; i<n-1; i++) {
for (j = i+1, least = i; j < n; j++)
if (data[j] < data[least])
least = j;
swap(data[least], data[i]);
}
}
• Once again this algorithm relies on a double loop, so
O(n2)
BubbleSort
• Imagine a glass of beer…
• The bubbles slowly rise to the surface.
• Now imagine those bubbles are the data
to be sorted, with the lowest values rising
to the surface.
• That’s how Bubblesort works!
BubbleSort
• Starting with the other end of the dataset, a[n-1],
each value is compared with its predecessor,
a[n-2], and if it is less it is swapped.
• Next a[n-2] is compared with a[n-3], and if
necessary swapped, and so on.
• Therefore at the end of the first loop through the
array, the smallest value is in position a[0], but
many of the other values have been bubbled
towards their final location.
• The next loop through will move values closer
still.
BubbleSort
template<class T>
void bubblesort(T data[], int n) {
for (int i=0; i<n-1; i++)
for (int j=n-1; j>I; --j)
if (data[j] < data[j-1])
swap(data[j], data[j-1]);
}
BubbleSort
• With Bubble Sort there are the same
number of comparisons in every case
(best, worst and average)
• Sadly that number of comparisons is;
(n(n-1))/2, or O(n2)
Cocktail Sort
• A variation of the BubbleSort is called
Cocktail Sort.
• Here the bubbles move in both directions
alternatingly. As though a cocktail was
being shaken. First lower values are
bubbled towards the start, then higher
values are bubbled towards the end.
Elementary Sorting Algorithms
• Sorry… but O(n2) just isn’t good enough
anymore!
Efficient Sorting Algorithms
•
•
•
•
•
Shell Sort
Heap Sort
Quick Sort
Merge Sort
Radix Sort
ShellSort
• One problem with the basic sorting
algorithms is that their runtime grows
faster than the size of the array.
• Therefore one way to solve this problem is
to perform the sort on smaller / limited
subarray parts of the array.
• ShellSort (named after Donald Shell),
breaks the array into subarrays, sorts
them, and then sorts the whole array.
ShellSort
• Shellsort is actually slightly more complex than
that, because decisions need to be made about
the size of each subarray.
• Normally the Sort is performed on a decreasing
number of sub arrays, with the final sort being
performed on the whole array (the sub arrays
growing in size each Sort).
• Note that the actual Sort part can use any of the
Elementary Sorting Algorithms already
discussed.
Shell Sort Sample
• Original Data Set
10, 15, 2, 9, 12, 4, 19, 5, 8, 0, 11
• 5 subarray Sort
10, 4, 11
15, 19
2, 5
9, 8
12, 0





4,10,11
15, 19
2, 5
8, 9
0, 12
• After 5 subarray Sort
4, 15, 2, 8, 0, 10, 19, 5, 9, 12, 11
Shell Sort Sample cont.
4, 15, 2, 8, 0, 10, 19, 5, 9, 12, 11
• 3 subarray Sort
4, 8, 19, 12

4, 8, 12, 19
15, 0, 5, 11

0, 5, 11, 15
2, 10, 9

2, 9, 10
• After 3 subarray Sort
4, 0, 2, 8, 5, 9, 12, 11, 10, 19, 15
• Final Sort
0, 2, 4, 5, 8, 9, 10, 11, 12, 15, 19
Shell Sort
• In the Sample, I arbitrarily chose sub array
sizes of 1, 3 and 5.
• In reality that increment isn’t ideal, but
then it isn’t clear what increment is.
• From empirical studies, a suggested
increment follows this pattern;
– h1 = 1
– hi+1 = 3hi+1
– until ht+2 ≥n
• What sequence will this give?
Shell Sort
• Shell Sort is shown to be better than
O(n2), but how much depends on the
increments.
• With Shell Sort different Elementary
Sorting Algorithms can affect the efficiency
of the ShellSort, but its not agreed on how.
• A favourite is surely the “Cocktail Shaker
Sort”, which uses iterations of the bubble
sort, finishing off with a insertion sort – as
the cherry is inserted into the cocktail!
Heap Sort
• To investigate how Heap Sort works, we
need to investigate a Heap further.
• A heap is like a binary tree, with 2 different
properties;
– First the value of each node is not less than
the values in its children (i.e. the highest value
is the root).
– Second the tree is perfectly balanced.
Heap Sort
• Heap Sort makes use of a heap, and is a
variation of Selection Sort.
• The weakness of selection Sort was the number
of comparisons required – because of the
improved searchability of a tree, heap sort
reduces the comparisons made during selection
sort.
• First a heap is created, then the root is moved to
the end of the dataset, before a new heap is
created.
• Therefore the complexity of the algorithm mirrors
the complexity of creating a heap.
Heap Sample
Remove root node,
and make new heap
from 2 heaps
15
12
11
1
6
10
8
2
3
Heap Sort
• Rather than moving the root to a new array,
heapsort uses the same memory space and
swaps the root value with the last leaf node (8).
• 8 is then be moved down the tree, swapping with
the larger child until it reaches its new base (i.e.
swap with 12 and then 11).
• The new root can then be swapped with the next
lowest node (1).
• And the process repeated, which in the worst
case is O(n lg n).
QuickSort
• Quicksort uses a recursive approach, which is
similar to the approach we used when guessing
a number between 1 and 10.
• Quicksort shares with Shellsort the ideal of
breaking the problem down into subproblems
that are easier to solve.
• Quicksort divides the values into 2 subarrays –
those lower than value ‘x’ and those higher than
value ‘x’.
• The 2 subarrays can then be sorted – guess
what… using QuickSort.
QuickSort
• The problem remains which value to
chose as the pivot or bound value?
• Obviously if we choose a value too near to
either end of the dataset we will have an
unbalanced split.
• We could arbitrarily choose the first value,
but some arrays are already partly sorted,
therefore we could arbitrarily choose the
middle value.
Quick Sort
{10, 15, 2, 9, 12, 4, 19, 5, 8, 0, 11}
{2, 0}
4
{10, 15, 9, 12, 19, 5, 8, 11}
{0, 2}
4
{10, 9, 5, 8, 11} 12 {15, 19}
0, 2, 4, {}5{10, 9, 8, 11} 12 {15,19}
0, 2, 4, 5
{8}9{10, 11}
12, 15, 19
1, 2, 4, 5, 8, 9, 10, 11, 12, 15, 19
QuickSort Pseudocode
Quicksort(array[])
if length(array)>1
choose bound;
while there are elements in array
include element in either
subarray1 or subarray2;
quicksort(subarray1);
quicksort(subarray2);
QuickSort
• Best Case – when the array divides into even
size parts;
n + 2n/2 + 4n/4 + 8n/8 + … …+nn/n = O(n lg n)
• Worst Case – when one array is always size 1;
– O(n2)
• Average case is in between. Ideally if we can
avoid the worst case, but to do that we need to
pick a good pivot value;
– A better approach is to choose the median of the first,
last and middle values
MergeSort
• The problem with QuickSort is still picking
a good pivot value.
• MergeSort combats this by concentrating
on merging two already sorted subarrays.
• The merge process can be quite straight
forward assuming the two subarrays are
already sorted.
• We can sort the subarrays recursively
using MergeSort!
Merge Sort
Mergesort(data)
If data has at least 2 elements
Mergesort(left half);
Mergesort(right half);
merge(both halves);
merge
merge(array1, array2, array3)
int i1 = 0, i2 = 0, i3 = 0;
while both array2 and array 3 have elements
if array2[i2] < array3[i3]
array1[i1++] = array2[i2++];
else array1[i1++] = array3[i3++];
add the remains of either array2 or array3 into
array1;
MergeSort
{10, 15, 2, 9, 12, 4, 19, 5, 8, 0, 11}
{10, 15, 2, 9, 12, 4}
{19, 5, 8, 0, 11}
{10, 15, 2}{9, 12, 4}
{19, 5, 8}{0, 11}
{10,15}{2}{9,12}{4}
{19,5}{8}{0,11}
{2, 10, 15}{4, 9, 12}
{5, 8, 19}{0, 11}
{2, 4, 9, 10, 12, 15}
{0, 5, 8, 11, 19}
{0, 2, 4, 5, 8, 9, 10, 11, 12, 15, 19}
MergeSort
• The MergeSort algorithm is O(n lg n), but
has a major drawback due to the space
required to store multiple arrays (in the run
time stack).
• An iterative version while visually more
complex is a little more efficient space
wise.
• Also using a linked list rather than an array
can counter the space issue.
Radix Sorting
• Radix Sorting mirrors a real life approach to
sorting.
• Given a selection of books, we might first make
a pile of all the ‘A’ author books, all the ‘B’ author
books etc. Then we would sort each pile.
• Or, if given a series of 10 digit numbers, we
might first sort according to the first digit, and
then according to the second digit etc.
Comparing approaches
• Take a look at the table on page 511 to
see how the different sorting algorithms
compare.