Transcript Lecture 8

Lecture 8
Sorting
Sorting (Chapter 7)
We have a list of real numbers.
Need to sort the real numbers in increasing order
(smallest first).
Important points to remember:
We have O(N2) algorithms
O(NlogN) algorithms (heap sort)
In general, any algorithm for sorting must be (NlogN)
(will prove it)
In special cases, we can have O(N) sorting algorithms.
Insertion Sort
Section 7.2 (Weiss)
Initially P = 1
Let the first P elements be sorted.
(3)Then the P+1 th element is inserted properly in
the list so that now P+1 elements are sorted.
Now increment P and go to step (3)
P+1th element is inserted as follows:
Store the P+1 th element first as some temporary variable,
temp;
If Pth element greater than temp, then P+1th element is set
equal to the Pth one,
If P-1 th element greater than temp, then Pth element is set
equal to temp…..
Continue like this till some kth element is less than or equal
to temp, or you reach the first position.
Let this stop at kth position (k can be 1). Set the kth element
equal to temp.
Extended Example
Need to sort:
34 8 64 51 32 21
P = 1;
Looking at first element only, and we do not change.
P = 2;
Temp = 8;
34 > Temp, so second element is set to 34.
We have reached the end of the list. We stop there. Thus,
first position is set equal to Temp;
After second pass;
8 34 64 51 32 21
Now, the first two elements are sorted.
Set P = 3.
Note down the rest from the board.
Pseudo Code
Assume that the list is stored in an array, A (can do with a
linked list as well)
Insertion Sort(A[],int N)
{
for (P = 1; P < N; P++)
{
Temp = A[P];
for (j = P; j > 0 and A[j-1] > Temp; j--) A[j] = A[j-1];
A[j] = Temp;
} }
Complexity Analysis
Inner loop is executed P times, for each P
Because of outer loop, P changes from 1 to N.
Thus overall we have 1 + 2 +…..N.
This is O(N2)
Space requirement: O(N)
Heap Sort
Section 7.5
Complexity: O(NlogN)
Suppose we need to store the sorted elements, in this case
we need an additional array. Total space is 2N. This is O(N),
but can we do with a space of N positions only?
Whenever we do a deletemin , the heapsize shrinks by one.
We can store the elements in the additional spaces.
We have a heap of size N.
After the first deletemin, the last element is at position N-1.
Store the retrieved element at position N
After the second deletemin, the last element is at position N2, so the retrieved element is stored at N-1, and so on.
Merge Sort
Divide and Conquer
Divide the list into two smaller lists:
Sort the smaller lists.
Merge the sorted lists to get an overall sorted list.
How do we merge 2 sorted lists?
Consider 2 sorted arrays A and B
We will merge them into a sorted array C.
Have three variables, posA, posB, posC
Initially, both of these are at the respective first positions
(3)Compare the elements at posA and posB
Whichever is smaller, goes into the posC position of C,
and the position is advanced in the corresponding array.
posC is also advanced (e.g., if element of A goes into C,
then posA is advanced).
Go to step (3)
After all the elements are exhausted in one array, the
remaining of the other are copied in C.
Suppose we want to merge
1 13 24 26
2 15 27 38
Copy the example from the board.
Complexity of the merging?
Pseudocode for merging
Merge (A, B, C){
posA = 0;
posB=0;
posC=0;
While (posA < sizeA)&& (posB < sizeB)
{if (A[posA] <B[posB])
{ C[posC]=A[posA];
posA = posA + 1;}
Else
{C[posC] = B[posB];
posB = posB + 1;}
posC = posC + 1;}
While (posB < sizeB)
{C[posC] =B[posB];
Increment posC and posB;}
While (posA < sizeA)
{C[posC] =A[posA];
Increment posC and posA;}
}
Overall approach
Divide an array into 2 parts
Recursively sort the 2 parts (redivide, recursively sort,
sorting one element is easy)
Merge the 2 parts
Mergesort(A,left,right)
{ if (left right) break; center = (left + right)/2;
Mergesort(A,left,center);
Mergesort(A,center+1,right);
B = left half of A; C = right half of A;
Merge(B,C,D); Set left to right of A = D;}
Note down the extended example from the board;
T(n) = 2T(n/2) + cn
T(1) = 1;
Using master theorem, we get T(n) as O(nlogn)
Any problem with merge sort?
Quick Sort
Storage: O(n)
Section 7.7
We will choose a pivot element in the list.
Partition the array into two parts.
One contains elements smaller than this pivot,
another contains elements larger than this pivot.
Recursively sort and merge the two partitions.
How do we merge?
We do not need any additional space for
merging. Thus storage is O(n)
How can the approach become bad?
Copy the example from the board.
We would like the partitions to be as equal as possible.
This depends on the choice of the pivot.
Equal partitions are assured if we can choose our pivot as
the element which is in the middle of the sorted list.
It is not easy to find that.
Pivots are chosen randomly.
In worst case, pivot choice can make one partition have
n-1 elements, another partition 1 element.
On an average this will not happen.
Pivot can be chosen as a random element in the list.
Or choose the first, the last and the center and take the
median of the three (middle position).
We will do the latter.
We discuss the partitioning now.
Interchange the pivot with the last element
Have a pointer at the first element (P1), and one at the
second last element (P2).
Move P1 to the right skipping elements which are less
than the pivot.
Move P2 to the left skipping elements which are more
than the pivot.
Stop P1 when we encounter an element greater than or
equal to the pivot.
Stop P2 when we encounter an element lesser than or
equal to the pivot.
Interchange the elements pointed to by P1 and P2.
If P1 is right of P2, stop, otherwise move P1 and P2 as
before till we stop again.
When we stop, swap P1 with the last element
which is the pivot
Note down the example from the board.
8 1 4 9 6 3 5 2 7 0
At any time can you say anything about the
elements to the left of P1?
Also, for right of P2?
Elements to the left of P1 are less than or equal to
the pivot.
Elements to the right of P1 are greater than or equal
to the pivot.
When P1 and P2 cross, what can you say about the
elements in between P1 and P2?
They are all equal to the pivot.
Suppose P1 and P2 have crossed, and stopped and
the pivot is interchanged with P1.
How do we form the partition?
Everything including P1 and its right are in one
partition (greater).
Remaining are in the left partition.
We can also do some local optimizations in length.
Complexity?
Space?
Procedure Summary
Partition the array
Sort the partition recursively.
Note down the example from the board.
8 1 4 9 6 3 5 2 7 0
Pseudocode
Quicksort(A, left, right)
{ Find pivot;
Interchange pivot and A[right];
P1 = left; P2 = right – 1;
Partition(A, P1, P2, pivot); /*returns newP1*/
Interchange A[newP1] and A[right];
Quicksort(A, left, newP1-1);
Quicksort(A, newP1, right); }
Partition(A, P1, P2,pivot)
{
While (P1  P2)
{
While (A[P1] < pivot) increment P1;
While (A[P2] > pivot) decrement P2;
Swap A[P1] and A[P2];
increment P1; decrement P2;
}
newP1 = P1;
return(newP1);
}
Worst Case Analysis
T(n) = T(n1) + T(n2) + cn
T(1) = 1;
n1 + n2 = n
In good case, n1 = n2 = n/2 always
Thus T(n) = O(nlogn)
In bad case, n1 = 1 n2 = n-1 always
T(n) = pn + T(n-1)
= pn + p(n-1) + T(n-2)
……..
= p(n + n-1 +……+1)
= pn2
Thus T(n) is O(n2 ) in worst case.
Average case complexity is O(nlog n)
Quicksort performs well for large inputs, but not so good
for small inputs.
When the divisions become small, we can use insertion
sort to sort the small divisions instead.
General Lower Bound For
Sorting
Section 7.9
Suppose a list of size k must be sorted.
How many orderings can we have for k members?
Depending on the values of the n numbers, we can
have n! possible orders, e.g., the sorted output for
a,b,c can be a,b,c, b,a,c, a,c,b, c,a,b, c,b,a, b,c,a
Any comparison based sorting process can be represented
as a binary decision tree.
A node represents a comparison, and a branch represents
the outcome of a comparison.
The possible orders must be the leaves of the decision
tree, i.e., depending on certain orders we will have
certain comparison sequences, and we will reach a
certain leaf.
Note down the example.
Different sorting algorithms will have different
comparison orders, but the leaves are same for all
sorting trees.
Thus any sorting algorithm for sorting n inputs can be
represented as a binary decision tree of n! leaves.
Such a tree has depth at least log n!
This means that any comparison based sorting algorithm need
to perform at least log n! Comparisons in the worst case.
log n! = log n + log (n-1) +…..+log(2) + log(1)
log n + log (n-1) +……+ log (n/2) + ….
(n/2)log (n/2)
= (n/2)log (n) – (n/2) = (nlog n)
Special Case Sorting
Now we will present some linear time sorting algorithms.
These apply only when the input has a special structure, e.g.,
inputs are integers.
Counting sort
Radix sort
Bucket sort
Please follow class notes
Counting Sort
Suppose we know that the list to be sorted consists of
n integers in the range 1 to M
We will declare an array B with M positions.
Initially all positions of B contain 0.
Scan through list A
A[j] is inserted in B[A[j]]
B is scanned once again, and the nonzero elements are read out.
Note down example from the board.
What do we do with equal elements?
B is an array of pointers.
Each position in the array has 2 pointers, head and tail. Tail
points to the end of a linked list, and head points to the
beginning.
A[j] is inserted at the end of the list B[A[j]]
Again, Array B is sequentially traversed and each nonempty list
is printed out.
Complexity?
O(n + M)
If M is O(n), then complexity is O(n)
Storage?
O(n+M)
Storage could be large for large ranges.
Supposing we have a list of elements, such that every
element has 2 fields, one an integer, and another field
something else.
During sorting we just look at the integer field.
Supposing element a occurs before element b in the
input list, and a and b have the same integer
components, Can their positions be reversed in the
output if counting sort is used?
Stability Property
Relative position of 2 equal elements remain the same in the
output as in the input.
Is merge sort stable?
Quick sort?
Insertion sort?
Stability property of counting sort will be used in designing
a new sorting algorithm, radix sort.
Radix Sort
Every integer is represented by at most d digits
(e.g., decimal representation)
Every digit has k values (k = 10 for decimal, k = 2 for
binary)
There are n integers
We would sort the integers by their least significant digits
first using counting sort.
Next sort it by its second least significant digit and so on.
Note down example from the board
Should it work?
Note that if the most significant digit of one number a is
less than that of another number b, then a comes before b.
However, if the most significant digits are the same
for a and b, and the difference is in the second most
significant digits ( second most significant digit of a
is less than that of b), then a comes before b.
Why?
Remember radix sort.
Complexity?
Complexity Analysis
There are d counting sorts.
Each counting sort works on n numbers with the range
being 0 to k.
Complexity of each counting sort is O(n + k)
Overall O(d(n+k))
Storage?
O(n + k)
If just counting sort were used for these numbers, what
is the worst case complexity?
O(kd + n) for storage and running time
Bucket Sort
We have sorted integers in worst case linear time.
Can we do anything similar for real numbers?
Real numbers can be sorted in average case linear
complexity under a certain assumption.
assuming that the real numbers are uniformly
distributed
if we know that the real numbers belong to a
certain range, then uniform distribution means any
two equal size subranges contain roughly the same
number of real numbers in the list.
Bucket sort can be used to sort such real numbers in
average case linear complexity
The idea is similar to direct sequence hashing and
counting sort.
We have a list of n real numbers.
We have an array of n pointers. Each position has a
pointer pointing to a linked list.
We have a function which generates an integer from
a real number, and adds the real number in the linked
list at the position.
The function is chosen such that the elements at the position j of
the array are less than those at position k, if j < k.
Thus we just sort the linked lists at every position.
Next we start from position 0, print out the sorted linked
list, go to position 1, print out the sorted linked (assuming
it is not empty) and so on.
The output is a sorted list.
The function must be such that the number of
elements in each linked list is roughly the same.
So if there are n real numbers and n positions in the
array, then each linked has roughly a constant
number c.
Thus sorting complexity for a linked list is a
constant.
What is the printing complexity?
Thus overall complexity is O(n)
How do we find such a function?
We can if the numbers are uniformly distributed.
Suppose we know the numbers are in an interval of size M.
We divide the interval M into n equal size subintervals, and
name them 0, 1,….n-1.
Any real number in subinterval j is added to the list at
position j in the array.
Given any real number, the function actually finds the
number of the subinterval where it belongs.
Firstly, why should such a function put roughly
equal number of elements in each list?
Second, evaluation time for the function must be a
constant.
That is, the function must find the interval
number in constant time.
Will give an example function which applies for a
specific case, but in general such functions can be
designed for more general cases as well.
Suppose the real numbers are in the interval [0, 1].
The function nx gets the correct interval number for
any real number x in the interval [0, 1].
What is nx for any number x is the interval [0, 1/n) ?
any number x is the interval [1/n, 2/n) ?
any number x is the interval [2/n, 3/n) ?
……………………………..
Evaluation complexity for the function?
Thus we have divided the interval [0, 1] in equal
sized subintervals, and
We have a function which returns the subinterval
number of a real number in constant time.
Procedure Summary
We have a list of n real numbers.
Given any number x, we add it at the list at A[nx] .
We sort all of the individual linked lists.
We output the sorted linked list at A[0], then at A[1], and so
on.
The output is sorted.
On an average every linked list contains a constant
number of elements, (as a linked list contains elements
from a subinterval, and all subintervals are equal sized,
and hence they have equal number of elements because of
uniform distribution)
Thus sorting complexity is constant for each linked list.
Thus overall sorting and printing complexity is O(n).