Sorting Algorithms

Download Report

Transcript Sorting Algorithms

Data Structures and Algorithms
Rabie A. Ramadan
[email protected]
Part II
Data Structures and Algorithms
Algorithms and Programs
A mechanical procedure written in such a way that
human beings can understand it (pseudo code)
• A mechanical procedure written in such a way that a
computer can execute it
Given an algorithm, we are led to ask:
• What is it supposed to do?
• Does it really do what it is supposed to do?
• How efficiently does it do it?
Computing the Running Time
Algorithms Goals:
Easy to understand, code, and debug.
Makes efficient use of the computer's resources, especially, one
that runs as fast as possible.
The running time depends on factors such as:
the input to the program
The size of the input and its nature (e.g. array size to be sorted)
the quality of code generated by the compiler used to create the
object program,
the nature and speed of the instructions on the machine used to
execute the program, and
the time complexity of the algorithm underlying the program
Computing the Running Time
If the number of instructions is n
T(n) is the running time
Worst case  All of the instructions are executed
Best case  the minimum number of instructions are executed
The Average is the best way to measure the running time Tavg(n)
Big-Oh and Big-Omega Notation
Big O
• We say that T(n) is O(f(n)) if there are constants
c and n0 such that T(n) <= cf(n) whenever n >=
Defines the upper bound of an algorithm/program
Omega 
• T(n) is  (W(g(n))), means that there exists a positive constant c
such that T(n) >= cg(n) infinitely often (for an infinite number of
values of n).
Find the Complexity of the following equation:
Group Activity
Find the complexity
Procedure Calls
What is a procedure ?
Count all of the callings to this procedure
function fact ( n: integer ): integer;
fact(n) computes n!
if n <= 1 then(2)
fact := 1
fact := n * fact(n-1)
end; { fact }
Let T(n) be the running time for fact(n).
The running time for lines (1) and (2) is O(1),
for line (3) it is O(1) + T(n-1).
Basic Data Structures
Way of organizing information
Many different data structures
Different problems may require different data structures
Each data structure has unique properties that makes it well suited
to give a certain view of the data.
There are many different ways of creating the same data structure in
a computer.
Our Objectives
Show how data structures are represented in the
Identify linear and nonlinear data structures
Manipulate data structures with basic operations
Compare different implementations of the same data
Computer Memory
Every piece of data that is stored in a computer
is kept in a memory cell with a specific address.
• The computer can store many different types of
data in its memory.
Computer Memory (Cont.)
Storing the string 'apple' in the computer's memory, it
might look like this.
Storing a list of names might look like this
Computer Memory (Cont.)
Can we represent the following tree the same way?
It does not make sense. Right!!
Stack (Call Back)
last-in, first-out, or LIFO,
Insert  PUSH
Delete  POP
Queue (Call Back)
first-in, first-out, or FIFO,
Insert  ENQUEUE,
Delete  DEQUEUE
It has a head and tail
Queue (Call Back)
Priority queue
Abstract data type that supports the following
Add an element to the queue with an associated priority
Remove the element from the queue that has the highest priority, and
return it
O(1) to insert the element and O(n) to return an element
Linked Lists
objects are arranged in a linear order.
the order in a linked list is determined by a pointer in
each object
provide a simple, flexible representation for dynamic
Linked List
Could be single or double linked list
Could be circular list
Sorted or unsorted
Liked list operations
Searching  O(n)
Inserting  O(1)
O(n) if sorted
Binary tree
A list with two pointers  right and left
Complete Binary Tree
If the height is h, the number of nodes is 2h+1-1 .
The missing nodes could be only at the bottom of the tree.
Rooted trees with unbounded branching
The number of children is k
The number of children are not known a head
The space problem appears when creating the unbounded tree
The Solution:
left-child[x] points to the leftmost child of node x, and
right-sibling[x] points to the sibling of x immediately to the right.
A binary tree with the following properties:
• It is a complete binary tree; that is, each level of the tree is completely
filled, except possibly the bottom level. At this level, it is filled from left
to right.
It satisfies the heap-order property: The data item stored in each node is
greater than or equal to the data items stored in its children.
Heap Example
Not a Heap  Not complete
Not a Heap  complete but does not
satisfy the heap property
Heap Representation
As an Array
An array A that represents a heap is an array with two attributes
– length, the number of elements in the array
– heap-size, the number of heap elements stored in the array
• Viewed as a binary tree and as an array :
The root of the tree is
stored at A[0],
its left-child at A[1],
its right child at A[2] etc.
Heap Operations
The height of a node in a tree is the number of edges on the longest
simple downward path from the node to a leaf. (i.e. maximum depth
from that node
The height of an n-element heap based on a binary tree is
log (n)
The basic operations on heaps run in time at most proportional to the
height of the tree and thus take O(log( n)) time.
Maintaining the Heap Property
One of the more basic heap operations is converting a complete binary tree to a heap.
Such an operation is called “Heapify”.
Its inputs are an array A and an index i into the array.
When Heapify is called, it is assumed that the binary trees rooted at LeftChild(i) and
RightChild(i) are heaps, but that A[i] may be smaller than its children, thus violating
the 2nd heap property.
The function of Heapify is to let the value at A[i] “float down” in the heap so that the
subtree rooted at index i becomes a heap.
The action required from Heapify is as follows:
Sorting Algorithms
Bubble sort O(n2)
Insertion sortO(n2)
Selection sortO(n2)
Shell sortO(n2)
Heap sortO(n log n)
Merge sortO(n log n)
Quick sortO(n log n)
O(n2) Complexity
O(n log n)
Bubble Sort
It's also the slowest
Compares each item in the list with the item
next to it,
Swapping them if required.
The algorithm repeats this process until it
makes a pass all the way through the list
without swapping any items
Pros: Simplicity and ease of implementation.
Cons: Horribly inefficient.  O(n2)
Heap Sort
Building a heap out of the data set
Given an array A[1…, n]
Since the elements in the subarray
A[floor(n/2 +1) . . n] are all leaves,
Use 'Heapify' to sort the array
Heap Sort
Pros: In-place and non-recursive, making it a good choice
for extremely large data sets.  O(n log n)
Cons: Slower than the merge and quick sorts.
Insertion Sort
Requires two lists
In-place sort is used to save space
Based on the technique used by card players to arrange a hand of
• Player keeps the cards that have been picked up so far in sorted
• When the player picks up a new card, he makes room for the new
card and then inserts it in its proper place
Pros: Relatively simple and easy to
implement.  O(n2 )
Cons: Inefficient for large lists.
Merge Sort
Divide-And-Conquer Algorithm
The list to be sorted into two
equal halves
Places them in separate arrays.
Each array is recursively sorted,
Then merged back together to
form the final sorted list.
Pros: Marginally faster than the heap sort for larger sets.
O(n log n)
Cons: At least twice the memory requirements of the
other sorts; recursive.
Selection Sort
Selects the smallest unsorted item remaining in
the list.
Then swapping it with the item in the next
position to be filled.
Pros: Simple and easy to implement.
Cons: Inefficient for large lists, so similar to the more
efficient insertion sort that the insertion sort should be
used in its place.
Shell Sort
Shell sort works by comparing elements that are
distant rather than adjacent elements in an array
or list where adjacent elements are compared
Shell sort makes multiple passes through a list
and sorts a number of equally sized sets using the
insertion sort.
Shellsort Examples
Sort: 18 32 12 5 38 33 16 2
8 Numbers to be sorted, Shell’s increment will be floor(n/2)
* floor(8/2)  floor(4) = 4
increment 4: 1
18 32 12 5 38 33 16
(visualize underlining)
Step 1) Only look at 18 and 38 and sort in order ;
18 and 38 stays at its current position because they are in order.
Step 2) Only look at 32 and 33 and sort in order ;
32 and 33 stays at its current position because they are in order.
Step 3) Only look at 12 and 16 and sort in order ;
12 and 16 stays at its current position because they are in order.
Step 4) Only look at 5 and 2 and sort in order ;
2 and 5 need to be switched to be in order.
Shellsort Examples (con’t)
Sort: 18 32 12 5 38 33 16 2
Resulting numbers after increment 4 pass:
18 32 12 2
38 33 16 5
* floor(4/2)  floor(2) = 2
increment 2: 1 2
Step 1) Look at 18, 12, 38, 16 and sort them in their appropriate location:
Step 2) Look at 32, 2, 33, 5 and sort them in their appropriate location:
Shellsort Examples (con’t)
Sort: 18 32 12 5 38 33 16 2
* floor(2/2)  floor(1) = 1
increment 1: 1
The last increment or phase of Shellsort is basically an Insertion
Sort algorithm.
Pros: Efficient for medium-size lists.
Cons: Somewhat complex algorithm, not nearly as
efficient as the merge, heap, and quick sorts.
Quick Sort
If there are one or less elements in the array to be sorted, return
Pick an element in the array to serve as a "pivot" point.
Split the array into two parts - one with elements larger than the pivot
and the other with elements smaller than the pivot.
Recursively repeat the algorithm for both halves of the original array.
Its complexity is affected by the pivot point
The worst-case efficiency of the quick sort, O(n2)
As long as the pivot point is chosen randomly, the algorithmic
complexity of O(n log n).
Pros: Extremely fast.
Cons: Very complex algorithm, massively recursive
Bucket Sort
It assumes that the input is generated by a random process that
distributes elements uniformly over the interval [0, 1).
divide the interval [0, 1) into n equal-sized subintervals, or
buckets, and then distribute the n input numbers into the
simply sort the numbers in each bucket and then go through the
bucket in order, listing the elements in each.