ANALYSIS OF ALGORITHMS
Download
Report
Transcript ANALYSIS OF ALGORITHMS
Hashing
Hash Tables - Introduction
A structure that offers fast insertion and
searching
Insertion and searching is almost O(1)
Hashing
- a range of key values is transformed into
a range of array index values
Hashing - Introduction
In a dictionary, if the main key was the array
index, searching and inserting items would be
very fast.
Example: Empdata[1000], employee database,
index=employee number
- search for employee with emp. number = 500
- Answer: Empdata[500]
- Running Time: O(1)
Hash Tables
In the previous example, it was easy since
employee number is an integer.
What if the main key is a word in the
English Alphabet (i.e. last names)
How can the main key be mapped into an
array index
Hash Tables
Sum of Digits Method
- map the alphabet A-Z to the numbers 1
to 26 (a=1,b=2,c=3,etc.)
- add the total of the letters
- For example, “cats”
(c=3,a=1,t=20,s=19);3+1+20+19=43
-”cats” will be stored using index = 43
Hash Tables
Problem
- Too may words with the same index
- “was”,”tin”,”give”,”tend”,”moan”,”tick” and
several other words add to 43
Hashing
Another Method (Multiply by Powers)
- an integer in the numeric system is in
the power of 10
- 7546=7x1000 + 5x100 + 4x10+6
- 7546=7x103 + 5x102 + 4x101+6
Can do the same thing with words
- Use 27 as base (26 letters + blank)
Hashing
“cats”=3*273+1*272+20*271+19
= 60,337
unique index for every word
main drawback : takes too much space
- For up to 10 letter words(279), one
7,000,000,000,000 (7000 gigabytes)
Hashing
While the scheme was able to generate
unique keys, it assigns spaces to nonwords (aaaaaa,zzzzzzz,aaacccc,etc.)
Be able to compress a huge range of
numbers from the numbers-multiplied-by
powers system into a smaller(reasonably
sized) array
Hashing
Hashing function
- The process of converting a number in a
large range into a number in a smaller
range.
Hashing
“cats”=3*273+1*272+20*271+19
= 60,337
unique index for every word
main drawback : takes too much space
- For up to 10 letter words(279), one
7,000,000,000,000 (7000 gigabytes)
Hashing
While the scheme was able to generate
unique keys, it assigns spaces to nonwords (aaaaaa,zzzzzzz,aaacccc,etc.)
Be able to compress a huge range of
numbers from the numbers-multiplied-by
powers system into a smaller(reasonably
sized) array
Hashing
Hashing function
- The process of converting a number in a
large range into a number in a smaller
range.
Size of smaller range
- twice the size of the data set (2s)
- for 50,000 words, array of 100,000
elements
Hashing
Hash Function
- achieved by using the modulo function
(returns the remainder)
- for example, 33 mod 10 = 3
- LargeNumber mod Smallrange
Hashing
Hugenumber=C0*279+C1*278+C2*277 …. C9*270
arraysize = numberofwords * 2
arrayindex=Hugenumber mod arraysize
Hashing - Collisions
Hashing presents the risk of two elements
with the same index (although better than
sumofdigits).
Collision
- two elements with the same index key
after hashing
Collisions
Two approaches to handle collision
- Open Addressing
- Separate Chaining
Open Addressing
- Finding the next available free cell
Separate Chaining
- install a linked list at each index
Open Addressing
Three Types
- Linear Probing, Quadratic Probing, and
Double Hashing
Linear Probing
- Finding the next available cell
(x+1,x+2,etc.)
- leads to clustering
Clustering
Quadratic Probing
- Finds next available cell using the
squares as the step method
(x+1,x+4,x+27,etc)
Double Hash
- Hash again using a different hash
function to find next free cell
- 2nd hash : step size
Separate Chaining
A linked list is installed in the array index
such that entries with the same keys are
attached to the linked list
1
2
3
986
1881
1
333
Hashing
Read Chapter 5 (Data Structures and
Algorithms in C by Weiss)
Chapter 7 (in Goodrich and Tamassia
Book)
Hash Functions implementations are
presented in these chapters
Summary Notes
Data Structures and
Algorithms
Conceptual Approach
Java and C Implementations are
presented in both books
For further studies, focus more on the
mathematical aspects (look at theorems &
propositions)
- Proving
When to use what
General Purpose Data Structures
- arrays,linked lists, trees, and hash tables
- used to store and retrieve data using
key values
-applications : can be used for storing
personnel records, inventories, contact
lists,etc.
General Purpose Data
Structures
Arrays
Best used :
- when amount of data is reasonably
small
- when to amount of data is predictable
in advance
General Purpose Data
Structures
Linked Lists
- when data stored cannot be predicted
- when data will be frequently inserted
and deleted
Binary Search Trees
- used when arrays or linked lists are too
slow
- O(logN) : insertion,searching, deletion
General Purpose Data
Structures
Hash Tables
- fastest data storage structure
- used in spell checkers and as symbol
tables in compilers
- may require additional memory for open
addressing implementations
Special Purpose Data
Structures
Stacks,Queues (Priority Queues)
used by a computer program to aid in
carrying out some algorithm
For example, in graph algorithms, stack
and queues were used
Abstract Data Types
- implemented by a more fundamental
data structure (array,linked list)
- conceptual aids
Special Purpose Data
Structures
Stacks
- used when you want to access the last
data inserted (LIFO structure)
- implemented using array or linked lists
depending on size
Queues
- used when you want to access the first
data item (FIFO structure)
Graphs
Unique data structure
Directly model real world situations
(maps, flight-airports,etc)
structure of the graph reflects the
structure of the problem
main choice is representation : adjacent
list or adjacency matrix
Sorting
For limited data elements (up to 10001500 entries), insertion sort may be
sufficient
When bogged down, can use merge sort
or quick sort (merge sort however
requires additional memory)
Sorting - Running Times
Sort
Average Running Time
Bubble
O( n2)
Selection
O( n2)
Insertion
O( n2)
ShellSort
O(n3/2)
Quick Sort
O(n log n)
MergeSort
O(n log n)