ANALYSIS OF ALGORITHMS

Download Report

Transcript ANALYSIS OF ALGORITHMS

Hashing
Hash Tables - Introduction
A structure that offers fast insertion and
searching
Insertion and searching is almost O(1)
Hashing
- a range of key values is transformed into
a range of array index values
Hashing - Introduction
In a dictionary, if the main key was the array
index, searching and inserting items would be
very fast.
Example: Empdata[1000], employee database,
index=employee number
- search for employee with emp. number = 500
- Answer: Empdata[500]
- Running Time: O(1)
Hash Tables
In the previous example, it was easy since
employee number is an integer.
What if the main key is a word in the
English Alphabet (i.e. last names)
How can the main key be mapped into an
array index
Hash Tables
Sum of Digits Method
- map the alphabet A-Z to the numbers 1
to 26 (a=1,b=2,c=3,etc.)
- add the total of the letters
- For example, “cats”
(c=3,a=1,t=20,s=19);3+1+20+19=43
-”cats” will be stored using index = 43
Hash Tables
Problem
- Too may words with the same index
- “was”,”tin”,”give”,”tend”,”moan”,”tick” and
several other words add to 43
Hashing
Another Method (Multiply by Powers)
- an integer in the numeric system is in
the power of 10
- 7546=7x1000 + 5x100 + 4x10+6
- 7546=7x103 + 5x102 + 4x101+6
Can do the same thing with words
- Use 27 as base (26 letters + blank)
Hashing
“cats”=3*273+1*272+20*271+19
= 60,337
unique index for every word
main drawback : takes too much space
- For up to 10 letter words(279), one
7,000,000,000,000 (7000 gigabytes)
Hashing
While the scheme was able to generate
unique keys, it assigns spaces to nonwords (aaaaaa,zzzzzzz,aaacccc,etc.)
Be able to compress a huge range of
numbers from the numbers-multiplied-by
powers system into a smaller(reasonably
sized) array
Hashing
Hashing function
- The process of converting a number in a
large range into a number in a smaller
range.
Hashing
“cats”=3*273+1*272+20*271+19
= 60,337
unique index for every word
main drawback : takes too much space
- For up to 10 letter words(279), one
7,000,000,000,000 (7000 gigabytes)
Hashing
While the scheme was able to generate
unique keys, it assigns spaces to nonwords (aaaaaa,zzzzzzz,aaacccc,etc.)
Be able to compress a huge range of
numbers from the numbers-multiplied-by
powers system into a smaller(reasonably
sized) array
Hashing
Hashing function
- The process of converting a number in a
large range into a number in a smaller
range.
Size of smaller range
- twice the size of the data set (2s)
- for 50,000 words, array of 100,000
elements
Hashing
Hash Function
- achieved by using the modulo function
(returns the remainder)
- for example, 33 mod 10 = 3
- LargeNumber mod Smallrange
Hashing
 Hugenumber=C0*279+C1*278+C2*277 …. C9*270
 arraysize = numberofwords * 2
 arrayindex=Hugenumber mod arraysize
Hashing - Collisions
Hashing presents the risk of two elements
with the same index (although better than
sumofdigits).
Collision
- two elements with the same index key
after hashing
Collisions
Two approaches to handle collision
- Open Addressing
- Separate Chaining
Open Addressing
- Finding the next available free cell
Separate Chaining
- install a linked list at each index
Open Addressing
Three Types
- Linear Probing, Quadratic Probing, and
Double Hashing
Linear Probing
- Finding the next available cell
(x+1,x+2,etc.)
- leads to clustering
Clustering
Quadratic Probing
- Finds next available cell using the
squares as the step method
(x+1,x+4,x+27,etc)
Double Hash
- Hash again using a different hash
function to find next free cell
- 2nd hash : step size
Separate Chaining
A linked list is installed in the array index
such that entries with the same keys are
attached to the linked list
1
2
3
986
1881
1
333
Hashing
Read Chapter 5 (Data Structures and
Algorithms in C by Weiss)
Chapter 7 (in Goodrich and Tamassia
Book)
Hash Functions implementations are
presented in these chapters
Summary Notes
Data Structures and
Algorithms
Conceptual Approach
Java and C Implementations are
presented in both books
For further studies, focus more on the
mathematical aspects (look at theorems &
propositions)
- Proving
When to use what
General Purpose Data Structures
- arrays,linked lists, trees, and hash tables
- used to store and retrieve data using
key values
-applications : can be used for storing
personnel records, inventories, contact
lists,etc.
General Purpose Data
Structures
Arrays
Best used :
- when amount of data is reasonably
small
- when to amount of data is predictable
in advance
General Purpose Data
Structures
Linked Lists
- when data stored cannot be predicted
- when data will be frequently inserted
and deleted
Binary Search Trees
- used when arrays or linked lists are too
slow
- O(logN) : insertion,searching, deletion
General Purpose Data
Structures
Hash Tables
- fastest data storage structure
- used in spell checkers and as symbol
tables in compilers
- may require additional memory for open
addressing implementations
Special Purpose Data
Structures
Stacks,Queues (Priority Queues)
used by a computer program to aid in
carrying out some algorithm
For example, in graph algorithms, stack
and queues were used
Abstract Data Types
- implemented by a more fundamental
data structure (array,linked list)
- conceptual aids
Special Purpose Data
Structures
Stacks
- used when you want to access the last
data inserted (LIFO structure)
- implemented using array or linked lists
depending on size
Queues
- used when you want to access the first
data item (FIFO structure)
Graphs
Unique data structure
Directly model real world situations
(maps, flight-airports,etc)
structure of the graph reflects the
structure of the problem
main choice is representation : adjacent
list or adjacency matrix
Sorting
For limited data elements (up to 10001500 entries), insertion sort may be
sufficient
When bogged down, can use merge sort
or quick sort (merge sort however
requires additional memory)
Sorting - Running Times
Sort
Average Running Time
Bubble
O( n2)
Selection
O( n2)
Insertion
O( n2)
ShellSort
O(n3/2)
Quick Sort
O(n log n)
MergeSort
O(n log n)