Transcript Heap Sort

Hashing
CS 110: Data Structures and Algorithms
First Semester, 2010-2011
Hashing
► In
a dictionary, if it can be arranged such
that the key is also the index to the array
that stores the entries, searching and
inserting items would be very fast
► Example: empdata[1000]
index = employee ID number
► Search
for employee with ID number 500
► return empdata[500]
► Running Time: O(1)
Hash Table
►A
data structure implemented as an array
of objects, where the search keys
correspond to the array indices
► Insert and find operations involve
straight forward array accesses: O(1)
time complexity
About Hash Tables
► In
the first example shown, it was
relatively easy since employee number is
an integer
► A few problems may arise in different
situations
About Hash Table
► Problem
1: possible integer key values
might be too large; creating an
appropriate array might be impractical
► Need
to map large integer values to
smaller array indices
► Problem
2: What if the key is a word in
the English Alphabet (e.g. last names)
► Need
to map names to integers (indices)
Large Values to Small Values
► Hash
function: converts a number from a
large range into a number from a smaller
range (the range of array indices)
► Size of the array
► Rule
of thumb: the array size should be
about twice the size of the data set
► For 50,000 words, use an array of 100,000
elements
Hash Function and Modulo
► Simplest
Hash Function: achieved by
using the modulo function (returns the
remainder)
► For
example, 33 % 10 = 3
► General Formula:
LargeNumber % SmallRange
Hash Functions for Names
► Sum
of Digits Method
► Map
the alphabet A to Z to the numbers 1
to 26 (a=1, b=2, c=3, etc)
► Add the total of the letters
► For example, “cats”
►c=3,
a=1, t=20, s=19, 3+1+20+19=43
►“cats” will be stored using index 43
► Use
modulo to map to a smaller array
Collisions
► Problem
► Too
many words with the same index
► “was”, “tin”, “give”, “tend”, “moan”, “tick”
and several other words add to 43
► These are called collisions: case where two
different search keys hash to the same
index value
Collisions
► Can
occur even when dealing with
integers
► Suppose
the size of the hash table is 100
► Keys 158 and 358 hash to the same value
when using the modulo hash function
Collision Resolution Policy
► Need
to know what to do when a
collision occurs; i.e. during an insert
operation; What if the array slot is
already occupied?
► Most common policy: go to the next
available slot
► “wrap
around” the array if necessary
Collision Resolution Policy
► Consequence:
when searching, use the
hash function, first check whether the
element is the one you are looking for
► If not, try the next slots
► How do you know if the element is not in
the array?
Probe Sequence
► Sequence
of indices that serve as array
slots where a key value would map to
► The first index in the probe sequence is
the home position; the value of the hash
function
► The next indices are the alternative slots
Probe Sequence
► Suppose
the array size is 10, and the
hash function is h(K) = K%10.
► The probe sequence for K=25 is:
► 5,6,7,8,9,0,1,2,3,4
► Here,
we assume that most common
collision resolution policy of going to the
next slot: p(K,i) = I
► Goal:
exhaust array slots
Hash Table Operations
► Insert
►
object Obj with key value K
home  h(K)
for i  0 to M-1 do
pos = (home + p(K,i)) % 10
if HT[pos].getKey() = K then
throw exception “error”
// or overwrite it
else if HT[pos] is null then
HT[pos]  Obj
break;
Hash Table Operations
► Finding
►
an object with key value K
home  h(K)
for i  0 to M-1 do
pos = (home + p(K,i)) %
if HT[pos].getKey() = K
return HT[pos]
else if HT[pos] is null
throw exception “not
10
then
then
found”
Hash Table Operations
► Although
insert and find run in O(1) time
during typical conditions, the time
complexity in the worst-case is O(n)
► Something to think about: characterize
the worst-case scenarios for insert and
find
Removing Elements
► Removing
an element from a hash table
during a delete operation poses a
problem
► If we set the corresponding hash table
entry to null, then succeeding find
operations might not work properly
►
Recall that for the find algorithm, seeing a null
means a target element is not found but in fact
the element might be in a next slot
Removing Elements
► Solution:
► Arrange
tombstone
it so that deleted entries seem null
when inserting, but don’t seem null when
searching
► Requires a simple flag on the objects
stored
Hash Tables in Java
► java.util.Hashtable
► Important
methods for Hashtable class
► put(Object
key, Object entry)
► Object get(Object key)
► remove(Object key)
► boolean constainsKey(Object key)
Summary
► Hash
tables implement the dictionary
data structure and enable O(1) insert,
find, and remove operations
► Caveat:
O(n) in the worst-case because of
the possibility of collisions
Summary
► Requires
a hash function(maps keys to
array indices) and a collision resolution
policy
► Probe
sequence depicts a sequence of
array slots that an object would occupy,
given its key
► In
Java: use the Hashtable class