Transcript Document
Lecture 9
Topics
Hashing
Reference: Introduction to Algorithm by Cormen
Chapter 12: Hash Tables
Data Structure and Algorithm
1.1
Introduction
A hash table is a generalization of the
simpler notion of an ordinary array
Searching for an element in a hash table
can take as long as searching for an
element in an array/linked list i.e. O(N)
time in the worst case.
But under reasonable assumptions,
hashing takes O(1) time to search an
element in a hash table.
Data Structure and Algorithm
1.2
Dictionary/Table
Keys
Given a student ID find the record (entry)
Data Structure and Algorithm
1.3
Direct Addressing
Direct Addressing is a simple technique that works well
when the universe U of keys is reasonably small.
Let Universe U= (0,1,…….m-1} where m is not too large
We may assume that no two elements have the same
key.
To represent the dynamic set, we use an array or direct
address table T[0..m-1] in which each position or slot
correspond to a key in the universe U
Data Structure and Algorithm
1.4
Direct Addressing
Slot k points to an element in the set with key k
If the set contains no element with key k, then T[k]=nil
0
1
7
9
2
2
3
3
4
2
1
key data
0
U
(universe of keys)
4
T
k4 5
3
5
8k5
6
5
7
8
6
9
actual
keys
Data Structure and Algorithm
1.5
8
Direct-address Table
Direct addressing is a simple technique that works well when the
universe of keys is small.
Assuming each key corresponds to a unique slot.
Direct-Address-Search(T,k)
return T[k]
Direct-Address-Insert(T,x)
return T[key[x]] x
Direct-Address-Delete(T,x)
return T[key[x]] Nil
5
2
3
entry
1
/
/
/
5
6
7
O(1) time for all operations
Data Structure and Algorithm
/
1
4
7
1
0
1.6
5
/
7
The Problem With Direct Addressing
If the universe U is large, storing a table T of size |U|
may be impractical or even impossible.
Furthermore, the set K of keys actually stored may be
so small relative to U that most of the space allocated
for T would be wasted.
Solution: map keys to smaller range 0..m-1
This mapping is called a hash function
Data Structure and Algorithm
1.7
Hash function
Hash function h maps the universe U of keys into slots of a hash
table T [0..m-1]:
h:U
{0,1,….m-1}
But two keys may hash to the same slot – a collision
T
U
(universe of keys)
0
h(k1)
h(k4)
k1
k4
K
(actual
keys)
k2
k5
h(k2) = h(k5)
h(k3)
k3
m-1
Data Structure and Algorithm
1.8
Next Problem
But two keys may hash to the same slot – a
collision
T
0
U
(universe of keys)
h(k1)
h(k4)
k1
k4
K
(actual
keys)
k2
k5
h(k2) = h(k5)
h(k3)
k3
m-1
Data Structure and Algorithm
1.9
Resolving Collisions
How can we solve the problem of
collisions?
Solution 1: chaining
Solution 2: open addressing
Data Structure and Algorithm
1.10
Chaining
Chaining puts elements that hash to the same slot in a linked
list:
U
(universe of keys)
k6
k2
k4
k1 ——
k7
k5
——
——
——
k1
k4
K
(actual
k7
keys)
T
——
k5
——
k8
k3
k3 ——
k8
——
Data Structure and Algorithm
1.11
k6 ——
k2 ——
Chaining (insert at the head)
U
(universe of keys)
k1
k4
K
(actual
k7
keys)
k6
k2
k5
k8
k3
Data Structure and Algorithm
T
——
k1 ——
——
——
——
——
——
——
——
——
1.12
Chaining (insert at the head)
U
(universe of keys)
k6
k2
k1 ——
——
——
——
k1
k4
K
(actual
k7
keys)
T
——
k5
k2 ——
——
k8
k3
Data Structure and Algorithm
k3 ——
——
——
1.13
Chaining (insert at the head)
U
(universe of keys)
k6
k2
k41 ——
——
——
——
k1
k4
K
(actual
k7
keys)
T
——
k5
k2 ——
——
k8
k3
Data Structure and Algorithm
k3 ——
——
——
1.14
k1 ——
Chaining (insert at the head)
U
(universe of keys)
k6
k2
k41 ——
k1 ——
k52 ——
k2 ——
——
——
——
k1
k4
K
(actual
k7
keys)
T
——
k5
——
k8
k3
k3 ——
——
k6 ——
Data Structure and Algorithm
1.15
Chaining (Insert to the head)
U
(universe of keys)
k6
k2
k4
k1 ——
k7
k5
——
——
——
k1
k4
K
(actual
k7
keys)
T
——
k5
——
k8
k3
k3 ——
k8
——
Data Structure and Algorithm
1.16
k6 ——
k2 ——
Operations
Direct-Hash-Search(T,k)
Search for an element with key k in list T[h(k)]
(running time is proportional to length of the list)
Direct-Hash-Insert(T,x)
(worst case O(1))
Insert x at the head of the list T[h(key[x])]
Direct-Hash-Delete(T,x)
Delete x from the list T[h(key[x])]
(same as searching)
Data Structure and Algorithm
1.17
Open Addressing
Basic idea (details in Section 12.4):
To insert: if slot is full, try another slot, …,
until an open slot is found (probing)
To search, follow same sequence of probes
as would be used when inserting the
element
If reach element with correct key, return it
If reach a NULL pointer, element is not in
table
Good for fixed sets (adding but no deletion)
Table needn’t be much bigger than n
Data Structure and Algorithm
1.18
Choosing A Hash Function
Choosing the hash function well is crucial
Bad hash function puts all elements in same
slot
A good hash function:
Should distribute keys uniformly into slots
Should not depend on patterns in the data
Three popular methods:
Division method
Multiplication method
Universal hashing
Data Structure and Algorithm
1.19
The Division Method
h(k) = k mod m
hash k into a table with m slots using the slot
given by the remainder of k divided by m
Elements with adjacent keys hashed to different
slots: good
If keys bear relation to m: bad
In Practice: pick table size m = prime number
not too close to a power of 2 (or 10)
Data Structure and Algorithm
1.20