Transcript CS2007Ch12C

COSC 2007
Data Structures II
Chapter 13
Advanced Implementation of
Tables III
Topics

Hashing

Definition
Hash function
 Key
 Hash value
 collision


2
Open hashing
Common Problem

A common pattern in many programs is to
store and look up data



Because it is so common, many data
structures for it have been investigated

3
Find student record, given ID#
Find person address, given phone #
How?
Phone Number Problem

Problem: phone company wants to implement
caller ID.
given a phone number (the key), look up person’s
name or address(the data)
 lots of phone numbers (P=107-1) in a given area
code
 only a small fraction of them are in use


4
Nobody has a phone number :0000000 or 0000001
Comparison of Time Complexity (average)
Operation
Insertion
Deletion
Search
O(1)
O(n)
O(n)
Unsorted reference O(1)
O(n)
O(n)
Sorted Array
O(n)
O(n)
O(logn)
Sorted reference
O(n)
O(n)
O(n)
BST
O(logn)
O(logn)
O(logn)
Unsorted Array
Can we do better than O(logn)?
5
Can we do better than O(log N)?



All previous searching techniques require a
specified amount of time (O(logn) or O(n))
Time usually depends on number of elements
(n) stored in the table
In some situations searching should be almost
instantaneous -- how?

Examples


6
911 emergency system
Air-traffic control system
Can we do better than O(log N)?

Search is O(1), but huge amount of space wasted. – how to
solve this?
7
000-0002
000-0001
000-0000
Null Null Null
Xu
•••
Null Null
•••
•••
Null
Sub
263-3049

Answer: Yes … sort of, if we're lucky.
General idea: take the key of the data record
you’re inserting, and use that number directly as
the item number in a list (array).
259-1623

•••
Hashing

Basic idea:


Don't use the data value directly.
Given an array of size B, use a hash function,
h(x), which maps the given data record x to some
(hopefully) unique index (“bucket”) in the array.
0
1
x
8
h
h(x)
B-1
What is Hash Table?


The simplest kind of hash table is an array of
records.
This example has 101 records.
[0]
[1]
[2]
[3]
[4]
[5]
[100]
...
An array of records
9
What is Hash Table?
[4]
Number 256-2879


8888 Queen St.
Linda Kim
Each record has a special
field, called its key.
In this example, the key
is a long integer field
called Number.
[0]
[1]
[2]
[3]
[4]
[5]
[100]
...
An array of records
10
What is Hash Table?
[4]
Number 256-2879

The number is person's
phone number,
and the rest is
person name or address.
[0]
[1]
[2]
[3]
[4]
[5]
[100]
...
An array of records
11
What is Hash Table?

When a hash table is in use, some spots
contain valid records, and other spots are
"empty".
[0]
[1]
[2]
Number 281942902
Number
[3]
233667136
[4]
Number
[5]
[100]
506643548
Number 155778322
...
An array of records
12
Inserting a New Record?


Number 265-1556
In order to insert a new record,
the key must somehow be
converted to an array index.
The index is called the
hash value of the key.
[0]
[1]
[2]
Number 281942902
Number
[3]
233667136
[4]
Number
[5]
[100]
506643548
Number 155778322
...
An array of records
13
Inserting a New Record?

Number 265-1556
Typical way to create a hash value:
(Number mod 101)
What is (265-1556 mod 101) ?
[0]
[1]
[2]
Number 281942902
Number
[3]
233667136
[4]
Number
[5]
[100]
506643548
Number 155778322
...
An array of records
14
Inserting a New Record?Number 265-1556

Typical way to create a hash value:
(Number mod 101)
What is (2651556 mod 101) ?
[0]
[1]
[2]
Number 281942902
Number
[3]
233667136
3
[4]
Number
[5]
[100]
506643548
Number 155778322
...
An array of records
15
Number 265-1556
Inserting a New Record?

The hash value is used for
the location of the
new record.
[0]
[1]
[2]
Number 281942902
Number
[3]
233667136
[3]
[4]
Number
[5]
[100]
506643548
Number 155778322
...
An array of records
16
Inserting a New Record?

The hash value is used for the location of
the new record.
[0]
[1]
[2]
Number 281942902
Number
233667136
[3]
Number 580625685
[4]
Number
[5]
[100]
506643548
Number 155778322
...
An array of records
17
What is Hashing?

What is hashing?





Each item has a unique key.
Use a large array called a Hash Table.
Use a Hash Function.
Hashing is like indexing in that it involves associating a key with
a relative record address.
Hashing, however, is different from indexing in two important
ways:


With hashing, there is no obvious connection between the key and the
location.
With hashing two different keys may be transformed to the same
address.
A Hash function is a function h(K) which transforms a key K
into an address.
18

What is Hashing?

An address calculator (hashing function) is used
to determine the location of the item
0
Search key
Array
Address Calculator
(Hash table)
(Hash function)
N-1
19
What Can Be Hashed?



20
Anything!
Can hash on numbers, strings, structures, etc.
Java defines a hashing method for general objects
which returns an integer value.
Where do we use Hashing?




21
Databases (phone book, student name list).
Spell checkers.
Computer chess games.
Compilers.
Hashing and Tables


Hashing gives us another implementation of Table
ADT
Hashing operations

Initialize





Insert
Search
Delete
Hash the key; this gives an index; use it to find the
value stored in the table in O(1)

22
all locations in Hash Table are empty.
Great improvement over Log N.
Hashing

Insert pseudocode
tableInsert (newItem)
i = the array index that the address calculator gives you for the new item’s
search key
table[i]=newItem

Retrieval pseudocode
tableRerieve (searchKey)
i = array index for searchKey given by the hash function
if (table[i].getKey( ) == searchKey)
return table[i]
else
return null
23
Hashing

Deletion pseudocode
tableDelete (searchKey)
i = array index for searchKey given by the hash function
success=(tabke[I].getKey() equals searchKey
if (success)
Delete the item from table[i]
Return success
24
Hash Tables

Table size


Mapping




Simple to compute
Ideally 1-1: not possible
Even distribution
Main problems



25
Entries are numbered 0 to TSIZE-1
Choosing table size
Choosing a good hash function
What to do on collisions
How to choose the Table Size?
TSIZE = 11
H (Key) = Key mod TSIZE
TSIZE = 10
110
20
0
15
210
20
1
320
22
22
2
460
26
3
520
49
4
54
600
54
5
15
6
26
7
8
9
49
26
0
1
2
3
4
5
6
7
8
9
110
210
320
460
520
600
110 0
210,320 1
2
520 3
4
5
600 6
7
8
460 9
10
How to choose a Hashing Function?

The hash function we choose depends on
the type of the key field (the key we use to
do our lookup).


Rule



27
Finding a good one can be hard
Be easy to calculate.
Use all of the key.
Spread the keys uniformly.
How to choose a Hashing Function?

Example:

Student Ids (integers)
h(idNumber) = idNumber % B
eg. h(678921) = 678921 % 100 = 21

Names (char strings)
h(name) = (sum over the ascii values) % B
eg. h(“Bill”) = (66+105+108+108) % 101 = 86
28
Number
2641455
Collision

Here is another new record to
insert, with a hash value of 2.
My hash
value is [2].
[0]
[1]
[2]
Number 281942902
Number
233667136
[3]
Number 580625685
[4]
Number
[5]
[100]
506643548
Number 155778322
...
An array of records
29
What to do on collisions?


Open hashing (separate chaining)
Close hashing (open address)



30
Linear Probing
Quadratic Probing
Double hashing
Open hashing (separate chaining)

Keep a list of all elements that hash to the same
value.
0
0
1
4
9
16
25
36
49
64
81
31
0
1
2
3
4
5
6
7
8
9
1
81
4
25
16
64
9
49
36
Open hashing (separate chaining)

Secondary Data Structure




List
Search tree
another hash table
We expect small collision

List


32
Simple
Small overhead
0
1
2
3
4
5
6
7
8
9
0
1
81
4
25
16
64
9
49
36
Operations with Chaining

Insert with chaining



Search with chaining


33
Apply hash function to get a position.
Insert key into the Linked List at this position.
Apply hash function to get a position.
Search the Linked List at this position.
Open hashing (separate chaining)
public class ChainNode
{
Private KeyedItem item;
private ChainNode next;
public ChainNode(KeyedItem newItem, ChainNode nextNode) {
item = newItem;
next= nextNode;
// set and get methods
}
} // end of ChainNode
34
Open hashing (separate chaining)
public class HashTable
{
private final int HASH_TABLE_SIZE = 101; // size of hash table
private ChainNode [] table;
//hash table
private int size;
//size of hash table
public HashTable() {
table = new ChainNode [HASH_TABLE_SIZE];
size =0;
}
35
public bool tableIsEmpty() { return size ==0;}
public int tableLength() { return size;}
public void tableInsert(KeyedItem newItem) throws
HashException {}
public boolean tableDelete(Comparable searchKey) {}
public KeyedIten tableRetrieve(Comparable searchKey) {}
} // end of hashtable
Open hashing (separate chaining)
tableInsert(newItem)
if (table is not full) {
searchKey= the search key of newItem
i = hashIndex (searchKey)
node= reference to a new node containing
newItem
node.setNext (table[I]);
table[I] = node
}
else //table full
throw new HashException ()
36
Open hashing (separate chaining)
tableRetrieve (searchKey)
i = hashIndex (searchKey)
node= table [I];
while ((node !=null)&& node.getItem().getKey()!=
searchKey )
node=getNext ()
if (node !=null)
return node.getITem()
else
return null
37
Evaluation of Chaining

Disadvantages of Chaining


More complex to implement.
Search and Delete are harder. We need to know: The
number of elements in the table (N); the number of
buckets (B); the quality of the hash function


Advantage of Chaining


Insertions is easy and quick.
Allows more records to be stored.

38
Worse case (O(n)) for searching
The size of table is dynamic
Review

A(n) ______ maps the search key of a
table item into a location that will contain
the item.




39
hash function
hash table
AVL tree
heap
Review

A hash table is a(n) ______.




40
stack
queue
array
list
Review

The condition that occurs when a hash
function maps two or more distinct search
keys into the same location is called a(n)
______.




41
disturbance
collision
Rotation
congestion
Review

______ is a collision-resolution scheme
that searches the hash table sequentially,
starting from the original location specified
by the hash function, for an unoccupied
location.




42
Linear probing
Quadratic probing
Double hashing
Separate chaining
Review

______ is a collision-resolution scheme that
searches the hash table for an unoccupied
location beginning with the original location that
the hash function specifies and continuing at
increments of 12, 22, 32, and so on.




43
Linear probing
Double hashing
Quadratic probing
Separate chaining
Review

______ is a collision-resolution scheme
that uses an array of linked lists as a hash
table.




44
Linear probing
Double hashing
Quadratic probing
Separate chaining
Review

The load factor of a hash table is
calculated as ______.




45
table size + current number of table items
table size – current number of table items
current number of table items * table size
current number of table items / table size