CS-240 Data Structures

Download Report

Transcript CS-240 Data Structures

Hash Tables

a hash table is an array of size Tsize


has index positions 0 .. Tsize-1
two types of hash tables

closed hash table



Chained (open) hash table



array element type is a <key, value> pair
all items stored in the array
element type is a pointer to a linked list of nodes containing
<key, value> pairs
items are stored in the linked list nodes
keys are used to generate an array index

home address (0 .. Tsize-1)
1
faster searching
 "balanced"
search trees guarantee O(log2 n)
search path by controlling height of the
search tree
AVL tree
 2-3-4 tree
 red-black tree (used by STL associative
container classes)

 hash
table allows for O(1) search
performance

search time does not increase as n increases
2
Considerations
 How

load factor of a hash table is n/Tsize
 Hash

big an array?
function to use?
int hash(KeyType key)
 Collision

// 0 .. Tsize-1
resolution strategy?
hash function is many-to-one
3
Hash Function
a
hash function is used to map a key
to an array index (home address)

search starts from here
 insert,
retrieve, update, delete all start
by applying the hash function to the
key
4
Some hash functions
 if
KeyType is int - key % TSize
 if KeyType is a string - convert to an
integer and then % Tsize
 goals for a hash function
fast to compute
 even distribution

 cannot
guarantee no collisions unless all
key values are known in advance
5
An Closed Hash Table
Hash (key) produces
an index in the range
0 to 6. That index is
the “home address”
Some insertions:
K1 --> 3
K2 --> 5
K3 --> 2
0
1
2
3
4
5
6
K3
K3info
K1
K1info
K2
K2info
key value
6
Handling Collisions
Some more insertions:
K4 --> 3
K5 --> 2
K6 --> 4
Linear probing collision
resolution strategy
0
1
2
3
4
5
6
K6
K6info
K3
K3info
K1
K1info
K4
K4info
K2
K2info
K5
K5info
7
Search Performance
0
1
2
3
4
5
6
K6
K6info
K3
K3info
K1
K1info
K4
K4info
K2
K2info
K5
K5info
Average number of probes needed
to retrieve the value with key K?
K hash(K)
K1
3
K2
5
K3
2
K4
3
K5
2
K6
4
#probes
1
1
1
2
5
4
14/6 = 2.33 (successful)
unsuccessful search?
8
A Chained Hash Table
insert keys:
K1 --> 3
K2 --> 5
K3 --> 2
K4 --> 3
K5 --> 2
K6 --> 4
linked lists of synonyms
0
1
2
3
4
5
6
K3 K3info
K1 K1info
K5 K5info
K4 K4info
K6 K6info
K2 K2info
9
Search Performance
0
1
2
3
4
5
6
Average number of probes needed
to retrieve the value with key K?
K3 K3info
K1 K1info
K6 K6info
K2 K2info
K5 K5info
K4 K4info
K hash(K)
K1
3
K2
5
K3
2
K4
3
K5
2
K6
4
#probes
1
1
1
2
2
1
8/6 = 1.33 (successful)
unsuccessful search?
10
successful search performance
load factor
0.5
0.7
0.9
1.0
2.0
open addressing open addressing chaining
(linear probing) (double hashing)
1.50
2.17
5.50
-------
1.39
1.72
2.56
-------
1.25
1.35
1.45
1.50
2.00
11
Factors affecting Search
Performance
 quality
of hash function
how uniform?
 depends on actual data

 collision
resolution strategy used
 load factor of the HashTable
N/Tsize
 the lower the load factor the better the
search performance

12
Traversal
 Visit
each item in the hash table
 Closed hash table
O(Tsize) to visit all n items
 Tsize is larger than n

 Chained

hash table
O(Tsize + n) to visit all n items
 Items
value
are not visited in order of key
13
Deletions?
 search
for item to be deleted
 chained hash table

find node and delete it
 open
hash table
must mark vacated spot as “deleted”
 is different than “never used”

14
Hash Table Summary
 search
speed depends on load factor
and quality of hash function
should be less than .75 for open addressing
 can be more than 1 for chaining

 items
not kept sorted by key
 very good for fast access to unordered
data with known upper bound

to pick a good TSize
15
heap

is a binary tree that


is complete
has the heap-order property



efficient data structure for PriorityQueue ADT


max heap - item stored in each node has a key/priority that
is >= the priority of the items stored in each of its children
min heap - item stored in each node has a key/priority that
is <= the priority of the items stored in each of its children
requires the ability to compare items based on their
priorities
basis for the heapsort algorithm
16
two heaps
23
18
8 12
4 2
7
9
1
4
9
8
23 12
1
7
2
18
A heap is always a complete binary tree
17
a complete binary tree can be
stored in an array
23
18
8 12
4 2
7
9
for the item in A[i]:
leftChild is in A[2i+1]
rightChild is in A[2i+2]
parent
is in A[(i-1)/2]
1
A 23 18 9 8 12 7 1 4 2
0
1
2
3
4
5
6
7
8
Size
9
18
PriorityQueue ADT

Data Items


a collection of items which can be ordered by priority
Operations






constructor - creates an empty PQ
empty () - returns true iff a PQ is empty
size () - returns the number of items in a PQ
push (item) - adds an item to a PQ
top () - returns the item in a PQ with the highest priority
pop () – removes the item with the highest priority
from a PQ
19
PQ Data structures

unordered array or linked list



ordered array or linked list



push is O(n)
top and pop are (1)
heap



push is O(1)
top and pop are (n)
top is O(1)
push and pop are O(log2 n)
STL has a priority_queue class

is implemented using a heap
20
PQ operations

top

return item at A[0]
push and pop must maintain heap-order property
 push




put new item at end (in A[size])
re-establish the heap-order property by moving the new
item to where it belongs
pop



A[0] is item to delete
swap A[0] and A[size-1]
move item at A[0] down a path to where it belongs
21
pop( )
4
A
8
18
2
23
12
7
9
1
4
12
8
2
18 12
2
23
23 18 9 8 12 7 1 4 2
0 1 2 3 4 5 6 7 8
18
7
9
1
Size
8 9
22
Balanced Search Trees

several varieties (Ch.13)




AVL trees
2-3-4 trees
Red-Black trees
B-Trees (used for searching secondary memory)
nodes are added and deleted so that the height
of the tree is kept under control
 insert and delete take more work, but retrieval
(also insert & delete) never more than log2 n
because height is controlled

23