CS-240 Data Structures
Download
Report
Transcript CS-240 Data Structures
Hash Tables
a hash table is an array of size Tsize
has index positions 0 .. Tsize-1
two types of hash tables
closed hash table
Chained (open) hash table
array element type is a <key, value> pair
all items stored in the array
element type is a pointer to a linked list of nodes containing
<key, value> pairs
items are stored in the linked list nodes
keys are used to generate an array index
home address (0 .. Tsize-1)
1
faster searching
"balanced"
search trees guarantee O(log2 n)
search path by controlling height of the
search tree
AVL tree
2-3-4 tree
red-black tree (used by STL associative
container classes)
hash
table allows for O(1) search
performance
search time does not increase as n increases
2
Considerations
How
load factor of a hash table is n/Tsize
Hash
big an array?
function to use?
int hash(KeyType key)
Collision
// 0 .. Tsize-1
resolution strategy?
hash function is many-to-one
3
Hash Function
a
hash function is used to map a key
to an array index (home address)
search starts from here
insert,
retrieve, update, delete all start
by applying the hash function to the
key
4
Some hash functions
if
KeyType is int - key % TSize
if KeyType is a string - convert to an
integer and then % Tsize
goals for a hash function
fast to compute
even distribution
cannot
guarantee no collisions unless all
key values are known in advance
5
An Closed Hash Table
Hash (key) produces
an index in the range
0 to 6. That index is
the “home address”
Some insertions:
K1 --> 3
K2 --> 5
K3 --> 2
0
1
2
3
4
5
6
K3
K3info
K1
K1info
K2
K2info
key value
6
Handling Collisions
Some more insertions:
K4 --> 3
K5 --> 2
K6 --> 4
Linear probing collision
resolution strategy
0
1
2
3
4
5
6
K6
K6info
K3
K3info
K1
K1info
K4
K4info
K2
K2info
K5
K5info
7
Search Performance
0
1
2
3
4
5
6
K6
K6info
K3
K3info
K1
K1info
K4
K4info
K2
K2info
K5
K5info
Average number of probes needed
to retrieve the value with key K?
K hash(K)
K1
3
K2
5
K3
2
K4
3
K5
2
K6
4
#probes
1
1
1
2
5
4
14/6 = 2.33 (successful)
unsuccessful search?
8
A Chained Hash Table
insert keys:
K1 --> 3
K2 --> 5
K3 --> 2
K4 --> 3
K5 --> 2
K6 --> 4
linked lists of synonyms
0
1
2
3
4
5
6
K3 K3info
K1 K1info
K5 K5info
K4 K4info
K6 K6info
K2 K2info
9
Search Performance
0
1
2
3
4
5
6
Average number of probes needed
to retrieve the value with key K?
K3 K3info
K1 K1info
K6 K6info
K2 K2info
K5 K5info
K4 K4info
K hash(K)
K1
3
K2
5
K3
2
K4
3
K5
2
K6
4
#probes
1
1
1
2
2
1
8/6 = 1.33 (successful)
unsuccessful search?
10
successful search performance
load factor
0.5
0.7
0.9
1.0
2.0
open addressing open addressing chaining
(linear probing) (double hashing)
1.50
2.17
5.50
-------
1.39
1.72
2.56
-------
1.25
1.35
1.45
1.50
2.00
11
Factors affecting Search
Performance
quality
of hash function
how uniform?
depends on actual data
collision
resolution strategy used
load factor of the HashTable
N/Tsize
the lower the load factor the better the
search performance
12
Traversal
Visit
each item in the hash table
Closed hash table
O(Tsize) to visit all n items
Tsize is larger than n
Chained
hash table
O(Tsize + n) to visit all n items
Items
value
are not visited in order of key
13
Deletions?
search
for item to be deleted
chained hash table
find node and delete it
open
hash table
must mark vacated spot as “deleted”
is different than “never used”
14
Hash Table Summary
search
speed depends on load factor
and quality of hash function
should be less than .75 for open addressing
can be more than 1 for chaining
items
not kept sorted by key
very good for fast access to unordered
data with known upper bound
to pick a good TSize
15
heap
is a binary tree that
is complete
has the heap-order property
efficient data structure for PriorityQueue ADT
max heap - item stored in each node has a key/priority that
is >= the priority of the items stored in each of its children
min heap - item stored in each node has a key/priority that
is <= the priority of the items stored in each of its children
requires the ability to compare items based on their
priorities
basis for the heapsort algorithm
16
two heaps
23
18
8 12
4 2
7
9
1
4
9
8
23 12
1
7
2
18
A heap is always a complete binary tree
17
a complete binary tree can be
stored in an array
23
18
8 12
4 2
7
9
for the item in A[i]:
leftChild is in A[2i+1]
rightChild is in A[2i+2]
parent
is in A[(i-1)/2]
1
A 23 18 9 8 12 7 1 4 2
0
1
2
3
4
5
6
7
8
Size
9
18
PriorityQueue ADT
Data Items
a collection of items which can be ordered by priority
Operations
constructor - creates an empty PQ
empty () - returns true iff a PQ is empty
size () - returns the number of items in a PQ
push (item) - adds an item to a PQ
top () - returns the item in a PQ with the highest priority
pop () – removes the item with the highest priority
from a PQ
19
PQ Data structures
unordered array or linked list
ordered array or linked list
push is O(n)
top and pop are (1)
heap
push is O(1)
top and pop are (n)
top is O(1)
push and pop are O(log2 n)
STL has a priority_queue class
is implemented using a heap
20
PQ operations
top
return item at A[0]
push and pop must maintain heap-order property
push
put new item at end (in A[size])
re-establish the heap-order property by moving the new
item to where it belongs
pop
A[0] is item to delete
swap A[0] and A[size-1]
move item at A[0] down a path to where it belongs
21
pop( )
4
A
8
18
2
23
12
7
9
1
4
12
8
2
18 12
2
23
23 18 9 8 12 7 1 4 2
0 1 2 3 4 5 6 7 8
18
7
9
1
Size
8 9
22
Balanced Search Trees
several varieties (Ch.13)
AVL trees
2-3-4 trees
Red-Black trees
B-Trees (used for searching secondary memory)
nodes are added and deleted so that the height
of the tree is kept under control
insert and delete take more work, but retrieval
(also insert & delete) never more than log2 n
because height is controlled
23