Transcript Hash Tables

Lecture 21: Hash Tables
Wednesday, November 17, 2004
1
Outline
• Hash-tables (13.4)
Note:
• Used as indexes (far less than B+ trees)
• Used during query processing (a lot !!)
2
In Class
• Suppose the B+ tree has depth 4 and degree d=200
• How many records does the relation have
(maximum) ?
• How many index blocks do we need to read and/or
write during:
– A key lookup
– An insertion
– A deletion
3
Hash Tables
• Secondary storage hash tables are much like
main memory ones
• Recall basics:
– There are n buckets
– A hash function f(k) maps a key k to {0, 1, …, n-1}
– Store in bucket f(k) a pointer to record with key k
• Secondary storage: bucket = block, use
overflow blocks when needed
4
Hash Table Example
• Assume 1 bucket (block) stores 2 keys +
pointers
e
0
• h(e)=0
b
• h(b)=h(f)=1
1
f
• h(g)=2
g
2
• h(a)=h(c)=3
3
a
c
Here: h(x) = x mod 4
5
Searching in a Hash Table
•
•
•
•
Search for a:
Compute h(a)=3
Read bucket 3
1 disk access
0
1
e
b
f
2
g
3
a
c
6
Insertion in Hash Table
• Place in right bucket, if space
• E.g. h(d)=2
0
1
2
e
b
f
g
d
3
a
c
7
Insertion in Hash Table
• Create overflow block, if no space
• E.g. h(k)=1
0
1
2
• More over- 3
flow blocks
may be needed
e
b
k
f
g
d
a
c
8
Hash Table Performance
• Excellent, if no overflow blocks
• Degrades considerably when number of
keys exceeds the number of buckets (I.e.
many overflow blocks).
9
Extensible Hash Table
• Allows has table to grow, to avoid
performance degradation
• Assume a hash function h that returns
numbers in {0, …, 2k – 1}
• Start with n = 2i << 2k , only look at first i
most significant bits
10
Extensible Hash Table
• E.g. i=1, n=2i=2, k=4
i=1
0
1
0(010)
1
1(011)
1
• Note: we only look at the first bit (0 or 1)
11
Insertion in Extensible Hash
Table
• Insert 1110
i=1
0
1
0(010)
1
1(011)
1
1(110)
12
Insertion in Extensible Hash
Table
• Now insert 1010
i=1
0
1
0(010)
1
1(011)
1
1(110), 1(010)
• Need to extend table, split blocks
• i becomes 2
13
Insertion in Extensible Hash
Table
i=2
00
01
10
11
0(010)
1
10(11)
2
10(10)
11(10)
2
14
Insertion in Extensible Hash
Table
• Now insert 0000, then 0101
i=2
0(010)
1
0(000), 0(101)
00
01
10
11
10(11)
2
10(10)
11(10)
2
• Need to split block
15
Insertion in Extensible Hash
Table
• After splitting the block
i=2
00
01
10
11
00(10)
2
00(00)
01(01)
2
10(11)
2
10(10)
11(10)
2
16
Extensible Hash Table
• How many buckets (blocks) do we need to
touch after an insertion ?
• How many entries in the hash table do we
need to touch after an insertion ?
17
Performance Extensible Hash
Table
• No overflow blocks: access always one read
• BUT:
– Extensions can be costly and disruptive
– After an extension table may no longer fit in
memory
18
Linear Hash Table
•
•
•
•
Idea: extend only one entry at a time
Problem: n= no longer a power of 2
Let i be such that 2i <= n < 2i+1
After computing h(k), use last i bits:
– If last i bits represent a number > n, change msb
from 1 to 0 (get a number <= n)
19
Linear Hash Table Example
• n=3
(01)00
i=2
(11)00
(01)11 BIT FLIP
00
01
10
(10)10
20
Linear Hash Table Example
• Insert 1000: overflow blocks…
(01)00
i=2
(10)00
(11)00
(01)11
00
01
10
(10)10
21
Linear Hash Tables
• Extension: independent on overflow blocks
• Extend n:=n+1 when average number of
records per block exceeds (say) 80%
22
Linear Hash Table Extension
• From n=3 to n=4
i=2
(01)00
(01)00
(11)00
(11)00
(01)11
00
01
10
(01)11
i=2
(10)10
• Only need to touch
one block (which one ?)
(10)10
00
01
10
n=11
(01)11
23
Linear Hash Table Extension
• From n=3 to n=4 finished
(01)00
(11)00
• Extension from n=4
to n=5 (new bit)
• Need to touch every
single block (why ?)
i=2
(10)10
00
01
10
11
(01)11
24