h + 1

Transcript h + 1

ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
ΤΕΧΝΙΚΕΣ ΚΑΤΑΚΕΡΜΑΤΙΣΜΟΥ
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
1
HASHING
Αποτελεσματικός τρόπος για:
...αποθήκευση δεδομένων
...ανάκτηση δεδομένων
Στόχος: αναζήτηση σε O(1)
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
2
Βασική λειτουργία
• Αποθηκεύουμε το στοιχείο με κλειδί k, στη θέση
h(k) του πίνακα κατακερματισμού.
• Συνάρτηση κατακερματισμού h :
– Όταν k δεν είναι ακέραιος, το h(k) είναι.
– Έχει τιμές από 0 ως Ν -1.
• Συγκρούσεις, όταν k1 ≠ k2 και h(k1) = h(k2)
– Διαφορετικά κλειδιά αντιστοιχίζονται στην ίδια θέση
του πίνακα κατακερματισμού.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
3
Συνάρτηση κατακερματισμού
• Επιτελεί δύο λειτουργίες:
– μετατρέπει την τιμή του κλειδιού σε ακέραιο,
– περιορίζει την τιμή του ακεραίου που υπολόγισε στο
προηγούμενο βήμα, εντός της περιοχής [0..N-1]
• Η πιο προφανής μέθοδος είναι η μέθοδος της
διαίρεσης ( mod N )
• Μια εναλλακτική μέθοδος είναι η :
h(k)= (a*k+b) mod N
(MAD – Multiply Add and Divide), όπου το N είναι
ένας πρώτος αριθμός και a mod N <> 0.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
4
Πίνακας Κατακερματισμού
0

1
2
025-612-0001

3
981-101-0003
4
451-229-0004
Έστω N το μέγεθος του
πίνακα, εδώ Ν = 10000
…
Χρησιμοποιείται η
συνάρτηση
κατακερματισμού: «πάρε τα
4 τελευταία ψηφία», δηλ.
h(k) = k mod Ν
9997 
9998
200-751-9998
9999

ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
5
Αντιμετώπιση Συγκρούσεων
• Όσο δεν έχουμε συγκρούσεις, ο κατακερματισμός
αποδίδει σε χρόνο Ο(1)
• Ορισμός: Παράγων Φόρτου (load factor)
α (ή λ) = n / N
– n = πλήθος στοιχείων που έχουν εισαχθεί
– N = μέγεθος πίνακα
• Στην περίπτωση συγκρούσεων τα στοιχεία μπορεί
να φυλάσσονται:
– σε άλλη δομή δεδομένων έξω από τον πίνακα
– σε εναλλακτικές θέσεις του πίνακα κατακερματισμού
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
6
Questions to Ask When Analyzing Resolution
Schemes
1. Are we guaranteed to find an empty cell if
there is one?
2. Are we guaranteed we won’t be checking the
same cell twice during one insertion?
3. What should the load factor be to obtain
O(1) average-case insert, search, and delete?
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
7
Three factors affecting the performance of
hashing
1. The hash function
– Ideally, it should distribute keys and entries evenly throughout
the table
– It should minimise collisions, where the position given by the hash
function is already occupied
2. The size of the table
– Too big will waste memory; too small will increase collisions and
may eventually force rehashing (copying into a larger table)
– Should be appropriate for the hash function used – and a prime
number is best
3. The collision resolution strategy
– Separate chaining: chain together several keys/entries in each
position
– Open addressing: store the key/entry in a different position
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
8
1. Choosing a hash function:
turning a key into a table position
• Truncation
– Ignore part of the key and use the rest as the array index (converting
non-numeric parts)
– A fast technique, but check for an even distribution throughout the table
• Folding
– Partition the key into several parts and then combine them in any
convenient way
– Unlike truncation, uses information from the whole key
• Modular arithmetic (used by truncation & folding, and on its own)
– To keep the calculated table position within the table, divide the position
by the size of the table, and take the remainder as the new position
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
9
Examples of hash functions
• Truncation: If students have an 9-digit identification
number, take the last 3 digits as the table position
– e.g. 925371622 becomes 622
• Folding: Split a 9-digit number into three 3-digit
numbers, and add them
– e.g. 925371622 becomes 925 + 371 + 622 = 1923
• Modular arithmetic: If the table size is 1000, the first
example always keeps within the table range, but the
second example does not (it should be mod 1000)
– e.g. 1923 mod 1000 = 923
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
(in C++: 1923 % 1000)
10
• Using a telephone number as a key
– The area code is not random, so will not spread the keys/entries evenly
through the table (many collisions)
– The last 3-digits are more random
• Using a name as a key
– Use full name rather than surname (surname not particularly random)
– Assign numbers to the characters (e.g. a = 1, b = 2; or use Unicode
values)
– Strategy 1: Add the resulting numbers. Bad for large table size.
– Strategy 2: Call the number of possible characters c (e.g. c = 54 for
alphabet in upper and lower case, plus space and hyphen). Then multiply
each character in the name by increasing powers of c, and add together.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
11
Modular arithmetic : Division
• The key is subject to modular (remainder)
division by an integer, which is usually
prime.
• This integer should be almost equal the
desired size of the array. The result of
the division – the remainder – determines
which array location is used.
• most common-used
others
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
in
combination
with
12
Modular arithmetic : Midsquare
• The key is squared and the digits in the
middle are retained for the address. This
works better with smaller hash values
(sizes less than 10000)
EXAMPLE : number 9876
(9876)2 = 975 353 76
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
13
Modular arithmetic : Folding / Boundary Folding
• Social Security Number : 387-58-1505
– hash as sum of three integers:
• 387 + 58 + 1505 = 1950
– hash as sum of three integers:
387
+ 85 (this number is reversed)
+ 1505
= 1977
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
14
Bar Coding
• A bar code consists of 10 digits:
1 234 567 890
• Store bar codes in a hash table
• Suppose total number of bar codes is less than
10,000….
– Where and how do I store the codes?
– What is the size of hash table?
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
15
Modular arithmetic : Folding
• The key is divided into several parts, each of which
are combined and processed to give an address. For
example, if the bar code is 70662 11001
• HashTable has 15000 entries
• Group into pairs: 70 66 21 10 01
• Multiply the first three pairs together
70 x 66 x 21 = 97020
• Add this number to the last two pairs:
97020 + 10 + 01 = 97031
• Find the remainder of mod division by 14987 (15000 –
3)
97031 % 14987 = 7109
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
16
• bar code : 66702 10110
– Group into pairs: 66 70 21 01 10
–
–
–
66 x 70 x 21 = 97020
97020 + 1 + 10 = 97031
97031 % 14987 = 7109
OOPS….same value as last bar code!
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
17
Real problems
• Suppose we are storing numeric id’s of customers,
maybe 100,000
• Now, we want to check if a person is delinquent, usually
less than 400 such people.
• Use an array of size 1000, for the delinquents.
• Put id in at id mod tableSize.
– id = 987567 table index = 567
• Clearly fast for searching
• But what happens if entries collide?
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
18
• Suppose we are storing students by social
security number
• How many students?
• How big should the table be?
• How do I store the students?
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
19
2. Choosing the table size to minimize
collisions
• As the number of elements in the table increases, the likelihood
of a collision increases - so make the table as large as practical
• If the table size is 100, and all the hashed keys are dividable by
10, there will be many collisions!
– Particularly bad if table size is a power of a small integer such as 2 or
10
• More generally, collisions may be more frequent if:
– Greatest Common Divisor (hashed keys, table size) > 1
• Therefore, make the table size a prime number (GCD = 1)
• An excess of approximately 30% is typical.
– This means that if s is the number of slots in the table and e is the
number of elements, then
s = a prime number >= 4/3 * e
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
20
3. The collision resolution strategy
• Collisions may still happen, so we need a
collision resolution strategy.
• Two principal strategies (techniques) :
–
Separate chaining
–
Open addressing
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
21
Open Addressing Strategy
• To insert a key K, compute h0(K). If the
location of the hash array, let T[h0(K)], is
empty, insert it there. If collision occurs,
probe alternative cell h1(K), h2(K), .... until an
empty cell is found.
hi(K) = (hash(K) + f(i)) mod m,
with f(0) = 0
– f: collision resolution strategy
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
22
Probing: If the table position given by the hashed key
is already occupied, increase the position by some
amount, until an empty position is found
– Linear probing: increase by 1 each time [mod table size!]
– Quadratic probing: to the original position, add 1, 4, 9, 16,…
Use the collision resolution strategy when inserting and when
finding (ensure that the search key and the found keys match)
Double hash : result of linear probing  result of another hash
function
With open addressing, the table size should be double the
expected number of elements
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
23
• Linear Probing
– f(i) = i
• cells are probed sequentially (with wraparound)
• hi(K) = (hash(K) + i) mod m
• Quadratic Probing
– f(i) = i2
• hi(K) = ( hash(K) +
i2 ) mod m
• Double Hashing
– f(i) = i * hash2(K)
• e.g. hash2(K) = R - (K mod R), with R is a
prime smaller than m
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
24
Linear Probing
• If the table is fairly empty with many collisions,
linear probing may cluster (group) keys/entries
– This increases the time to insert and to find
1
2
3
4
5
6
7
8
For a table of size n, then if the table is empty, the probability of the
next entry going to any particular place is 1/n
In the diagram, the probability of position 2 getting filled next is 2/n
(either a hash to 1 or to 2 fills it)
Once 2 is full, the probability of 4 being filled next is 4/n and then of 7 is
7/n (i.e. the probability of getting long strings steadily increases)
Linear Probing suffers from primary clustering
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
25
Primary Clustering
• We call a block of contiguously occupied table entries a cluster
• On the average, when we insert a new key K, we may hit the
middle of a cluster. Therefore, the time to insert K would be
proportional to half the size of a cluster. That is, the larger the
cluster, the slower the performance.
• Linear probing has the following disadvantages:
– Once h(K) falls into a cluster, this cluster will definitely grow in size
by one. Thus, this may worsen the performance of insertion in the
future.
– If two cluster are only separated by one entry, then inserting one
key into a cluster can merge the two clusters together. Thus, the
cluster size can increase drastically by a single insertion. This
means that the performance of insertion can deteriorate drastically
after a single insertion.
– Large clusters are easy targets for collisions.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
26
Primary Clustering
Consider inserting the following entries
81, 70, 97, 63, 76, 38, 85, 68, 21, 9, 55, 73, 57, 60, 72, 74, 85, 16, 61, 7, 49
Use the number modulo 25 to determine which
bin it should occupy
– The first five don’t cause any collisions
0
1
76
2
3
4
5
6
7
8
9
10
11
81
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
12
13
63
14
15
16
17
18
19
20
70
21
22
23
24
97
27
Primary Clustering
Inserting 38 causes a collision in bin 13
The next seven do not cause any further
collisions
0
1
76
2
3
4
5
6
7
55
81
57
8
9
10
9
85
11
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
12
13
14
63
38
15
16
17
18
68
19
20
21
22
23
70
21
97
73
24
28
Primary Clustering
The next four insertions cause collisions:
60 (bin
72 (bin
74 (bin
85 (bin
10)
22)
24)
10)
We can safely insert 16 into bin 16
0
1
74
76
2
3
4
5
6
7
55
81
57
8
9
10
11
12
13
14
9
85
60
85
63
38
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
15
16
16
17
18
68
19
20
21
22
23
24
70
21
97
73
72
29
Primary Clustering
The remaining insertions all cause collisions:
61 (bin 11)
7 (bin 7)
49 (bin 24)
0
1
2
74
76
49
3
4
5
6
7
8
9
10
11
12
13
14
15
16
55
81
57
7
9
85
60
85
63
38
61
16
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
17
18
68
19
20
21
22
23
24
70
21
97
73
72
30
Primary Clustering
The length of these chains will affect the number of
probes required to perform insertions, accesses, or
removals
It is possible to estimate the average number of probes
for a successful search, where λ is the load factor:
1
2
1 

1 

 1  
For example: if λ = 0.5, we require 1.5 probes on
average
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
31
Primary Clustering
The number of probes for an unsuccessful
search or for an insertion is higher:
1
2

1 
1 

2 



1




For 0 ≤  ≤ 1, then (1 – )2 ≤ 1 – , and therefore
the reciprocal will be larger
– Again, if  = 0.5 then we require 2.5 probes on
average
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
32
Primary Clustering
The following plot shows how the number of
required probes increases
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
33
Primary Clustering
• Our goal was to keep all operations O(1)
• Unfortunate, as  grows, so does the run time
• One solution is to keep the load factor under a given
bound
• If we choose  = 2/3, then the number of probes for
either a successful or unsuccessful search is 2 and 5,
respectively.
• Therefore, we have three choices:
– Choose M large enough so that we will not pass this load factor
– Double the number of bins if the chosen load factor is reached
– Choose a different strategy from linear probing
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
34
Primary Clustering
• The first solution (choose M sufficiently large) is most
useful if we know all the possible entries
• The second (doubling) is only useful if we have an
environment where we can dynamically allocate memory
• For the third, we will look at quadratic probing and
double hashing. Quadratic Probing is the most common
technique to avoid clustering.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
35
Quadratic probing
• Quadratic probing is a solution to the primary
clustering problem
– Linear probing adds 1, 2, 3, etc. to the original hashed key
– Quadratic probing adds 12, 22, 32 etc. to the original
hashed key
• However, whereas linear probing guarantees that all
empty positions will be examined if necessary, quadratic
probing does not.
• Two keys with different home positions will have different probe
sequences
– e.g. m=101, h(k1)=30, h(k2)=29
– probe sequence for k1: 30,30+1, 30+4, 30+9
– probe sequence for k2: 29, 29+1, 29+4, 29+9
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
36
Example (1) - Quadratic Probing
Use quadratic probing to insert the following numbers into
an initially empty hash table with 11 bins where the hash
value of a number is the least-significant digit.
81, 70, 34, 49, 50, 64
Starting with an initially empty table:
0
1
2
3
4
5
6
7
8
9
10
∅
∅
∅
∅
∅
∅
∅
∅
∅
∅
∅
We insert 81 in 1,
0
1
2
3
4
5
6
7
8
9
10
∅
81
∅
∅
∅
∅
∅
∅
∅
∅
∅
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
37
70 in 0,
0
1
2
3
4
5
6
7
8
9
10
70
81
∅
∅
∅
∅
∅
∅
∅
∅
∅
34 in bin 4,
0
1
2
3
4
5
6
7
8
9
10
70
81
∅
∅
34
∅
∅
∅
∅
∅
∅
and 49 in bin 9.
0
1
2
3
4
5
6
7
8
9
10
70
81
∅
∅
34
∅
∅
∅
∅
49
∅
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
38
• Inserting 50, we note that bin 0 is occupied and
therefore we check:
–
–
–
–
0 + 1 ≡ 1 which is occupied,
0 + 4 ≡ 4 which is occupied,
0 + 9 ≡ 9 which is occupied, and
0 + 16 ≡ 5 which is unoccupied.
• Thus, 50 goes into bin 5.
0
1
2
3
4
5
6
7
8
9
10
70
81
∅
∅
34
50
∅
∅
∅
49
∅
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
39
• Inserting 64, we note that bin 4 is occupied, and
therefore we check:
– bin 4 + 1 ≡ 5 which is occupied, and
– bin 4 + 4 ≡ 8 which is unoccupied.
• Thus, 64 goes into bin 8.
0
1
2
3
4
5
6
7
8
9
10
70
81
∅
∅
34
50
∅
∅
64
49
∅
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
40
Example (2) – Quadratic Probing
From the hash table in Example 1, search for the elements 60, 61, 62, 63, 64, 65, 66,
67, 68, and 69.
60. Searching for 60, we check 0, 0 + 1 ≡ 1, 0 + 4 ≡ 4, 0 + 9 ≡ 9, 0 + 16 ≡ 5, 0 + 25 ≡ 3,
and 3 is empty, and therefore 60 is not in the hash table.
61. Searching for 61, we check 1 and 1 + 1 ≡ 2 and 2 is empty. Therefore 61 is not in the
hash table.
62. 2 is empty, therefore 62 is not in the hash table.
63. 3 is empty, therefore 63 is not in the hash table.
64. Searching for 64, we check 4, 4 + 1 ≡ 5, and 4 + 4 ≡ 8, and 64 is located in 8, and
therefore 64 is in the hash table.
65. Searching for 65, we check 5 and 5 + 1 ≡ 6 and 6 is empty. Therefore 65 is not in
the hash table.
66. 6 is empty, therefore 66 is not in the hash table.
67. 7 is empty, therefore 67 is not in the hash table.
68. Searching for 68, we check 8, 8 + 1 ≡ 9, 8 + 4 ≡ 1, and 8 + 9 ≡ 6 and 6 is empty.
Therefore 68 is not in the hash table.
69. Searching for 69, we check 9 and 9 + 1 ≡ 10 and 10 is empty. Therefore 69 is not
located in the hash table.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
41
Example (3) - Quadratic Probing
• For example, suppose an element was to be inserted in
bin 23 in a hash table with 31 bins
• The sequence in which the bins would be checked is:
23, 24, 27, 1, 8, 17, 28, 10, 25, 11, 30, 20, 12, 6, 2, 0
• Even if two bins are initially close, the sequence in
which subsequent bins are checked varies greatly
• Again, with M = 31 bins, compare the first 16 bins which
are checked starting with 22 and 23:
22
23
22,23,26,0,7,16,27,9,24,10,29,19,11,5,1,30
23,24,27,1,8,17,28,10,25,11,30,20,12,6,2,0
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
42
Quadratic Probing: Properties
• Thus, quadratic probing solves the problem of
primary clustering
• Unfortunately, there is a second problem which
must be dealt with
• Suppose we have M = 8 bins:
12 ≡ 1, 22 ≡ 4, 32 ≡ 1
• In this case, we are checking bin h + 1 twice
having checked only one other bin
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
43
• Disadvantage of this method:
– After a number of probes the sequence of steps repeats itself
(remember that the step will be probe number2 mod the size of
the hash table). This repetition occurs when the probe number
is roughly half the size of the hash table.
– e.g. Table size 16 and original hashed key 3 gives the sequence:
3, 4, 7, 12, 3, 12, 7, 4…
• More generally, with quadratic probing, insertion may be
impossible if the table is more than half-full!
• Need to rehash
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
44
• For any  < 0.5, quadratic probing will find an
empty slot; for bigger , quadratic probing may find
a slot
• If the table size is prime, then a new key can always
be inserted if the table is at least half empty
• Keys that hash to the same home position will probe
the same alternative cells
• Simulation results suggest that it generally causes
less than an extra half probe per search
• Quadratic probing does not suffer from primary
clustering: keys hashing to the same area are not bad
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
45
Secondary Clustering
• The phenomenon of primary clustering will not occur
with quadratic probing
• Quadratic Probing suffers from a milder form of
clustering called secondary clustering (if multiple items
all hash to the same initial bin, the same sequence of
numbers will be followed ). The effect is less significant
than that of primary clustering.
• As with linear probing, if two keys have the same initial
probe position, then their probe sequences are the
same, since h(k1,0) = h(k2,0) implies h(k1,1) = h(k2,1).
So only m distinct probes are used.
• Clustering can occur around the probe sequences.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
46
Secondary Clustering
• Secondary clustering may be a problem if the hash
function does not produce an even distribution of
entries
• To avoid secondary clustering, the probe sequence need
to be a function of the original key value, not the home
position
• One solution to secondary is double hashing:
associating with each element an initial bin (defined by
one hash function) and a skip (defined by a second hash
function)
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
47
Review
Linear probing:
– Look at bins k, k + 1, k + 2, k + 3, k + 4, …
– Primary clustering
Quadratic probing:
– Look at bins k, k + 1, k + 4 , k + 9, k + 16, …
– Secondary clustering (dangerous for poor hash
functions)
– Expensive:
• Prime-sized arrays
• Euclidean algorithm for calculating remainders
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
48
Double Hashing
An alternate solution
– Give each object (with high probability) a different
jump size
– Associate with each object an initial bin and a jump
size
for ( int k = 0; k < M; ++k ) {
bin = (initial + k*jump) % M;
– The jump size and the number of bins M must be
relatively prime
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
49
Problem:
– Will initial + k*jump step through all of the
bins?
– The output of:
M = 16;
initial = 5
jump
= 12;
for ( int k = 0; k < M; ++k ) {
cout << (initial + k*jump) % M << ' ';
}
is
5 1 13
9
5
1 13
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
9
5
1 13
9
5
1 13
9
50
Problem:
– Not all jump sizes will visit all bins
– However, in this case, we visit all bins
M = 16;
initial = 5
jump
= 7;
for ( int k = 0; k < M; ++k ) {
cout << (initial + k*jump) % M << ' ';
}
is
5 12
3 10
1
8 15
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
6 13
4 11
2
9
0
7 14
51
Double Hashing
• Double hashing aims to avoid both primary and
secondary clustering and is guaranteed to find a free
element in a hash table as long as the table is not full.
It achieves these goals by calculating the step value
using a second hash function g.
step(k) = k.g(key)
• This new hash function g should:
– be different from the original hash function (remember
that it was the original hash function that resulted in the
collision in the first place) and,
– not result in zero (as original index + 0 = original index)
• One of the best methods available for open addressing
because the permutations produced have many of the
characteristics of randomly chosen permutations.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
52
• The second hash function is usually chosen as follows:
g(key) = q – (key % q),
where q is a prime number q < N (N is the size of the
array).
• In order for the entire table to be searched, the value of
the second hash function, g(k), must be relatively prime to
the table size m.
• Remark: It is important that the size of the hash table is a
prime number if double hashing is to be used. This
guarantees that successive probes will (eventually) try
every index in the hash table before an index is repeated
(which would indicate that the hash table is full).
– For other hashings (and for q) we want to use prime
numbers to eliminate existing patterns in the data.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
53
Double Hashing
• Probe sequence:
0th probe = h(k) mod TableSize
1th probe = (h(k) + g(k)) mod TableSize
2th probe = (h(k) + 2*g(k)) mod TableSize
3th probe = (h(k) + 3*g(k)) mod TableSize
...
ith probe = (h(k) + i*g(k)) mod TableSize
• Performance of Double hashing:
–
–
Much better than linear or quadratic probing because it
eliminates both primary and secondary clustering.
BUT requires a computation of a second hash function.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
54
Example
h(k) = k mod 7 and g(k) = 5 – (k mod 5)
93
76
0
1
2
3
4
5
6
Probes
76
1
0
1
2
3
4
5
6
93
76
1
40
0
1
2
3
4
5
6
93
40
76
1
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
47
0
1
2
3
4
5
6
47
93
40
76
2
10
0
1
2
3
4
5
6
47
93
10
40
76
1
55
0
1
2
3
4
5
6
47
93
10
55
40
76
2
55
Selecting a second hash function
• Second hash function g()must never evaluate to zero
• For any key K, g(K) must be relatively prime to the table size
m. Otherwise, we will only be able to examine a fraction of the
table entries.
– E.g., if hash(K)
= 0 and g(K) = m/2, then we can only
examine the entries T[0], T[m/2],and nothing else!
• One solution is to make m prime, and choose R to be a prime
smaller than m, and set
g(K) = R – (K mod R)
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
56
Example
• Load the keys 18, 26, 35, 9, 64, 47, 96, 36, and 70 in this
order, in an empty hash table of size 13
(a) using double hashing with the first hash function:
h(key) = key % 13
and the second hash function:
g(key) =
1 + key % 12
(b) using double hashing with the first hash function:
h(key) = key % 13
and the second hash function:
g(key) =
7
-
key % 7
Show all computations.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
57
hi(key) = [h(key) + i*g(key)]% 13
h0(18) = (18%13)%13 = 5
h(key) = key % 13
h0(26) = (26%13)%13 = 0
g(key) = 1 + key % 12
h0(35) = (35%13)%13 = 9
h0(9) = (9%13)%13 = 9 collision
g(9) = 1 + 9%12 = 10
h1(9) = (9 + 1*10)%13 = 6
h0(64) = (64%13)%13 = 12
h0(47) = (47%13)%13 = 8
h0(96) = (96%13)%13 = 5 collision
g(96) = 1 + 96%12 = 1
h1(96) = (5 + 1*1)%13 = 6
collision
h2(96) = (5 + 2*1)%13 = 7
h0(36) = (36%13)%13 = 10
h0(70) = (70%13)%13 = 5
collision
g(70) = 1 + 70%12 = 11
h1(70) = (5 + 1*11)%13 = 3
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
58
hi(key) = [h(key) + i*g(key)]% 13
h(key) = key % 13
h0(18) = (18%13)%13 = 5
h0(26) = (26%13)%13 = 0
g(key) = 7 - key % 7
h0(35) = (35%13)%13 = 9
h0(9) = (9%13)%13 = 9 collision
g(9) = 7 - 9%7 = 5
h1(9) = (9 + 1*5)%13 = 1
h0(64) = (64%13)%13 = 12
h0(47) = (47%13)%13 = 8
h0(96) = (96%13)%13 = 5 collision
g(96) = 7 - 96%7 = 2
h1(96) = (5 + 1*2)%13 = 7
h0(36) = (36%13)%13 = 10
h0(70) = (70%13)%13 = 5
collision
g(70) = 7 - 70%7 = 7
h1(70) = (5 + 1*7)%13 = 12
collision
h2(70) = (5 + 2*7)%13 = 6
59
Probing Techniques - review
0th try
1st try i
2nd try
3rd try
0th try
1st try i
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
0th try
2nd try
2nd try
3rd try
…
…
i
Quadratic probing: Double hashing*:
1st try
3rd try
…
Linear probing:
*(determined by a second
hash function)
60
Rehashing
• If the load factor goes over the safe limit, we should
increase the size of the hash table (as for dynamic
arrays). This process is called rehashing.
• Comments:
– we cannot just double the size of the table, as the
size should be a prime number;
– it will change the main hash function
– it’s not enough to just copy items
• Rehashing will take time O(N)
Rehashing
When the table gets too full, create a bigger
table (usually 2x as large) and hash all the
items from the original table into the new
table.
• When to rehash?
– When load factor reaches some threshold (e.g,.
λ ≥0.5),
– when an insertion fails
– some other threshold
• Cost of rehashing?
62
Rehashing: enlarging the table
• To rehash:
– Create a new table of double the size (adjusting until it is again prime)
– Transfer the entries in the old table to the new table, by recomputing
their positions (using the hash function)
• When should we rehash?
– When the table is completely full
– With quadratic probing, when the table is half-full or insertion fails
• Why double the size?
– If n is the number of elements in the table, there must have been n/2
insertions before the previous rehash (if rehashing done when table
full)
– So by making the table size 2n, a constant cost is added to each
insertion
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
63
Rehashing Example
h(x) = x mod 7
λ = 0.57
h(x) = x mod 17
λ = 0.29
Rehashing
Insert 23
λ = 0.71
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
64
Applications (1)
• Compilers use hash tables to keep track of declared variables
• A hash table can be used for on-line spelling checkers — if
misspelling detection (rather than correction) is important, an
entire dictionary can be hashed and words checked in constant
time
• Game playing programs use hash tables to store seen positions,
thereby saving computation time if the position is encountered
again
• Hash functions can be used to quickly check for inequality — if two
elements hash to different values they must be different
• Storing sparse data
• Useful in applications when the input keys come in sorted order.
This is a bad case for binary search tree. AVL tree and B+-tree
are harder to implement and they are not necessarily more
efficient.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
65
Applications (2)
• Keeping track of customer account information
at a bank
– Search through records to check balances and
perform transactions
• Keep track of reservations on flights
– Search to find empty seats, cancel/modify
reservations
• Search engine
– Looks for all documents containing a given word
66
Performance of Hashing
• The number of probes depends on the load
factor (usually denoted by ) which represents
the ratio of entries present in the table to the
number of positions in the array
• We also need to consider successful and
unsuccessful searches separately
• For a chained hash table, the average number
of probes for an unsuccessful search is  and
for a successful search is 1 + /2
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
67
Performance of Hashing (2)
• For open addressing, the formulae are more complicated but
typical values are:
Load Factor
0.1
0.5
0.8
0.9
0.99
Successful search
Linear Probes
Quadratic Probes
1.05
1.04
1.6
1.5
3.4
2.1
6.2
2.7
21.3
5.2
Unsuccessful search
Linear Probes
1.13
Quadratic probes
1.13
2.7
2.2
15.4
5.2
59.8
11.9
430
126
• Note that these do not depend on the size of the array or the
number of entries present but only on the ratio (the load factor)
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
68
Performance of Hashing (3)
• In the worst case, searches,
insertions and removals on a
hash table take O(n) time
• The worst case occurs when all
the keys inserted into the map
collide
• The load factor  = n/N affects
the performance of a hash table
• Assuming that the hash values
are like random numbers, it can
be shown that the expected
number of probes for an
insertion with open addressing
is
1 / (1  )
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
• In practice, hashing is very
fast provided the load
factor is not close to 100%
• Applications of hash
tables:
– small databases
– compilers
– browser caches
69
Analysis of Hashing with Chaining
• Worst case
– All keys hash into the same bucket
– a single linked list.
– insert, delete, find take O(n) time.
• Average case
– Keys are uniformly distributed into buckets
– O(1+N/B): N is the number of elements in a hash
table, B is the number of buckets.
– If N = O(B), then O(1) time per operation.
– N/B is called the load factor of the hash table.
ΔΠΘ-ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
70
The Efficiency of Hashing
The relative efficiency of four collision-resolution methods
13 B-71
Commonly Used Hash Functions
Suppose that each key is a string. The following C++ function uses the division
method to compute the address of the key:
int hashFunction(char *key, int keyLength)
{
int sum = 0;
for(int j = 0; j <= keyLength; j++)
sum = sum + static_cast<int>(key[j]);
return (sum % HTSize);
}//end hashFunction
Note that static_cast<int> operator truncates key[j] into an integer.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
72
Important Factors When Designing Hash Tables
To Minimize Collisions:
1. Distribute the elements evenly.
–
–
Use a hash function that distributes keys evenly
Make the table size, m, a prime number not near a
power of two if using a division method hash
function
2. Use a load factor, λ = n / m, that’s
appropriate for the implementation.
–
–
1.0 or less for chaining ( i.e., n ≤ m ).
0.5 or less for linear or quadratic probing or double
hashing ( i.e., n ≤ m / 2 )
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
73
Hashing in the real world
• Search time varies, but on average it is
great – no matter how much information!
• Udi Manber (chief scientist,
):
”The most important techniques behind
hashing, hashing,
hashing, and hashing”.
Yahoo! are: hashing,
• Lots of other critical applications in databases,
search engines, algorithms, etc.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
74
Comparison : other representations versus hashing
• Hash tables are very good if there is a need for many
searches in a reasonably stable table
• Hash tables are not so good if there are many
insertions and deletions, or if table traversals are
needed — in this case, AVL trees are better
• If there are more data than available memory then use
a B-tree
• Also, hashing is very slow for any operations which
require the entries to be sorted
– e.g. Find the minimum key
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
75
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
76
Παραδείγματα & Ασκήσεις
ΔΠΘ-ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
77
Example: Collision Resolution
Using the hash function:
h (k)
= 2k mod 10
Insert the numbers:
2, 20, 7, 15, 3, 0, 4, 14
into a one-dimensional
array of 10 elements
using linear probing to
resolve collisions.
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
h (k) = 2k mod 10
2, 20, 7, 15, 3, 0, 4, 14
0
1
20
15
2
3
4
0
5
6
7
3
7
8
9
2
4
14
78
ΑΣΚΗΣΗ - 1
• Give the contents of the hash table that
results when you insert items with the keys :
EASYQUTION
in that order into an initially empty table of M =
5 lists, using separate chaining with unordered
lists. Use the hash function k mod M to
transform the kth letter of the alphabet into a
table index, e.g., hash(I) = hash(9) = 9 % 5 = 4.
ΔΠΘ-ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
79
ΑΣΚΗΣΗ - 2
• Give the contents of the hash table that
results when you insert items with the keys
EASYQUTION
in that order into an initially empty table of
size M = 16 using linear probing.
• Use the hash function 11k mod M to transform
the kth letter of the alphabet into a table
index.
ΔΠΘ-ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
80
ΑΣΚΗΣΗ - 3
•
The following eight strings are to be inserted into an initially empty static
hash table that has 13 available addresses (0 to 12). Use linear probing to
resolve collisions. The value returned when the hash function is applied to
each string is shown in the column labelled Home address.
String
sharks
dragons
eagles
knights
eels
raiders
broncos
warriors
•
Home address
2
7
1
12
0
1
9
12
Draw the hash table after the strings are inserted in the order given.
ΔΠΘ-ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
81
ΛΥΣΗ άσκησης - 3
0 eels
1 eagles
2 sharks
3 tigers [home=1]
4 warriors [home=12]
5
6
7 dragons
8
9 bulldogs
10
11
12 knights
ΔΠΘ-ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
82
ΑΣΚΗΣΗ -4
• Δίνεται ο παρακάτω πίνακας
κατακερματισμού με αλυσίδα :
• Να εισάγετε τον αριθμό 53 στον
πίνακα με βάση τη συνάρτηση
κατακερματισμού που
χρησιμοποιείται.
ΔΠΘ-ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
0
1
2
3
4
5
6
7
8
9
10
23
24
36
1
14
16
17
7
29
31
20
56
42
83
Λύση άσκησης - 4
53 = 4 x 11 + 9
53 mod 11 = 9
0
1
2
3
4
5
6
7
8
9
10
23
24
36
1
56
14
16
17
7
29
31
20
42
ΔΠΘ-ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
0
1
2
3
4
5
6
7
8
9
10
23
24
36
1
14
16
17
7
29
53
20
56
42
31
84
ΑΣΚΗΣΗ - 5
• Ποια είναι η πολυπλοκότητα της μεθόδου
κατακερματισμού με αλυσίδα;
ΔΠΘ-ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
85
ΑΣΚΗΣΗ - 6
• Δίνεται ο πίνακας κατακερματισμού με
τη μέθοδο linear probing:
– Να εισάγετε τον αριθμό 12 στον πίνακα με
βάση τη συνάρτηση κατακερματισμού που
χρησιμοποιείται.
– Στη συνέχεια να αναζητήσετε τον αριθμό
15 (να βρείτε αν υπάρχει η όχι στον
πίνακα)
– Στο τέλος να διαγράψετε τον αριθμό 9
ΔΠΘ-ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
0
1
2
3
4
5
6
7
8
9
10
42
1
24
14
16
28
7
31
9
86
Linear Probing (insert 12)
12 = 1 x 11 + 1
12 mod 11 = 1
0
1
2
3
4
5
6
7
8
9
10
42
1
24
14
16
28
7
31
9
0
1
2
3
4
5
6
7
8
9
10
ΔΠΘ-ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
42
1
24
14
12
16
28
7
31
9
87
Search with linear probing (Search 15)
15 = 1 x 11 + 4
15 mod 11 = 4
0
1
2
3
4
5
6
7
8
9
10
42
1
24
14
12
16
28
7
31
9
NOT FOUND !
ΔΠΘ-ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
88
Deletion with linear probing: Delete 9
9 = 0 x 11 + 9
9 mod 11 = 9
0
1
2
3
4
5
6
7
8
9
10
42
1
24
14
12
16
28
7
31
9
FOUND !
ΔΠΘ-ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
0
1
2
3
4
5
6
7
8
9
10
42
1
24
14
12
16
28
7
31
D
89
ΑΣΚΗΣΗ - 7
h(k,i)=(h(k)+ i2) mod m
Offsets: 0, 1, 4, …
With H=h(k), we try the following
cells with wraparound:
H, H + 12, H + 22, H + 32 …
Insert Keys: 10, 23, 14, 9, 16, 25, 36,
44, 33
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
0
1
2
3
4
5
6
7
8
9
90
Double Hashing: Example
h1(k)=k mod 13
h2(k)=1+(k mod 11)
h(k,i)=(h1(k)+i*h2(k)) mod 13
• Insert key 14:
h1(14,0)=14 mod
h(14,1)=(h1(14)
= (1 +
h(14,2)=(h1(14)
= (1 +
13 = 1
+ h2(14)) mod 13
4) mod 13 = 5
+ 2 h2(14)) mod 13
8) mod 13 = 9
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
0
1
2
3
4
5
6
7
8
9
10
11
12
79
69
98
72
14
50
91
ΑΣΚΗΣΗ – 8
0
1
2
3
4
5
6
7
8
9
Hash Functions:
H(K) = K mod M
G(K)= 1 + ((K/M) mod (M-1))
M =
Insert these values into the hash table in this
order. Resolve any collisions with double hashing:
13, 28, 33, 147, 43
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
92
Review : Open Addressing Facts
• In general, primes give the best table sizes.
• With any open addressing method of collision resolution,
as the table fills, there can be a severe degradation in the table
performance.
• Load factors between 0.6 and 0.7 are common.
• Load factors > 0.7 are undesirable.
• The search time depends only on the load factor, not on the table
size.
• We can use the desired load factor to determine appropriate table
size:
ΔΠΘ - ΜΠΔ:ΔΟΜΕΣ ΔΕΔΟΜΕΝΩΝ & ΑΡΧΕΙΩΝ
93

h + 1

Transcript h + 1

Directory