Ch25 - Skylight Publishing
Download
Report
Transcript Ch25 - Skylight Publishing
Java Methods
Object-Oriented Programming
and Data Structures
2nd AP edition with GridWorld
Maria Litvin ● Gary Litvin
Chapter
Hash
Function
Danger
Keep Out
25
Lookup Tables and Hashing
Copyright © 2011 by Maria Litvin, Gary Litvin, and Skylight Publishing. All rights reserved.
Objectives:
• Learn about lookup tables
• Learn about hashing
• Review java.util.HashSet and
java.util.HashMap
25-2
Lookup Tables
• A lookup table is an array that helps to find
data very quickly.
• The array stores references to data records
(or some values).
• A data record is identified by some key.
• The value of a key is directly translated into
an array index using a simple formula.
25-3
Lookup Tables (cont’d)
• Only one key can be mapped onto a
particular index (no collisions).
• The index that corresponds to a key must fall
into the valid range (from 0 to array.length-1).
• Access to data is “instantaneous” (O(1)).
25-4
Lookup Tables: Example 1
Zip
codes
0
1
...
600
601
...
1004
1005
1006
1007
1008
1009
...
99950
99951
...
99998
99999
<null>
<null>
Corresponding
locales
<null>
Adjuntas, PR
Amherst, MA
Barre, MA
<null>
Belchertown, MA
Blanford, MA
Bondsville, MA
Some table entries
remain unused
Ketchikan, AK
<null>
<null>
<null>
25-5
Lookup Tables: Example 2
private static final int [ ] n_thPowerOf3 =
{ 1, 3, 9, 27, 81, 243, 729, 2187, 6561, 19683 };
...
// precondition: 0 <= n < 10
public int powOf3 (int n)
{
return n_thPowerOf3 [ n ];
}
25-6
Lookup Tables: Example 3
256 colors used
in a particular
image; each of
the palette entries
corresponds to a
triplet of RGB
values
25-7
Applications of Lookup Tables
•
•
•
•
Data retrieval
Data compression and encryption
Tabulating functions
Color mapping
25-8
Hash Tables
• A hash table is similar to a lookup table.
• The value of a key is translated into an array
index using a hash function.
• The index computed for a key must fall into
the valid range.
• The hash function can map different keys
onto the same array index — this situation is
called a collision.
25-9
Hash Tables (cont’d)
• The hash function should map the keys onto
the array indices randomly and uniformly.
• A well-designed hash table and hash function
minimize the number of collisions.
• There are two common techniques for
resolving collisions: chaining and probing.
25-10
Each element in the
array is itself a collection,
called a bucket (a list or
a BST), which is
searched for the desired
key
Chaining
Hash
Function
"Peach"
126
Danger
Keep Out
122
123
124
125
126
127
"Peach"
Buckets
25-11
If the place where we want to store
the key is occupied by a different
key, we store the former in another
location in the same array, computed
using a certain probing formula
Probing
"Peach"
Hash
Function
126
Danger
Keep Out
The probing function
recalculates the
index
Peach?
125
126
127
128
Apple Banana Lemon Orange
Nope...
Nope...
129
130
Peach
Pl
Yes!
25-12
java.util.HashSet<E> and
java.util.HashMap<K,V> Classes
• These classes implement the Set<E>
and Map<K,V> interfaces, respectively,
using hash tables (with chaining).
• This implementation may be more
efficient than TreeSet and TreeMap.
25-13
HashSet and HashMap (cont’d)
• Collisions are resolved through chaining.
• The sizes of buckets must remain relatively
small.
• Load factor:
Load factor =
Total number of items
Number of buckets
25-14
HashSet and HashMap (cont’d)
• Fine tuning:
Load factor too large lots of collisions
Load factor too small wasted space and slow
iterations over the whole set
• If the load factor exceeds the specified limit,
the table is automatically rehashed into a
larger table; if possible this should be
avoided.
25-15
HashSet and HashMap (cont’d)
• Objects in a HashSet or keys in a HashMap
must have a reasonable int hashCode
method that overrides Object’s hashCode and
helps calculate the hashing function.
• The hashCode method returns an int from the
entire int range; it is later mapped on the
range of indices in a particular hash table.
• String, Integer, Double each have a
reasonable hashCode defined.
25-16
hashCode Examples
• For String:
hashCode s0 31n1 s1 31n2 sn1
(where si is Unicode for the i-th character in the
string)
• For Person:
public int hashCode ( )
{
return getFirstName( ).hashCode( ) +
getLastName( ).hashCode( );
}
25-17
Consistency
• HashSet / HashMap first use hashCode, then
equals.
• TreeSet / TreeMap use only compareTo (or a
comparator)
• For consistent performance, these methods
should agree with each other:
x.equals (y) x.compareTo (y) == 0
x.equals (y) x.hashCode( ) == y.hashCode( )
25-18
HashSet<E> Constructors
Never mind...
25-19
HashMap<K,V> Constructors
25-20
Review:
• What is the main difference between a lookup
table and a hash table?
• What is a collision?
• Name two common techniques for resolving
collisions.
• What is a bucket?
• What is a load factor?
25-21
Review (cont’d):
• How is hash table performance affected when
the load factor is too high? Too low?
• What happens to a HashSet or a HashMap
when the load factor exceeds the specified
limit?
• HashSet’s no-args constructor sets the initial
capacity to 16 and the load factor limit to
0.75. How many values can be stored in this
table before it is rehashed?
25-22
Review (cont’d):
• What is the sequence of values returned by
an iterator for a HashSet?
• What is the range of values for the hashCode
method?
• Which method(s) of an object are used to find
it in a TreeSet? A HashSet?
25-23