Java Foundations

Download Report

Transcript Java Foundations

Appendix I
Hashing
Chapter Scope
• Hashing, conceptually
• Using hashes to solve problems
• Hash implementations
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 2
Hashing
• In previous collections, the location of an element in
a collection was either:
– determined by the order in which they were added
(examples)
– determined by comparing some key related to the
element (examples)
• In hashing elements are stored in the collection at a
location determined by applying a hash function to
the value to be stored.
– That is, the elements are stored in a hash table, with their
location determined by a hashing function
– Each location is called a cell or a bucket.
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 3
Idealistically..
• In an ideal world each value would be hashed to
a unique address in a 1-to-1 fashion.
• If this were the case, then the time to
access/store data in a hash table would be O(1)
• Factors to prevent this:
– Less than perfect hash function
– Limitations on the size of the address space
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 4
Example
•
•
•
•
•
Consider an example where we create an array that will
hold 26 elements
To store names, we create a simple hashing function
that associates the first letter of each name to a
separate cell
The first letter of the string determines into which cell
the name is stored.
Because the access time to a particular element is
independent of the number of elements stored all
operations would be O(1).
But this requires that each element map to a unique
position.
– If this is achieved, we have what is called a perfect
hashing function.
• Using our example, under what circumstance
would this be perfect hashing function? Is this
realistic?
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 5
Less than Perfect
• A collision occurs when two or more elements
map to the same location
– For our example, when two names that begin with
the same letter we have a collision.
• Collisions will have to be resolved somehow.
– There are several techniques for storing multiple
elements that map to the same bucket which we look
at later.
• Even if a hashing function isn't perfect, a good
hashing function can still result in O(1)
operations.
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 6
Hash Table Size
• How large should the table be?
• If we have a dataset of size n and a perfect
hashing function, we'd need a table of size n.
• Without a perfect hashing function, a good
guideline is to make the table 150% of the
dataset size.
• But what if we do not know the size of the
dataset?
– We then rely on dynamic resizing – creating a larger
hash table when the demand for space occurs.
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 7
Dynamic Resizing
• Deciding when to resize is key.
• One possibility: when the table is full
– But performance of a hash table seriously degrades
as it becomes full.
• A better approach is to use a load factor – a
percentage of occupancy at which the table will
be resized.
– For example, if the load factor is .5, then the table
would be resized when 50% of the table is filled.
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 8
Hashing Functions
• There are many good approaches to hashing
functions
– The method used in the name example is extraction –
part of an element's key value is used to compute the
location.
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 9
Hashing Function Examples
• Extraction
– Using only a part of the element’s value or key to
compute the location at which to store the element.
• Example on page 1007
• Extract the first character of the value and calculate it’s
offset from the letter ‘A’ to determine its location.
– ‘A’ maps to 0; ‘B’ maps to 1, etc.
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 10
Hashing Function Examples
• Division
– In this approach, the index is calculated as the remainder of
the key divided by some positive integer p.
– For a positive integer p, the result will be in the range 0 to p-1.
• Hashcode(key) = Math.abs(key) % p
– Since this yields 0 to p-1 location indices, we use the table size as p.
– Example: Apply the hash function to a Key Value = 79 with a
table size of 43:
– Hash Table Index = Math.abs(79) % 43
= 36
– Good idea: Using a prime number p as the table size, i.e. the divisor, can provide a better
distribution of keys across the address space 0 to p-1.
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 11
Hashing Function Examples
• Folding
– In this approach, the index is created by dividing the key into parts
and then combining or folding the parts together.
– In general, the parts have the same length as the desired index
except for perhaps the last part. If the first folding does not result in
an index within the desired range, a use either extraction or division
to yield a smaller index.
– Shift folding
• The parts are added together to create the index
– Key = 987-65-4321
– Hash Table Index = 987 + 654 + 321 => 1962
– Boundary folding
• A slight variation of shift folding where some of the parts of the key are
reversed before adding
– Key = 987-65-4321
– 987 + 654 + 321
– Hash Table Index = 987 + 456 + 321 => 1764
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 12
Hashing Function Examples
• Mid-Square Method
• In the mid-square method, the index is
calculated by multiplying the key by itself and
then using the extraction method (from the
middle)
• For example, for key of 4321:
• Product = 4321 * 4321 = 18671041
• Extract three digits from the middle: 710
• It is important that the same three digits be
extracted each time
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 13
More Functions
• Review the rest of these functions.
• In class, we go to slide 18
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 14
Hashing Function Examples
• Radix Transformation method
– Transform the key into another numeric base
– If our key is 23 in base 10, we might convert it to 32 in
base 7
– Then we use the division method and divide the
converted key by the table size and use the remainder as
the index
• Example: Hashcode(23)
– Convert the key 23, which is in base 10 to base 7
» 23 base 10 is 32 in base 7
– Use division method to convert to index
» Hash Table Index = Math.abs(32) % 17
= 15
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 15
Hashing Functions
• In the digit analysis method, the index is formed
by extracting and then manipulating specific
digits from the key
• If the key is 1234567, we might select the digits
in positions 2 through 4 yielding 234
• The manipulation could then take many forms:
– reversing the digits (432)
– performing a circular shift (423)
– swapping each pair of digits (324)
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 16
Hashing Functions
• In the length-dependent method, the key and the
length of the key are combined in some way to
form either the index itself or an intermediate
version
• If our key is 8765, we might multiply the first two
digits by the length and then divide by the last
digit, yielding 69
• If our table size is 43, we would then use the
division method to yield an index of 26
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 17
Java Hashing Functions
• Java.lang.Object hashcode method
– Returns an integer based on the memory location of
the object
• This is generally not useful, but ensures that all
objects have a hashcode method
• A class may override the inherited version of
hashcode to provide their own
• The String and Integer classes define their
own hashcode methods
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 18
Resolving Collisions
• As mentioned, without a perfect hashing
function, collisions must be resolved
• There are several techniques for this as well
• Chaining
– Treat the table as an array of linked lists
• Open Addressing
– linear probing
– quadratic probing
– double hashing
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 19
Chaining with Links or Overflow Area
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 20
Open Addressing
• The open addressing method addresses collisions
by looking for another unused position in the
table
– The simplest approach is linear probing.
• One problem with linear probing is the development of
clusters of occupied cells.
• There are other approaches that address the
issue of clustering.
– quadratic probing
– double hashing
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 21
Open Addressing
• Linear probing – if an element hashes to
position p and that position is occupied,
try position (p+1)%s where s is the size of
the table
• One problem with linear probing is the
development of clusters of occupied cells
• Example: Ann, Andrew, Bob, Doug,
Elizabeth, Betty, Barbara, Hal, Bill.
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 22
Open Addressing
•
Quadratic Probing yields a better distribution
–
newhascode(x) =
hashcode(x) + (-1) ^(i-1) * ((i+1) /2) ^ 2
–
Results in the search sequence: p, p+1, p-1,
p+4, p -4, p+9, p-9, p+16, p-16, …
–
Example: Amanda, Adam, Belinda, David,
Edward, Mei, Bill, Bart, Tim, Howard,
Bolton (Exercise Handout)
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
i
-1 ^ i-1
(i+1)/2 ^ 2
_________________________________________
1
1.0
1.0
2
-1.0
1.0
3
1.0
4.0
4
-1.0
4.0
5
1.0
9.0
6
-1.0
9.0
7
1.0
16.0
8
-1.0
16.0
9
1.0
25.0
10
-1.0
25.0
11
1.0
36.0
12
-1.0
36.0
13
1.0
49.0
14
-1.0
49.0
15
1.0
64.0
16
-1.0
64.0
17
1.0
81.0
18
-1.0
81.0
19
1.0
100.0
20
-1.0
100.0
21
1.0
121.0
22
-1.0
121.0
21 - 23
Deleting Elements
• Removing an element from a chained implementation falls into one of
five cases
– if the element is uniquely mapped, simply remove it
– if the element is stored in the table but has an index into an overflow area, replace the
element and the next index value in the table with the element and next index value of
the array position pointed to by the element to be removed
– if the element is at the end of the list of elements, set its position to null, set the next
pointer of the previous element to null, and add that position to the overflow
• Two more cases for chaining:
– if the element is in the middle of the list, set its position in the overflow to
null, and reset the pointer of the previous element to skip it
– if the element is not in the table, throw an exception
• When using open addressing, deletion creates more of a challenge
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 24
Java Collections Hash Tables
• The Java Collections API provides seven
implementations of hashing
• Three of these are:
– Hashtable – Key-Value Pairs, the oldest
class, synchronized.
– HashMap- Key-Value Pairs,
unsynchronized, permits null values
– HashSet –Values only which are unique,
unsynchronized, permits null values
– Note: The chaining method is used to
resolve collisions.
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
21 - 25
The Hashtable Class
The Hashtable Class (cont)
20.5 – The HashSet Class
• A set is a collection of elements with no duplicates.
• No other relationship among the elements should be assumed.
20.5 – The HashMap Class
• A map is a collection that establishes a relationship between keys and values
• The goal is to have an efficient way of obtaining a value given its key
• Keys of a map must be unique, but multiple keys could map to the same
object
Note on Implementing Sets and Maps
• Note that in addition to a Hashtable based
implementation, the Java API provides other
implementations for the Sets and Maps.
– Specifically, the TreeSet and TreeMap classes
which use an underlying tree to hold the elements of
the set or map.
• These trees are binary search trees, and in particular are
red-black balanced trees
• All of the basic operations are performed with O(log n)
efficiency
Java Foundations, 3rd Edition, Lewis/DePasquale/Chase
22 - 30