PowerPoint slides
Download
Report
Transcript PowerPoint slides
Grokking Hash Tables
A hash table is…
Just a data structure that’s used to establish a
mapping between arbitrary objects
Not to be confused with Map:
An interface that specifies methods that a
program could call to get/set/query key/value
mappings
Basically, defines what mappings are
A hash table is one way (but not the only) to
make a Map
And your MondoHashTable is just
implementation of that hash table data struct
that happens to match the Map interface
Example:
/**
* MondoHashTable is a hash table based
* implementation of the Map interface
*/
import java.util.Map;
public class MondoHashTable implements Map
{
…
}
So what’s a mapping?
A mapping is just a pair relationship between
two things
In math, we write a mapping m: xy to denote
that m establishes relationships between things
of type x and things of type y
Big restriction: every unique x must map to a
single y
Essentially: every left hand side has one and
only one right hand side
Examples of mappings
376.0827
(abs(sqrt(37)))
79609 “Prof Lane’s office phone”
123456789 studentRecord(“J. Student”)
-6.0827 36.999
studentRecord(“J. Student”) gradeSet()
largeFunkyDataObject()
otherObject()
“nigeria” 114
“cs35” 12
Left side is the key; right side is the value
Small integers are easy…
double[] mapArray=new double[200];
mapArray[37]=Math.sqrt(37);
37
6.0827
What about non-integers?
Desire: have a table that gives quick lookup for
arbitrary objects
Doesn’t require vast space
Answer: introduce an intermediate step
Hash function turns key into hash code
Use hash code to look up key/val pair in table
table[h(key)]=<key,value>
The big picture…
MondoHashTable
“nigeria”
get(“nigeria”)
h(“nigeria”)
1945462417
nigeria
114
Some practical questions
1.9 billion? Isn’t that a little much?
Some practical questions
1.9 billion? Isn’t that a little much?
A: reduce mod table size
int h=hashFunction(Object a) % table.length;
Some practical questions
1.9 billion? Isn’t that a little much?
A: reduce mod table size
int h=hashFunction(Object a) % table.length;
Where do the hash functions come from?
Some practical questions
1.9 billion? Isn’t that a little much?
A: reduce mod table size
int h=hashFunction(Object a) % table.length;
Where do the hash functions come from?
A: java.lang.Object.hashCode()
Some practical questions
1.9 billion? Isn’t that a little much?
A: reduce mod table size
int h=hashFunction(Object a) % table.length;
Where do the hash functions come from?
A: java.lang.Object.hashCode()
How big should the table be, initially?
Some practical questions
1.9 billion? Isn’t that a little much?
A: reduce mod table size
int h=hashFunction(Object a) % table.length;
Where do the hash functions come from?
A: java.lang.Object.hashCode()
How big should the table be, initially?
Good choice: pick a prime #
Ask the user (arg to constructor)
Some practical questions
1.9 billion? Isn’t that a little much?
A: reduce mod table size
int h=hashFunction(Object a) % table.length;
Where do the hash functions come from?
A: java.lang.Object.hashCode()
How big should the table be, initially?
Good choice: pick a prime #
Ask the user (arg to constructor)
What happens if the table gets too full?
Some practical questions
1.9 billion? Isn’t that a little much?
A: reduce mod table size
int h=hashFunction(Object a) % table.length;
Where do the hash functions come from?
A: java.lang.Object.hashCode()
How big should the table be, initially?
Good choice: pick a prime #
Ask the user (arg to constructor)
What happens if the table gets too full?
A: resize it!
#1 killer question: Collisions
What happens if
Depends…
If a.equals(b), then these are the same key
If not…
This is a hash collision
Basically, you have two different keys pointing at
the same location in the hash table
Have to resolve this somehow -- find unique
storage for every key and don’t lose anything
(a.hashCode()%tSz)==(b.hashCode()%tSz)
Collision strategy 1: Chaining
Make each cell in the hash table a “bucket”
containing multiple key/value pairs
h(“nigeria”)
nigeria
114
h(“viagra”)
viagra
29
Collision strategy 2: Open addressing
Each cell of the table actually holds a key/value
pair
When you have a collision, rehash to find a new
location for the new pair
Linear probing: try next cell in line
Quadratic probing: try cell h+1, then h+4,
h+9, h+16, h+15 …
Double hashing:
h(k)=(h1(k)+i*h2(k)) mod table.size()
Repeat probes until you find an empty spot
Map.keySet()
A common operation on hash tables (Maps): get
all of the keys
You’ll probably use this in the “dump”
functionality of SpamBGon
Map requires:
Set keySet(): Returns a set view
of the keys contained in this
map.
What’s a view?
Views of data
Different interface to the same underlying data
Doesn’t copy the data -- just provides new
methods for accessing it
A “set view of the keys”, then, is an object that
behaves like (i.e., implements) a set, but
gives you access to all of the keys of the hash
table.
The set view picture
MondoHashTable
Set
size()
contains()
iterator()
keySet()