PowerPoint slides

Download Report

Transcript PowerPoint slides

Grokking Hash Tables
A hash table is…


Just a data structure that’s used to establish a
mapping between arbitrary objects
Not to be confused with Map:
An interface that specifies methods that a
program could call to get/set/query key/value
mappings
 Basically, defines what mappings are
 A hash table is one way (but not the only) to
make a Map
And your MondoHashTable is just
implementation of that hash table data struct
that happens to match the Map interface


Example:
/**
* MondoHashTable is a hash table based
* implementation of the Map interface
*/
import java.util.Map;
public class MondoHashTable implements Map
{
…
}
So what’s a mapping?



A mapping is just a pair relationship between
two things
In math, we write a mapping m: xy to denote
that m establishes relationships between things
of type x and things of type y
Big restriction: every unique x must map to a
single y

Essentially: every left hand side has one and
only one right hand side
Examples of mappings









376.0827
(abs(sqrt(37)))
79609  “Prof Lane’s office phone”
123456789  studentRecord(“J. Student”)
-6.0827  36.999
studentRecord(“J. Student”)  gradeSet()
largeFunkyDataObject() 
otherObject()
“nigeria”  114
“cs35”  12
Left side is the key; right side is the value
Small integers are easy…
double[] mapArray=new double[200];
mapArray[37]=Math.sqrt(37);
37
6.0827
What about non-integers?




Desire: have a table that gives quick lookup for
arbitrary objects
Doesn’t require vast space
Answer: introduce an intermediate step
 Hash function turns key into hash code
 Use hash code to look up key/val pair in table
table[h(key)]=<key,value>
The big picture…
MondoHashTable
“nigeria”
get(“nigeria”)
h(“nigeria”)
1945462417
nigeria
114
Some practical questions

1.9 billion? Isn’t that a little much?
Some practical questions

1.9 billion? Isn’t that a little much?
 A: reduce mod table size

int h=hashFunction(Object a) % table.length;
Some practical questions

1.9 billion? Isn’t that a little much?
 A: reduce mod table size


int h=hashFunction(Object a) % table.length;
Where do the hash functions come from?
Some practical questions

1.9 billion? Isn’t that a little much?
 A: reduce mod table size


int h=hashFunction(Object a) % table.length;
Where do the hash functions come from?
 A: java.lang.Object.hashCode()
Some practical questions

1.9 billion? Isn’t that a little much?
 A: reduce mod table size



int h=hashFunction(Object a) % table.length;
Where do the hash functions come from?
 A: java.lang.Object.hashCode()
How big should the table be, initially?
Some practical questions

1.9 billion? Isn’t that a little much?
 A: reduce mod table size



int h=hashFunction(Object a) % table.length;
Where do the hash functions come from?
 A: java.lang.Object.hashCode()
How big should the table be, initially?
 Good choice: pick a prime #
 Ask the user (arg to constructor)
Some practical questions

1.9 billion? Isn’t that a little much?
 A: reduce mod table size




int h=hashFunction(Object a) % table.length;
Where do the hash functions come from?
 A: java.lang.Object.hashCode()
How big should the table be, initially?
 Good choice: pick a prime #
 Ask the user (arg to constructor)
What happens if the table gets too full?
Some practical questions

1.9 billion? Isn’t that a little much?
 A: reduce mod table size




int h=hashFunction(Object a) % table.length;
Where do the hash functions come from?
 A: java.lang.Object.hashCode()
How big should the table be, initially?
 Good choice: pick a prime #
 Ask the user (arg to constructor)
What happens if the table gets too full?
 A: resize it!
#1 killer question: Collisions

What happens if


Depends…
 If a.equals(b), then these are the same key
If not…
This is a hash collision
Basically, you have two different keys pointing at
the same location in the hash table
Have to resolve this somehow -- find unique
storage for every key and don’t lose anything




(a.hashCode()%tSz)==(b.hashCode()%tSz)
Collision strategy 1: Chaining

Make each cell in the hash table a “bucket”
containing multiple key/value pairs
h(“nigeria”)
nigeria
114
h(“viagra”)
viagra
29
Collision strategy 2: Open addressing


Each cell of the table actually holds a key/value
pair
When you have a collision, rehash to find a new
location for the new pair
 Linear probing: try next cell in line
 Quadratic probing: try cell h+1, then h+4,
h+9, h+16, h+15 …
 Double hashing:
h(k)=(h1(k)+i*h2(k)) mod table.size()

Repeat probes until you find an empty spot
Map.keySet()



A common operation on hash tables (Maps): get
all of the keys
 You’ll probably use this in the “dump”
functionality of SpamBGon
Map requires:
 Set keySet(): Returns a set view
of the keys contained in this
map.
What’s a view?
Views of data



Different interface to the same underlying data
Doesn’t copy the data -- just provides new
methods for accessing it
A “set view of the keys”, then, is an object that
behaves like (i.e., implements) a set, but
gives you access to all of the keys of the hash
table.
The set view picture
MondoHashTable
Set
size()
contains()
iterator()
keySet()