Transcript 08_dict
Data Structures
Hash Table (aka Dictionary)
i206
Fall 2010
John Chuang
Some slides adapted from Marti Hearst, Brian Hayes, Andreas Veneris, Glenn Brookshear,
Nicolas Christin, or Ion Stoica.
Outline
What is a data structure
Basic building blocks: arrays and linked lists
Data structures (uses, methods, performance):
-
List, stack, queue
Dictionary
Tree
Graph
John Chuang
2
Dictionary
Also known as hash table, lookup table,
associative array, or map
A searchable collection of key-value pairs
- each key is associated with one value
Many possible applications, e.g.,
-
Address book
Student record database
Routing tables
…
John Chuang
3
Dictionary Methods
get(k): if the dictionary has an entry with key k,
return its associated value; else return null
put(k,v): insert entry (k,v) into the dictionary; if
key is not already in dictionary, return null; else
return old value associated with k
remove(k): if dictionary has an entry with key k,
remove it from dictionary and return its
associated value; else return null
size()
isEmpty()
John Chuang
4
Dictionary in Python
Example:
>>> agents = {‘006’:’Jack Giddings’, ‘007’:’James Bond’}
>>> agents[‘001’] = ’Edward Donne’
>>> agents[‘007’]
‘James Bond’
John Chuang
5
Desirable Properties
Fast search, insert and delete (get, put, and
remove)
Efficient space usage
Arrays: O(1) operations; not space efficient
Linked lists: O(n) operations; space efficient
Binary trees: O(log n) operations; space efficient
Can we do better?
John Chuang
6
Hash Table
A generalization of an array that, if properly
designed, can realize fast insert/delete/search
operations with good space efficiency
- average run-time of O(1)
- worst case run-time of O(n)
A hash table is:
- Good for storing and retrieving key-value pairs
- Not good for iterating through a list of items
John Chuang
7
Hash Table
A hash table consists of two major components:
- Bucket array (for storing entries)
- Hash function (for mapping keys to buckets)
hash value/index
table
John Chuang
7
6
5
4
3
2
1
0
buckets
obj1
key=15
Obj3
key=4
Obj2
key=36
Obj4
key=2
Obj5
key=1
8
Hash Table Design
Bucket array is an array A of size N, where each
cell of A is considered a “bucket”
Hash function h maps each key k to an integer
in the range [0, N -1]
Store entry (k,v) in the bucket A[h(k)]
Search for entry (k,v) in the bucket A[h(k)]
Choose hash function such that
- Can take arbitrary objects (keys) as input
- Hash computation is fast
- Key mappings evenly distributed across [0, N -1]
John Chuang
9
Example Hash Function
h(k) = k mod N
mod stands for modulo, the remainder of the division of two
numbers. For example:
-
8 mod
9 mod
10 mod
15 mod
5=3
5=4
5=0
5=0
Observe that collisions are possible:
- two different keys hash to the same value
John Chuang
10
Example
h(k) = k mod 5
Insert (2,x)
Insert (21,y)
key value
0
1
2
2
x
Insert (34,z)
key value
key value
0
1 21 y
0
1 21 y
There is a
2
2
array entry #4
2
x
2
x
3
3
3
4
4
4 34 z
John Chuang
Insert (54,w)
collision at
???
11
Dealing with Collisions
Hashing with Chaining: every hash table entry
contains a pointer to a linked list of keys that
hash in the same entry
Insert (101,x)
Insert (54,w)
key
0
1
2
3
value
21
y
2
x
0
1
2
3
101 x
2
x
54
w
21
y
4
4
54
w
34 z
34 z
CHAIN
John Chuang
12
Dictionary Performance
What is the run time for insert/search/delete?
- Insert: It takes O(1) time to compute the hash function and insert at
head of linked list
- Search: It is proportional to the length of the linked list
- Delete: Same as search
Insert (101,x)
Insert (54,w)
key
0
1
2
3
value
21
y
2
x
0
1
2
3
101 x
2
x
54
w
21
y
4
4
54
w
34 z
34 z
CHAIN
John Chuang
13
Load Factor
Average length of the linked lists is a function of the
load factor
- Load factor = number of items in hash table / array size
In Python, the implementation details are transparent to
the programmer (load factor kept under 2/3)
In Java, the programmer can set the initial table size
(default=16) and/or the load factor (default = 0.75)
John Chuang
14
Distributed Hash Table (DHT)
Similar to traditional hash table data structure, except
data is stored in distributed peer nodes
- Each node is analogous to a bucket in a hash table.
Applications: distributed search, e.g., peer-to-peer
networks, content distribution networks (CDNs), etc.
DHTs are typically designed to scale to large numbers of
nodes and to handle continual node arrivals and
departures/failures.
Put(), Get() interface like a regular hash table:
- put(id, item);
- item = get(id);
John Chuang
15
DHT Example: Chord
Finger Table
i
i id+2
0 1
1 2
2 4
Nodes: n1(id=1),
n2(id=2), n3(id=0),
n4(id=6)
Items inserted:
f1(id=7), f2(id=1)
Finger Table
i id+2i succ
0 7
0
1 0
0
2 2
2
0
Finger Table Items
i id+2i succ 1
0 2
2
1 3
6
2 5
6
1
7
6
2
Finger Table
5
3
4
John Chuang
Items
7
succ
1
2
6
i id+2i succ
0 3
6
1 4
6
2 6
6
Slide adapted from Ion Stoica, Nicolas Christin
16
DHT Example: Chord
Upon receiving a query for
item id, a node:
Check if the item is stored
locally
If not, forwards the query
to the largest node in its
successor table that does
not exceed id
Finger Table
i id+2i succ
0 7
0
1 0
0
2 2
2
Finger Table
i
i id+2
0 1
1 2
2 4
0
Finger Table Items
i id+2i succ 1
0 2
2
1 3
6
2 5
6
1
7
query(7)
6
2
Finger Table
5
3
4
John Chuang
Items
7
succ
1
2
6
i id+2i succ
0 3
6
1 4
6
2 6
6
Slide adapted from Ion Stoica, Nicolas Christin
17