Transcript 08_dict

Data Structures
Hash Table (aka Dictionary)
i206
Fall 2010
John Chuang
Some slides adapted from Marti Hearst, Brian Hayes, Andreas Veneris, Glenn Brookshear,
Nicolas Christin, or Ion Stoica.
Outline
 What is a data structure
 Basic building blocks: arrays and linked lists
 Data structures (uses, methods, performance):
-
List, stack, queue
Dictionary
Tree
Graph
John Chuang
2
Dictionary
 Also known as hash table, lookup table,
associative array, or map
 A searchable collection of key-value pairs
- each key is associated with one value
 Many possible applications, e.g.,
-
Address book
Student record database
Routing tables
…
John Chuang
3
Dictionary Methods
 get(k): if the dictionary has an entry with key k,
return its associated value; else return null
 put(k,v): insert entry (k,v) into the dictionary; if
key is not already in dictionary, return null; else
return old value associated with k
 remove(k): if dictionary has an entry with key k,
remove it from dictionary and return its
associated value; else return null
 size()
 isEmpty()
John Chuang
4
Dictionary in Python
 Example:
>>> agents = {‘006’:’Jack Giddings’, ‘007’:’James Bond’}
>>> agents[‘001’] = ’Edward Donne’
>>> agents[‘007’]
‘James Bond’
John Chuang
5
Desirable Properties
 Fast search, insert and delete (get, put, and
remove)
 Efficient space usage




Arrays: O(1) operations; not space efficient
Linked lists: O(n) operations; space efficient
Binary trees: O(log n) operations; space efficient
Can we do better?
John Chuang
6
Hash Table
 A generalization of an array that, if properly
designed, can realize fast insert/delete/search
operations with good space efficiency
- average run-time of O(1)
- worst case run-time of O(n)
 A hash table is:
- Good for storing and retrieving key-value pairs
- Not good for iterating through a list of items
John Chuang
7
Hash Table
 A hash table consists of two major components:
- Bucket array (for storing entries)
- Hash function (for mapping keys to buckets)
hash value/index
table
John Chuang
7
6
5
4
3
2
1
0
buckets
obj1
key=15
Obj3
key=4
Obj2
key=36
Obj4
key=2
Obj5
key=1
8
Hash Table Design
 Bucket array is an array A of size N, where each
cell of A is considered a “bucket”
 Hash function h maps each key k to an integer
in the range [0, N -1]
 Store entry (k,v) in the bucket A[h(k)]
 Search for entry (k,v) in the bucket A[h(k)]
 Choose hash function such that
- Can take arbitrary objects (keys) as input
- Hash computation is fast
- Key mappings evenly distributed across [0, N -1]
John Chuang
9
Example Hash Function
 h(k) = k mod N
 mod stands for modulo, the remainder of the division of two
numbers. For example:
-
8 mod
9 mod
10 mod
15 mod
5=3
5=4
5=0
5=0
 Observe that collisions are possible:
- two different keys hash to the same value
John Chuang
10
Example
 h(k) = k mod 5
Insert (2,x)
Insert (21,y)
key value
0
1
2
2
x
Insert (34,z)
key value
key value
0
1 21 y
0
1 21 y
There is a
2
2
array entry #4
2
x
2
x
3
3
3
4
4
4 34 z
John Chuang
Insert (54,w)
collision at
???
11
Dealing with Collisions
 Hashing with Chaining: every hash table entry
contains a pointer to a linked list of keys that
hash in the same entry
Insert (101,x)
Insert (54,w)
key
0
1
2
3
value
21
y
2
x
0
1
2
3
101 x
2
x
54
w
21
y
4
4
54
w
34 z
34 z
CHAIN
John Chuang
12
Dictionary Performance
 What is the run time for insert/search/delete?
- Insert: It takes O(1) time to compute the hash function and insert at
head of linked list
- Search: It is proportional to the length of the linked list
- Delete: Same as search
Insert (101,x)
Insert (54,w)
key
0
1
2
3
value
21
y
2
x
0
1
2
3
101 x
2
x
54
w
21
y
4
4
54
w
34 z
34 z
CHAIN
John Chuang
13
Load Factor
 Average length of the linked lists is a function of the
load factor
- Load factor = number of items in hash table / array size
 In Python, the implementation details are transparent to
the programmer (load factor kept under 2/3)
 In Java, the programmer can set the initial table size
(default=16) and/or the load factor (default = 0.75)
John Chuang
14
Distributed Hash Table (DHT)
 Similar to traditional hash table data structure, except
data is stored in distributed peer nodes
- Each node is analogous to a bucket in a hash table.
 Applications: distributed search, e.g., peer-to-peer
networks, content distribution networks (CDNs), etc.
 DHTs are typically designed to scale to large numbers of
nodes and to handle continual node arrivals and
departures/failures.
 Put(), Get() interface like a regular hash table:
- put(id, item);
- item = get(id);
John Chuang
15
DHT Example: Chord
Finger Table
i
i id+2
0 1
1 2
2 4
 Nodes: n1(id=1),
n2(id=2), n3(id=0),
n4(id=6)
 Items inserted:
f1(id=7), f2(id=1)
Finger Table
i id+2i succ
0 7
0
1 0
0
2 2
2
0
Finger Table Items
i id+2i succ 1
0 2
2
1 3
6
2 5
6
1
7
6
2
Finger Table
5
3
4
John Chuang
Items
7
succ
1
2
6
i id+2i succ
0 3
6
1 4
6
2 6
6
Slide adapted from Ion Stoica, Nicolas Christin
16
DHT Example: Chord
 Upon receiving a query for
item id, a node:
 Check if the item is stored
locally
 If not, forwards the query
to the largest node in its
successor table that does
not exceed id
Finger Table
i id+2i succ
0 7
0
1 0
0
2 2
2
Finger Table
i
i id+2
0 1
1 2
2 4
0
Finger Table Items
i id+2i succ 1
0 2
2
1 3
6
2 5
6
1
7
query(7)
6
2
Finger Table
5
3
4
John Chuang
Items
7
succ
1
2
6
i id+2i succ
0 3
6
1 4
6
2 6
6
Slide adapted from Ion Stoica, Nicolas Christin
17