Transcript Document
Distributed Hash tables
1
Overview
Objective
A distributed lookup service
Data items are distributed among n parties
Anyone in the network can find an item
efficiently
No central node or infrastructure
Applications:
P2P file sharing, without central node (items
are files)
Large scale, distributed or grid computation
(items are computational tasks)
2
Chord
Uses consistent hashing to map keys to
nodes.
Enables
Lookup
Node joining the network
Node leaving the network
Each node maintains routing information on
the network.
3
Consistent hashing
Consistent hash function assigns each node and
key an m-bit identifier.
A node is a member of the network: a processing
and communication unit.
A key identifies a data item:
Example: movie name
A node’s identifier can be defined by e.g. hashing
the node’s IP address.
An identifier for node or key can be produced by
hashing
ID(node) = hash(IP address)
ID(key) = hash(key)
4
Consistent hashing (cont.)
Properties of consistent hashing:
Load balancing – given N nodes, with high probability
O(1/N) of the keys are stored in each node.
When the N-th node joins, with high probability O(1/N)
of the keys are moved.
Practical problems with identifiers
IP is not always a good identifier:
• NAT
• DHCP
Key is not always well defined (e.g. file name).
5
Chord ring
Chord orders all m-bit identifiers (both nodes and
keys) on ring of 2m elements
Order is defined modulo 2m (2m-1<0)
Identifier space for keys – {0,…,2m-1}
Not all keys need be in use concurrently
Identifier space for nodes – {0,…,2m-1}
Not all nodes are necessarily active
concurrently
Key k is assigned to node n if
id(n)=min {id (n’); id(n’)id(k)} ( is interpreted
modulo 2m, i.e. first node clockwise).
n is called successor(k)
6
Successor Nodes
identifier
node
6
1
0
successor(6) = 0
6
identifier
circle
6
5
key
successor(1) = 1
1
7
X
2
2
successor(2) = 3
3
4
2
m=3
7
More on hashing
Desirable properties for hash function:
Collision resistant on nodes (mandatory)
Collision resistant on keys (hopefully)
Load balancing
Good adversarial behavior.
Chord suggestion
SHA-1
ID(node) = SHA-1(IP address)
ID(key) = SHA-1(key)
Drawback
Full SHA-1 implies m=160
Truncated SHA-1 may not be collision-resistant
8
Key lookup
Suppose each node stores successor(node)
We could use the following pseudo-code
n.find_successor(id)
if (id [node, successor])
return successor;
else
// forward the query around the circle
return successor.find_successor(id);
Example of execution
What is the time/message complexity?
9
Key lookup in O(m)
Each node n holds additional routing
information: finger table
m entries (at most)
Each entry is called “finger”
Data in each entry
Finger[i].node=successor (n+2i-1 mod 2m), i=1,…,m
Finger[i] includes TCP/IP access information
such as IP and port.
Denote the i-th finger by node.finger(i)
10
Example – finger tables
finger table
start
int.
1
2
4
[1,2)
[2,4)
[4,0)
1
6
1
3
0
finger table
start
Int.
0
7
succ.
keys
6
2
3
5
[2,3)
[3,5)
[5,1)
succ.
keys
1
3
3
0
2
5
3
4
finger table
start
int.
4
5
7
[4,5)
[5,7)
[7,3)
succ.
keys
2
0
0
0
11
Example – fast lookup
12
Fast lookup - pseudo code
// ask node n to find the successor of id
n.find_successor(id)
if (id (n, successor])
return successor;
else
n’ = closest_preceding_node(id);
return n’.find_successor(id);
// search the local table for the highest predecessor of
id
n.closest_preceding_node(id)
for i = m downto 1
if (finger[i] (n, id))
return finger[i];
return n;
13
Fast lookup - analysis
Time complexity at each node
O(m)
Theorem - Number of nodes in a search
path is O(m) in the worst case
Number of messages is equal to number of
nodes in search path
Theorem – if nodes are distributed
uniformly among identifiers then number
of nodes in search path is O(log n) with
high probability.
14
Fast lookup – analysis (cont.)
Xi – i-th node is in last interval of 2m/N
identifiers in search path.
X=Σxi
=E[X]=1
Chernoff bound –
Pr[X>(1+)]<(e/(1+)
1+)
15
Node Joins and Stabilizations
Nodes join dynamically
Network maintains two invariants
Node successor is up to date.
For every key k, successor(k) is responsible for k.
When node n joins:
n.successor and n.finger[i] are updated.
Successor and finger tables of other nodes are updated.
Correct keys are transferred to n.
Each node runs a “stabilization” protocol
periodically in the background to update successor
pointer and finger table.
16
Node Joins and Stabilizations
“Stabilization” protocol contains 6
functions:
create()
join()
stabilize()
notify()
fix_fingers()
check_predecessor()
Each node has both successor and
predecessor pointers
17
Node Joins – join()
When node n first starts, it calls n.join(n’),
where n’ is any known Chord node.
The join() function asks n’ to find the
immediate successor of n.
join() does not make the rest of the
network aware of n.
18
Node Joins – join()
// create a new Chord ring.
n.create()
predecessor = nil;
successor = n;
// join a Chord ring containing node n’.
n.join(n’)
predecessor = nil;
successor = n’.find_successor(n);
19
Node Joins – stabilize()
Each time node n runs stabilize(), it asks
its successor for the successor’s
predecessor p, and decides whether p
should be n’s successor instead.
stabilize() notifies node n’s successor of n’s
existence, giving the successor the chance
to change its predecessor to n.
The successor does this only if it knows of
no closer predecessor than n.
20
Node Joins – stabilize()
// called periodically. verifies n’s immediate
// successor, and tells the successor about n.
n.stabilize()
x = successor.predecessor;
if (x (n, successor))
successor = x;
successor.notify(n);
// n’ thinks it might be our predecessor.
n.notify(n’)
if (predecessor is nil or n’ (predecessor, n))
predecessor = n’;
21
Node Joins – Join and
Stabilization
ns
n joins
n runs stabilize
np
succ(np) = ns
n
nil
succ(np) = n
pred(ns) = np
pred(ns) = n
predecessor = nil
n acquires ns as successor via some n’
n notifies ns being the new predecessor
ns acquires n as its predecessor
np runs stabilize
np asks ns for its predecessor (now n)
np acquires n as its successor
np notifies n
n will acquire np as its predecessor
all predecessor and successor pointers
are now correct
fingers still need to be fixed, but old
fingers will still work
22
Node Joins – fix_fingers()
Each node periodically calls fix fingers to
make sure its finger table entries are
correct.
New nodes initialize their finger tables
using fix_fingers().
Existing nodes incorporate new nodes into
their finger tables using fix_fingers().
Each node maintains a pointer next into its
finger table.
23
Node Joins – fix_fingers()
// called periodically. refreshes finger table entries.
n.fix_fingers()
next = next + 1 ;
if (next > m)
next = 1 ;
finger[next].node = find_successor(n + 2next-1);
// checks whether predecessor has failed.
n.check_predecessor()
if (predecessor has failed)
predecessor = nil;
What is the complexity?
24
Node Failures
Key step in failure recovery is maintaining correct successor
pointers
To help achieve this, each node maintains a successor-list of its
r nearest successors on the ring
If node n notices that its successor has failed, it replaces it
with the first live entry in the list
Successor lists are stabilized as follows:
node n reconciles its list with its successor s by copying s’s
successor list, removing its last entry, and prepending s to it.
If node n notices that its successor has failed, it replaces it
with the first live entry in its successor list and reconciles its
successor list with its new successor.
25