Transcript Document

Distributed Hash tables
1
Overview
 Objective
A distributed lookup service
 Data items are distributed among n parties
 Anyone in the network can find an item
efficiently
 No central node or infrastructure

 Applications:
P2P file sharing, without central node (items
are files)
 Large scale, distributed or grid computation
(items are computational tasks)

2
Chord
 Uses consistent hashing to map keys to
nodes.
 Enables
Lookup
 Node joining the network
 Node leaving the network

 Each node maintains routing information on
the network.
3
Consistent hashing
 Consistent hash function assigns each node and
key an m-bit identifier.
 A node is a member of the network: a processing
and communication unit.
 A key identifies a data item:

Example: movie name
 A node’s identifier can be defined by e.g. hashing
the node’s IP address.
 An identifier for node or key can be produced by
hashing


ID(node) = hash(IP address)
ID(key) = hash(key)
4
Consistent hashing (cont.)
 Properties of consistent hashing:
 Load balancing – given N nodes, with high probability
O(1/N) of the keys are stored in each node.
 When the N-th node joins, with high probability O(1/N)
of the keys are moved.
 Practical problems with identifiers

IP is not always a good identifier:
• NAT
• DHCP

Key is not always well defined (e.g. file name).
5
Chord ring
 Chord orders all m-bit identifiers (both nodes and





keys) on ring of 2m elements
Order is defined modulo 2m (2m-1<0)
Identifier space for keys – {0,…,2m-1}
 Not all keys need be in use concurrently
Identifier space for nodes – {0,…,2m-1}
 Not all nodes are necessarily active
concurrently
Key k is assigned to node n if
 id(n)=min {id (n’); id(n’)id(k)} ( is interpreted
modulo 2m, i.e. first node clockwise).
n is called successor(k)
6
Successor Nodes
identifier
node
6
1
0
successor(6) = 0
6
identifier
circle
6
5
key
successor(1) = 1
1
7
X
2
2
successor(2) = 3
3
4
2
m=3
7
More on hashing
 Desirable properties for hash function:
 Collision resistant on nodes (mandatory)
 Collision resistant on keys (hopefully)
 Load balancing
 Good adversarial behavior.
 Chord suggestion
 SHA-1
 ID(node) = SHA-1(IP address)
 ID(key) = SHA-1(key)
 Drawback

Full SHA-1 implies m=160

Truncated SHA-1 may not be collision-resistant
8
Key lookup
 Suppose each node stores successor(node)
 We could use the following pseudo-code
n.find_successor(id)
if (id  [node, successor])
return successor;
else
// forward the query around the circle
return successor.find_successor(id);
 Example of execution
 What is the time/message complexity?
9
Key lookup in O(m)
 Each node n holds additional routing
information: finger table
m entries (at most)
 Each entry is called “finger”

 Data in each entry
Finger[i].node=successor (n+2i-1 mod 2m), i=1,…,m
 Finger[i] includes TCP/IP access information
such as IP and port.

 Denote the i-th finger by node.finger(i)
10
Example – finger tables
finger table
start
int.
1
2
4
[1,2)
[2,4)
[4,0)
1
6
1
3
0
finger table
start
Int.
0
7
succ.
keys
6
2
3
5
[2,3)
[3,5)
[5,1)
succ.
keys
1
3
3
0
2
5
3
4
finger table
start
int.
4
5
7
[4,5)
[5,7)
[7,3)
succ.
keys
2
0
0
0
11
Example – fast lookup
12
Fast lookup - pseudo code
// ask node n to find the successor of id
n.find_successor(id)
if (id  (n, successor])
return successor;
else
n’ = closest_preceding_node(id);
return n’.find_successor(id);
// search the local table for the highest predecessor of
id
n.closest_preceding_node(id)
for i = m downto 1
if (finger[i]  (n, id))
return finger[i];
return n;
13
Fast lookup - analysis
 Time complexity at each node

O(m)
 Theorem - Number of nodes in a search
path is O(m) in the worst case
 Number of messages is equal to number of
nodes in search path
 Theorem – if nodes are distributed
uniformly among identifiers then number
of nodes in search path is O(log n) with
high probability.
14
Fast lookup – analysis (cont.)
 Xi – i-th node is in last interval of 2m/N
identifiers in search path.
 X=Σxi
 =E[X]=1
 Chernoff bound –

Pr[X>(1+)]<(e/(1+)
1+) 
15
Node Joins and Stabilizations
 Nodes join dynamically
 Network maintains two invariants
 Node successor is up to date.
 For every key k, successor(k) is responsible for k.
 When node n joins:
 n.successor and n.finger[i] are updated.
 Successor and finger tables of other nodes are updated.
 Correct keys are transferred to n.
 Each node runs a “stabilization” protocol
periodically in the background to update successor
pointer and finger table.
16
Node Joins and Stabilizations
 “Stabilization” protocol contains 6
functions:
create()
 join()
 stabilize()
 notify()
 fix_fingers()
 check_predecessor()

 Each node has both successor and
predecessor pointers
17
Node Joins – join()
 When node n first starts, it calls n.join(n’),
where n’ is any known Chord node.
 The join() function asks n’ to find the
immediate successor of n.
 join() does not make the rest of the
network aware of n.
18
Node Joins – join()
// create a new Chord ring.
n.create()
predecessor = nil;
successor = n;
// join a Chord ring containing node n’.
n.join(n’)
predecessor = nil;
successor = n’.find_successor(n);
19
Node Joins – stabilize()
 Each time node n runs stabilize(), it asks
its successor for the successor’s
predecessor p, and decides whether p
should be n’s successor instead.
 stabilize() notifies node n’s successor of n’s
existence, giving the successor the chance
to change its predecessor to n.
 The successor does this only if it knows of
no closer predecessor than n.
20
Node Joins – stabilize()
// called periodically. verifies n’s immediate
// successor, and tells the successor about n.
n.stabilize()
x = successor.predecessor;
if (x  (n, successor))
successor = x;
successor.notify(n);
// n’ thinks it might be our predecessor.
n.notify(n’)
if (predecessor is nil or n’  (predecessor, n))
predecessor = n’;
21
Node Joins – Join and
Stabilization

ns
n joins


n runs stabilize


np
succ(np) = ns
n
nil
succ(np) = n
pred(ns) = np
pred(ns) = n


predecessor = nil
n acquires ns as successor via some n’
n notifies ns being the new predecessor
ns acquires n as its predecessor
np runs stabilize




np asks ns for its predecessor (now n)
np acquires n as its successor
np notifies n
n will acquire np as its predecessor

all predecessor and successor pointers
are now correct

fingers still need to be fixed, but old
fingers will still work
22
Node Joins – fix_fingers()
 Each node periodically calls fix fingers to
make sure its finger table entries are
correct.
 New nodes initialize their finger tables
using fix_fingers().
 Existing nodes incorporate new nodes into
their finger tables using fix_fingers().
 Each node maintains a pointer next into its
finger table.
23
Node Joins – fix_fingers()
// called periodically. refreshes finger table entries.
n.fix_fingers()
next = next + 1 ;
if (next > m)
next = 1 ;
finger[next].node = find_successor(n + 2next-1);
// checks whether predecessor has failed.
n.check_predecessor()
if (predecessor has failed)
predecessor = nil;
 What is the complexity?
24
Node Failures
 Key step in failure recovery is maintaining correct successor
pointers
 To help achieve this, each node maintains a successor-list of its
r nearest successors on the ring
 If node n notices that its successor has failed, it replaces it
with the first live entry in the list
 Successor lists are stabilized as follows:


node n reconciles its list with its successor s by copying s’s
successor list, removing its last entry, and prepending s to it.
If node n notices that its successor has failed, it replaces it
with the first live entry in its successor list and reconciles its
successor list with its new successor.
25