Document

Transcript Document

Distributed Hash tables
1
Overview
 Objective
A distributed lookup service
 Data items are distributed among n parties
 Anyone in the network can find an item
efficiently
 No central node or infrastructure

 Applications:
P2P file sharing, without central node (items
are files)
 Large scale, distributed or grid computation
(items are computational tasks)

2
Chord
 Uses consistent hashing to map keys to
nodes.
 Enables
Lookup
 Node joining the network
 Node leaving the network

 Each node maintains routing information on
the network.
3
Consistent hashing
 Consistent hash function assigns each node and
key an m-bit identifier.
 A node is a member of the network: a processing
and communication unit.
 A key identifies a data item:

Example: movie name
 A node’s identifier can be defined by e.g. hashing
the node’s IP address.
 An identifier for node or key can be produced by
hashing


ID(node) = hash(IP address)
ID(key) = hash(key)
4
Consistent hashing (cont.)
 Properties of consistent hashing:
 Load balancing – given N nodes, with high probability
O(1/N) of the keys are stored in each node.
 When the N-th node joins, with high probability O(1/N)
of the keys are moved.
 Practical problems with identifiers

IP is not always a good identifier:
• NAT
• DHCP

Key is not always well defined (e.g. file name).
5
Chord ring
 Chord orders all m-bit identifiers (both nodes and





keys) on ring of 2m elements
Order is defined modulo 2m (2m-1<0)
Identifier space for keys – {0,…,2m-1}
 Not all keys need be in use concurrently
Identifier space for nodes – {0,…,2m-1}
 Not all nodes are necessarily active
concurrently
Key k is assigned to node n if
 id(n)=min {id (n’); id(n’)id(k)} ( is interpreted
modulo 2m, i.e. first node clockwise).
n is called successor(k)
6
Successor Nodes
identifier
node
6
1
0
successor(6) = 0
6
identifier
circle
6
5
key
successor(1) = 1
1
7
X
2
2
successor(2) = 3
3
4
2
m=3
7
More on hashing
 Desirable properties for hash function:
 Collision resistant on nodes (mandatory)
 Collision resistant on keys (hopefully)
 Load balancing
 Good adversarial behavior.
 Chord suggestion
 SHA-1
 ID(node) = SHA-1(IP address)
 ID(key) = SHA-1(key)
 Drawback

Full SHA-1 implies m=160

Truncated SHA-1 may not be collision-resistant
8
Key lookup
 Suppose each node stores successor(node)
 We could use the following pseudo-code
n.find_successor(id)
if (id  [node, successor])
return successor;
else
// forward the query around the circle
return successor.find_successor(id);
 Example of execution
 What is the time/message complexity?
9
Key lookup in O(m)
 Each node n holds additional routing
information: finger table
m entries (at most)
 Each entry is called “finger”

 Data in each entry
Finger[i].node=successor (n+2i-1 mod 2m), i=1,…,m
 Finger[i] includes TCP/IP access information
such as IP and port.

 Denote the i-th finger by node.finger(i)
10
Example – finger tables
finger table
start
int.
1
2
4
[1,2)
[2,4)
[4,0)
1
6
1
3
0
finger table
start
Int.
0
7
succ.
keys
6
2
3
5
[2,3)
[3,5)
[5,1)
succ.
keys
1
3
3
0
2
5
3
4
finger table
start
int.
4
5
7
[4,5)
[5,7)
[7,3)
succ.
keys
2
0
0
0
11
Example – fast lookup
12
Fast lookup - pseudo code
// ask node n to find the successor of id
n.find_successor(id)
if (id  (n, successor])
return successor;
else
n’ = closest_preceding_node(id);
return n’.find_successor(id);
// search the local table for the highest predecessor of
id
n.closest_preceding_node(id)
for i = m downto 1
if (finger[i]  (n, id))
return finger[i];
return n;
13
Fast lookup - analysis
 Time complexity at each node

O(m)
 Theorem - Number of nodes in a search
path is O(m) in the worst case
 Number of messages is equal to number of
nodes in search path
 Theorem – if nodes are distributed
uniformly among identifiers then number
of nodes in search path is O(log n) with
high probability.
14
Fast lookup – analysis (cont.)
 Xi – i-th node is in last interval of 2m/N
identifiers in search path.
 X=Σxi
 =E[X]=1
 Chernoff bound –

Pr[X>(1+)]<(e/(1+)
1+) 
15
Node Joins and Stabilizations
 Nodes join dynamically
 Network maintains two invariants
 Node successor is up to date.
 For every key k, successor(k) is responsible for k.
 When node n joins:
 n.successor and n.finger[i] are updated.
 Successor and finger tables of other nodes are updated.
 Correct keys are transferred to n.
 Each node runs a “stabilization” protocol
periodically in the background to update successor
pointer and finger table.
16
Node Joins and Stabilizations
 “Stabilization” protocol contains 6
functions:
create()
 join()
 stabilize()
 notify()
 fix_fingers()
 check_predecessor()

 Each node has both successor and
predecessor pointers
17
Node Joins – join()
 When node n first starts, it calls n.join(n’),
where n’ is any known Chord node.
 The join() function asks n’ to find the
immediate successor of n.
 join() does not make the rest of the
network aware of n.
18
Node Joins – join()
// create a new Chord ring.
n.create()
predecessor = nil;
successor = n;
// join a Chord ring containing node n’.
n.join(n’)
predecessor = nil;
successor = n’.find_successor(n);
19
Node Joins – stabilize()
 Each time node n runs stabilize(), it asks
its successor for the successor’s
predecessor p, and decides whether p
should be n’s successor instead.
 stabilize() notifies node n’s successor of n’s
existence, giving the successor the chance
to change its predecessor to n.
 The successor does this only if it knows of
no closer predecessor than n.
20
Node Joins – stabilize()
// called periodically. verifies n’s immediate
// successor, and tells the successor about n.
n.stabilize()
x = successor.predecessor;
if (x  (n, successor))
successor = x;
successor.notify(n);
// n’ thinks it might be our predecessor.
n.notify(n’)
if (predecessor is nil or n’  (predecessor, n))
predecessor = n’;
21
Node Joins – Join and
Stabilization

ns
n joins


n runs stabilize


np
succ(np) = ns
n
nil
succ(np) = n
pred(ns) = np
pred(ns) = n


predecessor = nil
n acquires ns as successor via some n’
n notifies ns being the new predecessor
ns acquires n as its predecessor
np runs stabilize




np asks ns for its predecessor (now n)
np acquires n as its successor
np notifies n
n will acquire np as its predecessor

all predecessor and successor pointers
are now correct

fingers still need to be fixed, but old
fingers will still work
22
Node Joins – fix_fingers()
 Each node periodically calls fix fingers to
make sure its finger table entries are
correct.
 New nodes initialize their finger tables
using fix_fingers().
 Existing nodes incorporate new nodes into
their finger tables using fix_fingers().
 Each node maintains a pointer next into its
finger table.
23
Node Joins – fix_fingers()
// called periodically. refreshes finger table entries.
n.fix_fingers()
next = next + 1 ;
if (next > m)
next = 1 ;
finger[next].node = find_successor(n + 2next-1);
// checks whether predecessor has failed.
n.check_predecessor()
if (predecessor has failed)
predecessor = nil;
 What is the complexity?
24
Node Failures
 Key step in failure recovery is maintaining correct successor
pointers
 To help achieve this, each node maintains a successor-list of its
r nearest successors on the ring
 If node n notices that its successor has failed, it replaces it
with the first live entry in the list
 Successor lists are stabilized as follows:


node n reconciles its list with its successor s by copying s’s
successor list, removing its last entry, and prepending s to it.
If node n notices that its successor has failed, it replaces it
with the first live entry in its successor list and reconciles its
successor list with its new successor.
25

Document

Transcript Document

Directory