Transcript ppt File

Distributed Lookup Systems
Chord: A Scalable Peer-to-peer
Lookup Service for Internet App.
A Scalable Content Addressable
Network
Motivation
How to find data in a distributed system?
Publisher
Key=“LetItBe”
Value=MP3 data
N2
N1
N3
Internet
N4
N5
Client ?
Lookup(“LetItBe”)
Applications
 Peer-to-peer systems
Napster, Gnutella, Groove, FreeNet, …
 Large scale storage management systems
Publius, OceanStore, PAST, Farsite, CFS ...
 Mirroring (web caching)
 Any wide area name resolution system
Outline
 Types of solutions
 Evaluation Criteria
 CAN and Chord




basic idea
insert/retrieve
join/leave
recovery from failures
Centralized Solution
Central server (Napster)
Publisher
Key=“LetItBe”
Value=MP3 data
N2
N1
N3
Internet
DB
N4
N5
Client
Lookup(“LetItBe”)
Distributed Solution (1)
Flooding (Gnutella, Morpheus, etc.)
Publisher
N2
N1
N3
Internet
N4
N5
Worst case O(m) messages per lookup
Client
Lookup(“LetItBe”)
Distributed Solution (2)
Routed messages (Freenet, Tapestry, Chord, CAN, etc.)
Publisher
Key=“LetItBe”
Value=MP3 data
N2
N1
N3
Internet
N4
N5
Client
Lookup(“LetItBe”)
Routing Challenges
 Define a useful nearness metric
 Keep the hop count small
 Keep the routing table right size
 Stay robust despite robust changes in
membership
Evaluation
 Scalability


routing path length
per-node state
 Latency
 Load balancing
 Robustness


routing fault tolerance
data availability
Content Addressable Network
Basic idea
 Internet scale hash table
 Interface

insert(key,value)

value = retrieve(key)
 Table partitioned among many individual
node
CAN - Solution
 virtual d-dimensional Cartesian coordinate
space
 dynamically partitioned into zones, each
“owned” by one node
 a key mapped into a point P
 (key, value) is stored at the node which
owns the zone P belongs to
CAN - Example
I
CAN - Example
node I::insert(K,V)
I
CAN - Example
node I::insert(K,V)
I
(1) a = hx(K)
x=a
CAN - Example
node I::insert(K,V)
(1) a = hx(K)
b = hy(K)
I
y=b
x=a
CAN - Example
node I::insert(K,V)
(1) a = hx(K)
b = hy(K)
(2) route(K,V)-> (a,b)
I
CAN - Example
node I::insert(K,V)
(1) a = hx(K)
b = hy(K)
(2) route(K,V)-> (a,b)
(3) (a,b) stores (K,V)
I
(K,V)
CAN - Routing
 each node maintains
state for its neighbors
(a,b)
 message contains dst
coordinates
 greedy forwarding to
neighbor with
coordinates closest to
dst
(x,y)
CAN - Node Insertion
I
new node
1) discover some node “I” already in CAN
CAN – Node Insertion
(p,q)
I
new node
2) pick random
point in space
CAN – Node Insertion
J
I
new node
3) I routes to (p,q), discovers node J
CAN - Node Insertion
J new
4) split J’s zone in half… new owns one half
CAN – Node Failure
Need to repair the space
 recover database


soft-state updates
use replication, rebuild database from replicas
 repair routing

takeover algorithm
CAN – Takeover Algorithm
 Simple failures


know your neighbor’s neighbors
when a node fails, one of its neighbors takes over its
zone
 More complex failure modes



simultaneous failure of multiple adjacent nodes
scoped flooding to discover neighbors
hopefully, a rare event
CAN - Evaluation
 Scalability


per node, number of neighbors is 2d
average routing path is (dn1/d)/4 hops
 Can scale the network without increasing
per-node state
CAN – Design Improvements
 increase dimensions, realities (reduce
the path length)
 heuristics (reduce the per-CAN-hop
latency)
RTT-weighted routing
 multiple nodes per zone (peer nodes)
 deterministically replicate entries

CAN - Weaknesses
 Impossible to perform a fuzzy search
 Susceptible to malicious activity
 Maintain coherence of all the indexed data
(Network overhead, Efficient distribution)
 Still relatively higher routing latency
 Poor performance w/o improvement
Chord
Based on consistent hashing for key to node
mapping
 standard hashing, e.g. x->ax+b (mod(p))
provide good balance across bins
 consistent hashing
small change in the bucket set does not induce a
total remapping of items to buckets
Chord IDs

m bit identifier space for both keys and nodes

Key identifier = SHA-1(key)
Key=“LetItBe”

ID=60
Node identifier = SHA-1(IP address)
IP=“198.10.10.1”

SHA-1
SHA-1
ID=123
Both are uniformly distributed
Chord – Consistent Hashing
0 K5
IP=“198.10.10.1”
N123
K101
N90
K20
Circular 7-bit
ID space
N32
Key=“LetItBe”
K60
A key is stored at its successor: node with next higher ID
Chord – Basic Lookup
Every node knows its successor in the ring
0
N10
N123
Where is “LetItBe”?
Hash(“LetItBe”) = K60
N32
“N90 has K60”
K60 N90
N55
Chord – “Finger Tables”
 Every node knows m other nodes in the ring
 Increase distance exponentially
i
 Finger i points to successor of n+2
N112
80 + 25
N96
80 + 24
80 + 23
80 + 22
80 + 21 0
80 + 2
N80
N16
80 + 26
Chord – “Finger Tables”
N32’s
Finger Table
(0)
N113
N102
N32
N85
N40
N80
N79
N52
N70
N60
33..33
34..35
36..39
40..47
48..63
64..95
96..31
N40
N40
N40
N40
N52
N70
N102
Chord – Lookup Algorithm
(0)
N32’s
Finger Table
N113
N32
N85
N40
N80
N52
N79
N70
N60
33..33
34..35
36..39
40..47
48..63
64..95
96..31
N40
N40
N40
N40
N52
N70
N102
Chord - Node Insertion


Three step process:

Initialize all fingers of new node

Update fingers of existing nodes

Transfer keys from successor to new node
Less aggressive mechanism (lazy finger update):

Initialize only the finger to successor node

Periodically verify immediate successor, predecessor

Periodically refresh finger table entries
Joining the Ring – Step 1
 Initialize the new finger table


locate any node p in the ring
ask node p to lookup fingers of new node N36
N5
N20
N36
N99
1. Lookup(37,38,40,…,100,164)
N40
N80
N60
Joining the Ring – Step 2
 Updating fingers of existing nodes


new node calls update function on existing nodes
existing node can recursively update fingers of
other nodes
N5
N20
N99
N36
N40
N80
N60
Joining the Ring – Step 3
 Transfer keys from successor node to
new node
N5
N20
N99
N36
K30
N40 K30
K38
N80
K38
N60
Copy keys 21..36
from N40 to N36
Chord – Handling Failures
 Use successor list



each node know r immediate successors
after failure will know first live successor
correct successors guarantee correct
lookups
 Guarantee is with some probability

Can choose r to make probability of
lookup failure arbitrarily small
Chord - Evaluation

Efficient: O(Log N) messages per lookup




N is the total number of servers
Scalable: O(Log N) state per node
Robust: survives massive changes in
membership
Assuming no malicious participants
Chord - Weakness



NOT that simple (compared to CAN)
Member joining is complicated

aggressive mechanisms requires too many messages and updates

no analysis of convergence in lazy finger mechanism
Key management mechanism mixed between layers

upper layer does insertion and handle node failures

Chord transfer keys when node joins (no leave mechanism!)

Routing table grows with # of members in group

Worst case lookup can be slow
Summary
Both systems:





fully distributed and scalable
efficient lookup
robust
simplicity (?)
susceptible to malicious activity
How these related to OSD?
 Very similar if data is “public”
 If data is “private”, only a few
locations are available for storing data
 Does OSD help (make it easy) for
peer-to-peer computing?
 Any more comments?