Transcript ppt File
Distributed Lookup Systems
Chord: A Scalable Peer-to-peer
Lookup Service for Internet App.
A Scalable Content Addressable
Network
Motivation
How to find data in a distributed system?
Publisher
Key=“LetItBe”
Value=MP3 data
N2
N1
N3
Internet
N4
N5
Client ?
Lookup(“LetItBe”)
Applications
Peer-to-peer systems
Napster, Gnutella, Groove, FreeNet, …
Large scale storage management systems
Publius, OceanStore, PAST, Farsite, CFS ...
Mirroring (web caching)
Any wide area name resolution system
Outline
Types of solutions
Evaluation Criteria
CAN and Chord
basic idea
insert/retrieve
join/leave
recovery from failures
Centralized Solution
Central server (Napster)
Publisher
Key=“LetItBe”
Value=MP3 data
N2
N1
N3
Internet
DB
N4
N5
Client
Lookup(“LetItBe”)
Distributed Solution (1)
Flooding (Gnutella, Morpheus, etc.)
Publisher
N2
N1
N3
Internet
N4
N5
Worst case O(m) messages per lookup
Client
Lookup(“LetItBe”)
Distributed Solution (2)
Routed messages (Freenet, Tapestry, Chord, CAN, etc.)
Publisher
Key=“LetItBe”
Value=MP3 data
N2
N1
N3
Internet
N4
N5
Client
Lookup(“LetItBe”)
Routing Challenges
Define a useful nearness metric
Keep the hop count small
Keep the routing table right size
Stay robust despite robust changes in
membership
Evaluation
Scalability
routing path length
per-node state
Latency
Load balancing
Robustness
routing fault tolerance
data availability
Content Addressable Network
Basic idea
Internet scale hash table
Interface
insert(key,value)
value = retrieve(key)
Table partitioned among many individual
node
CAN - Solution
virtual d-dimensional Cartesian coordinate
space
dynamically partitioned into zones, each
“owned” by one node
a key mapped into a point P
(key, value) is stored at the node which
owns the zone P belongs to
CAN - Example
I
CAN - Example
node I::insert(K,V)
I
CAN - Example
node I::insert(K,V)
I
(1) a = hx(K)
x=a
CAN - Example
node I::insert(K,V)
(1) a = hx(K)
b = hy(K)
I
y=b
x=a
CAN - Example
node I::insert(K,V)
(1) a = hx(K)
b = hy(K)
(2) route(K,V)-> (a,b)
I
CAN - Example
node I::insert(K,V)
(1) a = hx(K)
b = hy(K)
(2) route(K,V)-> (a,b)
(3) (a,b) stores (K,V)
I
(K,V)
CAN - Routing
each node maintains
state for its neighbors
(a,b)
message contains dst
coordinates
greedy forwarding to
neighbor with
coordinates closest to
dst
(x,y)
CAN - Node Insertion
I
new node
1) discover some node “I” already in CAN
CAN – Node Insertion
(p,q)
I
new node
2) pick random
point in space
CAN – Node Insertion
J
I
new node
3) I routes to (p,q), discovers node J
CAN - Node Insertion
J new
4) split J’s zone in half… new owns one half
CAN – Node Failure
Need to repair the space
recover database
soft-state updates
use replication, rebuild database from replicas
repair routing
takeover algorithm
CAN – Takeover Algorithm
Simple failures
know your neighbor’s neighbors
when a node fails, one of its neighbors takes over its
zone
More complex failure modes
simultaneous failure of multiple adjacent nodes
scoped flooding to discover neighbors
hopefully, a rare event
CAN - Evaluation
Scalability
per node, number of neighbors is 2d
average routing path is (dn1/d)/4 hops
Can scale the network without increasing
per-node state
CAN – Design Improvements
increase dimensions, realities (reduce
the path length)
heuristics (reduce the per-CAN-hop
latency)
RTT-weighted routing
multiple nodes per zone (peer nodes)
deterministically replicate entries
CAN - Weaknesses
Impossible to perform a fuzzy search
Susceptible to malicious activity
Maintain coherence of all the indexed data
(Network overhead, Efficient distribution)
Still relatively higher routing latency
Poor performance w/o improvement
Chord
Based on consistent hashing for key to node
mapping
standard hashing, e.g. x->ax+b (mod(p))
provide good balance across bins
consistent hashing
small change in the bucket set does not induce a
total remapping of items to buckets
Chord IDs
m bit identifier space for both keys and nodes
Key identifier = SHA-1(key)
Key=“LetItBe”
ID=60
Node identifier = SHA-1(IP address)
IP=“198.10.10.1”
SHA-1
SHA-1
ID=123
Both are uniformly distributed
Chord – Consistent Hashing
0 K5
IP=“198.10.10.1”
N123
K101
N90
K20
Circular 7-bit
ID space
N32
Key=“LetItBe”
K60
A key is stored at its successor: node with next higher ID
Chord – Basic Lookup
Every node knows its successor in the ring
0
N10
N123
Where is “LetItBe”?
Hash(“LetItBe”) = K60
N32
“N90 has K60”
K60 N90
N55
Chord – “Finger Tables”
Every node knows m other nodes in the ring
Increase distance exponentially
i
Finger i points to successor of n+2
N112
80 + 25
N96
80 + 24
80 + 23
80 + 22
80 + 21 0
80 + 2
N80
N16
80 + 26
Chord – “Finger Tables”
N32’s
Finger Table
(0)
N113
N102
N32
N85
N40
N80
N79
N52
N70
N60
33..33
34..35
36..39
40..47
48..63
64..95
96..31
N40
N40
N40
N40
N52
N70
N102
Chord – Lookup Algorithm
(0)
N32’s
Finger Table
N113
N32
N85
N40
N80
N52
N79
N70
N60
33..33
34..35
36..39
40..47
48..63
64..95
96..31
N40
N40
N40
N40
N52
N70
N102
Chord - Node Insertion
Three step process:
Initialize all fingers of new node
Update fingers of existing nodes
Transfer keys from successor to new node
Less aggressive mechanism (lazy finger update):
Initialize only the finger to successor node
Periodically verify immediate successor, predecessor
Periodically refresh finger table entries
Joining the Ring – Step 1
Initialize the new finger table
locate any node p in the ring
ask node p to lookup fingers of new node N36
N5
N20
N36
N99
1. Lookup(37,38,40,…,100,164)
N40
N80
N60
Joining the Ring – Step 2
Updating fingers of existing nodes
new node calls update function on existing nodes
existing node can recursively update fingers of
other nodes
N5
N20
N99
N36
N40
N80
N60
Joining the Ring – Step 3
Transfer keys from successor node to
new node
N5
N20
N99
N36
K30
N40 K30
K38
N80
K38
N60
Copy keys 21..36
from N40 to N36
Chord – Handling Failures
Use successor list
each node know r immediate successors
after failure will know first live successor
correct successors guarantee correct
lookups
Guarantee is with some probability
Can choose r to make probability of
lookup failure arbitrarily small
Chord - Evaluation
Efficient: O(Log N) messages per lookup
N is the total number of servers
Scalable: O(Log N) state per node
Robust: survives massive changes in
membership
Assuming no malicious participants
Chord - Weakness
NOT that simple (compared to CAN)
Member joining is complicated
aggressive mechanisms requires too many messages and updates
no analysis of convergence in lazy finger mechanism
Key management mechanism mixed between layers
upper layer does insertion and handle node failures
Chord transfer keys when node joins (no leave mechanism!)
Routing table grows with # of members in group
Worst case lookup can be slow
Summary
Both systems:
fully distributed and scalable
efficient lookup
robust
simplicity (?)
susceptible to malicious activity
How these related to OSD?
Very similar if data is “public”
If data is “private”, only a few
locations are available for storing data
Does OSD help (make it easy) for
peer-to-peer computing?
Any more comments?