Building Peer-to-Peer Systems With Chord, a Distributed

Download Report

Transcript Building Peer-to-Peer Systems With Chord, a Distributed

Chord: A Scalable Peer-to-peer Lookup
Service for Internet Applications
Dr. Yingwu Zhu
A peer-to-peer storage problem
• 10000 scattered music enthusiasts
• Willing to store and serve replicas
• Contributing resources, e.g., storage,
bandwidth, etc.
• How do you find the data?
• Efficient Lookup mechanism needed!
The lookup problem
N1
Key=“title”
Value=MP3 data…
Publisher
N2
Internet
N4
N5
N3
?
N6
Client
Lookup(“title”)
Centralized lookup (Napster)
N1 N2
SetLoc(“title”, N4)
Publisher@N4
Key=“title”
Value=MP3 data…
N3
DB
N9
N6
N7
Client
Lookup(“title”)
N8
Simple, but O(N) state and a single point of failure
Legal issues!
Napster: Publish
insert(X,
123.2.21.23)
...
Publish
I have X, Y, and Z!
123.2.21.23
Napster: Search
123.2.0.18
Fetch
Query
Where is file A?
search(A)
-->
123.2.0.18
Reply
Napster
• Central Napster server
• Can ensure correct results?
• Bottleneck for scalability
• Single point of failure
• Susceptible to denial of service
• Malicious users
• Lawsuits, legislation
• Search is centralized
• File transfer is direct (peer-to-peer)
Flooded queries (Gnutella)
N2
N1
Publisher@N
4
Key=“title”
Value=MP3 data…
N6
N7
N3
Lookup(“title”)
Client
N8
N9
Robust, but worst case O(N) messages per lookup
Gnutella: Query Flooding
Breadth-First Search (BFS)
= source
= forward
query
= processed
query
= found
result
= forward
response
Gnutella: Query Flooding
• A node/peer connects to a set of Gnutella
neighbors
• Forward queries to neighbors
• Client which has the Information responds.
• Flood network with TTL for termination
+ Results are complete
– Bandwidth wastage
Gnutella: Random Walk
• Improved over query flooding
• Same overly structure to
Gnutella
• Forward the query to random
subset of it neighbors
+ Reduced bandwidth
requirements
– Incomplete results
– High latency
Peer nodes
Kazza (Fasttrack Networks)
• Hybrid of centralized Napster and decentralized
Gnutella
• Super-peers act as local search hubs
• Each super-peer is similar to a Napster server for a small
portion of the network
• Super-peers are automatically chosen by the system based
on their capacities (storage, bandwidth, etc.) and availability
(connection time)
• Users upload their list of files to a super-peer
• Super-peers periodically exchange file lists
• You send queries to a super-peer for files of interest
• The local super-peer may flood the queries to other superpeers for the files of interest, if it cannot satisfy the queries.
• Exploit the heterogeneity of peer nodes
Kazza
• Uses supernodes to improve
scalability, establish hierarchy
• Uptime, bandwidth
• Closed-source
• Uses HTTP to carry out
download
• Encrypted protocol; queuing,
QoS
KaZaA: Network Design
“Super Nodes”
KaZaA: File Insert
insert(X,
123.2.21.23)
...
Publish
I have X!
123.2.21.23
KaZaA: File Search
search(A)
-->
123.2.22.50
123.2.22.50
Query
Replies
search(A)
-->
123.2.0.18
Where is file A?
123.2.0.18
Routed queries (Freenet, Chord, etc.)
N2
N1
Publisher
N3
N4
Key=“title”
Value=MP3 data…
N6
N9
N7
N8
Client
Lookup(“title”)
Routing challenges
•
•
•
•
Define a useful key nearness metric
Keep the hop count small
Keep the tables small
Stay robust despite rapid change (node
addition/removal)
• Freenet: emphasizes anonymity
• Chord: emphasizes efficiency and simplicity
Chord properties
• Efficient: O(log(N)) messages per lookup
• N is the total number of servers
• Scalable: O(log(N)) state per node
• Robust: survives massive failures
• Proofs are in paper / tech report
• Assuming no malicious participants
Chord overview
• Provides peer-to-peer hash lookup:
• Lookup(key)  IP address
• Mapping: key  IP address
• How does Chord route lookups?
• How does Chord maintain routing tables?
Chord IDs
•
•
•
•
Key identifier = SHA-1(key)
Node identifier = SHA-1(IP address)
Both are uniformly distributed
Both exist in the same ID space
• How to map key IDs to node IDs?
Consistent hashing [Karger 97]
Key 5
Node 105
K5
N105
K20
Circular 7-bit
ID space
N32
N90
K80
A key is stored at its successor: node with next higher ID
Basic lookup
N120
N10 “Where is key 80?”
N105
“N90 has K80”
K80 N90
N60
N32
Simple lookup algorithm
Lookup(my-id, key-id)
n = my successor
if my-id < n < key-id
call Lookup(id) on node n // next hop
else
return my successor
// done
• Correctness depends only on successors
“Finger table” allows log(N)-time lookups
¼
Fast track/
Express lane
1/8
1/16
1/32
1/64
1/128
N80
½
Finger i points to successor of n+2i
N120
112
¼
1/8
1/16
1/32
1/64
1/128
N80
½
Lookup with fingers
Lookup(my-id, key-id)
look in local finger table for
highest node n s.t. my-id < n < key-id
if n exists
call Lookup(id) on node n
// next hop
else
return my successor
// done
Lookups take O(log(N)) hops
N5
N10
K19
N20
N110
N99
N32 Lookup(K19)
N80
N60
Joining: linked list insert
N25
N36
1. Lookup(36)
N40
K30
K38
Join (2)
N25
2. N36 sets its own
successor pointer
N36
N40
K30
K38
Join (3)
N25
3. Copy keys 26..36
from N40 to N36
N36 K30
N40
K30
K38
Join (4)
N25
4. Set N25’s successor
pointer
N36 K30
N40
K30
K38
Update finger pointers in the background
Correct successors produce correct lookups
Failures might cause incorrect lookup
N120
N113
N10
N102
N85
Lookup(90)
N80
N80 doesn’t know correct successor, so incorrect lookup
Solution: successor lists
• Each node knows r immediate successors
• After failure, will know first live successor
• Correct successors guarantee correct lookups
• Guarantee is with some probability
Choosing the successor list length
• Assume 1/2 of nodes fail
• P(successor list all dead) = (1/2)r
• I.e. P(this node breaks the Chord ring)
• Depends on independent failure
• P(no broken nodes) = (1 – (1/2)r)N
• r = 2log(N) makes prob. = 1 – 1/N
Lookup with fault tolerance
Lookup(my-id, key-id)
look in local finger table and successor-list
for highest node n s.t. my-id < n < key-id
if n exists
call Lookup(id) on node n
// next hop
if call failed,
remove n from finger table
return Lookup(my-id, key-id)
else return my successor
// done
Chord status
•
•
•
•
Working implementation as part of CFS
Chord library: 3,000 lines of C++
Deployed in small Internet testbed
Includes:
•
•
•
•
Correct concurrent join/fail
Proximity-based routing for low delay
Load control for heterogeneous nodes
Resistance to spoofed node IDs
Experimental overview
•
•
•
•
Quick lookup in large systems
Low variation in lookup costs
Robust despite massive failure
See paper for more results
Experiments confirm theoretical results
Average Messages per Lookup
Chord lookup cost is O(log N)
Number of Nodes
Constant is 1/2
Failure experimental setup
• Start 1,000 CFS/Chord servers
• Successor list has 20 entries
• Wait until they stabilize
• Insert 1,000 key/value pairs
• Five replicas of each
• Stop X% of the servers
• Immediately perform 1,000 lookups
Failed Lookups (Percent)
Massive failures have little impact
1.4
(1/2)6 is 1.6%
1.2
1
0.8
0.6
0.4
0.2
0
5
10
15
20
25 30
35
Failed Nodes (Percent)
40
45
50
Related Work
• CAN (Ratnasamy, Francis, Handley, Karp, Shenker)
• Pastry (Rowstron, Druschel)
• Tapestry (Zhao, Kubiatowicz, Joseph)
• Chord emphasizes simplicity
Chord Summary
•
•
•
•
Chord provides peer-to-peer hash lookup
Efficient: O(log(n)) messages per lookup
Robust as nodes fail and join
Good primitive for peer-to-peer systems
http://www.pdos.lcs.mit.edu/chord
Reflection on Chord
• Strict overlay structure
• Strict data placement
• If data keys are uniformly distributed,
and # of keys >> # of nodes
• Load balanced
• Each node has O(1/N) fraction of keys
• Node addition/deletion only move O(1/N)
load, load movement is minimized!
Reflection on Chord
• Routing table (successor list + finger
table)
•
•
•
•
Deterministic
Network topology unaware
Routing latency could be a problem
Proximity Neighbor Selection (PNS)
• m neighbor candidates, choose min latency
• Still O(logN) hops
Reflection on Chord
• Predecessor + Successor must be
correct, aggressively maintained
• Finger tables are lazily maintained
• Tradeoff: bandwidth, routing
correctness
Reflection on Chord
• Assume uniform node distribution
• In the wild, nodes are heterogeneous
• Load imbalance!
• Virtual servers
• A node hosts multiple virtual servers
• O(logN)
Join: lazy finger update is OK
N2
N25
K30
N36
N40
N2 finger should now point to N36, not N40
Lookup(K30) visits only nodes < 30, will undershoot
CFS: a peer-to-peer storage system
•
•
•
•
Inspired by Napster, Gnutella, Freenet
Separates publishing from serving
Uses spare disk space, net capacity
Avoids centralized mechanisms
• Delete this slide?
• Mention “distributed hash lookup”
CFS architecture
move later?
Block storage
Availability / replication
Authentication
Caching
Consistency
Server selection
Keyword search
Dhash distributed
block store
Lookup
Chord
• Powerful lookup simplifies other mechanisms