Transcript Pastry
Pastry
Peter Druschel, Rice University
Antony Rowstron, Microsoft Research UK
Some slides are borrowed from the original
presentation by the authors
Outline
•
•
•
•
Background
Pastry
Pastry proximity routing
PAST
Background
Peer-to-peer systems
•
•
•
•
distribution
decentralized control
self-organization
symmetry (communication, node roles)
Common issues
•
•
•
•
Organize, maintain overlay network
Resource allocation/load balancing
Resource location
Network proximity routing
Pastry provides a generic p2p substrate
Architecture
Event
notification
Network
storage
Pastry
TCP/IP
?
P2p application layer
P2p substrate
(self-organizing
overlay network)
Internet
Structured p2p overlays
The primitive route(M, X) routes message M
to the live node with node Id closest to key X
Node ids and keys are from a large, sparse id space
Distributed Hash Tables (DHT)
nodes
k1,v1
Operations:
insert(k,v)
lookup(k)
P2P
overlay
network
k2,v2
k3,v3
k4,v4
k5,v5
k6,v6
• p2p overlay maps keys to nodes
• completely decentralized and self-organizing
• robust, scalable
Outline
•
•
•
•
•
•
Background
Pastry
Pastry proximity routing
PAST
SCRIBE
Conclusions
Pastry: Object distribution
2128-1 O
objId
Consistent hashing
[Karger et al. ‘97]
128 bit circular id space
nodeIds (uniform random)
nodeIds
objIds (uniform random)
Invariant: node with
numerically closest nodeId
maintains object
Pastry: Object insertion/lookup
2128-1 O
X
Msg with key X is
routed to live node
with nodeId closest
to X
Problem: complete
routing table not
feasible
Route(X)
Pastry: Routing
Tradeoff
• O(log N) routing table size
• O(log N) message forwarding steps
Routing table of # 65a1fcx
Row 0
0
x
1
x
2
x
3
x
4
x
Row 1
6
0
x
6
1
x
6
2
x
6
3
x
6
4
x
6
5
0
x
6
5
1
x
6
5
2
x
6
5
3
x
6
5
4
x
6
5
a
2
x
6
5
a
3
x
6
5
a
4
x
Row 2
Row 3
log16 N rows
6
5
a
0
x
5
x
7
x
8
x
9
x
a
x
b
x
c
x
d
x
e
x
f
x
6
6
x
6
7
x
6
8
x
6
9
x
6
a
x
6
b
x
6
c
x
6
d
x
6
e
x
6
f
x
6
5
5
x
6
5
6
x
6
5
7
x
6
5
8
x
6
5
9
x
6
5
b
x
6
5
c
x
6
5
d
x
6
5
e
x
6
5
f
x
6
5
a
5
x
6
5
a
6
x
6
5
a
7
x
6
5
a
8
x
6
5
a
9
x
6
5
a
b
x
6
5
a
c
x
6
5
a
d
x
6
5
a
e
x
6
5
a
f
x
6
5
a
a
x
Pastry: Routing
d471f1
d467c4
d462ba
d46a1c
d4213f
Route(d46a1c)
65a1fc
Properties
d13da3
log16 N steps
O(log N) state
Pastry: Leaf sets
Each node maintains IP addresses of the
nodes with the L/2 numerically closest
larger and smaller nodeIds, respectively.
• routing efficiency/robustness
• fault detection (keep-alive)
• application-specific local coordination
Pastry: Routing procedure
if (destination is within range of our leaf set)
forward to numerically closest member
else
let l = length of shared prefix
let d = value of l-th digit in D’s address
if (Rld exists)
(Rld = entry at column d row l)
forward to Rld
else
forward to a known node that
(a) shares at least as long a prefix
(b) is numerically closer than this node
Pastry: Performance
Integrity of overlay/ message delivery:
• guaranteed unless L/2 simultaneous failures
of nodes with adjacent nodeIds
Number of routing hops:
• No failures: < log16 N expected, 128/b + 1 max
• During failure recovery:
– O(N) worst case, average case much better
Self-organization
How are the routing tables and leaf sets
initialized and maintained?
• Node addition
• Node departure (failure)
Pastry: Node addition
d471f1
d467c4
d462ba
d46a1c
d4213f
New node: d46a1c
Route(d46a1c)
65a1fc
d13da3
Node departure (failure)
Leaf set members exchange heartbeat
• Leaf set repair (eager): request set
from farthest live node in set
• Routing table repair (lazy): get table
from peers in the same row, then higher
rows
Pastry: Experimental results
Prototype
• implemented in Java
• emulated network
• deployed currently at ~25 sites
worldwide
Pastry: Average # of hops
4.5
Average number of hops
4
3.5
3
2.5
2
1.5
Pastry
Log(N)
1
0.5
0
1000
10000
Number of nodes
L=16, 100k random queries
100000
Outline
• Background
• Pastry
• Pastry proximity routing
Pastry: Proximity routing
Proximity metric = time delay estimated by ping
A node can probe distance to any other node
Each routing table entry uses a node close to
the local node (in the proximity space),
among all nodes with the appropriate node Id
prefix.
Pastry: Routes in proximity
space
d467c4
d471f1
d467c4
d462ba
d46a1c
Proximity space
d4213f
Route(d46a1c)
d13da3
d4213f
65a1fc
NodeId space
d462ba
65a1fc
d13da3
Pastry: Distance traveled
1.4
Relative Distance
1.3
1.2
1.1
1
Pastry
0.9
Complete routing table
0.8
1000
10000
Number of nodes
L=16, 100k random queries, Euclidean proximity space
100000
Pastry: Locality properties
Expected distance traveled by a message in
the proximity space is within a small constant
of the minimum
Among k nodes with node Ids closest to the
key, message likely to reach the node closest
to the source node first
Pastry: Node addition
d471f1
d467c4
d462ba
d46a1c
d4213f
Route(d46a1c)
d467c4
Proximity space
d13da3
65a1fc
d4213f
New node: d46a1c
NodeId space
d462ba
65a1fc
d13da3
Distance traveled by Pastry message
Pastry delay vs IP delay
2500
Mean = 1.59
2000
1500
1000
500
0
0
200
400
600
800
1000
1200
1400
Distance between source and destination
GATech top., .5M hosts, 60K nodes, 20K random messages
Pastry: API
• route(M, X): route message M to node with
nodeId numerically closest to X
• deliver(M): deliver message M to application
• forwarding(M, X): message M is being
forwarded towards key X
• newLeaf(L): report change in leaf set L to
application
Pastry: Security
•
•
•
•
Secure nodeId assignment
Secure node join protocols
Randomized routing
Byzantine fault-tolerant leaf set
membership protocol
Pastry: Summary
• Generic p2p overlay network
• Scalable, fault resilient, self-organizing,
secure
• O(log N) routing steps (expected)
• O(log N) routing table size
• Network proximity routing
PAST: File Retrieval
C
k replicas
Lookup
fileId
file located in log16 N
steps (expected)
usually locates replica
nearest client C