Transcript Pastry_ICQ

Pastry: Scalable, decentralized
object location and routing for
large-scale peer-to-peer systems
Antony Rowstron and Peter Druschel
Proc. of the 18th IFIP/ACM International
Conference on Distributed Systems
Platforms, November 2001.
Outline






Pastry Goals
Routing
Adding/Deleting Nodes
Use of Locality
Results
Conclusions
Goals of Pastry







Create robust P2P overlay substrate
Provide efficient location and routing services
Make it highly scalable (100,000+ nodes)
Use locality to speed underlying network routing
Fault tolerance: repair, good replication support
No authentication or encryption
No naming or attribute based queries
Related Work

Chord – another overlay substrate

Build on top of Pastry:
• Overlay routing performance also O(log N)
• No notion of locality
• PAST - persistent file storage
• SCRIBE - publish/subscribe application
Overlay Substrate



Pastry overlay network consists of nodes.
Node ID is random 128 bit number
• Id’s uniformly distributed in circular 128 bit space.
• So numerically adjacent id’s are likely to be widely
dispersed in underlying physical network.
• Provides fault tolerance if you replicate data on
nodes adjacent in id space.
Pastry basically provides message delivery service
• Give a 128 bit key and message to Pastry.
• Pastry routes message to node with node id
numerically closest to key value.
• Like a distributed hash table.
Pastry API



Pastry’s API only supports message delivery
Pastry offers a very simple API!
•
nodeId = pastryInit(credentials, appHandle)
•
Route(msg, key)
• Joins to network and returns this nodes id
• Route msg to node with id closest to key
Pastry requires applications to export:
•
•
•
Deliver(msg,key) // msg arrived for this node
Forward(msg,key,nextID) // about to forward msg
newLeafs(leafSet) // notification leaf set changed
Basic Routing


ID’s treated as sequence of digits in base 2^b
•
A good number for b is 4, so a node or key ID is a
sequence of 32 hex digits.
Routing is based on address prefixes
•
•
•
•
Nodes forward messages to a node “closer” to keyID
Closer in terms of length of common prefix
• If current nodeID shares first n digits with keyID
• Forward to a node that shares > n prefix digits with
keyID
If no longer common prefix nodeID is in routing table
• Then forward to a node that is numerically closer to
keyID.
Requires less than ceil( log 2 N ) steps
b
Basic Routing
Routing from 65a1fc to
d46a1c

Each hop, shared
prefix gets longer, (or
at least numerically
closer).
Node State for Routing



To support routing each node maintains three pieces of
state
• Leaf Set, “Neighborhood” Set and a Routing Table
Leaf Set has id/ip’s of nearest nodes in id space
• L/2 nearest nodes with greater ids, L/2 with lower ids.
• Typical L is 16 or 32
• Leaf set is 8 or 16 adjacent nodes on each side of
current node
Neighborhood Set, nodes close by locality metric
• Not used in routing, but when populating routing tables.
• Typical set size is 32 nodes.
Node State for Routing

Routing table has forwarding info
• Each entry has an appropriate nodeID and its ip
address.
• Table has approx. (log N) rows and (2^b) columns
• For 100K nodes with b=4, 5 rows and 16 columns
• Nth row has entries with same n digit prefix as current
• E.g. every entry in row 2 has the same 2 digit prefix as
•
node.
N+1st digit of each entry has value of its column #
• E.g. row 5, column 2 will have same first 5 digits as the
current node’s id and the 6th digit will be “2”.
Example Routing Table

Example in base 4.
L = 8 leafs.
M = 8 neighbors
Each row down the
shared prefix is one
digit longer. The next
digit after the shared
prefix is the column
number.
Routing Algorithm

Given a key to forward, a node does the
following:
•
•
•
If key within range of leaf nodes, forward to leaf.
• done
Else if routing table has node with longer common
prefix with key than current node, forward to it.
Else forward to a node with same length prefix, but
closer numerically.
• Only fails if the L/2 leafs closer to the key fail
simultaneously.
•
Unlikely, since leaf nodes widely distributed in underlying
net.
Adding a Node


Assumption:
• Join at a node that is close in terms of locality metric.
• Valid if ip-multicast enabled, otherwise is a problem.
Send a join msg to your id (no one else knows it yet)
• It gets routed to node with id closest to yours.
• Use that nodes leaf set as basis for yours.
• Get neighbors from the joined node, since it is assumed
close.
• Get all state info from nodes in route and neighbors.
• Build routing table from aggregate state data using best
nodes in terms of locality.
• Cost: approx. (3 x 2^b) x log N = 200+ RPCs for 100k
nodes and b=4
Adding a Node
Leaf Set, Neighborhood Set and
Routing Table
Neighborhood Set
Leaf Set
Losing a Node



Leaf node fails
•
Get an adjacent leaf to send you its leaf set.
Routing entry fails
•
Contact nodes in that row for a replacement.
• If they don’t have one, ask nodes in next row.
• Experiments show it takes about 57 RPC calls to fix.
Ping neighborhood set periodically
•
If lose one, ask another neighbor for their set
Locality


Assumption:
•
The locality space obeys triangulation inequality
• Not true for ip hops. Authors looking into it.
Assumption not necessarily true
•
but authors claim experimental results show the
algorithm works.
Locating Nearest of k Nodes

If data replicated at k nodes (e.g. a leaf set)
•
•
Want to find nearest of k, in terms of locality.
Algorithm works pretty well as described, but
•
Heuristic
• Adding a heuristic can help
• Estimate node densities by looking at current’s leafs
• Based on this density estimate, switch to numerically
nearest address routing to find nearest replica, instead
of using routing table to get to node with id closest to
keyID.
Heuristic Results

Reaching nearest of k nodes
• 10,000 node network
• No heuristic: closest 68%, or second best
•
87%
Heuristic:
92%
closest 76%, or second best
Results: Overlay Routing


Simulation with 100K nodes, run in 1 JVM
Overlay Routing Performance
• With b=4, L=16 and Neighbors=32
• Average # hops =4, max = 5
• Not surprising since “uniform
distribution” of id’s.
• 16^5 is > 100K, so 5 digits makes an id
unique.
Results: Overlay Routing
Results: Locality


Using distance in a plane as locality metric
Node positions picked randomly
• Compared sum of distances for all hops, against
a straight line from source to destination.
• Routed distance was 30-40% longer than line.
• Is impressive given how small the routing
tables are.
• For 100K nodes, b = 4, routing table is
about 75 entries.
• Interesting but doesn’t necessarily say much
about internet.
Results: Locality
Optional: Some Vulnerabilities

A malicious node can silently consume
messages
•
Currently no way to detect, but can defend against it.
• Randomize routing, usually take best hop, but not
always.

• After multiple queries one should avoid the bad node.
Partitioning can occur
•
If connectivity goes away for a while, node groups
may “repair” all their connections to rest of overlay
• When connectivity comes back up, still partitioned.
• No good solution. Authors suggest ip multicast
periodically.
Conclusion

Issues to solve:
•
•
•




Joining is a problem, especially since need to join a
node close in locality.
No security built in.
Not sure how well locality metric works in real world.
Overlay routing very efficient, few node hops
Very small routing tables
Locality works well in simulation environment
Overall, appears to be a very scalable solution