route around failure
Download
Report
Transcript route around failure
Overlays and DHTs
Presented by
Dong Wang and Farhana Ashraf
Schedule
Review of Overlay and DHT
RON
Pastry
Kelips
Review on Overlay and DHT
Overlay
Network build on top of another network
Nodes connected to each other by logical/virtual links
Improve Internet Routing and Easy to deploy
P2P(Gnutella, Chord…), RON
DHT
Allows you to do lookup, insert, delete objects with keys in a
distributed settings
Performance Concerns
Load Balancing
Fault Tolerance
Efficiency of lookups and inserts
Locality
CAN, Chord, Pasrty, Tapestry are all DHTs
Resilient Overlay Network
Acknowledged:
http://nms.csail.mit.edu/ron/
Previous CS525 Courses
David G. Andersen, etc.
MIT
SOSP 2001
RON-Resilient Overlay Network
Motivation
Goal
Design
Evaluation
Discussion
RON-Motivation
Current Internet Backbone NOT be able to
• Detect failed path and recover quickly
• BGP takes several mins to recover from faults
• Detect flooding and congestion effectively
• Leverage redundant path efficiently
• Express fine-grained policy/metrics
RON-Basic Idea
B
A
D
C
Inequality of Triangles does not usually hold for Internet!
-Eg. Latency-It is possible that: AB+BC<AC
RON makes use of underlying path redundancy of Internet to
provide better path and route around failure
RON is end to end solution, packets are simply wrapped
around and sent normally
RON-Main Goals
Fast Failure Detection and Recovery
Tighter integration of routing/path selection
with the application
Average detect and recover delay<20s
Application can specify metrics to affect routing
Expressive Policy Routing
Fine-grained and aim at users and hosts
RON-Design
Overlay
Old idea in networks
Easily deployed and let Internet focus on
scalability
Only keep functionality between active peers
Approach
Aggressively probe all inter-RON node paths
Exchange routing information
Route along best path (from end to end view)
consistent with routing policy
RON-Design: Architecture
Probe between nodes, detect path quality
Store path qualities at Performance Database
Link-state routing protocol among nodes
Data are handled by application-specific conduit and
forwarded in UDP
RON-Design: Routing and Lookup
Policy routing
Metric optimization
Classify by policy
Generate table per policy
Application tags the packet
with its specific metric
Generate table per metric
Multi-level routing table and
3 stage lookup
Policy->Metric->Next hop
RON Design-Probing and Outage Detect
Probe every PROBE_INTERVAL (12s)
With 3 packets, both participants get an RTT and reachability
without syn. Clocks
If probe is lost, send next immediately, up to 3 more probes
(PROBE_TIMEOUT 3s)
Notify outage after 4 consecutive probe loses
Outage detection time on average=19 s
RON Design-Policy Routing
Allow user to define types of traffic allowed on
particular network links
Place policy tags on packets and build up policy
based routing table
Two policy mechanisms
exclusive cliques
general policies
RON-Evaluation
Two main dataset from Internet deployment
RON1-N=12 nodes, 132 distinct paths, traverse
36 AS and 74 Inter-AS paths
RON2-N=16 nodes, 240 distinct paths, traverse
50 AS and 118 Inter-AS paths
Policy-prohibit sending traffic to or from
commercial sites over the Internet2
RON Evaluation-Major Results
Increase the resilience of the overlay network
RON takes ~10s to route around failure
Compared to BGP’s several minutes
Many Internet outage are avoidable
Improve performance –Loss rate, Latency,
TCP Throughput
Single-hop indirect routing works well
Overhead is reasonable and acceptable
RON vs Internet 30 minute loss rates
For RON1-able to route around all outages;
For RON2-about 60% outages are
overcome
Performance-Loss rate
Performance-Latency
Performance-TCP Throughput
Performance Improvement: RON employs Appspecific metric optimization to select path
Scalability
Conclusions
RON improves network reliability and
performance
Overlay approach is attractive for resiliency:
development, fewer nodes, simple substrate
Single-hop indirection in RON works well
RON also introduces more probing and
updating traffic into network
RON-Discussion
Aggressiveness
RON never back-off as TCP does
Can it coexist with current traffic on Internet?
What happens if everyone starts to use RON?
Is it possible to modify RON to achieve good
behavior in a global sense
Scalability
Trade scalability for improved reliability
Many RONS coexisting in the Internet
Hierarchical structure of RON network
Distributed Hash Table
Problem
Route a msg with key, K to
the node, Z which has a ID
closest to key K
Not scalable if routing table
contains all the nodes
d471f1
d467c4
d462ba
X=d46a1c
d4213f
Tradeoffs
Memory per node
Lookup latency
Number of messages
d13da3
Route(d46a1c)
A = 65a1fc
Pastry: Scalable, decentralized
object location and routing for
large-scale peer-to-peer systems
Antony Rowstron
Peter Druschel
Middleware 2001
Acknowledged:
Previous CS525 Courses
Motivation
Node IDs are assigned randomly
With high probability nodes with adjacent IDs are
diverse
Considers network locality
Seeks to minimize distance messages travel
Scalar proximity metric
#IP routing hops
RTT
Pastry: Node Soft State
Storage requirement in each node = O(log N)
Immediate Neighbors
in ID space
(Used for routing)
Used for routing
Nodes closest
according to locality
(Used to update routing
table)
Similar to
successor and
predecessor
Similar to finger
table entries
Pastry: Node Soft State
Leaf Set
Neighborhood Set
Contains L nodes, closest in the ID space
Contains M nodes, closest according to proximity metric
Routing Table
Entries of row n, shares exactly the first n digits with the
local node
Nodes are chosen according to proximity metric
Pastry: Routing
Case I
Key within leaf set
Case II (Prefix Routing)
Key not within leaf set
Route to the node in the leaf set with ID closest to key
Route a node in the routing table, such that the new node shares one
more digit with the key than the local node
Case III
Key not within leaf set
Case II not possible
Route to a node which shares at least same number of digits with the
key, but is closer to the key than the local node
Routing Example
d471f1
d467c4
d46a1c
Cuts the ID space
into 1/(2^b)
Number of hops
needed is log2^bN
d462ba
d4213f
lookup(d46a1c)
65a1fc
d13da3
Self Organization: Node Join
d471f1
Z=d467c4
X=d46a1c
d462ba
d4213f
New node: X=d46a1c
A is X’s neighbor
Route(d46a1c)
A = 65a1fc
d13da3
Pastry: Node State Initialization
New node: X=d46a1c
d471f1
Leaf set (X) = leaf set (Z)
Neighborhood set (A) =
neighborhood set (X)
Routing Table
Z=d467c4
d462ba
X=d46a1c
d4213f
Route(d46a1c)
A = 65a1fc
B= d13da3
Row zero of X = row zero
of A
Row one of X = row one
of B
Pastry: Node State Update
X informs any nodes that need to be aware of
its arrival
X also improves its table locality by requesting
neighborhood sets from all nodes X knows
In practice: optimistic approach
Pastry: Node departure (failure)
Leaf set repair (eager – all the time):
Routing table repair (lazy – upon failure):
Leaf set members exchange keep-alive messages
Request set from furthest live node in set
Get table from peers in the same row, if not found –
from higher rows
Neighborhood set repair (eager)
Routing Performance
Average no. of hops = log(N)
Pastry uses locality information
Kelips: Building an Efficient and
Stable P2P DHT through Increased
Memory and Background Overhead
Indranil Gupta, Ken Birman, Prakash Linga, Al Demers,
and Robert van Renesse
Acknowledged:
Previous CS525 Courses
IPTPS 2003
Motivation
For n=1000000 and 20 byte per entry
Storage requirement at a Pastry node = 120 byte
Not using memory efficiently
How can we achieve O(1) lookup latency?
Increase memory usage per node
Design
Consists of k virtual affinity groups
Each node member of an affinity group
Soft State
Affinity Group View
Contacts
(constant sized) set of nodes lying in each of the foreign affinity
groups
Filetuples
(Partial) set of other nodes lying in the same affinity group
(partial) set of filename and host IP address (homenode), where
homenode lies in the same affinity group
Contains RTT, heartbeat count for each of the entries
Kelips: Node Soft State
soft state
affinity group view
15
76
18
id
hbeat rttime …
18
1890 23ms
167
2067 67ms
contacts
160
group
129
167
0
contactnodes
[129, 30, … ]
1
[15, 160, …]
filetuple
30
fname
…
Affinity
Group # 0
#1
p2p.txt
…
# k-1
homenode
[18, 167, … ]
Storage Requirement @ Kelips Node
S(k,n) = n/k
Affinity groups
c * (k-1)
+ F/k entries
Contacts
Minimized at k = √( (n+F) / c )
+
Assume F is proportional to n, and c fixed
Optimal k = O(sqrt(n))
For n=1000000, and F = 10 million
Total memory requirement < 20 MB
Filetuples
Algorithm: Lookup
Lookup (key D at node A)
Affinity group G of homenode of D = hash(D)
A sends message to closest node X in the contact
set for affinity group G
X finds homenode of D from filetuple set
O(1) lookup
Maintaining Soft State
Heartbeat mechanism
Soft state entries refreshed periodically within and across
groups
Each node periodically selects a few nodes as
gossip targets and sends them partial soft state
information
Uses constant gossip message size
O(log n) complexity for gossiping
Load Balance
N = 1500 with 38 affinity groups
1000 nodes with 30 affinity groups
Discussion Points
What happens when triangular inequality does not hold
for a proximity metric?
What happens for high churn rate in Pastry and Kelips?
What was the intuition behind affinity groups?
Can we use Kelips in a Internet scale network?
Conclusion
Chord
A DHT tradeoff:
Storage requirement
Lookup Latency
Pastry
Kelips
Storage
O(log N) O(log N)
requirement
O(√n)
Lookup
Latency
O(1)
O(log N) O(log N)
Going one step further
One hop lookups for p2p overlays
http://www.usenix.org/events/hotos03/tech/talks/gupta_talk.pdf