route around failure

Transcript route around failure

Overlays and DHTs
Presented by
Dong Wang and Farhana Ashraf
Schedule




Review of Overlay and DHT
RON
Pastry
Kelips
Review on Overlay and DHT

Overlay
 Network build on top of another network
 Nodes connected to each other by logical/virtual links
 Improve Internet Routing and Easy to deploy
 P2P(Gnutella, Chord…), RON

DHT
 Allows you to do lookup, insert, delete objects with keys in a
distributed settings
 Performance Concerns





Load Balancing
Fault Tolerance
Efficiency of lookups and inserts
Locality
CAN, Chord, Pasrty, Tapestry are all DHTs
Resilient Overlay Network
Acknowledged:
http://nms.csail.mit.edu/ron/
Previous CS525 Courses
David G. Andersen, etc.
MIT
SOSP 2001
RON-Resilient Overlay Network





Motivation
Goal
Design
Evaluation
Discussion
RON-Motivation

Current Internet Backbone NOT be able to
• Detect failed path and recover quickly
• BGP takes several mins to recover from faults
• Detect flooding and congestion effectively
• Leverage redundant path efficiently
• Express fine-grained policy/metrics
RON-Basic Idea
B
A
D
C
 Inequality of Triangles does not usually hold for Internet!
-Eg. Latency-It is possible that: AB+BC<AC
 RON makes use of underlying path redundancy of Internet to
provide better path and route around failure
 RON is end to end solution, packets are simply wrapped
around and sent normally
RON-Main Goals

Fast Failure Detection and Recovery


Tighter integration of routing/path selection
with the application


Average detect and recover delay<20s
Application can specify metrics to affect routing
Expressive Policy Routing

Fine-grained and aim at users and hosts
RON-Design

Overlay




Old idea in networks
Easily deployed and let Internet focus on
scalability
Only keep functionality between active peers
Approach



Aggressively probe all inter-RON node paths
Exchange routing information
Route along best path (from end to end view)
consistent with routing policy
RON-Design: Architecture




Probe between nodes, detect path quality
Store path qualities at Performance Database
Link-state routing protocol among nodes
Data are handled by application-specific conduit and
forwarded in UDP
RON-Design: Routing and Lookup

Policy routing



Metric optimization



Classify by policy
Generate table per policy
Application tags the packet
with its specific metric
Generate table per metric
Multi-level routing table and
3 stage lookup

Policy->Metric->Next hop
RON Design-Probing and Outage Detect





Probe every PROBE_INTERVAL (12s)
With 3 packets, both participants get an RTT and reachability
without syn. Clocks
If probe is lost, send next immediately, up to 3 more probes
(PROBE_TIMEOUT 3s)
Notify outage after 4 consecutive probe loses
Outage detection time on average=19 s
RON Design-Policy Routing

Allow user to define types of traffic allowed on
particular network links

Place policy tags on packets and build up policy
based routing table

Two policy mechanisms


exclusive cliques
general policies
RON-Evaluation

Two main dataset from Internet deployment



RON1-N=12 nodes, 132 distinct paths, traverse
36 AS and 74 Inter-AS paths
RON2-N=16 nodes, 240 distinct paths, traverse
50 AS and 118 Inter-AS paths
Policy-prohibit sending traffic to or from
commercial sites over the Internet2
RON Evaluation-Major Results

Increase the resilience of the overlay network

RON takes ~10s to route around failure





Compared to BGP’s several minutes
Many Internet outage are avoidable
Improve performance –Loss rate, Latency,
TCP Throughput
Single-hop indirect routing works well
Overhead is reasonable and acceptable
RON vs Internet 30 minute loss rates
For RON1-able to route around all outages;
For RON2-about 60% outages are
overcome
Performance-Loss rate
Performance-Latency
Performance-TCP Throughput
Performance Improvement: RON employs Appspecific metric optimization to select path
Scalability
Conclusions




RON improves network reliability and
performance
Overlay approach is attractive for resiliency:
development, fewer nodes, simple substrate
Single-hop indirection in RON works well
RON also introduces more probing and
updating traffic into network
RON-Discussion

Aggressiveness





RON never back-off as TCP does
Can it coexist with current traffic on Internet?
What happens if everyone starts to use RON?
Is it possible to modify RON to achieve good
behavior in a global sense
Scalability



Trade scalability for improved reliability
Many RONS coexisting in the Internet
Hierarchical structure of RON network
Distributed Hash Table
Problem


Route a msg with key, K to
the node, Z which has a ID
closest to key K
Not scalable if routing table
contains all the nodes
d471f1
d467c4
d462ba
X=d46a1c
d4213f

Tradeoffs



Memory per node
Lookup latency
Number of messages
d13da3
Route(d46a1c)
A = 65a1fc
Pastry: Scalable, decentralized
object location and routing for
large-scale peer-to-peer systems
Antony Rowstron
Peter Druschel
Middleware 2001
Acknowledged:
Previous CS525 Courses
Motivation

Node IDs are assigned randomly


With high probability nodes with adjacent IDs are
diverse
Considers network locality

Seeks to minimize distance messages travel

Scalar proximity metric


#IP routing hops
RTT
Pastry: Node Soft State
Storage requirement in each node = O(log N)
Immediate Neighbors
in ID space
(Used for routing)
Used for routing
Nodes closest
according to locality
(Used to update routing
table)
Similar to
successor and
predecessor
Similar to finger
table entries
Pastry: Node Soft State

Leaf Set


Neighborhood Set


Contains L nodes, closest in the ID space
Contains M nodes, closest according to proximity metric
Routing Table


Entries of row n, shares exactly the first n digits with the
local node
Nodes are chosen according to proximity metric
Pastry: Routing

Case I
 Key within leaf set


Case II (Prefix Routing)
 Key not within leaf set


Route to the node in the leaf set with ID closest to key
Route a node in the routing table, such that the new node shares one
more digit with the key than the local node
Case III
 Key not within leaf set
 Case II not possible

Route to a node which shares at least same number of digits with the
key, but is closer to the key than the local node
Routing Example
d471f1
d467c4
d46a1c

Cuts the ID space
into 1/(2^b)

Number of hops
needed is log2^bN
d462ba
d4213f
lookup(d46a1c)
65a1fc
d13da3
Self Organization: Node Join
d471f1
Z=d467c4
X=d46a1c
d462ba
d4213f
New node: X=d46a1c
A is X’s neighbor
Route(d46a1c)
A = 65a1fc
d13da3
Pastry: Node State Initialization
New node: X=d46a1c
d471f1

Leaf set (X) = leaf set (Z)

Neighborhood set (A) =
neighborhood set (X)

Routing Table
Z=d467c4
d462ba
X=d46a1c
d4213f


Route(d46a1c)
A = 65a1fc
B= d13da3
Row zero of X = row zero
of A
Row one of X = row one
of B
Pastry: Node State Update

X informs any nodes that need to be aware of
its arrival


X also improves its table locality by requesting
neighborhood sets from all nodes X knows
In practice: optimistic approach
Pastry: Node departure (failure)

Leaf set repair (eager – all the time):



Routing table repair (lazy – upon failure):


Leaf set members exchange keep-alive messages
Request set from furthest live node in set
Get table from peers in the same row, if not found –
from higher rows
Neighborhood set repair (eager)
Routing Performance
Average no. of hops = log(N)
Pastry uses locality information
Kelips: Building an Efficient and
Stable P2P DHT through Increased
Memory and Background Overhead
Indranil Gupta, Ken Birman, Prakash Linga, Al Demers,
and Robert van Renesse
Acknowledged:
Previous CS525 Courses
IPTPS 2003
Motivation

For n=1000000 and 20 byte per entry

Storage requirement at a Pastry node = 120 byte

Not using memory efficiently

How can we achieve O(1) lookup latency?

Increase memory usage per node
Design

Consists of k virtual affinity groups

Each node member of an affinity group

Soft State
 Affinity Group View


Contacts


(constant sized) set of nodes lying in each of the foreign affinity
groups
Filetuples


(Partial) set of other nodes lying in the same affinity group
(partial) set of filename and host IP address (homenode), where
homenode lies in the same affinity group
Contains RTT, heartbeat count for each of the entries
Kelips: Node Soft State
soft state
affinity group view
15
76
18
id
hbeat rttime …
18
1890 23ms
167
2067 67ms
contacts
160
group
129
167
0
contactnodes
[129, 30, … ]
1
[15, 160, …]
filetuple
30
fname
…
Affinity
Group # 0
#1
p2p.txt
…
# k-1
homenode
[18, 167, … ]
Storage Requirement @ Kelips Node

S(k,n) = n/k
Affinity groups

c * (k-1)
+ F/k entries
Contacts
Minimized at k = √( (n+F) / c )



+
Assume F is proportional to n, and c fixed
Optimal k = O(sqrt(n))
For n=1000000, and F = 10 million

Total memory requirement < 20 MB
Filetuples
Algorithm: Lookup

Lookup (key D at node A)




Affinity group G of homenode of D = hash(D)
A sends message to closest node X in the contact
set for affinity group G
X finds homenode of D from filetuple set
O(1) lookup
Maintaining Soft State

Heartbeat mechanism

Soft state entries refreshed periodically within and across
groups

Each node periodically selects a few nodes as
gossip targets and sends them partial soft state
information

Uses constant gossip message size

O(log n) complexity for gossiping
Load Balance
N = 1500 with 38 affinity groups
1000 nodes with 30 affinity groups
Discussion Points

What happens when triangular inequality does not hold
for a proximity metric?

What happens for high churn rate in Pastry and Kelips?

What was the intuition behind affinity groups?

Can we use Kelips in a Internet scale network?
Conclusion




Chord
A DHT tradeoff:
Storage requirement
Lookup Latency
Pastry
Kelips
Storage
O(log N) O(log N)
requirement
O(√n)
Lookup
Latency
O(1)
O(log N) O(log N)
Going one step further

One hop lookups for p2p overlays

http://www.usenix.org/events/hotos03/tech/talks/gupta_talk.pdf

route around failure

Transcript route around failure

Directory