A Shared Infrastructure for Overlay Applications

Download Report

Transcript A Shared Infrastructure for Overlay Applications

Infrastructure Primitives for
Overlay Networks
Karthik Lakshminarayanan
(with Ion Stoica and Scott Shenker)
SAHARA/i3 Retreat – Summer, 2003
1
Goal: Share Overlay Functionality
What do overlays share?
–Underlying IP infrastructure (of course!)
–Underlying hardware (maybe, e.g. PlanetLab)
Why not share…
• Higher level overlay functionality
– Each application designs overlay routing from scratch
– Lower deployment barrier: design effort & deployment expense
• Network weather information
– Each application performs probes to find good overlay paths
– Reduce overlay maintenance overhead
2
Diverse Overlay Requirements
What are the requirements for supporting most of
the overlays applications?
• Routing control
– Adaptive routing based on application sensitive
metrics
– Measurements of the virtual link characteristics
• Data manipulation
– Manipulate/store (e.g. transcode) data in the path to
the destination
3
Our Approach
• Embed in the infrastructure:
– Low-level routing mechanisms, e.g. forwarding,
replication
• Third-party services:
– Services are implemented at end-hosts, shared
using an open interface
– Information for making routing decisions, e.g.
measurements of path delay, loss-rate, bandwidth
• At the end-hosts:
– Not shared at all, e.g. policies for choosing paths
4
Outline
•
•
•
•
•
•
Motivation and Challenges
Infrastructure Primitives
Network Measurements
System Architecture – Weather Service
Experiments
Some Applications
5
Path Selection
n1
m
n2
m
m
R
R’
• Similar to “loose
source routing”
• End-hosts specify
points through which
packet is routed
• Routing between the
specified points
handled by IP
6
Path Replication
n1
m
m
n2
m1
m
R
R’
• End-host specify
that a particular
packet be
replicated at a
node and then sent
along a path
7
Infrastructure Primitives
• Path Selection
• Packet Replication
Claim: This is enough to do (i) Adaptive routing (ii)
Measurements (iii) Data manipulation
Why this approach?
• Control path must be outside – collective knowledge
to decide what to monitor
• No difference between data and measurement traffic
– better security, nodes have no incentive to lie
8
Implementation alternatives
• At the IP layer:
– Path selection
• Implemented in the form of loose source routing
• Requires path in the packet header
– Path replication requires a new primitive
• Why we chose i3:
– Implements the two primitives without any changes
• Path selection: Set up routing state beforehand (instead of in the header)
– Robustness to node failures
– We know it well!
This is one possible realization, and not the only one
9
Outline
•
•
•
•
•
•
Motivation and Challenges
Infrastructure Primitives
Network Measurements
System Architecture – Weather Service
Experiments
Some Applications
10
Metrics of measurement
•
•
•
•
Round-trip delay
Loss-rate
Available bandwidth
Bottleneck bandwidth
… in the process, demonstrate the versatility of the primitives
11
Round-trip Delay
n1
m1
n2
• Use path selection primitive
to send packet m along
m1
R→n1→R
• Use path selection in
m
m
conjunction with packet
replication to send packet
m1
along R→n1→n2→n1→R
R
• Difference yields the RTT of
To measure: RTT(n1→n2) the link (n1↔n2)
12
Round-trip Delay
n1
m1
n2
• Use path selection primitive
to send packet m along
m1
R→n1→R
• Use path selection in
m1
m
conjunction with packet
replication to send packet
along R→n1→n2→n1→R
R
• Difference yields the RTT of
To measure: RTT(n1→n2) the link (n1↔n2)
13
One-way Loss Rate
n1
m1
n2
m1
m
m1
m2
• m2 used to differentiate loss
on (n1→n2) from that on
(n2→n1)
• (m Λ ~m1 Λ ~m2)  loss
on virtual link (n1→n2)
– False positives
– False negatives
R
To measure l(n1→n2)
• Probability of false
positives/negatives ≈ O(p2 )
14
Available Bandwidth
• Come to the poster session.
15
Outline
•
•
•
•
•
•
Motivation and Challenges
Infrastructure Primitives
Network Measurements
System Architecture – Weather Service
Experiments
Some Applications
16
What we envision
Client A
Network measurements
Query/reply routing info.
Setup routes
Weather
Service 1
Weather
Service 2
Client D
Client B
Client C
Challenge: To make the measurements scale to an
infrastructure of 1000s of nodes
17
Outline
•
•
•
•
•
•
Motivation and Challenges
Infrastructure Primitives
Network Measurements
System Architecture – Weather Service
Experiments
Some Applications
18
Experiments: Delay Estimation
• More than 92% of the samples have error < 10%
• If we consider median over 15 consecutive samples, 98.3% of
19
the samples have error < 10%
Experiments: Loss-Rate Estimation
• Accuracy of 90% in over 89% of the cases (after
filtering the few nodes with high losses)
20
Experiments: Avail-BW Estimation
• Within a factor of two for 70% of the pairs
• Avail-BW is not static, so this is reasonable
21
How applications can use this
• Adaptive routing:
– End-hosts query the WS and construct the overlay
– Quality of paths depends on how sophisticated the WS is
– No changes to infrastructure if metrics change
• Multicast:
– Union of different unicast paths that the WS returns
– Number of replicas is no larger than the degree of the
overlay graph
• Finding closest replica:
– Client queries the WS to get the best among a set of nodes
– WS may export an API that allows this*
22
Multicast experiment
• Nodes at 37 sites in PlanetLab (1-3 per site).
• Delay-optimized multicast tree rooted at Stanford
– Union of delay-optimized unicast paths
– 90% of the nodes had RDP < 1.38; 99.7% of the nodes had RDP < 2
23
Summary of design
• Minimalist infrastructure functionality
• Delegate routing to applications
– Applications know their requirements best
• Delegate performance measurements to thirdparty applications
– Allows this to evolve to meet changing requirement
24
Open questions & Future work
• Why minimalist design?
– Why not more primitives? E.g. For supporting QoS
• What if path characteristics are correlated?
– Shared bottleneck
– Losses at the egress/ingress link
• Sub-problems
– By having incomplete information about network weather, how much
do we lose (if at all)?
– How much does accuracy of measurements affect the final outcome?
– If the underlying routing is bad, what is the diversity of such an
overlay needed to do a good job?
• Design API and develop applications based on it
25