Scalable and Deterministic Overlay Network Diagnosis
Download
Report
Transcript Scalable and Deterministic Overlay Network Diagnosis
Scalable and Deterministic
Overlay Network Diagnosis
Yao Zhao, Yan Chen
Northwestern Lab for Internet
and Security Technology (LIST)
Dept. of Computer Science
Northwestern University
http://list.cs.northwestern.edu
David Bindel
Computer Science
Division
Dept. of EECS
University of
California at Berkeley
When something breaks in the Internet, the
Internet's very decentralized structure makes
it hard to figure out what went wrong and
even harder to assign responsibility.
̶ ̶ “Looking Over the Fence at Networks:
A Neighbor's View of Networking
Research”, by Committees on Research
Horizons in Networking, National
Research Council, 2001.
Motivation
Internet diagnosis very important
To end users
To overlay network service providers (e.g., Akamai)
To Internet service providers (ISP)
But a very challenging problem due to the privacy
of network administration
Solution
E2E measurements by end users -- overlay networks
Related Works
Router based approaches [SOSP03]
Mostly ICMP based, ICMP rate limiting
Unscalable for simultaneous diagnosis
Cannot deterministically separate forward/backward path loss
Statistical approaches [MINC, INFOCOM03]
Non-deterministic: fundamentally under-constrained system
Inference based on temporal correlation in a multicast tree
Have to compromise for unicast, then sensitive to cross traffic
Optimization based on assumptions: # of lossy links small
Random sampling, linear programming, and Bayesian inference.
Unscalable: iterative refinement slow to converge for large
networks
Problem Formulation
Given an overlay of N end hosts and O(N2)
paths, to what granularity can we
deterministically diagnosis the network fault?
Assumptions:
Topology measurable
Can only measure the E2E path, not the link
Outlines
Architecture and algebraic model
Identifying virtual links
Evaluation with simulations
Internet experiments
Our Approach
Monitor a basis set of O(n·logn) paths that fully describe
the O(n2) paths
Decompose the paths into minimal deterministically
identifiable segments
Compute the loss rate for each segment for diagnosis
topology
Overlay Network
Operation Center
measurements
End hosts
Trouble
spots location
Diagnosis results:
Qwest access link:
63.232.180.230->63.232.33.134
Peering between UUNET and AOL:
64.45.216.154->172.139.89.74
Linear algebraic model
Path loss rate p, link loss rate l
1 p1 (1 l1)(1 l 2)
log( 1 l1)
log( 1 p1) log( 1 l1) log( 1 l 2) 1 1 0 log( 1 l 2)
log( 1 l 3)
x1
1 1 0 x 2 b1
x3
1
p1
3
D
2
B
C
Putting All Paths Together
Gx b, where path matrix G {0 | 1}rs
link loss rate vector x
s1
, path loss rate vector b r1
…
k rank (G) s
Usually an under - constrained system!
=
Identifiable and Unidentifiable
Vectors in the row space of G are identifiable
Otherwise, unidentifiable
x2
A
1
p1
3
D
2
B
p2
C
1 1 0
G
1
1
1
(1,-1,0)
(1,1,0)
Row(path) space
(identifiable)
(1,1,1)
(0,0,1)
x3
x1
Outlines
Architecture and algebraic model
Identifying virtual links
Evaluation with simulations
Internet experiments
Definition of Virtual Links
Uniquely identified shortest path segments
Identifiable
Consecutive
Undecomposable
b
1
1’
2
3’
4
3
2’
4’
5
4 paths, 5 links
a
e
c
d
5 virtual links
One More Example
6 paths, 8 links
4 virtual links:
Corresponding to links 1, 2, 3+4+7 and 5+6+8
respectively
3’
1
1’
2
3
4
2’
5 4’ 6
5’
7
6’
8
1
0
1
G
1
0
0
1
1
0
0
1
0
0
1
1
0
0
0
0
1
1
0
0
0
0
0
0
1
1
0
0
0
0
1
1
0
0
1
1
0
0
1
0
0
0
1
1
1
Computing Virtual Links in
Undirected Graph
Check if a vector is a virtual link
QR decomposition: G QR
v Qv ?
O(l·k) to check if a vector of length l
is in row space of G
O(l2)
0 1 0
QG
1 0 0
x3
(1,1,1)
potential virtual links in a
(1,0,0)
x1
path of length l
(0,1,0)
(1,1,0)
Total complexity
x2
Row space
2
3
2
O(l·k·l ·k)=O(l ·k )
Small constant: only 4.2 sec for 135-node network
Undirected vs. Directed Graphs
Directed graph
Any linear combination =>
incoming outgoing
Theorem: In a directed graph, no end-to-end
path contains an identifiable subpath.
Rescue: Good Path Algorithm
Identifying virtual links in undirected graphs
Use topology only
For directed graphs: additional info needed
Path loss rate
Use the link property constraint to break the deadlock
All the links in a good path are good links, i.e. no or little
loss.
Most of the paths on the Internet are good paths
System Flowchart
Monitors O(n·logn) paths that can fully describe
all the O(n2) paths (SIGCOMM04)
Inherit load balancing, monitoring adaptation, etc.
Optimization steps: find the minimal basis for identifiability test
Measure
topology
to get G
Select a
basis of
G, G, for
monitoring
Stage 1: set up scalable
monitoring system for
diagnosis
Good path
algorithm on
G
Estimated
loss rates
for all
paths in G
Reduced
paths G’’
Good path
algorithm on G
Select a
basis of G’’:
G ' QR
Reduced
paths G’
Find all
lossy virtual
links in G
Stage 2: online update the measurements and diagnosis
Outlines
Architecture and algebraic model
Identifying virtual links
Evaluation with simulations
Internet experiments
Metrics
Avg length of lossy virtual links in all lossy paths
Diagnosis granularity
The avg number of potential lossy links in a lossy path
Example (Path 1 w/ lossy VL 1 of length 5, path 2 and 3 w/ lossy
VL 2 of length 2)
Avg lossy VL length: (5+2)/2 = 3.5
Avg diagnosis granularity: (5+2+2)/3 = 3
Accuracy
Absolute error |p – p’ |
Relative error
p( ) p' ( )
F ( p, p' ) max
,
p
'
(
)
p
(
)
where p( ) max( , p), p' ( ) max( , p' )
Simulation Methodology
Topology type
Topology size
10% ~ 50%
Link loss rate distribution
1000 ~ 20000 or 184k nodes
Fraction of end hosts on the overlay network
Three types of BRITE router-level topologies
Mecator topology
LLRD1 and LLRD2 models
Loss model
Bernoulli and Gilbert
Sample of Simulation Results
(Barabasi+Gilbert)
Sample of Simulation Results
(Barabasi+Gilbert)
Results using Mercator Topology
# of end
hosts on OL
Avg
LP
# of
LP
# of links Avg LP
in LP
Length
Avg VLL
in LP
Avg BVLL
in LP
50
8.86
1459
1304
3.55(4.86) 2.56(3.54) 2.97(4.18)
100
8.8
5625
3182
3.22(4.5)
1.76(2.36) 2.21(3.11)
200
8.85
22303
7065
3.2(4.21)
1.6(2.07)
1.99(2.74)
Gibbs Sampling (Infocom03)
D
Ensemble of loss rates of links in the network
Goal
Observed packet transmission and loss at the clients
Determine the posterior distribution P(|D)
Approach
Use Markov Chain Monte Carlo with Gibbs sampling to
obtain samples from P(|D)
Draw conclusions based on the samples
Comparison with Bayesian Inference
using Gibbs Sampling (1)
Comparison with Bayesian Inference
using Gibbs Sampling (2)
Outlines
Architecture and algebraic model
Identifying virtual links
Evaluation with simulations
Internet experiments
Methodology
Planetlab
Topology measured by Traceroute
135 end hosts
Avg path length is 17.2
Path loss rate by active UDP probing
300 40-byte UDP packets per measured path
in 90 sec
Small overhead: 17.9kb if even measuring all
paths
Diagnosis Results
Loss
rate
[0,
lossy path [0.05, 1.0] (15.8%)
0.05) [0.05, 0.1) [0.1, 0.3) [0.3, 0.5) [0.5, 1.0) 1.0
%
84.2
17.2
Total end-to-end paths
15.6
24.9
15.8
26.5
18,090
Avg Path Length
17.2
After removing 79.5% good paths w/ 80.5% good links …
Avg lossy path (>5% loss rate) length
11.5 (9.0)
Avg lossy virtual link length
4.3 (3.1)
Avg Granularity
4.0 (2.7)
The numbers in () are those after removing sequential link chains.
Speed Results
On a Pentium-IV 3.2GHz PC
Average setup time (selecting 5,706 paths for
monitoring): 109.3 seconds
Diagnosis of 2,858 lossy paths: 4.2 seconds
Validation
Cross Validation
Divide 5720 paths into two sets (2860 each)
Get 571 virtual links from the first set
Check consistency with the second path set
99.1% paths in the second set are consistent
with virtual links computed by the first set.
IP Spoofing based Validation
D:b,
TTL=255
ICMP:S:a,
S:r3,
D:c,
TTL=255
UDP:
S:c,
TTL=2
D:c,
c
r1
a
r2
r3
b
IP Spoofing based Consistency
Checking
Use the function of source routing of IP
Spoofing to create new path segments
Validation is the same as cross validation
Results:
1000 new path including part of segments in
potential lossy paths
94.1% loss spoofed paths are consistent with 361 out
of 1664 lossy virtual links
5.9% paths are inconsistent with 45 virtual links
Conclusions
Propose the first deterministic and scalable
overlay diagnosis system based on a linear
algebraic approach
Diagnosis with virtual links:
Identifiable, consecutive and minimal path segments
Directed topology indecomposable to VL
Good path algorithms for rescue
Both simulation and Internet experiments show
fast & accurate diagnosis w/ optimal granularity
Backup Slides
Previous Work
“Computing the unmeasured: An algebraic approach to
Internet mapping,” INFOCOM’01
“User-level internet path diagnosis,” SOSP’03
Need the support of routers
Not accurate
“Multicast-based inference of network-internal loss
characteristics,” IEEE Transactions in Information Theory,
1999.
Can’t work on directed graph
Multicast support or unicast approximation
“Server-based inference of Internet link lossiness,”
INFOCOM'03
Can only determine whether a link is lossy or not
Distribution of Length of lossy Virtual
Links
IP Spoof Based Diagnosis