Network Tomography: From Link Counts to the OD Matrix

Download Report

Transcript Network Tomography: From Link Counts to the OD Matrix

Trajectory Sampling for
Direct Traffic Observation
Matthias Grossglauser
joint work with Nick Duffield
AT&T Labs – Research
Traffic Engineering
Two large flows
overload!
Traffic Engineering
overload!
New egress point
for first flow
Multi-homed customer
Traffic Engineering
OSPF shortest path splitting
overload!
Traffic Engineering
• Goal: domain-wide control & management to
– Satisfy performance goals
– Use resources efficiently
• Knobs:
– Configuration & topology: provisioning, capacity
planning
– Routing: OSPF weights, MPLS tunnels, BGP
policies,…
– Traffic classification (diffserv), admission control,…
• Measurements are key: closed control loop
– Characterize demand: what’s coming in?
– Observe network state: how is the network
reacting? (low-level adaptivity!)
– Check performance: what’s the customer’s QoS?
Traffic Matrix vs. Path Matrix
• Traffic matrix
– # bytes from ingress i to egress j
• Path matrix
– Spatial flow of traffic through domain
– # bytes for every path from i to j
Flow Measurement
flow 1
flow 2
flow 3
flow 4
• IP flow abstraction
– Set of packets with “same” src and dest IP
addresses
– Packets that are “close” together in time (a few
seconds)
• Cisco NetFlow
– Router maintains a cache of statistics about active
flows
– Router exports a measurement record for each
flow
Inferring the Path Matrix
from the Traffic Matrix
Network State Uncertainty
• Hard to get an up-to-date snapshot of…
• …routing
–
–
–
–
Large state space
Vendor-specific implementation
Deliberate randomness
Multicast
• …element states
– Links, cards, protocols,…
• …element performance
– Packet loss, delay at links
missing alarms
missing “down” alarms
noise
spurious down
Direct Traffic Observation
• Goal: direct observation
– No network model & state estimation
• Basic idea:
– Sample packets at each link
– Sampling decision based on hash over packet
content
– Consistent sampling  trajectories
– Labels based on second hash function
• Exploit entropy in packet content to obtain
statistically representative set of trajectories
Sampling and Labeling
• Fields of interest collected only once
• Multicast: trajectory is a tree
Fields Included in Hashes
Collisions: Identical Packets
Sampling and Labeling Hashes
• x: subset of packet bits, represented as
binary number
• Sampling hash
– h(x) = x mod A
– Sample if h(x) < r
– r/A: thinning factor
• Labeling hash
– g(x) = x mod M
• Make appropriate choice of A, M
– predictable patterns should “mix” well
Pseudo-Random Sampling
• Goal: infer metrics of interest from
trajectory samples
– E.g., what fraction of traffic of customer x
on a link y?
• Question: is sample set statistically
representative?
– Obvious for “really random” sampling
– Distribution of a field in the sampled subset
= real distribution?
– In other words: does the complement of
the field provide enough entropy?
Quality of
Deterministic Sampling
• Experiment: statistical test to check if sampled
and full distributions are close
– Chi-square statistic to verify independence
hypothesis
– Hypothesis: sampled distribution consistent with
full distribution
m01 m02 ... m0 I
n j : # samples in bin j
mij : # packets (un)sampled in bin j m11 m12 ... m1I
1
I
n1
n2 ... nI
(mij  m ij )2
T  
i 0 j 1
mij
– Confidence
level C(T) for hypothesis, where C is
2
cdf of  with I-1 degrees of freedom
m0
m1
n
Chi-square Test on Source Address
If C (T )  1   , then accept hypothesis
Bitwise Independence
• 2x2 contingency table formed by
– sampling decision
– l-th bit of packet
Optimal Sampling
• Fix amount of measurement traffic c per time
period
• Problem:
–
–
–
–
n: number of samples in sampling period
M: alphabet size, m=log2(M) bits/label
nm: total amount of measurement traffic [bits]
Goal: maximize # unique labels, subject to nm<c
• Result:
– optimal alphabet size M*=c log(2)
– optimal number of samples n*=M*/log(M*)
– example: c=1Mb/period 
Label Collisions and
Trajectory Ambiguity
Ambiguity cont.
• Rule for acyclic subgraphs + unicast packets:
– unambiguous if each connected component of the subgraph is
• (a) a source tree
• (b) a sink tree without loss
Inference
Experiment
• Experiment:
infer from trajectory samples
– Estimate fraction of traffic from customer
– Source address  customer
– Source address  sampling + label
• Fraction of customer traffic on backbone link:
̂
# unique labels common on b, c
ˆ 
# unique labels on b
Estimated Fraction (c=1000bit)
Estimated Fraction (c=10kbit)
Sampling Device
MPLS: simple additional logic to look “behind” label stack
Sampling Device Implementation
• Interface vs. processing speed
– OC-192: 10 Gbps
– State of the art DSP:
• Proc: 600M MACs x 32 bit: 20 Gbps
• I/O: 300MHz x 256 bit: 70 Gbps
– Moore’s law vs. interface speed growth
• Vendor interest: cisco, juniper, avici
• Advantages
Summary
– Trajectory sampling estimates path matrix
…and other metrics: loss, link delay
– Direct observation: no routing model + network state
estimation
– No router state
– Multicast (source tree), DDoS (sink tree)
– Control over measurement overhead
– Small measurement delay
• Disadvantages
– Requires support on linecards
• Open questions & research problems
– Collection, storage, querying (in progress)
– Management interface