Trajectory Sampling for Direct Traffic Oberservation
Download
Report
Transcript Trajectory Sampling for Direct Traffic Oberservation
Trajectory Sampling for Direct
Traffic Oberservation
N.G. Duffield and Matthias Grossglauser
IEEE/ACM Transactions on Networking, Vol. 9,
No. 3 June 2001
Problem: Which (spatial) path
does traffic take?
Circuit switched
networks (e.g.
telephone):
Per-call state is
maintained
=>trivial
IP networks:
Don’t maintain perflow information
?
Why is this interesting?
Quality of Service depends on traffic
management
Traffic control
Timescale: seconds
no human intervention
Traffic engineering
Timescale: minutes - months
Resource allocation
Pricing
Failover strategies
Options
Indirect
measurement
Direct
measurement
Uses information from
both
Network model
Network state
Direct observation of
traffic at multiple
points in the
network
Problems with indirect
measurement
Behavior of network elements depends
on vendor-specific design choices
Deliberate sources of randomness to
avoid collision
Events outside domain (route
advertising by neighboring domains)
Interactions may be too complex to
predict
Direct measurement:
Sampling of packets
Sample packets that traverse each link
Subset of packets used as
representative
Problem:
How do we get the actual path?
Key idea of the paper
Use a deterministic hash function over
the packet’s content to determine
subset of packets
Use the same hash function throughout
the domain
Use second hash function to label
packets
Theory
Measurement domain represented as a
directed graph
Packets
enter at ingress node
exit at egress node
Invariance function
Packet content without changing fields, e.g.
time-to-live field which is decremented each hop
Sampling Hash Function
Decides whether or not a given packet
should be sampled
Deterministic function of the invariant
packet content
Same function on each link
Mod function
# of bins chosen to achieve desired
sampling probability
Identification Hash Function
Entire packet content could be used
Aim: limit traffic to measurement
collection system
Hash results in m-bit binary number
Additional information may be included
Length of packet
Source, destination
Invariant content
Header: three categories of fields
Variable fields (not included)
Low entropy fields (not included)
Content changes little between packets
E.g., version, header length, protocol
High entropy fields (included)
E.g., TTL, header checksum, etc.
Source and destination IP, etc.
Part of remainder of packet
Ambiguities (f-h)
Dealing with ambiguities
Probability that trajectory can be
disambiguated depends on network
topology and traffic => renormalization
of results necessary
Safer to discard all duplicate labels
(greater loss of samples)
Specification of Hash
Functions
Ordered bits of invariant part of packet
content x are considered as binary
integers: f(x)
Sampling hash function
h(f(x)) = f(x) mod A
Identification hash function
g(f(x)) = f(x) mod B
with A, B positive integers
Identical Packets
Automatically ambiguous
Question:
How much packet content is needed to
avoid collisions?
Answer:
40 bytes lead to collision probability
smaller than 10-3
Implementation of hashing
40 byte “numbers” are represented by
vector of 16 bit words
z = (zk ,zk-1,…,z0) = Si=0k zi 216i
Use 32 bit long division
Iteratively compute
(zk ,zk-1,…,z0) mod A
= (zk-1+ 216(zk mod A),…,z0) mod A
Sampling independent of
packet content?
Note: IP address of source and
destination are included in the invariant
content!
Chi-squared test
40 byte packet prefix
=> 95% confidence level
20 byte packet prefix results in strong
dependence
Optimal Sampling
Tradeoff
More unambiguous samples
=> more accuracy
More samples
=> more measurement traffic
Optimize for given measurement traffic mn
(m bits per sample, n samples)
Small m increases collisions
Large m means smaller n
Unique Samples as function of
Total samples (label bits fixed)
Practical Example
Domain with 100 (OC-192) links
Measurement epoch 10s
Measurement traffic < 108 bits
n* = 3.8x106 (samples to collect)
Optimal sampling suggests
One sample per 217 packets
Labels 26 bits long
(Question to the authors
Doesn’t the measurement traffic itself
get sampled and thereby add another
source of error?
… may be part of their future work
statement)
Example
Service provider wants to determine what
fraction of packets on a certain backbone link
belongs to a certain customer
Compare
customer packets observed both on backbone and
on access link
Total number of packets observed on backbone
Real and estimated fractions largely within
error bars
Implementation issues
Can trajectory sampling be part of next
generation of high-speed interfaces?
Authors claim “yes”:
Compute both hash functions in parallel
Processor cost negligible compared with
cost of interface cards
Processor speed doubles every 18 months,
maximum trunk speed every 21 months
Other Common Approaches
Aggregation-based
approaches
e.g., sum of packets
traversing a link
Sampling-based
approaches
sample subset of
observations
Aggregation-based
Approaches
Link measurements (direct)
Traffic statistics (# of bytes / # of packets
transferred / dropped)
Measurements reported periodically
Flow aggregation (indirect)
Flow: sequence of packets with common
field in header
Relies on emulation of routing protocol
Sampling-based Approaches
Active end-to-end probes (direct)
Hosts send probe packets to one or more
other hosts
Packet loss rate
Round-trip delay
End-to-end path characteristics
Variation: collect and exchange
measurements of multicast session
Related Work
Measure end-to-end performance of
individual flows
ATM cells sampled at ingress and egress
points
Determine QoS for a single connection,
e.g., delay and loss rate
Extensions and Other
Applications
Distributed denial of service attacks
Filtering
Attackers use packet spoofing
A configurable packet filter may allow
trajectory sampling for a subset of packets
Probe Packets
Packet content may be constructed to
ensure sampling
Conclusions
Simple processing
No Router state required
Packets directly observed