Trajectory Sampling for Direct Traffic Oberservation

Download Report

Transcript Trajectory Sampling for Direct Traffic Oberservation

Trajectory Sampling for Direct
Traffic Oberservation
N.G. Duffield and Matthias Grossglauser
IEEE/ACM Transactions on Networking, Vol. 9,
No. 3 June 2001
Problem: Which (spatial) path
does traffic take?
Circuit switched
networks (e.g.
telephone):
Per-call state is
maintained
=>trivial
IP networks:
Don’t maintain perflow information
?
Why is this interesting?

Quality of Service depends on traffic
management

Traffic control



Timescale: seconds
no human intervention
Traffic engineering




Timescale: minutes - months
Resource allocation
Pricing
Failover strategies
Options
Indirect
measurement
Direct
measurement
Uses information on
 Network model
 Network state
Direct observation of
traffic at multiple
points in the
network
Problems with indirect
measurement




Behavior of network elements depends
on vendor-specific design choices
Deliberate sources of randomness to
avoid collision
Events outside domain (route
advertising by neighboring domains)
Interactions may be too complex to
predict
Direct measurement:
Sampling of packets


Sample packets that traverse each link
Subset of packets used as
representative
Problem:
How do we get the actual path?
Key idea of the paper



Use a deterministic hash function over
the packet’s content to determine
subset of packets
Use the same hash function throughout
the domain
Use second hash function to label
packets
Theory


Measurement domain represented as a
directed graph
Packets



enter at ingress node
exit at egress node
Invariance function
Packet content without changing fields, e.g.
time-to-live field which is decremented each hop

Sampling Hash Function




Decides whether or not a given packet
should be sampled
Deterministic function of the invariant
packet content
Same function on each link
Results in L-bit binary number
Identification Hash Function




Entire packet content could be used
Aim: limit traffic to measurement
collection system
Results in m-bit binary number
Additional information may be included


Length of packet
Source, destination
Invariant content
Header: three categories of fields


Variable fields (not included)


Low entropy fields (not included)



Content changes little between packets
E.g., version, header length, protocol
High entropy fields (included)


E.g., TTL, header checksum, etc.
Source and destination IP, etc.
Part of remainder of packet
Ambiguities (f-h)
Dealing with ambiguities


Probability that trajectory can be
disambiguated depends on network
topology and traffic => renormalization
of results necessary
Safer to discard all duplicate labels
(greater loss of samples)
Specification of Hash
Functions



Ordered bits of invariant part of packet
content x are considered as binary
integers: f(x)
Sampling hash function
h(f(x)) = f(x) mod A
Identification hash function
g(f(x)) = f(x) mod B
with A, B positive integers
Identical Packets
Automatically ambiguous
=> lead to biased estimators
Question:
 How much packet content is needed to
avoid collisions?
Answer:
 40 bytes lead to collision probability
smaller than 10-3

Implementation of hashing



40 byte “numbers” are represented by
vector of 16 bit words
z = (zk ,zk-1,…,z0) = Si=0k zi 216i
Use 32 bit long division
Iteratively compute
(zk ,zk-1,…,z0) mod A
= (zk-1+ 216(zk mod A),…,z0) mod A
Sampling independent of
packet content?


Note: IP address of source and
destination are included in the invariant
content!
Chi-squared test


40 byte packet prefix
=> 95% confidence level
20 byte packet prefix results in strong
dependence
Optimal Sampling

Tradeoff



More unambiguous samples
=> more accuracy
More samples
=> more measurement traffic
Optimize for given measurement traffic mn
(m bits per sample, n samples)
 Small m increases collisions
 Large m means smaller n
(Question to the authors


Doesn’t the measurement traffic itself
get sampled and thereby add another
source of error?
… may be part of their future work
statement)
Example
Service provider wants to determine what
fraction of packets on a certain backbone link
belongs to a certain customer
 Compare



customer packets observed both on backbone and
on access link
Total number of packets observed on backbone
Real and estimated fractions largely within
error bars
Implementation issues


Can trajectory sampling be part of next
generation of high-speed interfaces?
Authors claim “yes”:



Compute both hash functions in parallel
Processor cost negligible compared with
cost of interface cards
Processor speed doubles every 18 months,
maximum trunk speed every 21 months
Other Common Approaches

Aggregation-based
approaches

e.g., sum of packets
traversing a link

Sampling-based
approaches

sample subset of
observations
Aggregation-based
Approaches

Link measurements (direct)



Traffic statistics (# of bytes / # of packets
transferred / dropped)
Measurements reported periodically
Flow aggregation (indirect)


Flow: sequence of packets with common
field in header
Relies on emulation of routing protocol
Sampling-based Approaches

Active end-to-end probes (direct)

Hosts send probe packets to one or more
other hosts




Packet loss rate
Round-trip delay
End-to-end path characteristics
Variation: collect and exchange
measurements of multicast session
Related Work

Measure end-to-end performance of
individual flows


ATM cells sampled at ingress and egress
points
Determine QoS for a single connection,
e.g., delay and loss rate
Extensions and Other
Applications

Distributed denial of service attacks


Filtering


Attackers use packet spoofing
A configurable packet filter may allow
trajectory sampling for a subset of packets
Probe Packets

Packet content may be constructed to
ensure sampling
Conclusions



Simple processing
No Router state required
Packets directly observed