Analyzing network traffics
Download
Report
Transcript Analyzing network traffics
Internet Measurement, Section 6.3
Analyzing Network Traffics
Mohammad Hassan Hajiesmaili
ECE Department, University of Tehran
Fall 2009
Outline
Packet Capture
Data Management
Data Reduction
Inference
2
Packet Capture
Passive Traffic Management
Packet Capture in General Purpose Systems
Packet Capture in Special Purpose Systems
Control Plane Traffic
3
Packet Capture in General Purpose Systems
Libpcap
Promiscuous mode
Packet Filter
Report all packet received
Specify how much of each packet should be captured
Parsing tools
Tcpdump
Ethereal
Commercial products
4
Packet Capture in General Purpose Systems
Broadcast LANs
Switched LANs
Libpcap
Port Mirroring
Unrestricted Access
Pktd & scriptroute
Set Access policies by system Admin
5
Packet Capture in Special Purpose Systems
Monitoring Core Network Links
In Optical Scenarios the link speed is faster
than PCI bus
Specialized Network Interface and Interface Driver
Some Example
OC3MON
OC12MON
6
Control Plane Traffic
Capture control packet traffic
Example
Local view of BGP system
Establishing a session with a BGP-speaking router
Tools
GNU Zebra
Quagga
Can capture other routing traffic
OSPF
RIP
7
Data Management
Full Packet Capture is challenging
Limited bus bandwidth
Limited memory access speed
Limited speed and capacities of disk array
Limited processing power
Specialized tools
Using sophisticated algorithms for operating on large
stream of data
Example:
Smacq
Windmill
8
Data Management
Database management problem
Very large data sets
Incrementally over time
Continuous queries
Solution
Data Stream Management
Example:
Tribeca
STREAMS
TelegraphCQ
Gigascope
Queries are expresses in GSQL
9
Looking at the traffic
Too much data for
a human
Do something
smarter!
10
Data Reduction
Traffic Counters
Flow Capture
Sampling
Summarization
Dimensionality Reduction
Probabilistic Model
11
Counter
Use aggregation to form time series of counts
of traffic statistics
Time series are constructed by periodic polling
Bytes or packets per unit time
Generically called SNMP counts
Benefits:
Capturing without much performance impact on
router
Extremely compact compared to traffic traces
12
Counters - Drawback
SNMP transport is via UDP
Measurement packets can be lost
Difficulty in obtaining synchronized time
series across multiple interfaces
Too coarse-grained for many needs
13
Flow Capture
Counters provide basic information
Almost all traffic semantics are absent
Capture and store packet trains or flows
14
Packet Train
A burst of packets arriving from the same
source and heading to the same destination.
If the spacing between two packets exceeds
some inter-train gap, they are said to belong
to different trains.
15
Capture Packet Train
Can be used for
Monitoring basic network activity
Monitoring users and applications
Network planning
Security analysis
Accounting and Billing
Tools for capturing are present in all major
routers
16
Packet Train Record Content
IP header (5-tuple)
Source IP address, Source TCP port, Destination IP
address, Destination port, Protocol ID
Start Time
End Time
Number of Packet
Number of bytes contained in the packet train
Dramatic decrease in trace size compare to full
packet capture
17
Packet Flows
Capturing packet flows rather than packet
trains
Require higher level software for processing
and interpreting the raw data
18
Flow Capture
IETF standards for flow capture
IP Flow Information Export effort (IPFIX)
For Exporters: Providers of flow data
For Collectors: Consumers of flow data
Real Time Flow Metering (RTFM)
Meter MIB that can be accessed via SNMP
Meter Readers: Collect flow data
Managers: Coordinate meters and meter readers
19
Sampling
In sampling scheme, a subset of packets are
chosen for capture
Two important question
How should packet be chosen for sampling?
How should one correct or compensate for the
sampling process when performing analysis?
Basic packet sampling
Trajectory sampling
20
Basic Packet Sampling
The sampling process is performed
independently on each link being monitored
Two category:
Variable rate sampling
Constant rate sampling
Random sampling
Deterministic sampling
Stratified sampling
21
Trajectory Sampling
Basic Packet Sampling
Trajectory Sampling Idea
Packet capture at multiple points in a network
Can not obtain per-packet delay
If a packet is chosen for sampling at any point in
the network, it is chosen at all point in the
network.
Idea of implementation
Calculation of hash function for each packet
22
Trajectory Sampling - Advantages
Easy to obtain metrics on customer performance
Per-customer packet delay
Detect routing loops
Trace denial of service attacks
23
Summarization
Form compact summaries of large volume of
data
Bloom Filter
Sketches
Other Approaches
24
Review: Bloom Filters
Given a set S = {x1,x2,x3,…xn} on a universe U,
want to answer queries of the form:
Is y S .
Bloom filter provides an answer in
“Constant” time (time to hash).
Small amount of space.
But with some probability of being wrong.
Alternative to hashing with interesting
tradeoffs.
25
Bloom Filters
Start with an m bit array, filled with 0s.
B
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
1
0
1
0
Hash each item xj in S k times. If Hi(xj) = a, set B[a] = 1.
B
0
1
0
0
1
0
1
0
0
1
1
1
0
To check if y is in S, check B at Hi(y). All k values must be 1.
B
0
1
0
0
1
0
1
0
0
1
1
1
0
Possible to have a false positive; all k values are 1, but y is not in S.
B
0
n items
1
0
0
1
0
1
0
m = cn bits
0
1
1
1
0
1
k hash functions
26
False Positive Probability
Pr( specific bit of filter is 0) is
p ' (1 1 / m) kn e kn / m p
If r is fraction of 0 bits in the filter then false
positive probability is
(1 r ) k (1 p' ) k (1 p) k (1 e k / c ) k
Find optimal at k = (ln 2)m/n by calculus.
n items
So optimal FPP is about (0.6185)m/n
m = cn bits
k hash functions
27
Example
False positive rate
0.1
0.09
0.08
m/n = 8
0.07
0.06
0.05
0.04
0.03
Opt k = 8 ln 2 = 5.45...
0.02
0.01
0
0
1
2
3
4
5
6
7
8
9
10
Hash functions
n items
m = cn bits
k hash functions
28
Handling Deletions
B
Bloom filters can handle insertions, but not
deletions.
0
1
0
0
1
0
xi
xj
1
0
0
1
1
1
0
1
1
0
If deleting xi means resetting 1s to 0s, then
deleting xi will “delete” xj.
29
Counting Bloom Filters
Start with an m bit array, filled with 0s.
B
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
1
0
1
1
0
1
1
0
Hash each item xj in S k times. If Hi(xj) = a, add 1 to B[a].
B
0
3
0
0
1
0
2
0
0
3
2
1
0
To delete xj decrement the corresponding counters.
B
0
2
0
0
0
0
2
0
0
3
2
1
0
Can obtain a corresponding Bloom filter by reducing to 0/1.
B
0
1
0
0
0
0
1
0
0
1
1
1
0
30
Counting Bloom Filters: Overflow
Must choose counters large enough to avoid
overflow.
Poisson approximation suggests 4
bits/counter.
Average load using k = (ln 2)m/n counters is ln 2.
Probability a counter has load at least 16:
ln 2
e (ln 2) / 16! 6.78E 17
16
31
Bloom Filters in Networking
Summarizing the contents of web caches to
facilitate sharing cache content
Peer-to-Peer Systems
Routing
Active Queue Management
Topology Discovery
An Extension
Multistage bloom filters for storage of multisets
32
Sketches
X: histogram of flow counts observed
Xi: number of packets observed for flow i
Random lower-dimension projection
For m << n
m*n Matrix P
P has a single 1 in a randomly chosen row
All other entries in the column are 0
Forming the product Px is equivalent to
constructing a single counter set of the multistage
filter
33
Sketches
Dimension reducing random projection
Linear
Applications
Well suited to processing via wavelets
Traffic compression
Traffic similarity detection
Heavy-hitter estimation
Drawback
The set of keys encoded cannot easily be retrieved
directly form data structure
34
Summarization approaches
Trie data structure
Constructed based on address prefix
Each node in the tire stores the traffic volume
corresponding to all addresses contained in the
prefix
Approaches used to count the number of
distinct values in a traffic trace
Probabilistic counting
Bitmap algorithms
35
Summarization approaches
Approaches that maintain traffic summary for
a while
Landmark window model
Sliding window model
36
Dimensionality Reduction
Approaches for solving the problem of high
dimensionality in traffic measurements.
Dimension reduction approaches
Tend to find an alternate representation of data
that exposes the true (low-dimensional) structure
in the data
Clustering
Principal Component Analysis
37
Who is using my link?
38
Looking at traffic aggregates
Src.
Dest.
IPIP
Dest.
Source
IP port
Src.
Dest.
netnet
Dest. net
Rank
Destination IP
Src. port
Dest. port
Traffic
Aggregating on
individual
packet
header
Rank
Source
port
Traffic11.9%
1
jeff.dorm.bigU.edu
fields gives useful
results but Which network
Rank Destination network Traffic
Protocol
42.1%
2 are notWeb
tracy.dorm.bigU.edu
3.12%
uses
web
Traffic reports1
always
at the
right
Where
does
theand
What apps
granularity (e.g.
individual
IP
address,
subnet,
1
library.bigU.edu
27.5%
which
one
3
risc.cs.bigU.edu
2.83%
come
from?
2
Kazaatraffic
6.7%
etc.)
are used?
cs.bigU.edu
18.1%
……kazaa?
3aggregates
Sshdefined
Cannot show 2
over6.3%
multiple
fields (e.g. which
uses which application)
3 network
dorm.bigU.edu
17.8%
The traffic analysis tool should automatically
find aggregates over the
right
fields
attothe
Most
traffic
goes
right granularity
the dorms …
39
Ideal traffic report
Traffic aggregate
Traffic
Web traffic
42.1%
Web traffic to library.bigU.edu
26.7%
Web traffic from www.schwarzenegger.com
13.4%
ICMP traffic from sloppynet.badU.edu to jeff.dorm.bigU.edu
11.9%
Web is the dominant
This paper
is about
the network
application
Thisisisgiving
aheavy
Denial
of
The library
a
That’s a traffic
big flash
administrator
insightful
Service attack !! reports
user of web
crowd!
40
Clustering
Similarity metrics
Defined on the set of traffic features
Specific form of Vector Quantization
Challenges
Discovering a set of cluster definitions that
succinctly describe the traffic
Search problem in high dimensional space
41
Principal Component Analysis
Clustering
Nonlinear dimensionality reduction
PCA
Linear
Optimal, in the sense of capturing maximum
variability in the data using a minimum number of
dimension
42
Principal Component Analysis (PCA)
Away to
Why important?
identifying “patterns” in data
Expressing the data in order to highlight the
correlations such as similarities and dissimilarities
Hard to visualize the patterns of high dimensional data
How to take advantages of PCA
Compressing Data by reducing the number of
dimensions without “hopefully” much losing of data
information
43
Background Mathematics
Linear Algebra
Matrix representation of Data
Statistics Concepts
Mean – Expectation of the data distribution
Covariance – Sparseness of data distribution
Build Covariance Matrix (CM)
Covariance Matrix tells us the correlations of data between
dimensions of data
CMij = Positive -> ith dimension increased, so does jth
dimension
CMij = Negative -> ith dimension increased, jth dimension
decreased
CMij = 0 -> No correlation, which means Independency
45
PCA (1)
For any given
dataset, PCA finds
a new coordinate
system that maps
maximum variability
in the data to a
minimum number of
coordinates
New axes are called
Principal Axes or
Components
48
Principal Component Analysis
Suppose that the original variables X1, X2, . . . ,
Xm form a coordinate system in m-dimensional
space.
Each variable Xi represent an n × 1 vector,
where n is the number of records.
Standardized variable Zi is the n × 1 vector,
where Zi = (Xi − µi )/σii , µi is the mean of Xi ,
and σii is the standard deviation of Xi
In matrix notation: Z = (V1/2)−1(X − µ), and V1/2 is
a diagonal matrix (nonzero entries only on the
diagonal)
49
50
51
52
53
Example
We turn to the houses data set [3], which provides census information from all the
block groups from the 1990 California census.
For this data set, a block group has an average of 1425.5 people living in an area that
is geographically compact. Block groups that contained zero entries for any of the
variables were excluded.
Median house value is the response variable;
the predictor variables are:
Median income Population
Housing median age Households
Total rooms Latitude
Total bedrooms Longitude
The original data set had 20,640 records, of which 18,540 were selected randomly for
a training data set, and 2100 held out for a test data set.
A quick look at the variables is provided in Figure 1.1. (“Range” is Clementine’s type
label for continuous variables.)
Median house value appears to be in dollars, but median income has been scaled to
a continuous scale from 0 to 15.
Note that longitude is expressed in negative terms, meaning west of Greenwich.
Larger absolute values for longitude indicate geographic locations farther west.
55
Figure 1.1: the great disparity in variability among the variables. Median
income has a standard deviation less than 2, while total rooms has a
standard deviation over 2100. If we proceeded to apply principal
components analysis without first standardizing the variables, total rooms
would dominate median income’s influence,
56
X1 = median income, X2 = housing median age, . . . , X8 = longitude,
so m = 8 and n = 18,540.
for example, for the first block group, the median house value is $452,600, the
median income is 8.325, so on.
57
Variables have been standardized by z-standard. Zi = (Xi − µi ) /σii ,
Diagonally from left to right, the standardized variables
minc-z (median income), hage-z (housing median age), rooms-z (total
rooms), bedrms-z (total bedrooms), popn-z (population), hhlds-z (number of
households),lat-z (latitude), and long-z (longitude).
58
59
Table and graph show strong evidence for multicollinearity in the data set
if we perform, a multiple regression analysis of median housing value on the
predictors, the regression results would become quite unstable, with tiny shifts
in the predictors leading to large changes in the regression coefficients.
Now we need PCA (Principal components analysis) to identify the
components underlying the correlated variables.
Then the principal components can be used for further analysis such as in
regression analysis, classification, and so on.
60
61
Specific Questions
Are there low dimensional representations for
a set of OD flows?
Do OD flows share common features?
What do the feature look like?
Can we get a high-level understanding of a
set of OD flows in terms of these features?
62
PCA on OD flows
The Set of packets entering a network at a
given point and exiting at another point is
called an origin-destination flow
Set of flows mapped onto a single PC is
called an Eigen flow. V is a new basis for X
64
Empirical studies
Find intrinsic dimensionality of OD flows
67
Inference
Hidden data is significant challenge in traffic
analysis
Inference can play a role in missing
information about Internet structure
Traffic Matrix Estimation
A key summary of a network’s workload is its
traffic matrix
One approach to obtaining traffic matrix is via
inference
68
Associated Matrices
Incidence matrix of G=(V,E)
n×n matrix with (n=|V|)
c
a
b
d
e
Routing matrix
All-pairs paths
a 1
b 0
A c 0
d 0
e 0
1
0
1
0
0
1
1
0
0
0
1
1
0
0
1
0
0
1
0
0
0
1
0
0
0
0
1
0
0
1
0
0
0
1
0
0
0
0
1
1
0
0
0
0
1
69
Applications of Routing Matrix (1)
Origin-destination
(OD) flow
c
a
b
d
e
y Ax
a 1
b 0
c 0
d 0
e 0
1
0
1
0
0
1
1
0
0
0
1
1
0
0
1
0
0
1
0
0
0
1
0
0
0
0
1
0
0
1
0
0
0
1
0
0
0
0
1
1
1
1
1
0 4
1
0 4
1
0 2
1
0 2
1
1 4
1
1
1
70
The problem
1
y1 x1 x3
route 1
route 3 router
route 2
3
71
2
Want to compute the traffic xj along
route j from measurements on the
links, yi
y1 1 0 1x1
y 2 1 1 0x 2
y 3 0 1 1x 3
The problem
1
y1 x1 x3
route 1
route 3 router
route 2
3
2
Want to compute the traffic xj along
route j from measurements on the
links, yi
y = Ax
72
Underconstrained
linear inverse problem
Traffic matrix
y = Ax
Link
measurements
Routing matrix
Many more unknowns than measurements
73
Traffic Matrix Estimation
Given a set of link counts y
Infer the set of OD flow counts x
In typical network, the dimension of y is much
less than the dimension of x
The traffic matrix
estimation is an ill-posed
linear problem
The constraints
to dictate a
are not sufficient
single solution
There are infinitely many solutions that are
consistent with the observations
74
Direct Tomographic approach
Under-constrained problem
Find additional constraints
Use a model to do so
Disadvantage
75
Typical approach is to use higher order statistics of
the traffic to find additional constraints
Complex algorithm – doesn’t scale (~1000 nodes,
10000 routes)
Reliance on higher order stats is not robust given the
problems in SNMP data
Model may not be correct -> result in problems
Inconsistency between model and solution
Regularization
Want a solution that satisfies constraints: y = Ax
Such (ill-posed linear inverse) problems occur in
Under constrained system
Many solutions satisfy the equations
Must somehow choose the “best” solution
Medical imaging: e.g CAT scans
Seismology
Astronomy
Statistical intuition => Regularization
Penalty function J(x)
solution:
arg min y Ax 2 J x
x
2
76
How does this relate to other methods?
Previous methods are just particular cases of J(x)
Tomogravity (Zhang, Roughan, Greenberg and Duffield)
J(x) is a weighted quadratic distance from a gravity model
A very natural alternative
Start from a penalty function that satisfies the
“maximum entropy” principle
Minimum Mutual Information
77