Analyzing network traffics

Download Report

Transcript Analyzing network traffics

Internet Measurement, Section 6.3
Analyzing Network Traffics
Mohammad Hassan Hajiesmaili
ECE Department, University of Tehran
Fall 2009
Outline




Packet Capture
Data Management
Data Reduction
Inference
2
Packet Capture

Passive Traffic Management



Packet Capture in General Purpose Systems
Packet Capture in Special Purpose Systems
Control Plane Traffic
3
Packet Capture in General Purpose Systems


Libpcap
Promiscuous mode


Packet Filter


Report all packet received
Specify how much of each packet should be captured
Parsing tools



Tcpdump
Ethereal
Commercial products
4
Packet Capture in General Purpose Systems


Broadcast LANs
Switched LANs


Libpcap


Port Mirroring
Unrestricted Access
Pktd & scriptroute

Set Access policies by system Admin
5
Packet Capture in Special Purpose Systems


Monitoring Core Network Links
In Optical Scenarios the link speed is faster
than PCI bus


Specialized Network Interface and Interface Driver
Some Example


OC3MON
OC12MON
6
Control Plane Traffic


Capture control packet traffic
Example



Local view of BGP system
Establishing a session with a BGP-speaking router
Tools



GNU Zebra
Quagga
Can capture other routing traffic


OSPF
RIP
7
Data Management

Full Packet Capture is challenging





Limited bus bandwidth
Limited memory access speed
Limited speed and capacities of disk array
Limited processing power
Specialized tools


Using sophisticated algorithms for operating on large
stream of data
Example:


Smacq
Windmill
8
Data Management

Database management problem




Very large data sets
Incrementally over time
Continuous queries
Solution


Data Stream Management
Example:




Tribeca
STREAMS
TelegraphCQ
Gigascope

Queries are expresses in GSQL
9
Looking at the traffic
Too much data for
a human
Do something
smarter!
10
Data Reduction






Traffic Counters
Flow Capture
Sampling
Summarization
Dimensionality Reduction
Probabilistic Model
11
Counter

Use aggregation to form time series of counts
of traffic statistics


Time series are constructed by periodic polling


Bytes or packets per unit time
Generically called SNMP counts
Benefits:


Capturing without much performance impact on
router
Extremely compact compared to traffic traces
12
Counters - Drawback

SNMP transport is via UDP



Measurement packets can be lost
Difficulty in obtaining synchronized time
series across multiple interfaces
Too coarse-grained for many needs
13
Flow Capture

Counters provide basic information


Almost all traffic semantics are absent
Capture and store packet trains or flows
14
Packet Train


A burst of packets arriving from the same
source and heading to the same destination.
If the spacing between two packets exceeds
some inter-train gap, they are said to belong
to different trains.
15
Capture Packet Train

Can be used for






Monitoring basic network activity
Monitoring users and applications
Network planning
Security analysis
Accounting and Billing
Tools for capturing are present in all major
routers
16
Packet Train Record Content

IP header (5-tuple)






Source IP address, Source TCP port, Destination IP
address, Destination port, Protocol ID
Start Time
End Time
Number of Packet
Number of bytes contained in the packet train
Dramatic decrease in trace size compare to full
packet capture
17
Packet Flows

Capturing packet flows rather than packet
trains

Require higher level software for processing
and interpreting the raw data
18
Flow Capture

IETF standards for flow capture

IP Flow Information Export effort (IPFIX)



For Exporters: Providers of flow data
For Collectors: Consumers of flow data
Real Time Flow Metering (RTFM)



Meter MIB that can be accessed via SNMP
Meter Readers: Collect flow data
Managers: Coordinate meters and meter readers
19
Sampling


In sampling scheme, a subset of packets are
chosen for capture
Two important question




How should packet be chosen for sampling?
How should one correct or compensate for the
sampling process when performing analysis?
Basic packet sampling
Trajectory sampling
20
Basic Packet Sampling


The sampling process is performed
independently on each link being monitored
Two category:


Variable rate sampling
Constant rate sampling



Random sampling
Deterministic sampling
Stratified sampling
21
Trajectory Sampling

Basic Packet Sampling



Trajectory Sampling Idea


Packet capture at multiple points in a network
Can not obtain per-packet delay
If a packet is chosen for sampling at any point in
the network, it is chosen at all point in the
network.
Idea of implementation

Calculation of hash function for each packet
22
Trajectory Sampling - Advantages

Easy to obtain metrics on customer performance



Per-customer packet delay
Detect routing loops
Trace denial of service attacks
23
Summarization

Form compact summaries of large volume of
data

Bloom Filter
Sketches
Other Approaches


24
Review: Bloom Filters

Given a set S = {x1,x2,x3,…xn} on a universe U,
want to answer queries of the form:
Is y  S .

Bloom filter provides an answer in




“Constant” time (time to hash).
Small amount of space.
But with some probability of being wrong.
Alternative to hashing with interesting
tradeoffs.
25
Bloom Filters
Start with an m bit array, filled with 0s.
B
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
1
0
1
0
Hash each item xj in S k times. If Hi(xj) = a, set B[a] = 1.
B
0
1
0
0
1
0
1
0
0
1
1
1
0
To check if y is in S, check B at Hi(y). All k values must be 1.
B
0
1
0
0
1
0
1
0
0
1
1
1
0
Possible to have a false positive; all k values are 1, but y is not in S.
B
0
n items
1
0
0
1
0
1
0
m = cn bits
0
1
1
1
0
1
k hash functions
26
False Positive Probability



Pr( specific bit of filter is 0) is
p '  (1  1 / m) kn  e  kn / m  p
If r is fraction of 0 bits in the filter then false
positive probability is
(1  r ) k  (1  p' ) k  (1  p) k  (1  e  k / c ) k
Find optimal at k = (ln 2)m/n by calculus.

n items
So optimal FPP is about (0.6185)m/n
m = cn bits
k hash functions
27
Example
False positive rate
0.1
0.09
0.08
m/n = 8
0.07
0.06
0.05
0.04
0.03
Opt k = 8 ln 2 = 5.45...
0.02
0.01
0
0
1
2
3
4
5
6
7
8
9
10
Hash functions
n items
m = cn bits
k hash functions
28
Handling Deletions

B

Bloom filters can handle insertions, but not
deletions.
0
1
0
0
1
0
xi
xj
1
0
0
1
1
1
0
1
1
0
If deleting xi means resetting 1s to 0s, then
deleting xi will “delete” xj.
29
Counting Bloom Filters
Start with an m bit array, filled with 0s.
B
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
1
0
1
1
0
1
1
0
Hash each item xj in S k times. If Hi(xj) = a, add 1 to B[a].
B
0
3
0
0
1
0
2
0
0
3
2
1
0
To delete xj decrement the corresponding counters.
B
0
2
0
0
0
0
2
0
0
3
2
1
0
Can obtain a corresponding Bloom filter by reducing to 0/1.
B
0
1
0
0
0
0
1
0
0
1
1
1
0
30
Counting Bloom Filters: Overflow


Must choose counters large enough to avoid
overflow.
Poisson approximation suggests 4
bits/counter.


Average load using k = (ln 2)m/n counters is ln 2.
Probability a counter has load at least 16:
 ln 2
 e (ln 2) / 16! 6.78E  17
16
31
Bloom Filters in Networking

Summarizing the contents of web caches to
facilitate sharing cache content
Peer-to-Peer Systems
Routing
Active Queue Management
Topology Discovery

An Extension





Multistage bloom filters for storage of multisets
32
Sketches



X: histogram of flow counts observed
Xi: number of packets observed for flow i
Random lower-dimension projection





For m << n
m*n Matrix P
P has a single 1 in a randomly chosen row
All other entries in the column are 0
Forming the product Px is equivalent to
constructing a single counter set of the multistage
filter
33
Sketches


Dimension reducing random projection
Linear


Applications




Well suited to processing via wavelets
Traffic compression
Traffic similarity detection
Heavy-hitter estimation
Drawback

The set of keys encoded cannot easily be retrieved
directly form data structure
34
Summarization approaches

Trie data structure



Constructed based on address prefix
Each node in the tire stores the traffic volume
corresponding to all addresses contained in the
prefix
Approaches used to count the number of
distinct values in a traffic trace


Probabilistic counting
Bitmap algorithms
35
Summarization approaches

Approaches that maintain traffic summary for
a while


Landmark window model
Sliding window model
36
Dimensionality Reduction


Approaches for solving the problem of high
dimensionality in traffic measurements.
Dimension reduction approaches



Tend to find an alternate representation of data
that exposes the true (low-dimensional) structure
in the data
Clustering
Principal Component Analysis
37
Who is using my link?
38
Looking at traffic aggregates
Src.
Dest.
IPIP
Dest.
Source
IP port
Src.
Dest.
netnet
Dest. net
Rank

Destination IP
Src. port
Dest. port
Traffic
Aggregating on
individual
packet
header
Rank
Source
port
Traffic11.9%
1
jeff.dorm.bigU.edu
fields gives useful
results but Which network
Rank Destination network Traffic



Protocol
42.1%
2 are notWeb
tracy.dorm.bigU.edu
3.12%
uses
web
Traffic reports1
always
at the
right
Where
does
theand
What apps
granularity (e.g.
individual
IP
address,
subnet,
1
library.bigU.edu
27.5%
which
one
3
risc.cs.bigU.edu
2.83%
come
from?
2
Kazaatraffic
6.7%
etc.)
are used?
cs.bigU.edu
18.1%
……kazaa?
3aggregates
Sshdefined
Cannot show 2
over6.3%
multiple
fields (e.g. which
uses which application)
3 network
dorm.bigU.edu
17.8%
The traffic analysis tool should automatically
find aggregates over the
right
fields
attothe
Most
traffic
goes
right granularity
the dorms …
39
Ideal traffic report
Traffic aggregate
Traffic
Web traffic
42.1%
Web traffic to library.bigU.edu
26.7%
Web traffic from www.schwarzenegger.com
13.4%
ICMP traffic from sloppynet.badU.edu to jeff.dorm.bigU.edu
11.9%
Web is the dominant
This paper
is about
the network
application
Thisisisgiving
aheavy
Denial
of
The library
a
That’s a traffic
big flash
administrator
insightful
Service attack !! reports
user of web
crowd!
40
Clustering

Similarity metrics



Defined on the set of traffic features
Specific form of Vector Quantization
Challenges


Discovering a set of cluster definitions that
succinctly describe the traffic
Search problem in high dimensional space
41
Principal Component Analysis

Clustering


Nonlinear dimensionality reduction
PCA


Linear
Optimal, in the sense of capturing maximum
variability in the data using a minimum number of
dimension
42
Principal Component Analysis (PCA)

Away to



Why important?


identifying “patterns” in data
Expressing the data in order to highlight the
correlations such as similarities and dissimilarities
Hard to visualize the patterns of high dimensional data
How to take advantages of PCA

Compressing Data by reducing the number of
dimensions without “hopefully” much losing of data
information
43
Background Mathematics

Linear Algebra


Matrix representation of Data
Statistics Concepts



Mean – Expectation of the data distribution
Covariance – Sparseness of data distribution
Build Covariance Matrix (CM)

Covariance Matrix tells us the correlations of data between
dimensions of data



CMij = Positive -> ith dimension increased, so does jth
dimension
CMij = Negative -> ith dimension increased, jth dimension
decreased
CMij = 0 -> No correlation, which means Independency
45
PCA (1)


For any given
dataset, PCA finds
a new coordinate
system that maps
maximum variability
in the data to a
minimum number of
coordinates
New axes are called
Principal Axes or
Components
48
Principal Component Analysis




Suppose that the original variables X1, X2, . . . ,
Xm form a coordinate system in m-dimensional
space.
Each variable Xi represent an n × 1 vector,
where n is the number of records.
Standardized variable Zi is the n × 1 vector,
where Zi = (Xi − µi )/σii , µi is the mean of Xi ,
and σii is the standard deviation of Xi
In matrix notation: Z = (V1/2)−1(X − µ), and V1/2 is
a diagonal matrix (nonzero entries only on the
diagonal)
49
50
51
52
53
Example




We turn to the houses data set [3], which provides census information from all the
block groups from the 1990 California census.
For this data set, a block group has an average of 1425.5 people living in an area that
is geographically compact. Block groups that contained zero entries for any of the
variables were excluded.
Median house value is the response variable;
the predictor variables are:








Median income Population
Housing median age Households
Total rooms Latitude
Total bedrooms Longitude
The original data set had 20,640 records, of which 18,540 were selected randomly for
a training data set, and 2100 held out for a test data set.
A quick look at the variables is provided in Figure 1.1. (“Range” is Clementine’s type
label for continuous variables.)
Median house value appears to be in dollars, but median income has been scaled to
a continuous scale from 0 to 15.
Note that longitude is expressed in negative terms, meaning west of Greenwich.
Larger absolute values for longitude indicate geographic locations farther west.
55
Figure 1.1: the great disparity in variability among the variables. Median
income has a standard deviation less than 2, while total rooms has a
standard deviation over 2100. If we proceeded to apply principal
components analysis without first standardizing the variables, total rooms
would dominate median income’s influence,
56
X1 = median income, X2 = housing median age, . . . , X8 = longitude,
so m = 8 and n = 18,540.
for example, for the first block group, the median house value is $452,600, the
median income is 8.325, so on.
57
Variables have been standardized by z-standard. Zi = (Xi − µi ) /σii ,
Diagonally from left to right, the standardized variables
minc-z (median income), hage-z (housing median age), rooms-z (total
rooms), bedrms-z (total bedrooms), popn-z (population), hhlds-z (number of
households),lat-z (latitude), and long-z (longitude).
58
59
Table and graph show strong evidence for multicollinearity in the data set
if we perform, a multiple regression analysis of median housing value on the
predictors, the regression results would become quite unstable, with tiny shifts
in the predictors leading to large changes in the regression coefficients.
Now we need PCA (Principal components analysis) to identify the
components underlying the correlated variables.
Then the principal components can be used for further analysis such as in
regression analysis, classification, and so on.
60
61
Specific Questions




Are there low dimensional representations for
a set of OD flows?
Do OD flows share common features?
What do the feature look like?
Can we get a high-level understanding of a
set of OD flows in terms of these features?
62
PCA on OD flows


The Set of packets entering a network at a
given point and exiting at another point is
called an origin-destination flow
Set of flows mapped onto a single PC is
called an Eigen flow. V is a new basis for X
64
Empirical studies

Find intrinsic dimensionality of OD flows
67
Inference



Hidden data is significant challenge in traffic
analysis
Inference can play a role in missing
information about Internet structure
Traffic Matrix Estimation


A key summary of a network’s workload is its
traffic matrix
One approach to obtaining traffic matrix is via
inference
68
Associated Matrices

Incidence matrix of G=(V,E)

n×n matrix with (n=|V|)
c
a
b
d
e

Routing matrix

All-pairs paths
a 1
b 0
A  c 0

d 0
e 0
1
0
1
0
0
1
1
0
0
0
1
1
0
0
1
0
0
1
0
0
0
1
0
0
0
0
1
0
0
1
0
0
0
1
0
0
0
0
1
1
0
0
0

0
1
69
Applications of Routing Matrix (1)

Origin-destination
(OD) flow
c
a
b
d
e
y  Ax 
a 1
b 0
c 0

d 0
e 0
1
0
1
0
0
1
1
0
0
0
1
1
0
0
1
0
0
1
0
0
0
1
0
0
0
0
1
0
0
1
0
0
0
1
0
0
0
0
1
1
1
1

1
0    4
1
0   4
1
0     2
 1  
0    2
1
1   4
1
1

1
70
The problem
1
y1  x1  x3
route 1
route 3 router
route 2
3
71
2
Want to compute the traffic xj along
route j from measurements on the
links, yi
y1  1 0 1x1 
  
 
y 2  1 1 0x 2 
  
 
y 3  0 1 1x 3 
The problem
1
y1  x1  x3
route 1
route 3 router
route 2
3
2
Want to compute the traffic xj along
route j from measurements on the
links, yi
y = Ax
72
Underconstrained
linear inverse problem
Traffic matrix
y = Ax
Link
measurements
Routing matrix
Many more unknowns than measurements
73
Traffic Matrix Estimation




Given a set of link counts y
Infer the set of OD flow counts x
In typical network, the dimension of y is much
less than the dimension of x

The traffic matrix
 estimation is an ill-posed
linear problem

 The constraints
to dictate a
 are not sufficient

single solution
There are infinitely many solutions that are
consistent with the observations
74
Direct Tomographic approach



Under-constrained problem
Find additional constraints
Use a model to do so


Disadvantage




75
Typical approach is to use higher order statistics of
the traffic to find additional constraints
Complex algorithm – doesn’t scale (~1000 nodes,
10000 routes)
Reliance on higher order stats is not robust given the
problems in SNMP data
Model may not be correct -> result in problems
Inconsistency between model and solution
Regularization

Want a solution that satisfies constraints: y = Ax




Such (ill-posed linear inverse) problems occur in




Under constrained system
Many solutions satisfy the equations
Must somehow choose the “best” solution
Medical imaging: e.g CAT scans
Seismology
Astronomy
Statistical intuition => Regularization


Penalty function J(x)
solution:

arg min y  Ax  2 J x 
x
2

76
How does this relate to other methods?

Previous methods are just particular cases of J(x)

Tomogravity (Zhang, Roughan, Greenberg and Duffield)


J(x) is a weighted quadratic distance from a gravity model
A very natural alternative


Start from a penalty function that satisfies the
“maximum entropy” principle
Minimum Mutual Information
77