Transcript Document

Mining Anomalies Using
Traffic Feature Distributions
Anukool Lakhina, Mark Crovella (cs.bu),
Christophe Diot (Intel)
SIGCOMM 2005
Reference



SIGCOMM 2004 – “Diagnosing Network-Wide
Traffic Anomalies”
SIGCOMM 2005 – “Mining Anomalies Using
Traffic Feature Distributions”
Authors:



2008/6/20
Anukool Lakhina (Ph.D. @ Boston Univ.)
Mark Crovella (Professor @ Boston Univ.)
Christophe Diot (@ Intel Research Lab.)
Speaker: Li-Ming Chen
2
Outline




Network-wide observation
Using subspace method to detect volume
anomalies (SIGCOMM’04)
Volume vs. Traffic Feature Distribution
(SIGCOMM’05)
Anomaly Diagnosis Methodology



Anomaly Detection
Anomaly Classification
Conclusion & comments
2008/6/20
Speaker: Li-Ming Chen
3
Anomaly Diagnosis

Is my network experiencing unusual conditions?


e.g., being attacked?, worm spreading?, equipment
outages?, misconfigurations? unknown…
Anomaly Diagnosis



2008/6/20
Detection – is there an unusual event?
Identification – what is the best explanation?
Quantification – how serious is the problem?
Speaker: Li-Ming Chen
4
Previous Work on Anomaly Detection

Largely focused on:

Point solutions for specific types of anomalies



Single-link traffic data


E.g., portscans, worm, DoS…
Not a general approach
Not network-wide view
Rule-based classification

Not unsupervised
 A general, unsupervised method for reliably
detecting and classifying network anomalies is
needed
2008/6/20
Speaker: Li-Ming Chen
5
Network-wide Observation

Study the proposed anomaly detection and
classification framework using sampled flow data
collected from all access links of backbone networks


Two backbone networks: Abilene, Géant and Sprint
OD flow is the traffic that
enters at an origin PoP
and exits at a destination
PoP of a backbone network
PoP: Points of Presence
2008/6/20
Speaker: Li-Ming Chen
6
Volume Anomaly Detection:
Problem Statement

A volume anomaly is a
sudden change in an OD
flow


i.e., point to point traffic
Given link traffic
measurements, diagnose
the volume anomalies
2008/6/20
Speaker: Li-Ming Chen
7
Why care about OD Flows?
• If we only monitor traffic
on network links, volume
arising from an OD flow
may not be noticeable.
• Thus, naïve approach
won’t work if OD flow
info isn’t available.
• (Problem)
• A network with n PoP
will have n2 OD flows.
• -> OD flows are high
dimensional data…
2008/6/20
Speaker: Li-Ming Chen
8
Subspace Analysis of Link Traffic

Even if OD flow information is not available, and only link
traffic information is available, PCA can be applied and
subspace technique can detect volume anomalies



Reasons:



PCA: Principle Component Analysis
Link Traffic info: data consist of time samples of traffic volumes at
all m links in the network
 Y is the t x m traffic measurement matrix
 An arbitrary row y of Y denotes one sample
Links share OD flows
Set of OD flows also low dimensional
Use PCA to separate normal and anomalous traffic
2008/6/20
Speaker: Li-Ming Chen
9
The Subspace Method




An approach to separate normal from anomalous
traffic
Normal Subspace, : space spanned by the first k
principal components
Anomalous Subspace, : space spanned by the
remaining principal components
Then, decompose traffic on all links by projecting onto
and
to obtain:
Traffic vector of all
links at a particular
point in time
2008/6/20
Normal traffic
vector
Speaker: Li-Ming Chen
Residual traffic
vector
10
Traffic on Link 2
A Geometric Illustration
In general,
anomalous traffic
results in a large
value of
y
Capture size of
vector using squared
prediction error
(SPE):
Traffic on Link 1
2008/6/20
Speaker: Li-Ming Chen
11
Subspace Analysis Results
• Note that during anomaly,
normal component
doesn’t change that much
while residual component
changes quite a lot.
• Thus, anomalies can be
detected by setting some
threshold.
2008/6/20
Speaker: Li-Ming Chen
12
Outline




Network-wide observation
Using subspace method to detect volume
anomalies (SIGCOMM’04)
Volume vs. Traffic Feature Distribution
(SIGCOMM’05)
Anomaly Diagnosis Methodology



Anomaly Detection
Anomaly Classification
Conclusion & comments
2008/6/20
Speaker: Li-Ming Chen
13
Introduction

Challenges for automatically detecting and classifying
anomalies:




Anomalies are a moving target (can span a vast range of events)
New anomalies will continue to arise
Anomalies present in network-wide traffic data are buried like
needles in a haystack
Goal of this paper:


2008/6/20
Seek methods that are able to detect a diverse and general set of
network anomalies
 With high detection rate and low false alarm rate
Seek to mine the anomalies from the data by discovering and
interpreting the patterns present in network-wide traffic
Speaker: Li-Ming Chen
14
Traffic Feature Distributions


Most anomalies share a common characteristic
Anomalies can be detected and distinguished by
inspecting traffic features:

2008/6/20
4-tuple: SrcIP, SrcPort, DstIP, DstPort
Speaker: Li-Ming Chen
15
Volume vs.
Traffic Feature Distribution

Volume based detection schemes have been
successful in isolating large traffic changes


But a large of anomalies do NOT cause detectable
disruptions in traffic volume
Using traffic feature distribution


Augments volume-based anomaly detection
Traffic distributions can reveal valuable information
about the structure of anomalies

2008/6/20
-> information which is not present in traffic volume
measures
Speaker: Li-Ming Chen
16
Traffic Feature Distributions
# Packets
Dispersed
Histogram
~ 450 new
destination
portsusing
Summarize
Dest.
High Entropy
Ports
sample entropy of
histogram X:
# Packets
where symbol i occurs ni
times; S is total # of
Oneobservations
destination
Dest.
Concentrated
IPs
Histogram
(victim) dominates
Low Entropy
2008/6/20
Typical Traffic
Speaker: Li-Ming Chen
Port scan
17
Port scan anomalies viewed in terms
of traffic volume and in terms of
entropy
Port scan dwarfed
in volume metrics…
But stands out in
feature entropy,
which also reveals
its structure
2008/6/20
Speaker: Li-Ming Chen
18
Entropy based scheme



In volume based scheme, # of packets or bytes per time
slot was the variable.
In entropy based scheme, in every time slot, the entropy
of every traffic feature is the variable.
This gives us a three way data
matrix H.


H(t, p, k) denotes at time t, the
entropy of OD flow p, of the traffic
feature k.
To apply subspace method,
we need to unfold it into a
single-way representation.
Multiway Subspace Method:
H(dstPort)
H(dstIP)
H(srcPort)
H(srcIP)
# timebins
(Multi-way to single-way)
pe
ty
s
H(SrcIP)
H(SrcPort)
H(DstIP) H(DstPort)
# od-pairs


Decompose into a single-way matrix
Now apply the usual subspace decomposition (PCA)

Every row of the matrix will be decomposed into
Comparing Entropy Detections with
Detections in Volume Metrics (1)
Found in
Entropy Only
Found in
both metrics
Found in
Volume Only
Points that lie to the right of the vertical line are volume-detected
anomalies and points that lie above the horizontal line are
detected in entropy.
2008/6/20
Speaker: Li-Ming Chen
22
Comparing Entropy Detections with
Detections in Volume Metrics (2)
2008/6/20
Speaker: Li-Ming Chen
23
Detection Rate by Injecting Real
Anomalies

Evaluation Methodology



2008/6/20
Superimpose known anomaly traces into OD flows
Test sensitivity at varying anomaly intensities, by thinning trace
Results are average over a sequence of experiments
12%
1.3%
6.3%
Speaker: Li-Ming
Chen
0.63%
24
Classifying Anomalies by Clustering


Enables unsupervised classification
Each anomaly is a point in 4-D space:


Questions:




h = [H(srcIP), H(dstIP), H(srcPort), H(dstPort)]
Do anomalies form clusters in this space?
Are the clusters meaningful?
 Internally consistent, Externally different
What can we learn from the clusters?
Use Hierarchical Agglomerative Algorithm for
determining clusters

2008/6/20
Minimizes intra-cluster variation and maximizes inter-cluster
variation
Speaker: Li-Ming Chen
25
Clustering Known Anomalies
(2-D view)
Code Red
Scanning
Multi source
DOS attack
Single source
DOS attack
2008/6/20
Speaker: Li-Ming Chen
26
Abilene anomaly clusters
(3-D view)
• Results of both clustering
algorithms are consistent
• Heuristics identify about
10 clusters in dataset
2008/6/20
Speaker: Li-Ming Chen
27
Anomaly Clusters in Abilene data
2008/6/20
Speaker: Li-Ming Chen
28
Conclusion


Feature distributions as summarized by entropy
are promising for general anomaly diagnosis
Network-Wide Detection:



Entropy significantly augments volume metrics
Highly sensitive: Detection rates of 90% possible,
even when anomaly is 1% of background traffic
Anomaly Classification:

2008/6/20
Clusters are meaningful, and reveal new anomalies
Speaker: Li-Ming Chen
29
Comments



The paper only discusses anomaly detection on
offline data. Can it be enhanced for online
anomaly detection?
We still need volume based detection because
feature distribution does not identify all
anomalies.
Can other fields in packet header be used for
anomaly detection?
2008/6/20
Speaker: Li-Ming Chen
30