presentation - Haakon Ringberg

Download Report

Transcript presentation - Haakon Ringberg

Sensitivity of PCA for
Traffic Anomaly Detection
Evaluating the robustness of
current best practices
Haakon Ringberg1, Augustin Soule2,
Jennifer Rexford1, Christophe Diot2
1Princeton University, 2Thomson Research
Outline
Background and motivation





Traffic anomaly detection
PCA and subspace approach
Problems with methodology
Conclusion & future directions
2
A network in the Internet
AS
AS
Network
AS
AS
3
Network anomalies
March
Madness
BOTNET
Computer Computer Computer
Network
Computer Computer Computer
Computer Computer Computer
Computer Computer Computer
VIAGRA
Computer Computer Computer
Computer Computer Computer
We want to be able to
detect these anomalies!
4
Network anomaly detectors
We’re
good!
Network
Anomaly
Detector


Monitor health of network
Real-time reporting of anomalies
5
Principal Components
Analysis (PCA) Benefits


Finds correlations
across multiple links

Network-wide analysis

[Lakhina SIGCOMM’04]
Demonstrated ability to
detect wide variety of
anomalies


=
Anomaly
Detector
PCA
AS
AS
[Lakhina IMC’04]
Network
Subspace methodology

We use same software
AS
Victim
6
Principal Components
Analysis (PCA)



PCA transforms data
into new coordinate
system
Principal components
(new bases) ordered by
captured variance
The first k tend to
capture periodic trends


normal subspace
vs. anomalous subspace
7
Pictorial overview of
subspace methodology
1.
2.
3.
Training: separate normal &
anomalous traffic patterns
Detection: find spikes
Identification: find original
spatial location that caused
spike (e.g. router, flow)
PCA
normal
signal
anomalous
A
Network
B
8
Pictorial overview of problems
with subspace methodology



topk
Defining normalcy can
be challenging

Tunable knobs
Contamination
PCA’s coordinate
remapping makes it
difficult to identify the
original location of an
anomaly
PCA
normal
signal
anomalous
A
Network
B
9
Data used




Géant and Abilene networks
IP flow traces
21/11 through 28/11 2005
Anomalies were manually
verified
10
Outline
Background and motivation
Problems with approach






Sensitivity to its parameters
Contamination of normalcy
Identifying the location of detected anomalies
Conclusion & future directions
11
Sensitivity to topk



PCA separates normal from
anomalous traffic patterns
Works because top PCs tend
to capture periodic trends
And large fraction of variance
PCA
normal
signal
anomalous
12
Sensitivity to topk


Where is the line drawn
between normal and
anomalous?
What is too anomalous?
topk
PCA
normal
signal
anomalous
13
Sensitivity to topk
Very sensitive to number of
principal components included!
14
Sensitivity to topk


Sensitivity wouldn’t be
an issue if we could
tune topk parameter
We’ve tried many
different methods





3σ deviation heuristic
Cattell’s Scree Test
Humphrey-Ilgen
Kaiser’s Criterion
None are reliable
15
Contamination of normalcy




PCA
normal
signal
anomalous
What happens to large
anomalies?
 They capture a large
fraction of variance
 Therefore they are included
among top PCs
Invalidates assumption that
top PCs need to be periodic
Pollutes definition of normal
In our study, the outage to
the left affected 75/77 links
 Only detected on a handful!
16
Identifying anomaly locations

Spikes when state
vector projected on
anomaly subspace



But network operators
don’t care about this
They want to know
where it happened!
state vector
How do we find the
original location of the
anomaly?
anomaly subspace
17
Identifying anomaly locations

Previous work used a
simple heuristic


state vector
Associate detected spike
with k flows with the
largest contribution to the
state vector v
No clear a priori reason
for this association
anomaly subspace
A
Network
B
18
Outline
Background and motivation
Problems with approach
Conclusion & future directions





Defining normalcy
Identifying the location of an anomaly
19
Defining normalcy

Large anomalies can
cause a spike in first
few PCs



Diminishes effectiveness
But we can presumably
smooth these out (WMA)
But first PCs aren’t
always periodic


whichk instead of topk?
Initial results suggest this
might be challenging also
20
Fundamental disconnect
between objective functions



PCA is optimal at
finding orthogonal
vectors ordered by
captured variance
But variance need not
correspond to normalcy
(i.e. periodicity)
When do they
coincide?
21
Identifying anomaly locations




PCA is very effective at
finding correlations
But is accomplished by
remapping all data to
new coordinate system
Strength in detection
becomes weakness in
identification
Inherent limitation
AS
AS
Network
AS
Victim
AS
Network
AS
22
Conclusion


PCA is sensitive to its parameters
More robust methodology required




Disconnect between objective functions



Training: defining normalcy (topk, whichk)
Detection: tuning threshold
Identification: better heuristic
PCA finds variance
We seek periodicity
PCA’s strengths can be weaknesses


Transformation good at detecting correlations
Causes difficulty in identifying anomaly location
23
Thanks!
Questions?
Haakon Ringberg
Princeton University Computer Science
http://www.cs.princeton.edu/~hlarsen/
Outline
Background and motivation
Problems with approach
Future directions
Conclusion






Addressable problems, versus
Fundamental problems
25
Conclusion: addressable


PCA is sensitive to its parameters
More robust methodology required





Training: defining normalcy (topk, whichk)
Detection: tuning threshold
Identification: better heuristic
Previous work used same data and optimized
parameter settings as Lakhina et al.
But these concerns might be addressable
26
Conclusion: fundamental


We don’t know what “normal” is
Disconnect between objective functions



PCA’s strengths can be weaknesses



PCA finds variance
We seek periodicity
Transformation good at detecting correlations
Causes difficulty in identifying anomaly location
Are other methods are more appropriate?

We require a standardized evaluation framework
27