presentation - Haakon Ringberg
Download
Report
Transcript presentation - Haakon Ringberg
Sensitivity of PCA for
Traffic Anomaly Detection
Evaluating the robustness of
current best practices
Haakon Ringberg1, Augustin Soule2,
Jennifer Rexford1, Christophe Diot2
1Princeton University, 2Thomson Research
Outline
Background and motivation
Traffic anomaly detection
PCA and subspace approach
Problems with methodology
Conclusion & future directions
2
A network in the Internet
AS
AS
Network
AS
AS
3
Network anomalies
March
Madness
BOTNET
Computer Computer Computer
Network
Computer Computer Computer
Computer Computer Computer
Computer Computer Computer
VIAGRA
Computer Computer Computer
Computer Computer Computer
We want to be able to
detect these anomalies!
4
Network anomaly detectors
We’re
good!
Network
Anomaly
Detector
Monitor health of network
Real-time reporting of anomalies
5
Principal Components
Analysis (PCA) Benefits
Finds correlations
across multiple links
Network-wide analysis
[Lakhina SIGCOMM’04]
Demonstrated ability to
detect wide variety of
anomalies
=
Anomaly
Detector
PCA
AS
AS
[Lakhina IMC’04]
Network
Subspace methodology
We use same software
AS
Victim
6
Principal Components
Analysis (PCA)
PCA transforms data
into new coordinate
system
Principal components
(new bases) ordered by
captured variance
The first k tend to
capture periodic trends
normal subspace
vs. anomalous subspace
7
Pictorial overview of
subspace methodology
1.
2.
3.
Training: separate normal &
anomalous traffic patterns
Detection: find spikes
Identification: find original
spatial location that caused
spike (e.g. router, flow)
PCA
normal
signal
anomalous
A
Network
B
8
Pictorial overview of problems
with subspace methodology
topk
Defining normalcy can
be challenging
Tunable knobs
Contamination
PCA’s coordinate
remapping makes it
difficult to identify the
original location of an
anomaly
PCA
normal
signal
anomalous
A
Network
B
9
Data used
Géant and Abilene networks
IP flow traces
21/11 through 28/11 2005
Anomalies were manually
verified
10
Outline
Background and motivation
Problems with approach
Sensitivity to its parameters
Contamination of normalcy
Identifying the location of detected anomalies
Conclusion & future directions
11
Sensitivity to topk
PCA separates normal from
anomalous traffic patterns
Works because top PCs tend
to capture periodic trends
And large fraction of variance
PCA
normal
signal
anomalous
12
Sensitivity to topk
Where is the line drawn
between normal and
anomalous?
What is too anomalous?
topk
PCA
normal
signal
anomalous
13
Sensitivity to topk
Very sensitive to number of
principal components included!
14
Sensitivity to topk
Sensitivity wouldn’t be
an issue if we could
tune topk parameter
We’ve tried many
different methods
3σ deviation heuristic
Cattell’s Scree Test
Humphrey-Ilgen
Kaiser’s Criterion
None are reliable
15
Contamination of normalcy
PCA
normal
signal
anomalous
What happens to large
anomalies?
They capture a large
fraction of variance
Therefore they are included
among top PCs
Invalidates assumption that
top PCs need to be periodic
Pollutes definition of normal
In our study, the outage to
the left affected 75/77 links
Only detected on a handful!
16
Identifying anomaly locations
Spikes when state
vector projected on
anomaly subspace
But network operators
don’t care about this
They want to know
where it happened!
state vector
How do we find the
original location of the
anomaly?
anomaly subspace
17
Identifying anomaly locations
Previous work used a
simple heuristic
state vector
Associate detected spike
with k flows with the
largest contribution to the
state vector v
No clear a priori reason
for this association
anomaly subspace
A
Network
B
18
Outline
Background and motivation
Problems with approach
Conclusion & future directions
Defining normalcy
Identifying the location of an anomaly
19
Defining normalcy
Large anomalies can
cause a spike in first
few PCs
Diminishes effectiveness
But we can presumably
smooth these out (WMA)
But first PCs aren’t
always periodic
whichk instead of topk?
Initial results suggest this
might be challenging also
20
Fundamental disconnect
between objective functions
PCA is optimal at
finding orthogonal
vectors ordered by
captured variance
But variance need not
correspond to normalcy
(i.e. periodicity)
When do they
coincide?
21
Identifying anomaly locations
PCA is very effective at
finding correlations
But is accomplished by
remapping all data to
new coordinate system
Strength in detection
becomes weakness in
identification
Inherent limitation
AS
AS
Network
AS
Victim
AS
Network
AS
22
Conclusion
PCA is sensitive to its parameters
More robust methodology required
Disconnect between objective functions
Training: defining normalcy (topk, whichk)
Detection: tuning threshold
Identification: better heuristic
PCA finds variance
We seek periodicity
PCA’s strengths can be weaknesses
Transformation good at detecting correlations
Causes difficulty in identifying anomaly location
23
Thanks!
Questions?
Haakon Ringberg
Princeton University Computer Science
http://www.cs.princeton.edu/~hlarsen/
Outline
Background and motivation
Problems with approach
Future directions
Conclusion
Addressable problems, versus
Fundamental problems
25
Conclusion: addressable
PCA is sensitive to its parameters
More robust methodology required
Training: defining normalcy (topk, whichk)
Detection: tuning threshold
Identification: better heuristic
Previous work used same data and optimized
parameter settings as Lakhina et al.
But these concerns might be addressable
26
Conclusion: fundamental
We don’t know what “normal” is
Disconnect between objective functions
PCA’s strengths can be weaknesses
PCA finds variance
We seek periodicity
Transformation good at detecting correlations
Causes difficulty in identifying anomaly location
Are other methods are more appropriate?
We require a standardized evaluation framework
27