EL 933 Final Project Presentation

Download Report

Transcript EL 933 Final Project Presentation

EL 933 Final Project Presentation
Combining Filtering and Statistical
Methods for Anomaly Detection
Augustin Soule
Kav´e Salamatian
Nina Taft
Motivation
•Traffic anomalies are a fact of life in computer
networks
–Outages, attacks, flash crowds, DOS, alpha events
etc…
•Anomaly detection and identification in a timely
fashion is challenging
–Operators typically monitor traffic / link by eye using
SNMP or IP flows…. Is it effective ????
–Characterization, building a model of what constitutes
the normal behavior.
–Efficient methods for anomaly detection…are they any
???
Overview



Introduction - AIM and general discussion of methodology
Approach - What tools are used and why ?
Sequence of different analysis for anomaly detection –







Modeling the network
Obtaining the residual using filtering
analysis of the residual using all the methods and comparison of
the methods in the due course
Results (analysis)
Results
Conclusion
Future work – some ideas about what I like about the paper
Introduction

Aim – Develop an approach for anomaly detection for large
scale networks using Traffic Matrix. (TM)

The main idea is to predict the normal behavior(TM approach)
of the network, filter the “normal“ traffic the actual traffic
matrix that is got using more recent measurement data than
those used for prediction.

Examination of anomalies in the residual traffic using the 4
methods proposed in the paper, two of which are new and
other two are the ones in use
Approach

What is Traffic Matrix ? How we obtain it here ? (SNMP
data) Should Time interval to be considered ???

What kinds of anomalies are in focus and why only those ?

Few quick example of such anomalies..

What is a Kalman filter ? What is it used here for ??

What are the four different methods proposed ?
Sequencing the different analysis
procedures

SNMP data -> TM -> It has OD flows -> filter normal to
get residual -> Anomaly analysis

What are OD flows ? Its significance here.


They are preferred over direct usage of the monitored data
captured at the target granularity level
What is included anomaly analysis ?

Methods used for analysis and validation of these methods


Using actual data
Using a synthetic anomaly generator.
Continued ..

Using network wide perspective (TM) for volume anomaly
detection is justified but the data will be lots so have to be
scaled.

By projecting onto a small number of principle
components we can filter out normal traffic. The traffic
projecting onto the remaining components is analyzed for
anomalies.
Modeling the Network



Obtain per-link statistics on byte counts (SNMP today)
Infer TM that includes all the OD flows (hidden states!)
Total traffic on a link = Sum of all OD flows traversing that link which
can be expressed as : Yt = At * Xt + Vt


To capture the dynamic model of OD flow we need a model that
specifies Xt+1 in terms of Xt.


Where Yt = vector of link counts at time t, Xt = OD flows as vectors and Vt
corresponds to the measurement errors and At is the routing matrix
Where Ct is the state trasition matrix Wt is the noise that accounts for
randomness in the fluctuation of flow.
For traffic estimation, get total byte count per link and then partition in
based on the number of OD flows traversing that link. Now when an
anomaly occurs on the link, then it is possible that the anomaly maybe
get distributed across all OD flows on that link and to avoid that we
use Ct as a diagonal matrix.
Continued..



The assumptions are that Vt and Wt are uncorrelated.
Now the task is that we need to Estimate the (t+1)st
instance of Xt+1 and that is done by using a Kalman filter.
The Kalman filter is a robust filter that estimates the
system state process by using a 2 step approach that
iterates at time t. using this we get ,
which is the estimate of Xt at time i, where t >= I and
hence the estimated are obtained.
Continued ….


The methods used and their details:
Method I - based on comparison of the residual traffic to the
threshold.


ADV – it triggers an anomaly very fast as the test is verified as soon as the
Kalman filter processes the new observation.
DISADV – the test is being performed independently on past results


This creates a High false positive rate i.e it will detect the anomaly based on one
observation and which is not the right approach.
Method II - based on comparison of local and global variance on our
filtered residual signal.


ADV - Uses Cumulative summation approach which solves the DISADV of
the previous method. It is a very powerful method as it is proved that it’s the
best estimator when the variance and level change are unknown.
DISADV - It adds some delay for the detection as it takes some observations
after the anomaly to estimate the deviation level.
Continued..

Method III - Based on multi-scale and variance



ADV - The rational behind multi-scale is that anomalies should appear at
different time scales and hence by monitoring these multiple scales the false
positive rate should be reduced that is because the change on one time scale
will not trigger an alarm (anomaly detected)
DISADV – Detection lag involved as wavelet analysis is involved
Method IV - Based on multi-scale variance shift



In this methods an alarm is triggered if the ratio between the local and global
variance exceeds a threshold
This analysis is based on two scales one is the scale at which the global
variance is calculated and other is the scale at which the local variance is
calculated.
Again detection lag seen as wavelet analysis of the signal is performed.
Validation of the methods

Validation of the methods is done using ROC ( receiver operating
characteristics) curves.





These curves are a grahical representation of the tradeoffs between the false
positives and the false negatives of a decision threshold.
These are used since the method comparison can be don’t throughout
the decision threshold.
Also this ROC analysis gives the performance of the methods in terms
of detection time.
An algorithm is considered to be good is the ROC curve climbs rapidly
towards the upper left corner (A very high fraction of the true
anomalies with only a few false positives)
Quantification of how quickly the ROC climbs is based on the area,
larger the area, better is the algorithm. Each algorithm has a ROC
curve and then comparison is done.
Continued..



Validation of these methods also includes testing these methods and
getting the range of operation of each of them and this can be done in
two ways.
Method I : Collect live data in the form of packet or flow level trace
and then to have this trace “labeled”
Labeling is the procedure of identification of anomalous event with the
start and finish time.



ADV of labeling - real world anomalies can be tested with.
DISADV of labelling – parameters of the anomaly cannot be changed to know
the threshold of the trace.
Method II : Synthetically generate attacks and test the methods.


ADV - Parameters of the attacks can be changed and hence the limits of the
method can be identified
DISADV – these attacks are not real world and hence the attacks might not be
of much concern.
Continued..


Both these approached will be used in our paper so that we
can analyze the methods better.
Lets consider an Abilene network for testing.




Synthetic anomaly generator


Has 11 POP’s
Data is collected from every POP.
This data results in a TM of 121 OD flows.
Select either one or a set of OD flows to add anomaly and then add
anomalies on top of the baseline traffic level for those OD flows.
Anomaly is characterized by 4 factors :

Volume, duration, number of OD flows, shape function. (TABLE 1)
Continued..
Continued..
Results
Continued..

This was the results with the actual data from the network.

The Abilene network had 27 anomalies and the above graph shows that
the basic method performed the best(7% FPR & 100% TPR) that is
misses no anomalies.

The second best method cached about 85% of the anomalies with the
same FPR.

Wavelet method dint reach a FNR=0 even with a huge threshold.
Continued…

Now the results with the Synthetic anomaly generator is as follows:

500 anomalies were generated.

Duration varied between 5mins to 2 hours.

OD flows in consideration were 1-7 per anomaly (randomly chosen)

Basic and the Second method performed the best equivalently.


Second method doesn’t perform very well in actual data whereas in the
synthetic generator it performs well.
Reason maybe lie in the properties of the anomalies itself.
Detection Time Comparison

Onset of attacks is rapid on the internet so the methods
should be fast and have very less detection time.
Continued..



Second method detected 90% anomalies with no lag and
Basic method detected 95% of anomalies with no lag
Wavelet method takes about half hour for detection of
anomalies and hence its too slow and also it dint perform
well.
It is interesting that the vshift method performs well in the
synthetic than the abilene network.
Conclusion and what I feel






Interesting granularity level for anomaly detection namely that of TM.
Estimation and prediction was interesting as we used wavelets.
The idea of filtering normal traffic to analyze residual traffic for
anomalies was nice.
Detection schemes were tested well as it involves use of both actual
data and the synthetic anomaly generator.
The anomaly generator got close to depict anomalies like DOS, flash
crowds etc. which helped in the validation and evaluation of the four
methods well, but then it was not very effective in depicting the attacks
like worms.
The wavelet based method dint do very well because of the detechtion
time involved.