SDM02adaptive - Columbia University

Download Report

Transcript SDM02adaptive - Columbia University

Ensemble-based Adaptive
Intrusion Detection
Wei Fan
IBM T.J.Watson Research
Salvatore J. Stolfo
Columbia University
Data Mining for Intrusion Detection
Connection Records
(telnet, 10,3,...)
Feature Construction
(ftp,10,20,...)
Training Data
Label Existing Connections
Intrusion Detection
Model
Inductive
Learner
Some interesting requirements ... ...
ƒ New types of intrusions are constantly invented by
hackers.
Most recent coordinated attacks on many ebusiness
websites in 2000.
ƒ Hackers tend to use new types of intrusions that
intrusion detection system is unaware of or weak at
detecting them successfully.
ƒ Data mining for intrusion detection is a very dataintensive process.
very large data
revolving patterns
real-time detection
Question
ƒ When new types of intrusions are invented,
can we quickly adapt our existing model to be
able to detect these new intrusions before they
cause more damages?
If we don't have a solution, the new attack will
make significant damage.
For this kind of problem, having a solution that is
not completely satisfactory is better than having
no solution.
Naive Approach - Complete Retraining
Existing
Training
Data
Merged Training
Data
New
Data
NEW
Intrusion Detection
Model
Inductive
Learner
Problem with the Naive Approach
ƒ Since data (existing plus new) will be very
large, it takes a long time to compute a
detection model.
ƒ By the time, the model is constructed, the new
attack probably will have already made
enough damage to our system.
New Approach
New
Data
Learner
NEW Model
Combined
Model
Existing
Model
Key point: we only compute model from the data
on new types of intrusions only
How do we label connections?
a new connection
existing model
connection type unrecognized
normal
or
previously known
intrusion types
NEW Model
normal
or
new intrusion types
Basic Idea
ƒ Existing model is built to identify THREE
classes
normal
some type of intrusions
and anomaly: some connection that is neither
normal nor some known types of intrusions.
ƒ anomaly detection - we use the artificial
anomaly generation method (Fan et al, ICDM
2001)
Anomaly Detection
ƒ Generate "artificial anomalies" from training
data: similar to "near misses".
ƒ Artificial anomalies are data points that are
different from the training data.
ƒ The algorithm concentrates on feature values
that are infrequent in the training data.
ƒ Distribution-based Artificial Anomaly (Fan et al,
ICDM2001)
Four Configurations
ƒ H1(x): existing model.
ƒ H2(x): new model.
ƒ They differ in how H2(x) is computed.
ƒ and how H1(x) and H2(x) are combined
ƒ and how a connection is processed and
classified.
Configuration I
Configuration II
Configuration III
Configuration IV
Experiment
ƒ 1998 DARPA Intrusion Detection Evaluation
Dataset
ƒ 22 different types of intrusions.
Experiment
ƒ Sequence to introduce intrusions into the
training data to simulate new intrusions are
being invented and launched by hackers
22! unique sequences
we randomly used 3 unique sequences.
ƒ The results are averaged.
ƒ RIPPER
unordered rulesets
3 Unique Sequences
Measurements
ƒ All results on the new intrusion types
ƒ Precision:
If I catch a potential thief, what is the probability that it is a
real thief?
ƒ Recall:
What is the probability that real thieves are detected?
ƒ Anomaly Detection Rate classified as anomaly
ƒ Other classified as other types of intrusions.
Precision Results
Recall Results
Anomaly Detection Rate
Other Detection Rate Results
Summary of results
ƒ The most accurate is Configuration 1 where
new model is trained from normal and the new
intrusion type
 all predicted normal and anomalies by the old
model is examined by the new model.
ƒ Reason:
Existing model's precision to detect normal
connection influences combined model's
accuracy.
New data is limited in amount. Artificial anomalies
generated from new data is limited as well.
Training Efficiency
Related Work (incomplete list)
ƒ Anomaly Detection:
SRI's IDES use probability distribution of past activities to measure
abnormality of host events. We measure network events.
Forrest et al uses absence of subsequence to measure abnormality.
Lane and Brodley employ a similar approach but use incremental
learning approach to update stored sequence from UNIX shell
commands.
Ghosh and Schwarzbard use neural network to learn profile of
normality and distance function to detect abnormality.
ƒ Generating Artificial Data:
Nigam et al assign label to unlabelled data using classifier trained
from labeled data.
Chang and Lippman applied voice transformation techniques to add
artificial training talkers to increase variability.
ƒ Multiple classifiers:
Summary and Future Work
ƒ Proposed a two-step two classifier approach
for efficient training and fast model
deployment.
ƒ Empirically tested in the intrusion detection
domain.
ƒ Need to test if it works well for other domains.