Helvetica is a Good Font

Download Report

Transcript Helvetica is a Good Font

A Data Mining Approach for Building
Cost-Sensitive and Light Intrusion
Detection Models
PI Meeting - July, 2000
North Carolina State University
Columbia University
Florida Institute of Technology
Overview
• Project description
• Progress report:
–
–
–
–
correlation
cost-sensitive modeling
anomaly detection
collaboration with industry
• Plan of work for 2000-2001
New Ideas/Hypotheses
• High-volume automated attacks can overwhelm an
IDS and its staff.
• Use cost-sensitive data mining algorithms to
construct ID models that consider cost factors:
– damage cost, response cost, operational cost, etc.
• Multiple specialized and light ID models can be
dynamically activated/configured in run-time
• Cost-effectiveness as the guiding principle and
multi-model correlation as the architectural
approach .
Impact
• A better understanding of the cost factors, cost
models, and cost metrics related to intrusion
detection.
• Modeling techniques and deployment strategies for
cost-effective IDSs.
• “Clustering” techniques for grouping intrusions and
building specialized and light models.
• An architecture for dynamically activating,
configuring, and correlating ID models.
Correlation: Model and Issues
across
sources
across
time/sources
• “Good” base models: data sources and
modeling techniques.
• The combined model: the correlation
algorithms and network topology.
Correlation: Approaches
• Extend previous work in JAM
• A sequence of time-stamped records
– each is composed of signals from multiple sensors
(network topology information embedded);
• Apply data mining techniques to learn how
to correlate the signals to generate a
combined sensor:
– link analysis, sequence analysis, machine learning
(classification), etc.
Correlation: Integrating NM and
ID Signals
• A stream of measures (anomaly reports) on
MIB variables of network elements and a
stream of ID signals:
– Better coverage;
– Early sensing of attacks.
• Normal measures of network traffics and
parameter values of ID signatures
– S = f(N, A), A is invariant then S=g(N).
– Automatic parameter adjustment, S1=g(N1).
Cost Factors of IDSs
• Attack taxonomy: result/target/technique
• Development cost
• Damage cost (DCost)
– The amount of damage when ID is not available or
ineffective.
• Response cost (RCost)
– The cost of acting upon an alarm of potential intrusion.
• Operational cost (OpCost)
– The cost of processing and analyzing audit data ;
– Mainly the computational costs of the features.
Cost Models of IDSs
• The total cost of an IDS over a set of
events:
• CumulativeCost(E) =  eE (CCost(e) + OpCost(e))
• CCost(e), the consequential cost, depends
on prediction on event e
Consequential Cost (CCost)
• For event e :
CCost(e)
Outcome
Miss (FN)
DCost(e)
False Alarm (FP) RCost(e’)+PCost(e)
0
Hit (TP)
RCost(e)+ DCost(e)
DCost(e)
Normal (TN)
0
Misclassified Hit RCost(e’)+ DCost(e)
DCost(e)
Conditions
DCost(e’)  RCost(e’)
Otherwise
DCost(e)  RCost(e)
Otherwise
DCost(e’)  RCost(e’)
Otherwise
Cost-sensitive Modeling: Objectives
• Reducing operational costs:
– Use cheap features in ID models.
• Reducing consequential costs:
– Do not respond to an intrusion if RCost >
DCost.
Cost-sensitive Modeling: Approaches
• Reducing operational costs:
– A multiple-model approach:
• Build multiple rule-sets, each with features of different
cost levels;
• Use cheaper rule-sets first, costlier ones later only for
required accuracy.
– Feature-Cost-Sensitive Rule Induction:
• Search heuristic considers information gain AND
feature cost.
Cost-sensitive Modeling: Approaches
(continued)
• Reducing consequential costs:
– MetaCost:
• Purposely re-label intrusions with Rcost > DCost as
normal.
– Post-Detection decision:
• Action depends on comparison of RCost and DCost.
Latest Results
• OpCost
– Compare the multiple-model approach with single-model
approach;
– rdc%: (single - multiple)/single;
– range: 57% to 79%. 250
200
150
Single
Multiple 100
50
0
Average Per Connection
Latest Results (continued)
• CCost using a post-detection cost-sensitive
decision module
– rdc% range: 75% to 95%;
– Compared with single model: slightly better rdc%;
– Compared with cost-insensitive models: 25% higher rdc%.
CS-single
CS-multiple
CI-single
CI-Multiple
27500
27000
26500
26000
25500
25000
24500
24000
23500
23000
Total Ccost
Anomaly Detection
• Unsupervised Training Methods
– Build models over noisy (not clean) data
• Artificial Anomalies
– Improves performance of anomaly detection
methods.
• Combining misuse and anomaly detection.
AD over Noisy Data
• Builds normal models over data containing
some anomalies.
• Motivating Assumptions:
– Intrusions are extremely rare compared to to
normal.
– Intrusions are quantitatively different.
Approach Overview
• Mixture Model
– Normal Component
– Anomalous Component
• Build Probabilistic Model of Data
• Max Likelihood test for detection.
Mixture Model of Anomalies
• Assume a generative model: The data is
generated with a probability distribution D.
• Each element originates from one of two
components.
– M, the Majority Distribution (x  M).
– A, the Anomalous Distribution (x  A).
• Thus: D = (1-)M + A
Modeling Probability Distributions
• Train Probability Distributions over current
sets of M and A.
• PM(X) = probability distribution for
Majority
• PA(X) = probability distribution for
Anomaly
• Any probability modeling method can be
used: Naïve Bayes, Max Entropy, etc.
Detecting Anomalies
• Likelihood of a partition of the set of all
elements D into M and A:
L(D)= 
PD(X)
D
|A|  P (X))
=((1-)|M| 
P
(X)
)(

M
A
M
A
• Log Likelihood (for computational reasons):
LL(D)=log(L(D))
Algorithm for Detection
• Assume all elements are normal (M0=D,
A0= ).
• Compute PD(X).
• Using PD(X) compute LL(D).
• For each element compute difference in
LL(D) if removed from M and inserted into
A.
• If the difference is large enough, then
declare the element an anomaly.
Evaluating xt
Mt+1 = Mt – {xt}
At+1 = At U {xt}
Recompute PMt and PAt. (efficiently)
If (LLt+1-LLt)> threshold, xt is anomaly
Otherwise xt is normal
Experiments
• Two Sets of experiments:
– Measured Performance against comparison
methods over noisy data.
– Measured Performance trained over noisy data
against comparison methods trained over clean
data.
AD Using Artificial Anomalies
• Generate abnormal behavior artificially
– assume the given normal data are representative
– “near misses” of normal behavior is considered
abnormal
– change the value of only one feature in an instance
of normal behavior
– sparsely represented values are sampled more
frequently
– “near misses” help define a tight boundary
enclosing the normal behavior
Experimental Results
• Learning algorithm: RIPPER rule learner.
• Data: 1998/99 DARPA evaluation
– U2R, R2L, DOS, PRB: 22 “clusters”
• Training data: normal and artificial anomalies
• Results
– Overall hit rate: 94.26% (correctly normal or intrusion)
– Overall false alarm rate: 2.02%
– 100% dectection: buffer_overflow, guess_passwd, phf,
back
– 0% detection: perl, spy, teardrop, ipsweep, nmap
– 50+% detection: 13 out of 22 intrusion subclasses
Combining Anomaly And Misuse
Detection
• Training data: normal, artificially generated
anomalies, known intrusions
• The learned model can predict normal,
anomaly, or known intrusion subclass
• Experiments were performed on increasing
subsets of known intrusion subclasses in the
training data (simulates identified intrusions
over time).
Combining Anomaly And Misuse
Detection (continued)
• Consider phf, pod, teardrop, spy, and smurf
are unknown (absent from the training data)
• Anomaly detection rate: phf=25%,
pod=100%, teardrop=93.91%, spy=50%,
smurf=100%
• Overall false alarm rate: .20%
• The false alarm rate has dropped from 2.02%
to .20% when some known attacks are
included for training
Collaboration with Industry
• RST Inc.
– Anomaly detection on NT systems
• NFR Inc.
– real-time IDS
• SAS Institute
– off-line ID (funded by SAS)
• Aprisma (Cabletron)
– Integrating ID with NM (funded by Aprisma)
• HRL Labs
– ID in wireless networks (funded by HRL)
Plan for 2000-2001
• Dynamic cost-sensitive modeling and
deployment
– work with industry for realistic cost analysis
and real-time testing
• Anomaly detection
– improve existing algorithms using feedback
from evaluation
• Correlation
– develop/evaluate algorithms for integrating
multiple sources data/evidences