One Class Support Vector Machines for Detecting Anomalous
Download
Report
Transcript One Class Support Vector Machines for Detecting Anomalous
One Class Support Vector Machines
for Detecting Anomalous Windows
Registry Accesses
Krysta Svore
Joint Work with Katherine Heller, Angelos Keromytis, and
Salvatore Stolfo
Anomaly Detection
We would like to be able to detect anomalous
behavior by monitoring Windows registry
queries
We studied a host-based intrusion detection
system that applies a one class support
vector machine algorithm within the Registry
Anomaly Detection (RAD) system
Key Results
We show that the Probabilistic Anomaly
Detection (PAD) algorithm dramatically
outperforms the OCSVM algorithm due to its
hierarchical prior
To improve the performance of the OCSVM
algorithm, we need knowledge on how to
construct well-defined kernels
Host-based Intrusion Detection Systems
(IDS)
Most often attacked operating system is
Microsoft Windows
Combative methods
Virus scanner and security patches
Unable to combat unknown attacks
Requires frequent updates
Host-based IDS
Detects intrusions by monitoring system accesses
Used with data mining techniques for improved
detection ability
Our Approach
Use the Registry Anomaly Detection (RAD)
system to monitor Windows registry queries
Apply a One Class Support Vector Machine
(OCSVM) algorithm within the RAD system
Compare our system with previous work
using the Probabilistic Anomaly Detection
(PAD) algorithm within the RAD system
The Windows Registry
Stores configuration settings for programs,
security information, user profiles, and other
system information
Consists of registry keys and their associated
values
Queried by programs for information by
accessing a specific registry key
Process: EXPLORER.EXE
Query: OpenKey
Key: HKCR\CLSID\B41DB860-8EE4-11D2-9906E49FADC173CA\shellex\MayChangeDefaultMenu
Response: SUCCESS
ResultValue: NOTFOUND
The Windows Registry
Activity varies
from several
accesses per
minute to several
thousand
accesses per
minute
Almost all
programs use the
registry
Registry Anomaly Detection (RAD)
System
Three components:
Audit Sensor
Model Generator
Analyzes the registry access
Develops a model of normal behavior
Anomaly Detector
Classifies a new registry access as normal or attack
according to the generated model
Probabilistic Anomaly Detection (PAD)
Algorithm
Developed by E. Eskin et al., 2002
Relies on consistency checks over normal data and
labels a record anomalous if it fails any one of these
tests
First order consistency check
Verifies a value is consistent with observed values of that
feature in the normal data set
5 total checks
Second order consistency check
Determines the conditional probability of a feature value given
another feature value
20 total checks
PAD
Probability of an observed element
P(X = i) = C(Ni + α)/(kα+N)
where N - total number of observations
Ni - number of observations of symbol i
α - “pseudo count” for each observed symbol
k - number of observed symbols
Probability of an unobserved element
P(X = i) = (1-C)/(L-k)
L - number of possible symbols
Scaling factor
C= N/(N+L-k)
An Example
LAN
We expect the probability of seeing a previously
unobserved IP address to be small
Peering Center
We expect the probability of seeing a previously
unobserved IP address to be almost 1
One Class Support Vector Machine
(OCSVM) Algorithm
Maps input data into a high dimensional
feature space
Iteratively finds the maximal margin in the
hyperplane which best separates the training
data from the origin
Solves optimization problem to find rule f with
maximal margin
f(x)=‹w,x›+b
If f(x)<0 , label x as anomalous
OCSVM
OCSVM: Kernels
Equivalent to solving the dual quadratic programming
problem
minα (1/2) ∑i,j αiαjK(xi,xj) s.t. 0≤αi≤1/(νl) , ∑i αi = 1
where αi - Lagrange multiplier
v - parameter to control trade-off between distance of
hyperplane from the origin and number of points in
training dataset
l - number of points in training dataset
Kernel function projects input vectors into a feature
space allowing for nonlinear decision boundaries
Feature map:
Kernel Function:
Φ: X → RN
K(xi,xj) = ‹Φ(xi), Φ(xj)›
Efficiency Comparison
Complexity of PAD
Time: O(v 2 R 2 )
Space: O(vR 2 )
where v – number of unique record values
R – number of record components
Complexity of OCSVM
Time: O(dL3 )
Space: O(d(L+T))
where d – number of dimensions
L – number of records in training dataset
T – number of records in test dataset
Experimental Data
Collected training data on Windows NT 4.0
Used approximately 500,000 attack-free
records to train system
Tested system using approximately 300,000
records with 2,000 labeled as attack
Experiments
Used three common kernels
Linear: K(x,y) = (x·y)
Polynomial: K(x,y) = (x·y+1)d , where d is the degree of
the polynomial
Gaussian: K(x,y) = e -║x-y║2/(2σ2) , where σ2 is the
variance
Used two types of feature vectors
Binary
One dimension for every unique entry for each of the five
given record values
Frequency-based
Each feature corresponds to the number of occurrences of
the corresponding record component in the training set
Dimensionality of Binary and Frequencybased Feature Vectors
Binary feature vector
Dimension = Number of possible entries for
feature value 1 * number of possible entries for
feature value 2 * … * number of possible entries
for feature value 5
Frequency-based feature vector
Dimension = 5
Evaluation
Used two statistics to compare the system’s
accuracy
Detection rate
False positive rate
Percentage of attack records that have been correctly
identified
Percentage of normal records that have been mislabeled as
anomalous
Plotted results with a Receiver Operator
Characteristic (ROC) curve
Plots percentage of false positives versus percentage of
true positives
Results
Results
Results
PAD algorithm outperforms OCSVM algorithm due
to the use of a hierarchical prior to estimate the
probabilities
Knows the likelihood of encountering a previously
unencountered feature value
Ability of OCSVM to detect anomalies relies on the
choice of the kernel
Need a novel, well-defined kernel which accounts
for highly discriminative information
Conclusions
Registry activity provides a good platform for
anomaly detection
PAD is more reliable than OCSVM
Currently no known way to learn a “most
optimal” kernel
By analyzing different algorithms, we can
identify what needs to be captured in the
kernel definition
Future Work
Improve kernel definition to incorporate more
discriminative information
Update the model as new data is labeled
Requires an efficient way of remodeling the data
over time
Test our system on the UNIX platform