Lecture17 - The University of Texas at Dallas

Download Report

Transcript Lecture17 - The University of Texas at Dallas

Evolving Insider Threat
Detection
Pallabi Parveen
Dr. Bhavani Thuraisingham (Advisor)
Dept of Computer Science
University of Texas at Dallas
Funded by AFOSR
Outline
Evolving Insider threat Detection

Unsupervised Learning

Supervised learning
System log
Evolving Insider Threat Detection
Anomaly?
System
System jtraces
Traces weeki+1
Feature
Extraction
&
Selection
Testing on
Data from
weeki+1
weeki
Online learning
Gather
Data from
Weeki
Feature
Extraction
&
Selection
Update
models
Learning
algorithm
Unsupervised Graph based
Anomaly
detection,
GBAD
Supervised One class
SVM, OCSVM
Ensemble of
Models
Ensemble based Stream Mining
Insider Threat Detection
using unsupervised
Learning based on Graph
Outlines: Unsupervised Learning
Insider Threat
 Related Work
 Proposed Method
 Experiments & Results

Definition of an Insider
An Insider is someone
who exploits, or has the
intention to exploit, their
legitimate access to assets
for unauthorised purposes
Insider Threat is a real threat

Computer Crime and Security Survey
2001

$377 million financial losses due to
attacks

49% reported incidents of
unauthorized network access by
insiders
Insider Threat : Continue

Insider threat
◦ Detection
◦ Prevention

Detection based approach:
◦ Unsupervised learning, Graph Based Anomaly
Detection
◦ Ensembles based Stream Mining
Related work

"Intrusion Detection Using Sequences of System
Calls," Supervised learning by Hofmeyr

"Mining for Structural Anomalies in Graph-Based
Data Representations (GBAD) for Insider Threat
Detection." Unsupervised learning by
Staniford-Chen and Lawrence Holder

All are static in nature. Cannot learn from
evolving Data stream
Related Approaches and comparison with
proposed solutions
Challenges
Techniques Proposed By
Supervised/Unsuper
vised
Concept-drift
Insider Threat
Graph-based
Forrest, Hofmeyr
Supervised
X
√
X
Masud , Fan (Stream Mining)
Supervised
√
N/A
N/A
Liu
Unsupervised
X
√
X
Holder (GBAD)
Unsupervised
X
√
√
Our Approach (EIT)
Unsupervised
√
√
√
Why Unsupervised Learning?

One approach to detecting insider threat is
supervised learning where models are built from
training data.

Approximately .03% of the training data is
associated with insider threats (minority class)

While 99.97% of the training data is associated
with non insider threat (majority class).

Unsupervised learning is an alternative for this.
Why Stream Mining

All are static in nature. Cannot learn
from evolving Data stream
Current decision boundary
Data Chunk
Previous decision boundary
Anomaly Data
Normal Data
Data Stream
Instances victim of concept
drift
Proposed Method
Graph based anomaly detection (GBAD,
Unsupervised learning) [2]
Ensemble based Stream Mining
GBAD Approach

Determine normative pattern S using
SUBDUE minimum description length
(MDL) heuristic that minimizes:
M(S,G) = DL(G|S) + DL(S)
Unsupervised Pattern Discovery
Graph compression and the
minimum description length (MDL)
principle

The best graphical pattern S
minimizes the description length of S
and the description length of the
graph G compressed with pattern S
min ( DL ( S )  DL (G | S ))
S


where description length DL(S) is
the minimum number of bits
needed to represent S (SUBDUE)
Compression can be based on
inexact matches to pattern
S1
S1
S1
S1
S1
S2
S2
S2
Three types of anomalies

Three algorithms for handling each of the
different anomaly categories using Graph
compression and the minimum description
length (MDL) principle:
1. GBAD-MDL finds anomalous
modifications
2. GBAD-P (Probability) finds
anomalous insertions
3. GBAD-MPS (Maximum Partial
Substructure) finds anomalous
deletions
Example of graph with normative pattern
and different types of anomalies
GBAD-P
(insertion)
G
G
C
G
G
A
B
A
B
A
B
A
B
A
B
C
D
C
D
E
D
C
D
C
D
Normative
Structure
GBAD-MDL
(modification)
GBAD-MPS
(Deletion)
Proposed Method
Graph based anomaly detection (GBAD,
Unsupervised learning)
Ensemble based Stream Mining
Characteristics of Data Stream
◦ Continuous flow of
data
◦ Examples:
Network traffic
Sensor
data
Call center
records
DataStream Classification

Single Model Incremental classification

Ensemble Model based classification
Ensemble based is more effective than
incremental approach.
Ensemble of Classifiers
C1
+
x,?
C2
+
input
C3
-
Classifier
Individual voting
outputs
+
Ensemble
output
Proposed Ensemble based Insider
Threat Detection (EIT)

Maintain K GBAD models
◦ q normative patterns
Majority Voting
 Updated Ensembles

◦ Always maintain K models
◦ Drop least accurate model
Ensemble based Classification of Data
Streams (unsupervised Learning-GBAD)
◦ Build a model (with q normative patterns) from each
data chunk
◦ Keep the best K such model-ensemble
◦ Example: for K = 3
Data
chunks
D1
D2
D543
Model with
Normative
Patterns
C1
C2
C543
Ensemble
C1
C42
C53
Update
Ensemble
Testing
chunk
D654
Prediction
EIT –U pseudocode

Ensemble (Ensemble A, test Graph t, Chunk S)

LABEL/TEST THE NEW MODEL
1: Compute new model with q normative
Substructure using GBAD from S
2: Add new model to A
3: For each model M in A
4: For each Class/ normative substructure, q in M
5:
Results1  Run GBAD-P with test Graph t & q
6:
Results2 Run GBAD-MDL with test Graph t & q
7:
Result3 Run GBAD-MPS with test Graph t & q
8:
Anomalies Parse Results (Results1, Results2, Results3)
End For
End For














9: For each anomaly N in Anomalies
10: If greater than half of the models agree
11:
Agreed Anomalies  N
12:






Add 1 to incorrect values of the disagreeing models
13: Add 1 to correct values of the agreeing models
End For
UPDATE THE ENSEMBLE:
14: Remove model with
lowest (correct/(correct + incorrect)) ratio
End Ensemble
Experiments





1998 MIT Lincoln Laboratory
500,000+ vertices
K =1,3,5,7,9 Models
q= 5 Normative substructures per model/
Chunk
9 weeks
◦ Each chunk covers 1 week
A Sample system call record from
MIT Lincoln Dataset
Token Sub-graph
Token
<Return
Value>
<Path>
<Date>
<User
Audit ID>
<Call>
<Arguments>
<Process
ID>
<Terminal>
Performance
Total Ensemble Accuracy
# of
Models
Total False Positives/Negative
True Positives
False Positives
False
Negatives
9
920
0
9
188
0
K=5
9
180
0
K=7
9
179
0
K=9
9
150
0
Normal
GBAD
K=3
Performance Contd..
0 false negatives
 Significant decrease in false positives
 Number of Model increases

◦ False positive decreases slowly after k=3
Performance Contd..
Distribution of False Positives
Performance Contd..
Summary of Dataset A & B
Entry
Description—
Dataset A
Description—
Dataset B
User
# of vertices
Donaldh
269
William
1283
# of Edges
Week
Day
556
2-8
Friday
469
4-7
Thursday
Performance Contd..
The effect of q on
TP rates for fixed K
= 6 on dataset A
The effect of q on FP
rates for fixed
K = 6 on dataset A
The effect of q on
runtime For fixed K = 6
on Dataset A
True Positive vs # normative substructure for fixed K=6 on dataset A
Performance Contd..
The effect of K on TP rates for fixed
q = 4 on dataset A
The effect of K on runtime for
fixed q = 4 on Dataset A
Evolving Insider Threat
Detection using
Supervised Learning
Outlines: Supervised Learning
Related Work
 Proposed Method
 Experiments & Results

Related Approaches and comparison with
proposed solutions
Challenges
Techniques Proposed By
Supervised/Unsupervised
Conceptdrift
Insider Threat
Graph-based
Liu
Unsupervised
X
√
X
Holder (GBAD)
Unsupervised
X
√
√
Masud , Fan (Stream Mining)
Supervised
√
N/A
N/A
Forrest, Hofmeyr
Supervised
X
√
X
Our Approach (EIT-U)
Unsupervised
√
√
√
Our Approach (EIT-S)
Supervised
√
√
X
Why one class SVM

Insider threat data is minority class

Traditional support vector machines (SVM) trained
from such an imbalanced dataset are likely to
perform poorly on test datasets specially on minority
class

One-class SVMs (OCSVM) addresses the rare-class
issue by building a model that considers only normal
data (i.e., non-threat data).

During the testing phase, test data is classified as
normal or anomalous based on geometric deviations
from the model.
Proposed Method
One class SVM (OCSVM) , Supervised
learning
Ensemble based Stream Mining
One class SVM (OCSVM)
Maps training data into a high dimensional feature space (via a
kernel).
 Then iteratively finds the maximal margin hyper plane which best
separates the training data from the origin corresponds to the
classification rule:

f(X) = <w,x> + b
where w is the normal vector and b is a bias term

For testing, f(x) < 0. we label x as an anomaly, otherwise as
normal data
Proposed Ensemble based Insider
Threat Detection (EIT)

Maintain K number of OCSVM (One class
SVM) models
Majority Voting
 Updated Ensemble

◦ Always maintain K models
◦ Drop least accurate model
Ensemble based Classification of Data
Streams (supervised Learning)

Divide the data stream into equal sized chunks
◦ Train a classifier from each data chunk
◦ Keep the best K OCSVM classifier-ensemble
◦ Example: for K= 3
Data
chunks
D1
D2
D543
Classifiers
C1
C2
C543
Ensemble
C1
C42
C53
D654
Labeled
chunk
Unlabeled
chunk
Addresses infinite
length
and concept-drift
Prediction
EIT –S pseudo code (Testing)
Algorithm 1 Testing
Input: A← Build-initial-ensemble()
Du← latest chunk of unlabeled instances
Output: Prediction/Label of Du









1: Fu Extract&Select-Features(Du)
//Feature set for Du
2: for each xj∈ Fu do
3.
ResultsNULL
4.
for each model M in A
5.
Results Results U Prediction (xj, M)
end for
6.
Anomalies Majority Voting (Results)
end for
EIT –S pseudocode
Algorithm 2 Updating the classifier
ensemble
Input: Dn: the most recently labeled data chunks,
A: the current ensemble of best K classifiers
Output: an updated ensemble A






1: for each model M ∈ A do
2:
Test M on Dn and compute its expected error
3: end for
4: Mn  Newly trained 1-class SVM classifier
(OCSVM)
from data Dn
5: Test Mn on Dn and compute its expected error
6: A  best K classifiers from Mn ∪ A based on expected
error
Feature Set extracted
Time, userID, machine IP, command, argument, path, return
1 1:29669 6:1 8:1 21:1 32:1 36:0
PERFORMANCE…..
Performance Contd..
Updating vs Non-updating stream approach
False Positives
True Negatives
False Negatives
True Positives
Accuracy
False Positive Rate
False Negative Rate
Updating Stream Non-updating
Stream
13774
24426
44362
33710
1
1
9
9
0.76
0.58
0.24
0.42
0.1
0.1
Performance Contd..
Summary of Dataset A
Entry
Description
—Dataset A
Supervised (EIT-S) vs.
Unsupervised(EIT-U) Learning
Supervised
Learning
Unsupervised
Learning
55
122
95
82
False
Negatives
0
5
True Positives
12
0.71
0.31
7
0.56
0.54
0
0.42
False Positives
True
Negatives
User
Donaldh
# of records 189
Week
2-7 (Friday
only)
Accuracy
False Positive
Rate
False Negative
Rate
Conclusion & Future Work
Conclusion:
Evolving Insider threat detection using
 Stream Mining
 Unsupervised learning and supervised
learning
Future Work:
 Misuse detection in mobile device
 Cloud computing for improving
processing time.
Publication
Conference Papers:
Pallabi Parveen, Jonathan Evans, Bhavani Thuraisingham, Kevin W. Hamlen, Latifur Khan, “ Insider
Threat Detection Using Stream Mining and Graph Mining,” in Proc. of the Third IEEE International
Conference on Information Privacy, Security, Risk and Trust (PASSAT 2011), October 2011, MIT, Boston,
USA (full paper acceptance rate: 13%).
Pallabi Parveen, Zackary R Weger, Bhavani Thuraisingham, Kevin Hamlen and Latifur Khan
Supervised Learning for Insider Threat Detection Using Stream Mining, to appear in 23rd IEEE
International Conference on Tools with Artificial Intelligence (ICTAI2011), Nov. 7-9, 2011, Boca Raton,
Florida, USA (acceptance rate is 30%)
Pallabi Parveen, Bhavani M. Thuraisingham: Face Recognition Using Multiple Classifiers. ICTAI 2006, 179186
Journal:
Jeffrey Partyka, Pallabi Parveen, Latifur Khan, Bhavani M. Thuraisingham, Shashi Shekhar: Enhanced
geographically typed semantic schema matching. J. Web Sem. 9(1): 52-70 (2011).
Others:
Neda Alipanah, Pallabi Parveen, Sheetal Menezes, Latifur Khan, Steven Seida, Bhavani M.
Thuraisingham: Ontology-driven query expansion methods to facilitate federated queries. SOCA 2010, 18
Neda Alipanah, Piyush Srivastava, Pallabi Parveen, Bhavani M. Thuraisingham: Ranking Ontologies Using
Verified Entities to Facilitate Federated Queries. Web Intelligence 2010: 332-337
References
1.
W. Eberle and L. Holder, Anomaly detection in Data
Represented as Graphs, Intelligent Data Analysis, Volume
11, Number 6, 2007. http://ailab.wsu.edu/subdue
2.
W. Ling Chen, Shan Zhang, Li Tu: An Algorithm for Mining
Frequent Items on Data Stream Using Fading Factor.
COMPSAC(2) 2009: 172-177
3.
S. A. Hofmeyr, S. Forrest, and A. Somayaji, “Intrusion
Detection Using Sequences of System Calls,” Journal of
Computer Security, vol. 6, pp. 151-180, 1998.
4.
M. Masud, J. Gao, L. Khan, J. Han, B. Thuraisingham, “A
Practical Approach to Classify Evolving Data Streams:
Training with Limited Amount of Labeled Data,” Int.Conf.
on Data Mining, Pisa, Italy, December 2010.
Thank You