Anomaly Detection Using Data Mining Techniques
Download
Report
Transcript Anomaly Detection Using Data Mining Techniques
Anomaly Detection Using Data
Mining Techniques
Margaret H. Dunham, Yu Meng, Donya
Quick, Jie Huang, Charlie Isaksson
CSE Department
Southern Methodist University
Dallas, Texas 75275
[email protected]
This material is based upon work supported by the National Science Foundation under
Grant No. IIS-0208741
12/9/08, Sandia
National Labs
1
Objectives/Outline
Develop modeling techniques which can
“learn/forget” past behavior of spatiotemporal
stream events. Apply to prediction of anomalous
events.
Introduction
EMM Overview
EMM Applications to Anomaly Detection
EMM Applications to Nuclear Testing
Future Work
12/9/08, Sandia
National Labs
2
12/9/08, Sandia
National Labs
3
Outline
Introduction
Motivation
What is an anomaly?
Spatiotemporal Data
Modeling Spatiotemporal Data
EMM Overview
EMM Applications to Anomaly Detection
Emm Applications to Nuclear Testing
Future Work
12/9/08, Sandia
National Labs
4
Motivation
A growing number of applications generate streams
of data.
Computer network monitoring data
Call detail records in telecommunications
Highway transportation traffic data
Online web purchase log records
Sensor network data
Stock exchange, transactions in retail chains, ATM
operations in banks, credit card transactions.
Data mining techniques play a key role in modeling
and analyzing this data.
12/9/08, Sandia
National Labs
5
What is Anomaly?
Event that is unusual
Event that doesn’t occur frequently
Predefined event
What is unusual?
What is deviation?
12/9/08, Sandia
National Labs
6
What is Anomaly in Stream Data?
Rare - Anomalous – Surprising
Out of the ordinary
Not outlier detection
No knowledge of data distribution
Data is not static
Must take temporal and spatial values into account
May be interested in sequence of events
Ex: Snow in upstate New York is not an anomaly
Snow in upstate New York in June is rare
Rare events may change over time
12/9/08, Sandia
National Labs
7
Statistical View of Anomaly
Outlier
Data item that is outside the normal
distribution of the data
Identify by Box Plot
Image from Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002.
12/9/08, Sandia
National Labs
8
Statistical View of Anomaly
Identify by looking at distribution
THIS DOES NOT WORK with stream data
Image from www.wikipedia.org, Normal distribution.
12/9/08, Sandia
National Labs
9
Data Mining View of Anomaly
Classification Problem
Build classifier from training data
Problem is that training data shows what is
NOT an anomaly
Thus an anomaly is anything that is not viewed
as normal by the classification technique
MUST build dynamic classifier
Identify anomalous behavior
Signatures of what anomalous behavior looks
like
Input data is identified as anomaly if it is similar
enough to one of these signatures
Mixed – Classification and Signature
12/9/08, Sandia
National Labs
10
Visualizing Anomalies
Temporal Heat Map (THM) is a visualization technique
for streaming data derived from multiple sensors.
Two dimensional structure similar to an infinite table.
Each row of the table is associated with one sensor
value.
Each column of the table is associated with a point in
time.
Each cell within the THM is a color representation of
the sensor value
Colors normalized (in our examples)
0 – While
0.5 – Blue
1.0 - Red
12/9/08, Sandia
National Labs
11
THM of VoIP Data
VoIP traffic data was provided by Cisco Systems and
represents logged VoIP traffic in their Richardson, Texas
facility from Mon Sep 22 12:17:32 2003 to Mon Nov 17
11:29:11 2003.
12/9/08, Sandia
National Labs
12
Spatiotemporal Stream Data
Records may arrive at a rapid rate
High volume (possibly infinite) of continuous data
Concept drifts: Data distribution changes on the fly
Data does not necessarily fit any distribution
pattern
Multidimensional
Temporal
Spatial
Data are collected in discrete time intervals,
Data are in structured format, <a1, a2, …>
Data hold an approximation of the Markov
property.
12/9/08, Sandia
National Labs
13
Spatiotemporal Environment
Events arriving in a stream
At any time, t, we can view the state
of the problem as represented by a
vector of n numeric values:
Vt = <S1t, S2t, ..., Snt>
V1
S1
S2
…
Sn
S11
S21
…
Sn1
V2
S12
S22
…
Sn2
…
…
…
…
…
Vq
S1q
S2q
…
Snq
Time
12/9/08, Sandia
National Labs
14
Data Stream Modeling
Single pass: Each record is examined at most once
Bounded storage: Limited Memory for storing synopsis
Real-time: Per record processing time must be low
Summarization (Synopsis )of data
Use data NOT SAMPLE
Temporal and Spatial
Dynamic
Continuous (infinite stream)
Learn
Forget
Sublinear growth rate - Clustering
12/9/08, Sandia
National Labs
15 15
MM
A first order Markov Chain is a finite or countably infinite
sequence of events {E1, E2, … } over discrete time
points, where Pij = P(Ej | Ei), and at any time the future
behavior of the process is based solely on the current
state
A Markov Model (MM) is a graph with m vertices or states,
S, and directed arcs, A, such that:
S ={N1,N2, …, Nm}, and
A = {Lij | i 1, 2, …, m, j 1, 2, …, m} and Each arc,
Lij = <Ni,Nj> is labeled with a transition probability
Pij = P(Nj | Ni).
12/9/08, Sandia
National Labs
16
Problem with Markov Chains
The required structure of the MC may not be certain
at the model construction time.
As the real world being modeled by the MC
changes, so should the structure of the MC.
Not scalable – grows linearly as number of events.
Our solution:
Extensible Markov Model (EMM)
Cluster real world events
Allow Markov chain to grow and shrink
dynamically
12/9/08, Sandia
National Labs
17
Outline
Introduction
EMM Overview
EMM Applications to Anomaly
Detection
EMM Applications to Nuclear Testing
Future Work
12/9/08, Sandia
National Labs
18
Extensible Markov Model (EMM)
Time Varying Discrete First Order Markov Model
Nodes are clusters of real world states.
Learning continues during application phase.
Learning:
Transition probabilities between nodes
Node labels (centroid/medoid of cluster)
Nodes are added and removed as data arrives
12/9/08, Sandia
National Labs
19
Related Work
Splitting Nodes in HMMs
Create new states by splitting an existing state
M.J. Black and Y. Yacoob,”Recognizing facial expressions in image
sequences using local parameterized models of image motion”, Int. Journal
of Computer Vision, 25(1), 1997, 23-48.
Dynamic Markov Modeling
States and transitions are cloned
G. V. Cormack, R. N. S. Horspool. “Data compression using dynamic
Markov Modeling,” The Computer Journal, Vol. 30, No. 6, 1987.
Augmented Markov Model (AMM)
Creates new states if the input data has never been seen
in the model, and transition probabilities are adjusted
Dani Goldberg, Maja J Mataric. “Coordinating mobile robot group behavior
using a model of interaction dynamics,” Proceedings, the Third
International Conference on Autonomous Agents (agents ’99), Seattle,
Washington
12/9/08, Sandia
National Labs
20
EMM vs AMM
Our proposed EMM model is similar to AMM, but is more
flexible:
EMM continues to learn during the application phase.
State matching is determined using a clustering
technique.
EMM not only allows the creation of new nodes, but
deletion (or merging) of existing nodes. This allows the
EMM model to “forget” old information which may not be
relevant in the future. It also allows the EMM to adapt to
any main memory constraints for large scale datasets.
EMM performs one scan of data and therefore is suitable
for online data processing.
12/9/08, Sandia
National Labs
21
EMM
Extensible Markov Model (EMM): at any time t,
EMM consists of an MM and algorithms to modify
it, where algorithms include:
EMMSim, which defines a technique for matching
between input data at time t + 1 and existing states
in the MM at time t.
EMMIncrement algorithm, which updates MM at
time t + 1 given the MM at time t and classification
measure result at time t + 1.
Additional algorithms may be added to modify the
model or for applications.
12/9/08, Sandia
National Labs
22
EMMSim
Find closest node to incoming event.
If none “close” create new node
Labeling of cluster is centroid/medoid of
members in cluster
Problem
Nearest Neighbhor O(n)
BIRCH O(lg n)
• Requires second phase to recluster
initial
12/9/08, Sandia
National Labs
23
EMMIncrement
<18,10,3,3,1,0,0>
<17,10,2,3,1,0,0>
<16,9,2,3,1,0,0>
<14,8,2,3,1,0,0>
2/3
2/3
2/21
2/3
1/1
1/2
1/2
N3
N1
1/3
N2
1/1
1/2
1/1
<14,8,2,3,0,0,0>
<18,10,3,3,1,1,0.>
12/9/08, Sandia
National Labs
24
EMMDecrement
N1
N3
1/3
1/3
2/2
1/3
N2
1/2
N5
12/9/08, Sandia
National Labs
N1
1/3
N3
1/3
1/6
Delete N2
1/6
1/3
N6
N5
1/6
N6
25
EMM Advantages
Dynamic
Adaptable
Use of clustering
Learns rare event
Scalable:
Growth of EMM is not linear on size of
data.
Hierarchical feature of EMM
Creation/evaluation quasi-real time
Distributed / Hierarchical extensions
12/9/08, Sandia
National Labs
26
Growth of EMM
number of state in model
800
700
600
threshold 0.994
500
threshold 0.995
400
threshold 0.996
300
threshold 0.997
200
threshold 0.998
100
1502
1423
1344
1265
1186
1107
1028
949
870
791
712
633
554
475
396
317
238
159
80
1
0
number of input data (total 1574)
Servent Data
12/9/08, Sandia
National Labs
27
EMM Performance – Growth Rate
Data
Ser
went
Ouse
Sim 0.99
Jaccrd 156
Dice
72
Cosine 11
Ovrlap 2
Jaccrd 56
Dice
40
Cosine 6
Ovrlap 1
12/9/08, Sandia
National Labs
Threshold
0.992 0.994 0.996 0.998
190
268
389
667
92
123
191
389
14
19
31
61
2
3
3
4
66
81
105
162
43
52
66
105
8
10
13
24
1
1
1
1
28
EMM Performance – Growth Rate
12/9/08, Sandia
National Labs
Minnesota Traffic Data
29
Outline
Introduction
EMM Overview
EMM Applications to Anomaly
Detection
EMM Applications to Nuclear Testing
Future Work
12/9/08, Sandia
National Labs
30
Datasets/Anomalies
MnDot – Minnesota Department of Transportation
Automobile Accident
Ouse and Serwent – River flow data from England
Flood
Drought
KDD Cup’99
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Intrusion
Cisco VoIP – VoIP traffic data obtained at Cisco
Unusual Phone Call
12/9/08, Sandia
National Labs
31
Rare Event Detection
Detected unusual
weekend traffic pattern
Weekdays Weekend
12/9/08, Sandia
National Labs
Minnesota DOT Traffic Data
32
Our Approach to Detect Anomalies
By learning what is normal, the model can
predict what is not
Normal is based on likelihood of occurrence
Use EMM to build model of behavior
We view a rare event as:
Unusual event
Transition between events states which
does not frequently occur.
Base rare event detection on determining
events or transitions between events that do
not frequently occur.
Continue learning
12/9/08, Sandia
National Labs
33
EMMRare
EMMRare algorithm indicates if the current
input event is rare. Using a threshold
occurrence percentage, the input event is
determined to be rare if either of the following
occurs:
The frequency of the node at time t+1 is
below this threshold
The updated transition probability of the MC
transition from node at time t to the node at
t+1 is below the threshold
12/9/08, Sandia
National Labs
34
Determining Rare
Occurrence Frequency (OFc) of a node Nc :
OFc =
CNc
CN
i
i
Normalized Transition Probability (NTPmn),
from one state, Nm, to another, Nn :
NTPmn = CL
mn
CN
i
i
12/9/08, Sandia
National Labs
36
EMMRare
Given:
• Rule#1: CNi <= thCN
• Rule#2: CLij <= thCL
• Rule#3: OFc <= thOF
• Rule#4: NTPmn <= thNTP
Input: Gt: EMM at time t
i: Current state at time t
R= {R1, R2,…,RN}: A set of rules
Output: At: Boolean alarm at time t
Algorithm:
1 Ri = True
At =
0 Ri = False
12/9/08, Sandia
National Labs
37
Rare Event in Cisco Data
12/9/08, Sandia
National Labs
38
Risk assessment
Problem: Mitigate false alarm rate while maintaining a
high detection rate.
Methodology:
Historic feedbacks can be used as a free resource to
take out some possibly safe anomalies
Combine anomaly detection model and user’s
feedbacks.
Risk level index
Evaluation metrics: Detection rate, false alarm rate.
Detection rate
Detection rate = TP/(TP+TN)
False alarm rate
False alarm rate = FP/(TP+FP)
Operational Curve
12/9/08, Sandia
National Labs
39
Reducing False Alarms
•Calculate Risk using historical feedback
•Historical Feedback:
•Count of true alarms:
12/9/08, Sandia
National Labs
40
Detection Rate Experiments
DETECTION RATE OF ANOMALY DETECTION AND RISK ASSESSMENT MODELS
1
0.5
0
16
ANOMALY DETECTION
RISK ASSESSMENT
18
20
22
24
26
28
(a) EUCLIDEAN THRESHOLD FOR CLUSTERING (th)
30
32
1
0.5
0
0
ANOMALY DETECTION
RISK ASSESSMENT
0.1
1
0.3
0.4
0.5
0.6
0.7
0.8
(b) RISK ASSESSMENT WEIGHT FACTOR (alpha)
0.9
1
ANOMALY DETECTION
RISK ASSESSMENT
0.5
0
0
0.2
50
100
150
200
250
300
(c) EMM STATE CARDINALITY THRESHOLD (thNode)
350
400
1
0.5
0
0
12/9/08, Sandia
National Labs
ANOMALY DETECTION
RISK ASSESSMENT
50
100
150
200
250
(d) EMM TRANSITION CARDINALITY THRESHOLD (thLink)
300
41
False Alarm Rate
FALSE ALARM RATE OF ANOMALY DETECTION AND RISK ASSESSMENT MODELS
1
ANOMALY DETECTION
RISK ASSESSMENT
0.5
0
16
18
1
20
22
24
26
28
(a) EUCLIDEAN THRESHOLD FOR CUSTERING (th)
0.1
1
1
0.5
0
0
12/9/08, Sandia
National Labs
0.2
0.3
0.4
0.5
0.6
0.7
0.8
(b) RISK ASSESSMENT WEIGHT FACTOR (alpha)
0.9
1
ANOMALY DETECTION
RISK ASSESSMENT
0.5
0
0
32
ANOMALY DETECTION
RISK ASSESSMENT
0.5
0
0
30
50
100
150
200
250
300
(c) EMM STATE CARDINALITY THRESHOLD (thNode)
350
400
ANOMALY DETECTION
RISK ASSESSMENT
50
100
150
200
250
(d) EMM TRANSITION CARDINALITY THRESHOLD (thLink)
300
42
Outline
Introduction
EMM Overview
EMM Applications to Anomaly
Detection
EMM Applications to Nuclear Testing
Future Work
12/9/08, Sandia
National Labs
43
Research Objectives
Apply proven spatiotemporal modeling technique to seismic
data
Construct EMM to model sensor data
Local EMM at location or area
Hierarchical EMM to summarize lower level models
Represent all data in one vector of values
EMM learns normal behavior
Develop new similarity metrics to include all sensor data types
(Fusion)
Apply anomaly detection algorithms
12/9/08, Sandia
National Labs
44
Input Data Representation
•Vector of sensor values (numeric) at
precise time points or aggregated over
time intervals.
•Need not come from same sensor
types.
•Similarity/distance between vectors
used to determine creation of new nodes
in EMM.
12/9/08, Sandia
National Labs
45
EMM with Seismic Data
Input – Wave arrivals (all or one per sensor)
Identify states and changes of states in seismic
data
Wave form would first have to be converted into a
series of vectors representing the activity at
various points in time.
Initial Testing with RDG data
Use amplitude, period, and wave type
12/9/08, Sandia
National Labs
46
New Distance Measure
Data = <amplitude, period, wave type>
Different wave type = 100% difference
For events of same wave type:
50% weight given to the difference in amplitude.
50% weight given to the difference in period.
If the distance is greater than the threshold, a state
change is required.
amplitude =
| amplitudenew – amplitudeaverage | / amplitudeaverage
period =
| periodnew – periodaverage | / periodaverage
12/9/08, Sandia
National Labs
47
EMM with Seismic Data
States 1, 2, and
3 correspond to
Noise, Wave A,
and Wave B
respectively.
12/9/08, Sandia
National Labs
48
Preliminary Testing
RDG data February 1, 1981 – 6
earthquakes
Find transition times close to known
earthquakes
9 total nodes
652 total transitions
Found all quakes
12/9/08, Sandia
National Labs
49
Outline
Introduction
EMM Overview
EMM Applications to Anomaly
Detection
EMM Applications to Nuclear Testing
Future Work
12/9/08, Sandia
National Labs
50
Ongoing/Future Work
Extend to Emerging Patterns
Extend to Hierarchical/Distributed
Yu Su
Test with more data – KDD Cup
Compare to other approaches
Charlie Isaksson
Apply to nuclear testing
12/9/08, Sandia
National Labs
51
Hierarchical EMM
Summary
EMM
Regional
EMM
Local
EMM
12/9/08, Sandia
National Labs
Local
EMM
Regional
EMM
Local
EMM
Local
EMM
Local
EMM
52
12/9/08, Sandia
National Labs
53
References
Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio-Temporal
Data”, Proceedings of the First International Workshop on Knowledge Discovery in Multimedia
and Complex Data, May 2002, pp 1-9.
Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Variables
to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Conference, May 2003, pp
519-531.
Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE
ICDM Conference, November 2004, pp 371-374.
Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic
Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in
Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.)
Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data
Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp 43-50.
Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic
Anomalies,” International Journal of Computer Science and Network Security, Vol 6, No 6, June
2006, pp 258-265.
Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,”
Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) Symposium,
November 26, 2007, Shreveport Louisiana.
12/9/08, Sandia
National Labs
54