Project Schedule
Download
Report
Transcript Project Schedule
Net-Centric Software and Systems I/UCRC
Self-Detection of Abnormal Event Sequences
Project Lead: Farokh Bastani, I-Ling Yen, Latifur Khan
Date: April 7, 2011
Copyright © 2011 NSF Net-Centric I/UCRC.
All Rights Reserved.
2010/Current Project Overview
Self-Detection of Abnormal Event Sequences
Tasks:
1. Prepare Cisco event sequence data for
analysis tools.
2. Develop clustering, local outlier factor, and
probabilistic finite state automata (PFSA)
based technique for anomaly detection.
3. Apply the techniques on Cisco datasets,
analyze and validate the results.
4. Use streaming techniques, parallelization,
and prefix tree method to handle large
datasets from Cisco.
5. Enhance the anomaly detection tools for
on-the-fly anomaly detection.
Research Goals:
1. Develop a diverse set of anomaly detection
techniques for handling datasets with
different characteristics.
2. Handling large datasets is still a major issue
in current data mining research and it is
especially an issue in attributed event
sequences.
3. Develop run-time anomaly detection
techniques to detect non-crashing faults in
deployed systems to mitigate critical
failures and ensure software reliability.
Project Schedule:
Task 1: preprocessor
Task 1/2/3/4/5: Fine tuning
Task 2/3/4: varoius anomaly
detection techniques and
applying them
Task 5: on-the-fly detection
A M
10
J J A
S
O N D
J
F M A
11
Benefits to Industry Partners:
1. A comprehensive set of techniques and tools
to allow best analysis of different datasets.
2. Real-time on-the-fly anomaly detection
capability.
3. Rapid adaptation of the tools to handle other
application specific datasets.
2
Project Results to Date
Task
Status
Progress and Accomplishment
1. Prepare Cisco event sequence data for
analysis tools.
Use lex/yacc to implement a flexible processor.
Refine the preprocessor to eliminate the noisy data.
2. Develop clustering, local outlier factor
(LOF), and probabilistic finite state
automata (PFSA) based technique for
anomaly detection.
Completed the program for clustering, LOF, and
adapted MDI (minimal divergence inference) library
for state based anomaly detection.
3. Apply the techniques on Cisco
datasets, analyze and validate the
results.
The results show high precision and recall in
identifying injected anomalies. Currently working
with Cisco on result validation.
4. Use streaming techniques,
parallelization, and prefix tree method to
handle large datasets from Cisco.
Invented the prefix tree based approach, which
facilitates the analysis of large datasets, reduces
processing time over 20 folds.
5. Enhance the anomaly detection tools
for on-the-fly anomaly detection.
Completed MDI and LOF approaches to detect
anomalies on-the-fly. Updating preprocessor, filters,
and diagnostic output. Need to integrate the
software to Cisco Call Manager for online analysis.
Significant Finding/Accomplishment
Task Complete
Task Partially Complete
Task Not Started
3
Major Accomplishments, Discoveries, and Surprises
Various Methods for Comparison & integration
Real Time Processing Method:
Anomaly Detection for Event Sequences
Clustering
Density
Automata
Prefix-tree
based
K-Medoid
Prefix-tree
based LOF
MDI
Use prefix
tree traces
as input
Developed
Tool
Optimized
& Added
Anomaly
Detection
Capability
Collect Dt+T
Build At
Apply At–T
Collect Dt
Build At–T
Apply At–2T
t
t+T
Collect Dt+2T
Build At+T
Apply At
t+2T
t+3T
Prefix Tree Based Methods
Experimental Results:
Data Set: 2GB Cisco SDL trace logs (197,628 signal flows
with 18 manually injected anomalies). Conducted on a
PC with Intel Core i5 Duo 2.67 GHz CPU and 8 GB RAM.
2nd closest neighbor
4