Transcript Document
Information Visualization for
an Intrusion Detection System
Ching-Lung Fu
James Blustein
Daniel Silver
Overview
Research Objective:
explore / discover factors for building a better
IDS (network based)
Initial stage of our research
Short comings of IDS
Spatial Hypertext / visualization
ML & UM + IDS + SH
Recent Update
Revisit the IDS users
2
Problem Source
Rule based IDS
Machine Learning based IDS, high errors
resulting a network too restricted to be used, or
an IDS vulnerable to new types of attacks
Training Data imbalance: available “real-attack”
training examples are scarce
A machine learning algorithm need to “see” enough
examples to generalize to “unseen” future examples
Ambiguous data
Could a human expert do better?
Current Machine Learning algorithms cannot
generalize better than humans
3
Problem Source
High false detections
Preventing immediate response to the real
attacks
User’s trust
Unusable IDS Most system admins now
attend to the problem after the attack or after
the damage has been done.
4
Alternative IDS
Reduce the dependability on detection
mechanism
Visual intelligence
harnessing human abilities
keeps humans “in the loop”
contributing
judgment and sharing some
responsibility
personal involvement & empowerment
5
Alternative IDS
A visualization + machine learning tool
could provide the answer
6
SH as a visualization mechanism
Information Triage
What is Spatial Hypertext (SH) ?
Graphic workspace with freely manipulable
objects.
Relationship represented by color, proximity,
alignment, containment, etc.
Ambiguity & implicit
Examples in the next few pages
7
SH – example 1
8
9
Power of Visualization example 2
10
An on-line example
http://www.hivegroup.com/salesforce.html
11
SH as a visualization mechanism continued
Emerging information
Human has excellent visual intelligence
Able to contain lot of information
Please see my poster for a new
developing framework
12
Challenges
The information visualization cannot be effective
if the machine learning components cannot
deliver accurate information
The publicly available testing dataset are not
good enough
Data ambiguity always exist
The ML algorithms are not the bottleneck,
feature extraction processes are
The ML algorithms may be used to “mine” the
features used directly by visualization tools;
human eyes detect the anomalies
13
Revisit the IDS users
Most of them still rely on primitive tools
IDS are completely not trusted
Response to problems only after complaints
have been made
Many organizations refuse the visit as they do
not have an IDS — “Security through obscurity”
Some organizations simply unplug the important
system from the network to avoid unnecessary
exposures
14
Conclusion
Improve current ML based IDS as a
component
Data Mining on features for information
visualization
Spatial Hypertext – a hybrid approach in
which information visualization
complements the IDS
15
Questions ?
Ching-Lung Fu
Dalhousie Computer Science
<[email protected]>
16