dc10-burroughs-bayesian

Download Report

Transcript dc10-burroughs-bayesian

Correlating Network Attacks Using
Bayesian Multiple Hypothesis Tracking
Daniel J. Burroughs
Institute for Security Technology Studies
Thayer School of Engineering
Dartmouth College
Dartmouth College
May 1, 2002
Thayer School of
Engineering
Outline
•
•
•
•
•
•
•
•
Institute for Security Technology Studies
Needs and goals
System overview
Sensor Modeling
Attacker Modeling
Hypothesis Management
Testing and Evaluation
Summary and Future Work
Dartmouth College
Thayer School of
Engineering
Institute for Security Technology Studies
• Security and counter-terrorism research center
• Funded by the NIJ
• Main focus is on computer security
• Investigative Research for Infrastructure
Assurance (IRIA)
• Joint effort with Thayer School of Engineering
Dartmouth College
Thayer School of
Engineering
Outline
•
•
•
•
•
•
•
•
Institute for Security Technology Studies
Needs and goals
System overview
Sensor Modeling
Attacker Modeling
Hypothesis Management
Testing and Evaluation
Summary and Future Work
Dartmouth College
Thayer School of
Engineering
The Internet and Security in a Nutshell
IDS
ALERT!
ALERT!
IDS
Dartmouth College
Thayer School of
Engineering
What is the Need?
• Distributed and/or coordinated attacks
– Increasing rate and sophistication
• Infrastructure protection
– Coordinated attack against infrastructure
– Attacks against multiple infrastructure components
• Overwhelming amounts of data
– Huge effort required to analyze
– Lots of uninteresting events
Dartmouth College
Thayer School of
Engineering
Outline
•
•
•
•
•
•
•
•
Institute for Security Technology Studies
Needs and goals
System overview
Sensor Modeling
Attacker Modeling
Hypothesis Management
Testing and Evaluation
Summary and Future Work
Dartmouth College
Thayer School of
Engineering
What is the System?
•
•
•
•
Reorganization of existing data
Data fusion
Building situational knowledge
Not an intrusion detection system
RealSecure
SHADOW
Tracking
System
Snort
Dartmouth College
Security Database
Thayer School of
Engineering
Network Centered View
• Network viewed in
isolation
• Limited view of
attacker’s activity
• Defensive posture
Dartmouth College
Thayer School of
Engineering
Distributed Attack
Denial of Service
Dartmouth College
Thayer School of
Engineering
Attacker Centered View
• More complete picture
• Information gathering
• Requires cooperation
and data fusion
Dartmouth College
Thayer School of
Engineering
Radar Tracking
RealSecure
• Multiple sensors
• Multiple targets
• Heterogeneous
sensors
• Real-time tracking
• Incomplete data
• Inaccurate data
Snort
SHADOW
Dartmouth College
Thayer School of
Engineering
Gather and Correlate
• Collecting data
– Time correlation, communications, common
formatting, etc.
– These issues are addressed by numerous projects
• IDEF, IDMEF, CIDF, D-Shield, Incidents.org, etc.
• Correlating data
– How can we tell what events are related?
– Attacker’s goals determine behavior
– Multiple hypothesis tracking
Dartmouth College
Thayer School of
Engineering
Multiple Hypothesis Tracking
Port
Scan
Attack 1:
Port
Scan
Buffer
Overflow
Buffer
Overflow
• Events analyzed
on arrival
• Scenario created
OR
Attack
Stream1:1
Port
Scan
Attack 2:
Buffer
Overflow
• Alternate
hypothesis
Dartmouth College
Thayer School of
Engineering
Hypothesis Evaluation
• Hypotheses are evaluated based on the behaviors of
the sensor and target
• What real-world event caused the given sensor output?
• How likely is it that the target moved to this position?
1

pt k , sk   Lk  yk | sk  p t k , sk 
C
Dartmouth College
Thayer School of
Engineering
Outline
•
•
•
•
•
•
•
•
Institute for Security Technology Studies
Needs and goals
System overview
Sensor Modeling
Attacker Modeling
Hypothesis Management
Testing and Evaluation
Summary and Future Work
Dartmouth College
Thayer School of
Engineering
IDS Overview
• Two methods of intrusion detection
– Signature detection (pattern matching)
• Low false positive / Detects only known attacks
– Statistical anomaly detection
• High false positive / Detects wider range of attacks
• Two domains to be observed
– Network
– Host
Dartmouth College
Thayer School of
Engineering
Signature Detection vs. Anomaly Detection
• Modeling signature detection is easy
– If a known attack occurred in an observable area,
then p(detection) = 1, else p(detection) = 0
• Modeling anomaly detection is more difficult
– Noisy and/or unusual attacks are more likely seen
• Denial of Service, port scans, unused services, etc.
– Other types of attacks may be missed
• Malformed web requests, some buffer overflows, etc.
Dartmouth College
Thayer School of
Engineering
Event Measurements
• Minimal feature set is extracted from reports
–
–
–
–
Source IP, destination IP
Source port, destination port
Type of attack
Time
• These are then used to describe a hyperspace
through which the attack moves
Dartmouth College
Thayer School of
Engineering
Bayesian Inference
• Forward response of sensor is well known
– Given real-world event x, what is H(x)?
• We need to reason backwards
– Given sensor output H(x), what is x?
• Forward response and prior distribution of x
– Probability of H(x) given x
– Probability of a particular x existing
p x | H ( x)  
L H ( x ) | x  p  x 
 LH ( x) | x px dx
Dartmouth College
Thayer School of
Engineering
Outline
•
•
•
•
•
•
•
•
Institute for Security Technology Studies
Needs and goals
System overview
Sensor Modeling
Attacker Modeling
Hypothesis Management
Testing and Evaluation
Summary and Future Work
Dartmouth College
Thayer School of
Engineering
Attacker Model
• Attackers are not as easy to observe
– Often we are only able to observe them through
the sensors (IDS)
• State of the attack is difficult to describe
• We have three sources of attack data
– Simulation
– Dartmouth / Thayer network
– Def Con
Dartmouth College
Thayer School of
Engineering
Simulation
• Purely generated data
– Models for generating attack sequences and noise
– Highly controllable – good for development
• Generated attacks with ‘background noise’
– Use Thayer IDS for background noise
– More interesting for testing
Dartmouth College
Thayer School of
Engineering
Dartmouth / Thayer Network
ISTS
Snort
SHADOW
SignalQuest
Switch
Snort
Snort
SHADOW
Switch
Switch
Switch
Dartmouth College
Thayer School of
Engineering
Def-Con Capture-The-Flag
• Hacker game
• Unrealistic data in some aspects
– Lack of stealth, lack of firewall, etc.
• Many attacks, many scenarios
– 16,250 events in 2.5 hours
– 89 individual scenarios
• Classified by Oliver Dain at Lincoln Labs
Dartmouth College
Thayer School of
Engineering
State Problem
• Desire to describe state as Markovian process
– Reduces computational complexity and space
• Easy for an aircraft, difficult for an attack
– Non-linear, non-contiguous space
X, Y, Z
Yaw, Pitch, Roll
Position & Velocity
?
Dartmouth College
Thayer School of
Engineering
State Problem
• No simple method for describing state
• Use a history of events in the track
– Increases computational complexity
– Increases memory requirements
• Use a weighted window of past events
– Calculate various relationships between past and
current events.
Dartmouth College
Thayer School of
Engineering
Windowed History
• Minimum history needed to differentiate state
• Weighting of events to lend more value to
recent events
• Relationships calculated between pairs and
sequences of events
Xt-6
Xt-5
Xt-4
Xt-3
Xt-2
Xt-1
Xt
Dartmouth College
Thayer School of
Engineering
Common History
1a
2a
State
1
• Don’t care
which path
was taken
1b
2b
1c
State
2
• Just need to
distinguish
current state
2c
Dartmouth College
Thayer School of
Engineering
Predictive Model
• To determine likelihood of event belonging in
series, predictive models are needed
• Based on current state, what is the probability
distribution for the target motion?
• Different types of attacks have different
distributions
Dartmouth College
Thayer School of
Engineering
Attacker Motion Probability Distributions
Motion update for scanning
Motion update for DoS
(Denial of Service)
Events are readily distinguishable based on
arrival time and source IP distance
Dartmouth College
Thayer School of
Engineering
Feature Extraction
• Historical data sets used to determine good
differentiating feature sets
• These are used in combination to measure the
fitness of new events to scenarios
• Use neural net to discover complex patterns
Dartmouth College
Thayer School of
Engineering
Neural Net
• Empirically derived probability distributions
work well for simple attacks
– But is difficult to compute for more complex ones
• Machine learning is applied to solve this
– Neural net feeds from event feature set values
– Fitness function is calculated from this
Dartmouth College
Thayer School of
Engineering
Neural Net
• Fitness functions created for various feature
subsets
– i.e., rate of events vs. IP source velocity
• These values feed a neural net
• NN then determines overall fitness value
Dartmouth College
Thayer School of
Engineering
Outline
•
•
•
•
•
•
•
•
Institute for Security Technology Studies
Needs and goals
System overview
Sensor Modeling
Attacker Modeling
Hypothesis Management
Testing and Evaluation
Summary and Future Work
Dartmouth College
Thayer School of
Engineering
Hypothesis Management
• In the brute-force approach, each new event
doubles the number of hypotheses
• Without pruning, complexity grows
exponentially
Dartmouth College
Thayer School of
Engineering
Branch and Prune
• Calculate all possible hypotheses
• Prune back unlikely or completed ones
– Must be very aggressive in pruning
– Many hypotheses are not kept long
• Inefficient method of controlling growth
Dartmouth College
Thayer School of
Engineering
Selective Branching
• Often times, there is a clear winner
– Why bother creating hypotheses for other?
• Measure difference between fitness of top choice and
fitness of second choice
• If it is greater than a predetermined threshold, no
branching is needed
• Number of branches can be determined with
threshold
Dartmouth College
Thayer School of
Engineering
Preprocessing and Multi-pass
• Some sequences of events are simply related
• Port scans
– Noisy
• Many events
• Require many evaluations
– Easily grouped
• Preprocessing groups these into single larger
events
Dartmouth College
Thayer School of
Engineering
Multi-Pass Approach
• Develop small attack sequences initially
• Chain sequences together in later passes
– Small sequences become atomic events
• May aid ‘missing data’ problem
a
b
c
d
f
a-b-c-d
g
h
f-g-h
k
l
m
k-l-m
Dartmouth College
a-b-c-d-f-g-h-k-l-m
Thayer School of
Engineering
Outline
•
•
•
•
•
•
•
•
Institute for Security Technology Studies
Needs and goals
System overview
Sensor Modeling
Attacker Modeling
Hypothesis Management
Testing and Evaluation
Summary and Future Work
Dartmouth College
Thayer School of
Engineering
Testing and Evaluation
• Testing has been performed with data
collected from the Thayer network and
DefCon data sets
– Thayer testing used earlier probability distribution
method
– DefCon testing used machine learning approach
• Arranging for a live run at DefCon
Dartmouth College
Thayer School of
Engineering
Thayer Testing and Evaluation
• Testing performed on Thayer data
– Roughly 1500 events
– 20 Scenarios
– Roughly half of data were single events
Dartmouth College
Thayer School of
Engineering
Thayer Testing and Evaluation
• Accuracy measured by
number of correctly
placed scenario events
• Best hypothesis had
~20% of the single
events included in
tracks
• Most confident
hypothesis not always
most accurate
Dartmouth College
Thayer School of
Engineering
DefCon Testing and Evaluation
• Testing performed on DefCon data
–
–
–
–
2.5 Hour time slice
Roughly 16,000 events
89 Scenarios
Hand classified by Oliver Dain at Lincoln Labs
• Neural net approach used
– Trained with random time slice of data
Dartmouth College
Thayer School of
Engineering
DefCon Testing and Evaluation
• Testing performed on DefCon data
–
–
–
–
2.5 Hour time slice
Roughly 16,000 events
89 Scenarios
Hand classified by Oliver Dain at Lincoln Labs
• Neural net approach used
– Trained with random time slice of data
Dartmouth College
Thayer School of
Engineering
DefCon Testing and Evaluation
From Dain & Cunningham
(October, 2001)
Dartmouth College
Thayer School of
Engineering
DefCon Testing and Evaluation
• Accuracy measured by
number of correctly
placed scenario events
• Achieved higher
accuracy, but less stable
with fewer hypotheses
Dartmouth College
Thayer School of
Engineering
Outline
•
•
•
•
•
•
•
•
Institute for Security Technology Studies
Needs and goals
System overview
Sensor Modeling
Attacker Modeling
Hypothesis Management
Testing and Evaluation
Summary and Future Work
Dartmouth College
Thayer School of
Engineering
Summary
• Reorganize data already being collected
• Provide ‘Higher level’ view of situation
• Reduce the work of the security analyst
• Radar tracking analogy
• Multisensor data fusion
• Multiple hypothesis tracking
Dartmouth College
Thayer School of
Engineering
Future Work
• Incorporate wider variety of sensors
– Host-based IDS
– System logs
– Other network devices (firewall, router, etc…)
• Larger scale implementation
– Scaling, timing, communications
• Integration with network analysis tools
Dartmouth College
Thayer School of
Engineering
Final Summary
BAD
GOOD
Tracking
System
Questions?
Dartmouth College
Thayer School of
Engineering
Acknowledgements
•
•
•
•
•
•
•
•
Linda Wilson and George Cybenko
Oliver Dain & Richard Cunningham
Robert Gray
Robert Morris
Daniel Bilar
Goufei Jiang
Chris Brenton
Bill Stearns
Dartmouth College
Thayer School of
Engineering
Objectives
• Gather and correlate intrusion reports
• Develop attack sequences
• Reorganize existing data
• Using techniques from radar tracking
applications
Dartmouth College
Thayer School of
Engineering
Multiple Target Tracking
Scan:
1
2
3
4
5
6
Each scan is a sweep of the radar or sampling of the IDS reports
Targets at each sweep are clear, but the paths are not
Dartmouth College
Thayer School of
Engineering
Multiple Target Tracking
• Hypotheses are generated and evaluated as new data arrives
• Belief in a hypothesis is recalculated with additional data
Dartmouth College
Thayer School of
Engineering
Dartmouth / Thayer Network
Outside
IDS
100 Mb Switch
IDS
10 Mb Sw
IDS
10 Mb Sw
100 Mb Sw
Dartmouth College
Thayer School of
Engineering