Transcript project2

Statistical based IDS
background introduction
Statistical IDS background
• Why do we do this project
• Attack introduction
• IDS architecture
• Data description
• Feature extraction
• Statistical method introduction
• Result analysis
Project goals
• Related work
– Internet has various network attacks, including denial of
service attacks and port scans, etc.
– Overall traffic detection
– Flow-level detection
• Our goals
– Detect both attacks at the same time
– Differentiate DoS and port scans
Attack introduction
• TCP SYN flooding
- An important form of DoS attacks
- Exploit the TCP’s three-way handshake mechanism
and its limitation in maintaining half-open connection
- Feature: spoofed source IP
- Recent reflected SYN/ACK flooding attacks
Attack introduction
• Port scan
- horizontal scan
- Vertical scan
- Block scan
Feature: real source IP
address
P
O
R
T
N
U
M
B
E
R
V
E
R
T
I
C
A
L
BLOCK
HORIZONTAL
SOURCE IP
Statistical IDS architecture
• Learning part
• Detection part
Learning
Training
data
Real
traffic
stream
Data
preprocessing
Statistical
learning
Learned
models
Data
sequence
Statistical
detection
Detection
Reporting
result
Data description
• DARPA98 data
– The first standard corpora for evaluation of network
intrusion detection systems.
– From the Information Systems Technology Group
( IST ) of MIT Lincoln Laboratory,
– Under Defense Advanced Research Projects Agency
( DARPA ITO ) and Air Force Research Laboratory
( AFRL/SNHS ) sponsorship
– Seven weeks of training data
– Two weeks of detection data
Data description
• DARPA98 data format
897048008.080700 172.16.114.169.1024 > 195.73.151.50.25: S ACK
1055330111:1055330111(0) win 512 <mss 1460>
- Time stamp: 897048008.080700
- Source IP address + port: 172.16.114.169.1024
- Destination IP address + port: 195.73.151.50.25
- TCP flag: S (maybe other : R, F, P)
- ACK flag: ACK
- Other part of packet header:
1055330111:1055330111(0) win 512 <mss 1460>
Feature extraction
• Calculate the metrics in every 5 minute traffic
• Metrics
- For example:
SYN-SYN_ACK pair
SYN-FIN + SYN-RSTactive pair
traffic volume
SYN packet volume
……
Good Luck 
Statistical method
• Statistical based IDS
Goals: Using statistical metrics and
algorithm to differentiate the anomaly
traffic from benign traffic, and to
differentiate different types of attacks.
- Advantage: detect unknown attacks
- Disadvantage: false positive and false
negative
Hidden Markov Model (HMM)
• HMM is a very useful statistical learning
model. It has been successfully implemented
in the speech recognition.
- Advantage
1. analyzing sequence data (using observation
probability and transition probability to represent)
2. unsurprised data training and surprised data
training
3. high accuracy
- Disadvantage
comparatively long training time
Double Gaussian model
• Introduction
-
Two Gaussion distribution models are used to represent two
classes of behaviors
- Get the two probabilities of current behavior using different
two-class Gaussian parameters
- Compare them. The current behavior belongs to the larger
probability class.
• Training period
- Get the two-class Gaussian parameters
• Detection period
- Use two-class Gaussian parameters to get probabilities and
compare them
Double Gaussian model
• Advantage
– Simple, easy to understand
– Fast
• Disadvantage
– No sequence characteristic
Result analysis
• Evaluation
- Important quantitative analysis:
false positive + false negative
- Looking at metric value, and finding the
reasons
- Repeating experiments