Network Intrusion Detection using Random Forests.

Download Report

Transcript Network Intrusion Detection using Random Forests.

Network Intrusion Detection
Using Random Forests
Jiong Zhang
Mohammad Zulkernine
School of Computing
Queen's University
Kingston, Ontario, Canada
Outline







Motivation
Intrusion detection system
Data mining meets intrusion
detection
Proposed architecture
Challenges and solutions
Experimental results
Conclusion and future work
PST2005
Jiong Zhang and Mohammad Zulkernine
2
Motivation

Intrusion Prevention System
(firewall) can not prevent all attacks.
Intruder
Victim
Intruder
Firewall
Internet
PST2005
Jiong Zhang and Mohammad Zulkernine
3
Motivation (contd.)
Statistical data for intrusions
• Total losses of 2004 (reported):
$141,496,560.

Source: FBI survey for Year 2004
• 50% of security breaches are
undetected.

PST2005
Source: FBI Statistics for Year 2000
Jiong Zhang and Mohammad Zulkernine
4
Intrusion Detection
Techniques

Misuse Detection
• Extracts patterns of known intrusions
• Cannot detect novel intrusions
• Has low false positive rate

Anomaly Detection
• Builds profiles for normal activities
• Uses the deviations from the profiles to detect
attacks
• Can detect unknown attacks
• Has high false positive rate
PST2005
Jiong Zhang and Mohammad Zulkernine
5
Network Intrusion Detection
System (NIDS)




PST2005
Monitors network traffic to detect
intrusions
Monitors more targets on a network
Detects some attacks that hostbased systems miss
Does not affect network operations
Jiong Zhang and Mohammad Zulkernine
6
Current NIDS
Many current NIDSs (like snort) :

Rule-based

Unable to detect novel attacks

High maintenance cost
PST2005
Jiong Zhang and Mohammad Zulkernine
7
Rule Based vs. Data Mining

Rule based systems
Intrusion Data

Security Experts
Rules
Data mining based systems
Labeled Data
PST2005
Data Mining
Engine
Jiong Zhang and Mohammad Zulkernine
Patterns
8
Data Mining Meets
Intrusion Detection




PST2005
Extract patterns of intrusions for
misuse detection
Build profiles of normal activities for
anomaly detection
Build classifiers to detect attacks
Some IDSs have successfully
applied data mining techniques in
intrusion detection
Jiong Zhang and Mohammad Zulkernine
9
Proposed Architecture
Database
(On line)
Networks
Packets
Sensors
Audited
data On-line
PreProcessors
Alarms
Feature
vectors
Detector
Patterns
Training
data
Data Set
Off-line Preprocessor
Feature
vectors
Alarmer
On line
Off line
Pattern Builder
Database
(Off line)
Architecture of the proposed NIDS
PST2005
Jiong Zhang and Mohammad Zulkernine
10
Random Forests





Unsurpassable in accuracy among
the current data mining algorithms
Runs efficiently on large data set
with many features
Gives the estimates of what features
are important
No nominal data problem
No over-fitting
PST2005
Jiong Zhang and Mohammad Zulkernine
11
Imbalanced Intrusion
Problems

• Higher error rate for minority intrusions
• Some minority intrusions are more
dangerous
• Need to improve the performance for
the minority intrusions
Proposed Solution

• Down-sample the majority intrusions
and over-sample the minority intrusions
PST2005
Jiong Zhang and Mohammad Zulkernine
12
Feature Selection



PST2005
Essential for improving detection
rate
Reduces the computational cost
Many NIDSs select features by
intuition or the domain knowledge
Jiong Zhang and Mohammad Zulkernine
13
Feature Selection over
the KDD’99 Dataset

PST2005
Calculate variable
importance using
random forests.
Select the 38
most important
features in
detection.
Importance
-10
Feature

-5
0
5
10
15
3
23
10
35
33
17
8
6
32
14
24
5
36
40
13
12
4
16
34
22
1
2
29
31
38
37
30
18
19
41
27
9
26
11
28
25
39
15
7
20
21
Jiong Zhang and Mohammad Zulkernine
14
Some Features

The two most important features
• Feature 3. service type, such as http, telnet, and ftp
• Feature 23. count, # connections to the same host as
the current one during past two seconds

The three least important features
• Feature 7. land, 1 if connection is from/to the same
host/port; 0 otherwise
• Feature 20. num_outbound_cmds, # of outbound
commands in an ftp session
• Feature 21. is_hot_login, 1 if the login belongs to the
“hot” list; 0 otherwise
PST2005
Jiong Zhang and Mohammad Zulkernine
15
Parameter Optimization
for Random Forests

PST2005
600
0.00215
0.0021
Oob Error Rate
Time
500
0.00205
0.002
400
0.00195
300
0.0019
0.00185
Time
Optimize the
parameter Mtry of
random forests to
improve detection
rate.
Choose 15 as the
optimal value, which
reaches the
minimum of the oob
error rate.
Oob Error Rate

200
0.0018
0.00175
100
0.0017
0
0.00165
Jiong Zhang and Mohammad Zulkernine
5 10 15 20 25 30 35 38
Mtry
16
Performance Comparison
on the KDD’99 Dataset


Our approach
provides lower
overall error rate and
cost compared to the
best KDD’99 result.
Feature selection can
improve the
performance of
intrusion detection.
Ove rall Error Rate
7.35%
7.30%
7.25%
7.20%
7.15%
7.10%
7.05%
7.00%
6.95%
B e s t KD D
R e s ult
E xpe rim e nt
wit ho ut
f e a t ure
s e le c t io n
E xpe rim e nt
wit h f e a t ure
s e le c t io n
Cos t of M is clas s ification
0.234
0.233
0.232
0.231
0.23
0.229
0.228
0.227
0.226
0.225
B e s t KD D
R e s ult
PST2005
Jiong Zhang and Mohammad Zulkernine
E xpe rim e nt
wit ho ut
f e a t ure
s e le c t io n
E xpe rim e nt
wit h f e a t ure
s e le c t io n
17
Conclusion and Future Work



PST2005
Random forests algorithm can help
improve detection performance and
select features.
Sampling techniques can reduce the time
to build patterns and increase the
detection rate of minority intrusions.
In future, we will focus on anomaly
detection and a multiple classifier
architecture.
Jiong Zhang and Mohammad Zulkernine
18
PST2005
Jiong Zhang and Mohammad Zulkernine
19