Project name
Download
Report
Transcript Project name
NOVEL APPROACH FOR NETWORK
INTRUSION DETECTION
A.B. Babatope
N.A. Azeez
INTRODUCTION
Information that is not properly secured has the
tendency of being vulnerable to intrusions and threats.
Security measures ensure information maintains its
integrity.
Intrusion Detection Security (IDS) is one of the methods
of securing computer networks and systems as it detects
the attacks before gaining access to the system.
2
AIM AND OBJECTIVES
The aim of this project is to develop an IDS using Genetic
algorithm approach which is better for detecting unusual events
and threats within a computer network
Objectives include;
• To analyse the different Artificial Intelligence algorithms used in Intrusion
Detection systems
• To develop an Intrusion Detection System that detects more threats to
computer networks and less likely to produce errors.
3
CONTRIBUTIONS
Based on the result gotten, it was discovered the
intrusive attacks were more frequent on the Class A set
of IP addresses.
It was also discovered that the probe category of attacks
were most frequent across the system.
4
LITERATURE REVIEW
According to Scarfone and Mell (2007)
“Intrusion detection is the process of monitoring the
events occurring in a computer system or network and
analysing them for signs of possible incidents, which are
violations or imminent threats of violation of computer
security policies, acceptable use policies or standard
security practices.”
5
LITERATURE REVIEW
The following are types of network attacks;
• Denial of service
• Remote to User Attacks (R2L)
• User to Root Attacks (U2R)
• Probing
6
RELATED WORKS
7
TITLE
APPROACH
STRENGTH
WEAKNESS
The Design and
Implementation of
Intrusion Detection
System based on
Data Mining
Technology (Zhou &
Zhao, 2013)
Data Mining
Adaptive ability
Not time efficient
It takes less time to
implement.
Security issues
Implementation of an Intrusion sensor
Intrusion Detection
(Snortt)
System (Ourida,
2012)
GENETIC ALGORITHM
Genetic algorithm is a problem solving method that was
coined from the biological gene operators.
The decision variables of search problem are encoded as
strings of alphabets. The strings are the Chromosomes,
the alphabets are the Genes and the value of the gene is
called the Allele
10
GENETIC ALGORITHM
During the evolution, different operators are used to
process the chromosomes during each generation. The
operators include;
• Selection (or Reproduction)
• Crossover (or Recombination)
• Mutation
11
GENETIC ALGORITHM
Selection – is the phase where population individuals with
better fitness are selected, otherwise it gets damaged.
Crossover – is a process where each pair of individuals
selects randomly participates in exchanging their parents
with each other, until a total new population has been
generated.
Mutation – this involves diversifying the population due to
repeated use of crossover operators.
12
GENETIC ALGORITHM
13
I. Random generation of initial chromosomes
XVI. Fitness =
II. Set w1 = 0.2, w2 = 0.8, T = 0.5, Max Generations = 100
XVII. If Fitness > T
III. Set N = total number of record in training set
XVIII. Select fitted chromosomes into new selection pool
IV. Set generation counter = 0
XIX. End if
V. For each chromosome in population
XX. For each chromosome in new pool/population
VI.
Set A = 0, AB = 0
XXI. Select chromosome for breeding
VII.
For each record in dataset set
XXII. Apply crossover and mutation to new offspring
VIII.
IX.
If record matches chromosome
AB = AB + 1
XXIII. Place newly created chromosome into population
XXIV. End for Each
X.
End If
XXV. Kill old pool, new pool now current pool
XI.
If record matches only condition part
XXVI. Increment generation Counter by 1
XII.
XIII.
A=A+1
End If
XIV. End for Each record
XV. End for Each chromosome
XXVII. If generation Counter < Max Generation then
XXVIII. Goto line v
LINKING GA WITH INTRUSION DETECTION
Genetic algorithm is linked with intrusion detection by
using the GA to classify the different network
connections that the system comes across.
Each network connection is represented as a
chromosome by the genetic algorithm. The attributes of
each network connection is represented as genes in the
chromosome.
14
LINKING GA WITH INTRUSION DETECTION
The system acts on each network connection as a
chromosome. Therefore, the GA makes it possible for the
intrusion detection system to differentiate the different
types of network connections.
15
FITNESS FUNCTION
It is defined as a function which scales the value
individual relative to the rest of population. It computes
the best possible solutions from the amount of
candidates located in the population.
It is used to determine the most fit set of chromosomes
in respect to other chromosomes present, that will be
used for recombination in the next generation.
16
FITNESS FUNCTION
Algorithm I: Fitness Function Algorithm
17
SYSTEM DESIGN
Figure 1: Genetic Algorithm architecture
18
SYSTEM DESIGN
19
Table 1: Chromosome representation
Attribute Name
Number of Genes
Format
Duration
3
H:M:S
Protocol
1
Numeric
Source port
1
Numeric
Destination port
1
Numeric
Source IP
4
a.b.c.d
Destination IP
4
a.b.c.d
Attack name
1
String
IMPLEMENTATION
The software was developed using the Microsoft Visual
studio; an integrated Development Environment (IDE) with
the in-built programming language C#. The system was
developed as a console program
On initialisation of the software, random chromosomes are
generated and the system locates the dataset file and reads
the file. The system continues to iterate the results for each
generation until the number of generation is reached.
20
IMPLEMENTATION
Figure 2: Initialisation of the program
21
IMPLEMENTATION
22
DURATION
PROTOCOL
SOURCE
PORT
DESTINATION
PORT
SOURCE IP
DESTINATION IP
ATTACK
NAME
0:0:53
ftp-data
38127
1985
65.127.-1.203
243.192.194.177
rcp
-1:1:0
Auth
26586
55979
20.154.30.-1
51.70.111.254
rsh
0:0:-1
Rsh
62512
26370
160.72.57.42
135.-1.237.4
Phf
-1:1:0
Auth
26586
-1
242.41.151.123
239.18.88.223
Guess
0:0:-1
Rsh
62728
55979
20.154.30.-1
51.70.111.254
rsh
0:0:-1
http
-1
12106
104.65.17.124
-1.191.-1.244
Port-scan
0:0:39
Rsh
62728
-1
242.41.151.123
239.18.88.223
Guess
0:0:39
http
-1
12106
104.65.17.124
-1.191.-1.244
Port-scan
Table 2: The result of the intrusion detection system
DARPA DATASET
The Defense Advanced Research Projects Agency (DARPA) dataset
was created in 1998 out of the need to evaluate intrusion
detection systems by the Lincoln Laboratory of MIT. It was first
made to the public in February 1998.
There are three classes of attacks are present in the dataset
namely;
• Probe (Portscan)
• R2L (phf, guess)
• U2R (rlogin, rsh, rcp)
23
DARPA DATASET
24
The following show the distribution of the types of network
connections present in the dataset file;
Table 3: Distribution of the intrusive connections in the dataset
Probe
R2L
U2R
Portscan – 30
Phf – 1
Guess – 4
rlogin – 1
rsh – 2
rcp – 1
30
5
4
DARPA DATASET
25
U2R
R2L
Probe
Normal
0
20
40
60
80
Figure 3: Graphical distribution of the network connections
100
ANALYSIS OF RESULTS
Three set of results were selected to be analysed for
findings and notes. Each set of result consists of 50
records.
The first analysis was the classification of the source and
destination IP addresses. The second analysis is the
relationship between the protocol and the type of
intrusion
26
ANALYSIS OF RESULTS
27
• First run
Table 4: Classification of IP addresses of the first set of results
Source IP
Destination IP
Class A
20
50
Class B
30
0
Class C
0
0
Class D
0
0
Class E
0
0
ANALYSIS OF RESULTS
28
60
50
40
Source IP
Destination IP
30
20
10
0
Class A
Class B
Class C
Class D
Class E
Figure 4: Graphical classification of the IP addresses of the first set of results
ANALYSIS OF RESULTS
29
Table 5: Distribution of intrusion attacks to the Protocols for the first set of results
Protocols
Attack names
Phf
Port Scan
Rcp
rlogin
rsh
ftp-data
2
3
1
6
http
2
4
14
20
Rsh
3
2
3
8
Smtp
5
2
5
12
1
3
4
12
26
ftp
telnet
12
ANALYSIS OF RESULTS
30
telnet
Smtp
Attack
Attack
Attack
Attack
Attack
Rsh
http
ftp-data
names rsh
names rlogin
names rcp
names Port scan
names Phf
ftp
0
5
10
15
Figure 5: Graphical representation of the distribution of attacks to protocols for the first set of results
ANALYSIS OF RESULTS
31
• Second run
Table 6: Classification of IP addresses of the second set of results
Source IP
Destination IP
Class A
19
39
Class B
0
11
Class C
0
0
Class D
0
0
Class E
31
0
ANALYSIS OF RESULTS
32
45
40
35
30
25
20
15
10
5
0
Source IP
Destination IP
Class A
Class B
Class C
Class D
Class E
Figure 6: Graphical classification of the IP addresses of the second set of results
ANALYSIS OF RESULTS
33
Table 7: Distribution of intrusion attacks to the Protocols for the second set of results
Protocols
Attack names
Phf
Port Scan
Rcp
rlogin
rsh
ftp
2
4
6
ftp-data
11
21
32
8
4
12
21
29
http
Rsh
Smtp
telnet
ANALYSIS OF RESULTS
34
telnet
Smtp
Attack
Attack
Attack
Attack
Attack
Rsh
http
ftp-data
names rsh
names rlogin
names rcp
names Port scan
names Phf
ftp
0
5
10
15
20
25
Figure 7: Graphical representation of the distribution of attacks to protocols for the second set of results
ANALYSIS OF RESULTS
35
• Third run
Table 8: Classification of IP addresses of the third set of results
Source IP
Destination IP
Class A
21
50
Class B
8
0
Class C
21
0
Class D
0
0
Class E
0
0
ANALYSIS OF RESULTS
36
60
50
40
Source IP
Destination IP
30
20
10
0
Class A
Class B
Class C
Class D
Class E
Figure 8: Graphical classification of the IP addresses of the third set of results
ANALYSIS OF RESULTS
37
Table 9: Distribution of intrusion attacks to the Protocols for the third set of results
Protocols
Attack names
Phf
Port Scan
Rcp
rlogin
rsh
1
8
16
25
1
9
15
25
2
17
31
ftp
ftp-data
http
Rsh
Smtp
telnet
ANALYSIS OF RESULTS
38
telnet
Smtp
Attack
Attack
Attack
Attack
Attack
Rsh
http
ftp-data
names rsh
names rlogin
names rcp
names Port scan
names Phf
ftp
0
5
10
15
20
Figure 9: Graphical representation of the distribution of attacks to protocols for the third set of results
ANALYSIS OF RESULTS
39
• Four run
Table 10: Classification of IP addresses of the fourth set of results
Source IP
Destination IP
Class A
46
44
Class B
4
0
Class C
0
0
Class D
0
0
Class E
0
6
ANALYSIS OF RESULTS
40
50
45
40
35
30
25
20
15
10
5
0
Source IP
Destination IP
Class A
Class B
Class C
Class D
Class E
Figure 10: Graphical classification of the IP addresses of the fourth set of results
ANALYSIS OF RESULTS
41
Table 11: Distribution of intrusion attacks to the Protocols for the fourth set of results
Protocols
Attack names
Phf
Port Scan
Rcp
rlogin
rsh
http
4
5
9
Rsh
1
1
2
Smtp
9
30
39
14
36
ftp
ftp-data
telnet
ANALYSIS OF RESULTS
42
telnet
Smtp
Attack
Attack
Attack
Attack
Attack
Rsh
http
ftp-data
names rsh
names rlogin
names rcp
names Port scan
names Phf
ftp
0
5
10
15
20
25
30
35
Figure 11: Graphical representation of the distribution of attacks to protocols for the fourth set of results
FUTURE WORK
Genetic algorithm as an approach towards Intrusion
Detection system needs to continually improved on, and
the classification rules continually need to be enhanced
in order for the system to classify connections correctly.
43
CONCLUSION
Securing information against attacks is a process that
continuous research needs to be made on as new threats
come up in order to be up to date and minimise risk and
cost.
44
REFERENCES
Chittur, A., 2001. Model Generation for an Intrusion Detection System Using Genetic
Algorithms.
Hoque, M.S., Mukit, A. & Bikas, A.N., 2012. An Implementation of Intrusion Detection
System using Genetic Algorithm. International Journal of Network Security & Its
Applications (IJNSA), Vol. 4, No. 2, March 2012.
Li, W., 2004. Using Genetic Algorithm for Network Intrusion Detection. Mississippi State
University, Mississippi State, MS 39762.
Sastry, K., Goldberg, D., Kendall, G., 2005. Genetic Algorithms.
Scarfone, K. & Mell, P., 2007. Guide to Intrusion Detection and Prevention Systems (IDPS).
National Institute of Standards and Technology NIST special publication 800-94.
45
THANK YOU
46