Immune System Metaphors Applied to Intrusion Detection and
Download
Report
Transcript Immune System Metaphors Applied to Intrusion Detection and
Immune System Metaphors
Applied to Intrusion Detection
and Related Problems
by
Ian Nunn, SCS, Carleton University
[email protected]
Overview of Presentation
• Review of immune system properties of most
interest
• Algorithm design and the representation of
application domains
• Examples of two recognition algorithms
• Overview of application areas
• Focus on intrusion detection systems (IDS)
• Advantages of IS models and future research
• The IS model as a swarm system
Immune System Characteristics
of Interest
• The human immune system (IS) is a system of detectors
(principally B and T cells) that:
– After initial negative selection (tolerization), does not recognize
elements of the body (self)
– Is adaptable in that it can recognize over time, any foreign element
(non-self) including those never before encountered
– Remembers previous foreign element encounters
– Dynamically regenerates its elements
– Regulates the population size and diversity of its elements
– Is robust to input signal noise (recognition region) and detector loss
– Is distributed in nature with no central or hierarchical control
– Is error tolerant in that self recognition does not halt the system
– Is self-protecting since it is part of self
Representation of Self/Non-Self
• IS elements involved are cellular proteins and their peptide
sequences
• Recognition is based on matching
of structural regions called epitopes
on antigens and paratopes on
antibodies
• Shape space model: a parameterized
representation (genotype) of the
conformational form of self/nonself elements (phenotype)
IS Application Algorithm Design
• Requires a deep understanding of the problem domain
• Self/non-self discrimination the fundamental IS principle
• Steps in designing an IS algorithm:
– Identification of features allowing correct and complete self/nonself discrimination*
– Representation or encoding of features, particularly of continuous
real-valued parameters*. Ab and Ag feature strings of same length
facilitate algorithm performance analysis
– Determination of a matching or fitness function. Important for
evolution of Ab populations (affinity maturation)
– Selection of IS principles to apply, e.g. negative selection,
costimulation, affinity maturation, etc.
* This is hard stuff and an important step in applying any modeling technique whether
genetic algorithms or swarm simulations (recall for army ants the problem of deciding
what parameters to assign to the ants and to the environment and what values to allow).
Approach to Feature Selection
and Representation
• Antibodies and antigens represented by strings of features:
– The set of actual values observed such as sensor readings, voltages,
ASCII text is called the application’s phenotype
– The coded representation is called the application’s genotype
• A feature is encoded by symbols from a finite alphabet
• Some application feature domains:
– Binary variables: digital signals in computer systems
– Discrete real variables: ASCII character text
– Continuous real variables: real world sensors
• Continuous domains must be mapped onto discrete
domains since we work with finite alphabets to ensure
finite Ab/Ag population spaces
Phenotype Representation: Change
Detection Problem Domains
•
•
•
•
•
OS (UNIX) processes: sequences of top level system calls
Program execution: alphabet symbols represent op codes
File system: reduction to ASCII or binary strings
User behavior and interface use: keystrokes, mouse clicks
Time series data representation of a physical (plant)
processes: x/y position of a milling machine tool
• Memory accesses: memory address calls
• Local network traffic: TCP/IP packets: addresses, ports
• Network traffic through routers and gateways: TCP/IP
packets, addresses, volume
IS Phenotype Encoding and
Matching Using a Binary Model1
• Genotype = Phenotype : 32 bit string on a binary alphabet
•
Many matching (fitness) functions possible, e.g. for li a
contiguous substring of l 1’s in the complementary match
(Hamming distance)
M 0( Ag, Ab) li 26 in theaboveexample
i
Example: Use of a Binary Model
with a GA for Clonal Selection1
• Start with randomly generated Ag and possibly incomplete
Ab populations
• For each Ab in turn, compute its average match (fitness)
with a random fixed-size Ag subpopulation
• Use a standard GA with mutation but no crossover to
evolve successively better generations of antibodies
• Niches observed to develop in coverage space for genetic
commonalities (bacterial polysaccaride coating) if the
initial populations have a bias
• Self recognition minimized (without negative selection) by
selecting for more Ag specific instead of more general
antibodies – less likely to match self
Establishing Antibody Fitness
Random subpopulation
M 0( Ab2) 8
GA Evolution
________________
1011010111110111
1
of Antibodies
Use of a Negative Selection
Algorithm for Clonal Selection5,2
•
•
Want explicit self-filtering (tolerization)
Algorithm:
1. Generate the set S of self (sub)strings
2. Generate a set R0 of random strings
3. Match each string from R0 against S :
–
–
•
•
•
Match (non-complementary) on at least r contiguous
locations: reject
No match: add string to detector set R
How to generate detectors efficiently an issue
Match detector set against target strings to detect intruder
Strings can be on any alphabet
Negative
2
Selection Algorithm
Match Ab1: 10xx
Match Ab4: xx00
Self string to be protected: 1011 0111 0011 0000,
length of contiguous match substring r = 2
The Problem of
6
Holes
• For a particular choice of matching rule and Ab repertoire,
some non-self strings may not be found causing a hole in
the coverage space
• Let s1 and s2 be two
antibodies matching over
r-1 contiguous bits and
h1 and h2 be two antigens
• A detector that matches any r contiguous bits in h1 will
also match either s1 or s2 for the same feature string. The
same for h2. So h1 and h2 are undetectable.
Major Application Categories of
Immune System Theory
• Machine learning and pattern recognition: limited but
promising work done to date
• Associative memories: limited work done to date
• Elimination of identified elements:
– IS model: use the B cell and Tk cell “kill” disable viruses
– Use a phagocyte analogue for cleanup and garbage collection
– IBM virus lab and Forrest’s group at UNM have looked at this
• Recovery, repair and augmentation of identified elements :
– IS model: use the B cell and Tk cell analogue to deliver a positive
payload to an agent
– Very little work done to date
Application Areas (cont.)
• Detection problems – where most of the work has
occurred:
–
–
–
–
–
–
Fault: failure of a self element (industrial plant systems)
Change: any change in self (tumors)
Anomaly: unusual presentation of a self element
Virus: presence of a non-self element
Intrusion: attempt to gain access by non-self element
Many of the classical issues of computer and network
security involve some element of detection or self/nonself discrimination
The Intrusion Detection Problem
• Two classical types of intrusion detection systems
(IDS):
– Host-based: domain is a single machine possibly on a
network
– Network-based: domain is a network of hosts
• Two classes of problem:
– Anomaly detection: deviations from normal local
resource use and network traffic
– Misuse detection: usage identified with known system
vulnerabilities and security policies
Essential Requirements of a
Network-based IDS
• Robustness to host failures and noisy signals (anomalous
behavior)
• Easy (self-)configurability of hosts
• Easy extendibility to new hosts
• Scalability: extendible to large networks without
degradation of performance
• Adaptability: dynamically able to recognize new anomalies
• Efficiency: simple and low overhead operation
• Global analysis: able to correlate local events to form
global patterns
Network Representation
• Commonly represent problem as the connection events
(not message content) between computers
• Kim and Bentley4:
– Phenotype: 35 real-valued fields in four categories - connection
identifier, port vulnerabilities, TCP handshaking, traffic intensity
– Genotype: 35 genes. A detector gene has three “nucleotides”
(cluster number in (0, 9), min offset, max offset). An antibody or
antigen has a single real value.
– Cluster and offset tables established for each host at start
– A matching function maps an Ag or Ab value to a cluster and takes
the distance to the nearest cluster bound as the measure of
similarity
– Use positive detection events to evolve the offsets for clusters
Kim and Bentley Model4
New lower interval
bound for cluster 2
New upper interval
bound for cluster 2
Network Representation (cont.)
• Hofmeyr and Forrest3:
– Phenotype: 3 integer fields (source IP address,
destination IP address, service or port number)
– Genotype: for a detector, 49 bit binary string + state
Algorithmic
3
Refinements
• Detectors may have a lifetime at the end of which they are
replaced if they have not matched – maintain diversity
• Activation threshold and time decay on activation level to
deter limited autoimmune reaction to rare self strings
• Local activation causes a message to be sent to other hosts
decreasing their activation levels (cytokine costimulation)
• Matching rule may result in holes in coverage. A randomly
assigned permutation mask to control packet presentation
helps avoid this (MHC molecule host diversity)
contributing to population diversity.
• Each host has a unique detector set contributing to
diversity and self-protection across a population of hosts
The Hofmeyr and Forrest
3
Model
Antigen
pheotype
Host-based
refinement fields
Detector
state
Problem Posed by Computer
Applications
• The repertoire of human self proteins is fixed over
a lifetime
• In networks, valid hosts are added and deleted
without notice so what “self” is constantly
changes
• Among a fixed set of hosts, valid usage patterns
may change without notice
• One solution: costimulation by a trusted (human)
authority both at start and subsequent operation
• Much work needs to be done
Advantages of IS Models
• Adaptability through the ability to recognize
foreign patterns never before encountered
• Distributed detection contributes to:
– Diversity (shape space coverage)
– Robustness (failure of individual hosts)
– Scalability and extendibility
• Quick response to new variants of old attacks
• Ability to reproduce detectors of increasing fitness
while self-regulating the overall population
IS as a Swarm System
• The IS model has a number of characteristics in common with swarm
systems:
– Large populations of independent agents of characterizable classes
– Each agent has at most a very few characteristic simple behaviors:
•
•
•
•
•
Bind with another appropriate agent and activate (B and T cells)
Kill something (killer T cell)
Clone myself (B cell)
Secrete a signaling chemical or an antibody (T and B cells)
Live for a long time (memory B and T cells)
– Simple interactions with the environment:
• Special things that happen in lymphoid organs
• Secreting signal chemicals which alter environmental properties (cytokines
and inflammation)
– Self-organizing as an emergent property
– No centralized control over the system
Areas for Additional Research
• Matching rules with good computational
properties, perhaps application specific ones
• Self/non-self representation and encoding
• Algorithms for generating detector sets
• Other selection algorithms
• Incorporation of additional IS characteristics
• Detector set populations: evolution, dynamics and
emergent properties at the species level
References
1.
2.
3.
4.
Forrest, Smith, Javornik and Perelson. Using Genetic Algorithms to
Explore Pattern Recognition in the Immune System. Evolutionary
Computation, 1(3):191-211, 1993.
Forrest, Allen, Perelson and Cherukuri. Self-Nonself Discrimination
in a Computer. Proceeding of the 1994 IEEE Symposium on
Research in Security and Privacy, Los Alamitos CA, 1994.
Hofmeyr and Forrest. Immunity by Design: An Artificial Immune
System. In Proceedings of 1999 GECCO Conference, 1999.
Kim and Bentley. Negative selection and niching by an artificial
immune system for network intrusion detection. In Late Breaking
Papers at the 1999 Genetic and Evolutionary Computation
Conference, Orlando, Florida, 1999.
References (cont.)
5.
6.
Forrest, Allen, Perelson and Cherukuri. A Change-Detection
Algorithm Inspired by the Immune System. Submitted to IEEE
Transactions on Software Engineering, 1995.
D'haeseleer, Forrest and Helman. An Immunological Approach to
Change Detection: Algorithms, Analysis, and Implications.
Proceeding of the 1994 IEEE Symposium on Research in Security
and Privacy, 1996.