ICT619 Intelligent Systems
Download
Report
Transcript ICT619 Intelligent Systems
ICT619 Intelligent
Systems
Topic 5: Genetic
Algorithms
Genetic Algorithms
Introduction
How GAs work
The TSP as an example
Business Applications of GA
Advantages of GA systems
Some issues related to GA based systems
Case Study
ICT619
2
What is a genetic algorithm?
GA part of the broader soft computing (aka
"computational intelligence") paradigm known as
evolutionary computation
First introduced by Holland (1975)
Inspired by possibility of problem solving through a
process of evolution
ICT619
3
What is a GA? (cont’d)
GA mimics biological evolution to generate better
solutions from existing solutions through
survival of the fittest
crossbreeding and
mutation
ICT619
4
What is a GA? (cont’d)
A GA is capable of finding solutions for many problems
for which no usable algorithmic solutions exist
GA methodology particularly suited for optimization
Optimization searches a solution space consisting of a
large number of possible solutions
GA reduces the search space through evolution of
solutions, conceived as individuals in a population
ICT619
5
Intelligence and Evolution
One way of understanding intelligence is as the capability of a creature
to adapt itself to an ever-changing environment
We normally think of adaptation as changes in the characteristics
(including behaviours) of a single animal in response to experiences
over its history
But adaptation is also change in the characteristics of a species, over
the generations, in response to environmental change
An individual creature is in competition with other individuals of the
same species for resources, mates etc.
There is also rivalry from other species which may be a direct (predator)
or indirect (food, water, land, etc.) threat
In nature, evolution operates on populations of organisms, ensuring by
natural selection that characteristics that serve the members well tend
to be passed on to the next generation, while those that don’t die out
ICT619
6
Evolution as Optimisation
Evolution can be seen as a process leading to the optimisation of a
population’s ability to survive and thus reproduce in a specific
environment.
Evolutionary fitness - the measure of the ability to respond adequately
to the environment, is the quantity that is actually optimised in natural
life
Consider a normal population of rabbits. Some rabbits are naturally
faster than others. Any characteristic has a range of variation that is due
to i) sexual reproduction and ii) mutation
We may say that the faster rabbits possess superior fitness, since they
have a greater chance of avoiding foxes, surviving and then breeding
If two parents have superior fitness, there is a good chance that a
combination of their genes will produce an offspring with even higher
fitness. We say that there is crossover between the parents’ genes
Over successive generations, the entire population of rabbits tends to
become faster to meet their environment challenges in the face of foxes
ICT619
7
How GAs work
A population of candidate solutions - mathematical
representations - is repeatedly altered until an optimal
solution is found
The GA evolutionary cycle
1. Starts with a randomly generated initial population of solutions
(1st generation)
2. Selects a population of better solutions (next generation) by
using a measure of 'goodness' (a fitness evaluation function)
3. Alters new generation population through crossbreeding and
mutation
Processes of selection (step 2) and alteration (step 3) lead to
a population with a higher proportion of better solutions
ICT619
8
How GAs work (cont’d)
The GA evolutionary cycle continues until an
acceptable solution is found in the current population,
or
some control parameter such as the maximum
number of generations is exceeded
ICT619
9
How solutions are represented
A series of genes, known as a chromosome,
represents one possible solution
Each gene in the chromosome represents one
component of the solution pattern
Each gene can have one of a number of possible
values known as alleles
The process of converting a solution from its original
form into a chromosome is known as coding
ICT619
10
How solutions are represented
(cont’d)
The most common form of representing a solution as a
chromosome is a string of binary digits (aka a binary
vector) eg 1010110101001
Each bit in this string is a gene with two alleles: 0 and 1
Other forms of representation are also used, eg,
integer vectors
Solution bit strings are decoded to enable them to be
evaluated using a fitness measure
ICT619
11
GA Selection
Selection in GA based on a process analogous to that
of biological evolution
Only the fittest survive and contribute to the gene pool
of the next generation
Fitness proportional selection
Each chromosome’s likelihood of being selected is
proportional to its fitness value.
Solutions failing selection are “bad”, and are discarded
ICT619
12
Alteration = Crossover + Mutation
Alteration refines good solutions from current generation to
produce next generation of solutions
Carried out by performing crossover and mutation
Crossover by splicing two chromosomes at a crossover point and
swapping the spliced parts
A better chromosome may be created by combining genes with
good characteristics from one chromosome with some good genes
in the other chromosome
Crossover carried out with a probability – typically 0.7
Chromosomes not crossed over are cloned
ICT619
13
Crossover and Mutation
Mutation
A random adjustment in the genetic composition
Can be useful for introducing new characteristics in a
population
May be counterproductive
Probability kept low: typically 0.001 to 0.01
ICT619
14
ICT619
An albino is a common mutation
15
The typical Genetic Algorithm
1. Represent the solution as a chromosome of fixed
length, choose size of population N, crossover
probability pc and mutation probability pm.
2. Define a fitness function f for measuring fitness of
chromosomes.
3. Create an initial solution population randomly of size
N:
x1, x2, …, xN
4. Use the fitness function f to evaluate the fitness value
of each solution in the current generation:
f(x1), f(x2), …, f(xN)
ICT619
16
The typical Genetic Algorithm
(cont’d)
5. Select “good” solutions based on fitness value.
Discard rest of the solutions.
6. If acceptable solution(s) found in the current
generation or maximum number of generations is
exceeded then stop.
7. Alter the solution population using crossover and
mutation to create a new generation of solutions with
population size N.
8. Go to step 4.
ICT619
17
The typical Genetic Algorithm
(cont’d)
ICT619
18
The Travelling Salesperson Problem
Given a set of n cities (A, B, C, ...) find a closed tour of all cities
with the shortest total distance d
Tour 'cost' may be something other than distance d
This is an optimization problem with following constraints
1.
2.
Each city to be visited once and only once
Total distance travelled must be shortest
possible
The time required to find a solution by exhaustive search
increases exponentially - the problem is NP-hard
Possible number of tours for n cities = n!/2n
1 million centuries for 50 cities at the rate of 1 billion tours per sec!
ICT619
19
The Travelling Salesperson Problem
In 1987, Martin Groetschel and Olaf Holland found an optimal tour of 666
interesting places in the world. Source: http://www.tsp.gatech.edu//index.html
The TSP example (cont’d)
Representation and coding of TSP solutions
The representation might be an ordered list of numbers each
representing a city nominally (known as order-based GA):
1) London
2) Venice
3) Dunedin
4) Singapore
5) Beijing
6) Phoenix
CityList1
CityList2
(3 5 7 2 1 6 4 8)
(2 5 7 6 8 1 3 4)
7) Tokyo
8) Victoria
Alternatively, the representation of the solution may be
encoded in binary on a matrix...
ICT619
21
The TSP example (cont’d)
Representation and coding of TSP solutions
A solution to the TSP problem is an ordered list of the n
cities
Each city is assigned 1 out of n possible positions
Representation of the solution may be visualised as a
table:
Each row represents a city
Each column associated with a tour position for cities
ICT619
22
The TSP example (cont’d)
The tour represented above is CAEBDC
One possible bit string code for this solution:
01000 00010 1000 00001 00100
(rows written end to end)
Binary bit strings can produce "faulty" chromosomes
needing repair
An integer vector scheme produced a 100 city tour
9.4% above optimal cost
ICT619
23
An Optimal 100-City Tour
ICT619
24
Business Applications of GA
Increasing number of industrial and business
applications of GA since late 1980s
In business, applications include (Kingdon 1997)
Portfolio optimisation
Bankruptcy prediction
Financial forecasting
Fraud detection
Scheduling
Design of complex
machines eg. jet engines
ICT619
25
Business Applications of GA (cont’d)
First Quadrant - investment firm in
California
Started using GA technique in 1993
Uses GA to manage US$5 billion worth of
investments in 17 different countries
Their evolved model earns, on average, $255 for
every $100 invested over six years, as opposed to
$205 for other types of modeling systems (Begley &
Beals, 1995)
ICT619
26
Advantages of GA systems
Useful when no algorithms or heuristics are
available for solving a problem
No formulation of the solution is required - only
"recognition" of a good solution
A GA system can be built as long as a solution
representation and an evaluation scheme can
be worked out
So minimal domain expert access is required
ICT619
27
Advantages of GA systems
GA can act as an alternative to Expert Systems if
number of rules is too large or
the nature of the knowledge-base too dynamic
Traditional optimization techniques if
constraints and objective functions are non-linear
and/or discontinuous
ICT619
28
Advantages of GA systems (cont'd)
GA does not guarantee optimal solutions, but produce
near optimal solutions which are likely to be very good
Solution time with GA is highly predictable
Determined by
Size of the population
Time taken to decode and evaluate a solution and
Number of generations of population
GA uses simple operations to solve problems that are
computationally prohibitive otherwise
Example: the TSP problem
ICT619
29
Advantages of GA systems (cont'd)
Because of simplicity, GA software are
Reasonably sized and self-contained
Easier to embed them as a module in another
system
GA can also aid in developing intelligent
business systems that use other
methodologies, eg,
Building the rule base of an expert system
Finding optimal neural networks
ICT619
30
Some issues related to GA based
systems
Level of explainability
Capability to explain why a particular solution was
arrived at is practically nil
The system does not know what a fitness value really
means
Scalability
Moderately scalable
Accommodates increased number of variables by
increasing the length of the chromosome
But
A longer chromosome means a larger population space
(more potential combinations of genes)
More time required for decoding and fitness evaluation
ICT619
31
Some issues related to GA based
systems (cont’d)
Data requirements
In general, GA do not require extensive access to data
but some applications may need it to evaluate solutions
This makes the quality and quantity of data is important
Local maxima
Local maxima are regions that hold good solutions
relative to regions around them, but which do not
necessarily contain the best overall solutions
The region(s) that contain the best solutions are called
global maxima
GAs are less prone to being trapped in local maxima
because of the use of mutation and crossover
ICT619
32
Some issues related to GA based
systems (cont’d)
Premature convergence
A GA is said to have converged prematurely if it
explores a local maximum extensively
It may be then dominated by very similar solutions
within the region
Most significant factor leading to such convergence is a
mutation rate which is too slow
Mutation interference is an effect opposite to that of
premature convergence
ICT619
33
Some issues related to GA based
systems (cont’d)
Mutation interference
Finding a mutation rate which allows the GA to
converge but which also allows adequate exploration of
the solution space is essential for satisfactory
performance
Mutation interference occurs when mutation rates in a
GA are too high, and as a result:
Solutions are frequently or drastically mutated
The algorithm never manages to explore any region of
the space thoroughly
Any good solutions found tend to be destroyed rapidly
ICT619
34
Case Study - Help Desk Task
Scheduling (Dhar & Stein 1997,
pp.219-227)
GA based system developed at Moody’s for
scheduling service tasks to its customer
service representatives
Major constraints
The system
Must minimise computer downtime and customer
dissatisfaction
Must integrate with existing database system which kept track
help desk requests.
ICT619
35
Case study – constraints (cont’d)
Must be flexible to
Accommodate new types of task definitions and changes
in employee, training etc.
Allow administrator to modify solutions
Must generate and reevaluate schedules quickly
(under 15 minutes) and consistently
Must not take administrator or CSRs away from their
jobs for any extended period of time
Must be developed quickly
ICT619
36
Case study – constraints (cont’d)
Should be scalable in case of future growth in number
of requests for help and the number of CSRs
Must not be too complicated for its users – the
administrator and CSRs
The main difficulties in meeting the constraints:
the large number of tasks
the large number of CSRs
the varying capabilities of CSRs, and
the wide variety of tasks
ICT619
37
Case Study - Variables and issues
needing to be considered
The priority of a task, which is determined by the severity of
the problem
The length of time required to perform the task and how it
would affect the servicing of other users
The ability of various CSRs to perform different levels of
tasks (expertise must match the complexity)
Low priority tasks must not be kept waiting indefinitely
The measure of goodness of a schedule to be based on
amount of downtime each schedule cost the organisation.
ICT619
38
Possible solution methodologies
considered
1. Traditional linear programming (a numerical
optimisation technique)
2. A rule based expert system
3. A GA based system
ES ruled out because
Expertise to solve this problem not expressible as a set of
rules
Help desk administrator not available for knowledge extraction
Linear programming ruled out because
It fails if no optimal solution can be found
It does not produce any sub-optimal solutions, which is the
case with GA.
ICT619
39
Case Study - the solution
SOGA (Schedule Optimising for GA)
A hybrid system consisting of GA and fuzzy system
components
The GA component deals with the scheduling task.
Each task in the queue is represented by a gene
The entire task list forms the chromosome
Each chromosome is decoded by a scheduling module
that assigns tasks to available CSRs who can perform
them
ICT619
40
Case Study - the solution (cont’d)
Fitness of each chromosome is determined by
calculating the amount of downtime that would result
based on the schedule represented by the
chromosome.
Schedules generated by the GA component are
modified by the FS component
SOGA runs in the background behind the help request
tracking system
Updates schedules based upon a predefined time
interval (eg, every 10 or 15 minutes)
CSRs access their current job queue through their
interface to accept jobs.
ICT619
41
Case Study - Results
The system is timely – generating schedules in about 5
minutes.
The solutions are found to be good by the help desk
administrator
The system is flexible enough to allow for task
definitions
The system scales up well to larger domains (higher
number of tasks)
The SOGA system was developed in two months using
one programmer overseen by its designers
ICT619
42
REFERENCES
Begley, S. and Beals, G. "Software au naturel." Newsweek, May 8, 1995,
p.70
Dhar, V., & Stein, R., Seven Methods for Transforming Corporate
Data into Business Intelligence., Prentice Hall 1997, pp. 126-148,
203-210.
Goldberg, D. E., Genetic and Evolutionary Algorithms Come of
Age, Communications of the ACM, Vol.37, No.3, March 1994,
pp.113-119.
Holland, J. H., Adaptation in Natural and Artificial Systems, Univ.
of Michigan Press, 1975.
Kingdon, J., Intelligent Systems and Financial Forecasting,
Springer Verlag, London 1997.
Medsker,L., Hybrid Intelligent Systems, Kluwer Academic Press,
Boston 1995.
Michalewicz, Z., Genetic Algorithms + Data Structures = Evolution
Programs, Springer-Verlag, Berlin 1996.
Negnevitsky, M. Artificial Intelligence A Guide to Intelligent
Systems, Addison-Wesley 2005.
ICT619
43