Genetic Algorithm
Download
Report
Transcript Genetic Algorithm
Genetic Algorithm
What is a genetic algorithm?
“Genetic Algorithms
are defined as global
optimization procedures that use an
analogy of genetic evolution of biological
organisms.”
Basically genetic algorithms tend to find
the best solution to a problem by following
an evolutionary process.
Basic Genetic Algorithm
Parallel Genetic Algorithm
For large population sizes, G.A. is
computationally infeasible.
Hence the use of Parallel Genetic
Algorithms.
Parallel Genetic Algorithm
Model and Encoding
Island Model -: Each processor runs a
G.A. on a subset of the population and
there is periodic migration.
Fixed Length Binary String Encoding -:
Here if gene is included in the subset
then value is 1 else 0.
Fitness Evaluation
Two Different Criteria
Classification Accuracy
Size
of the subset
fitness(x) = w1 * accuracy(x) + w2 *(1 – dimensionality(x))
Here,
accuracy(x) =test accuracy of the classifier
built with the gene subset represented by x
dimensionality(x) [0,1] = the dimension
of the subset
Fitness Evaluation
w1 = weight assigned to accuracy
w2 = weight assigned to dimensionality
High classification accuracy and low
dimension has high fitness.
Data Sets Used
Test Parameters
The tests were run on two processors.
The parameters of G.A. in each
processor were set as -:
Population
Size : 1000
Trials : 400000
Crossover probability: 0.6
Mutation probability: 0.001
Test Parameters
Selection Strategy: Elitist
Migration Probability: 0.002
Crossover probability of average level to get different
subpopulation with good traits of the parents.
Mutation Probability low to avoid randomness of
selection.
Selection Strategy is Elitist which ensures that the
best individuals are kept and hence leads to more
accurate subsets of genes.
Results
Results
Leukemia Data Set
Subset
with 29 Genes found
Classifies 36/38 training instances correctly
Classifies 30/34 test instances correctly
Colon Data Set
Subset
with 30 genes found
92% accuracy on the training data set
Results Comparison
Results better than other algorithms
such as G-S and NB algorithms which
have accuracies less than 90% and
gene numbers varying from 10 to 500.
Average Performance Graphs
Conclusion
Method does well in finding smaller
gene subsets and better accuracies.
Fitness function needs to be something
more sophisticated than the simple one
used right now to ensure a final
compact subset every time.