Transcript PPT

Genetic-Algorithm-Based
Instance and Feature Selection
Instance Selection and Construction for
Data Mining Ch. 6
H. Ishibuchi, T. Nakashima, and M. Nii
Abstract

GA based approach for selecting a small number
of instances from a given data set in a pattern
classification problem.
 To improve the classification ability of our nearest
neighbor classifier by searching for an appropriate
reference set.
Genetic Algorithm

Coding
 Binary string of the length (n+m)
S  a1a2 an s1s2  sm
 ai : inclusion or exclusion of the i-th feature
 sp : the inclusion or exclusion of the p-th instance

Fitness function
 Minimize |F|, minimize |P|, and maximize g(S)
 |F| : number of selected feature
 |P| : number of selected instance
 g(S) : classification performance
Genetic Algorithm
 Performance measure (first one) : gA(S)
 The
number of correctly classified instances
 Minimize |P| subject to gA(S) = m
 Performance measure (second one) : gB(S)
 When
an instance xq was included in the reference set, xq was
not selected as its own nearest neighbor.
min{ d F ( x p , xq ) | x p  P}, if xq  P
d F ( x p * , xq )  
min{ d F ( x p , xq ) | x p  P  {xq }}, if xq  P
 fitness
fitness( S )  Wg  g ( S )  WF  | F | WP  | P |
Genetic Algorithm
Initialization
2. Genetic Operation: Iterate the following
procedure Npop/2 times to generate Npop string
1.
1. Randomly select a pair of strings
2. Apply a uniform crossover
3. Apply a mutation operator
Generation Update: Select the Npop best string
from 2Npop
4. Termination test
3.
Numerical Example
Biased Mutation

For effectively decreasing the number of selected
instances is to bias the mutation probability
 In the biased mutation, a much larger probability
is assigned to the mutation from sp = 1 to sp = 0.
Data sets







2 artificial + 4 real
Normal distribution with small overlap
Normal distribution with large overlap
Iris data
Appendicitis Data
Cancer Data
Wine Data
Parameter Specifications



Pop Size : 50
Crossover Prob. : 1.0
Mutation Prob.
 Pm = 0.01 for feature selection
 Pm(1  0) = 0.1 for instance selection
 Pm(0  1) = 0.01 for instance selection




Stopping condition : 500 gen.
Weight values : Wg = 5; WF = 1; WP = 1
Performance measure : gA(S) or gB(S)
30 trials for each data
Performance on Training Data
Performance on Test Data

Leaving-one-out procedure (iris & appendicitis)
 10-fold cross-validation (cancer & wine)
Effect of Feature Selection
Effect on NN
Some Variants