Transcript Document
Particle swarm optimization for
parameter determination and feature
selection of support vector machines
Shih-Wei Lin, Kuo-Ching Ying,
Shih-Chieh Chen, Zne-Jung Lee
Expert Systems with
Applications 2008
Introduction
Classification problems have been extensively
studied.
Support vector machine (SVM) is a popular
pattern classification method with many diverse
applications.
Kernel parameter setting in the SVM training
procedure, along with the feature selection,
significantly influences the classification
accuracy.
Introduction
1. How to choose the optimal input feature subset for SVM?
2. How to set the best kernel parameters?
Hybridizing the particle swarm optimization
(PSO) and SVM to improve the classification
accuracy with a small and appropriate feature
subset.
This makes the optimal separating hyper-plane
obtainable in both linear and non-linear
classification problems
Support Vector Machine (SVM)
Support vector machine (SVM) is a new
technique for data classification were first
suggested by Vapnik in 1995.
SVM is using Separating Hyperplane to
distinguish the data of two or several
different Class that deal with the data
mining problem of classification.
Kernel Function
Several kernel functions help the SVM in
obtaining the optimal solution.
For example: Linear, Polynomial, RBF, Sigmoid
The RBF is generally applied most frequently,
because it can classify multi-dimensional data,
unlike a linear kernel function.
The RBF has fewer parameters to set than other
kernel.
RBF is an effective option for kernel function.
SVM Parameters Setting
Proper parameters setting can improve the
classification accuracy of SVM.
The parameters that should be optimized include
penalty parameter C and the parameters with
different kernel function (gamma of RBF).
Grid Algorithm is an alternative to find the best
C and the gamma parameter, however it is time
consuming and does not perform well.
Feature Selection
Feature selection is used to identify a powerfully
predictive subset of fields within the database
and to reduce the number of fields presented to
the mining process.
Affects several aspects of pattern classification:
1.The accuracy of classification algorithm learned
2.The time needed for learning a classification
function
3.The number of examples needed for learning
4.The cost associated with feature
Feature Selection
Filter
Wrapper
No
Full Set
Training
Data
Feature
Generation
Subset
Learning
Algorithm
Accuracy
Good ?
Phase 1
Yes
Phase 2
Accuracy
Testing
Classifier
Testing
Data
Learning
Algorithm
Best Subset
Training
Data
Particle swarm optimization
Particle swarm optimization (PSO)
(Kennedy & Eberhart,1995) is an
emerging population-based meta-heuristic
The new position of a particle is calculated
using the following formula:
Particle swarm optimization
rnd( ) is a random function in the range[0, 1]
Positive constant c1 and c2 are personal and social
learning factors.
w is the inertia weight and inertia weight balances the
global exploration and local exploitation.
Pi,d denote the best previous position encountered by the
ith particle.
Pg,d denotes the global best position thus far.
t denotes the iteration counter.
Search concept of PSO.
Grid-Search Algorithm
Particle representation
The flowchart of PSO algorithm.
Fitness = Accuracy
Platform
The platform adopted to develop the
PSO + SVM approach is a PC with the
following features:
Intel Pentium IV 3.0 GHz CPU
512 MB RAM
Windows XP operating system
Visual C++ 6.0 development environment
Dataset
Cross-Validation
To guarantee that the present results are
valid and can be generalized for making
predictions regarding new data
Using k-fold-cross-validation
This study used k = 10, meaning that all of
the data will be divided into ten parts, each
of which will take turns at being the
testing data set.
PSO-based parameters determination
and feature selection approach for SVM
Comparison between the
PSO + SVM, NSVM, SVM, LSVM
Fung & Mangasarian (2003) and Liao et al (2004)
PSO+SVM VS PSO+GA
PSO + SVM approach with and without
feature selection and grid search
PSO + SVM approach with and
without feature selection
Conclusions
We proposed a PSO-based strategy to select
features subset and to set the parameters for
SVM classification.
We have conducted two experiments to evaluate
the classification accuracy of the proposed PSObased approach with RBF kernel and the grid
search method on 17 real-world datasets from
UCI database.
Generally, compared with the grid search
approach, the proposed PSO-based approach has
good accuracy performance with fewer features.
Future research
Other kernel parameters can also be
optimized using the same approach.
Experimental results obtained from UCI
datasets, other public datasets and realworld problems can be tested in the future
to verify and extend this approach.
Thank You
Q&A