Incremental Reduced Support Vector Machines

Download Report

Transcript Incremental Reduced Support Vector Machines

Incremental Reduced Support
Vector Machines
Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang
National Taiwan University of Science and Technology and Institute of
Statistical Science Academia Sinica
2003 International Conference on Informatics, Cybernetics, and Systems
ISU, Kaohsiung, Dec. 14 2003
Outline
 Support Vector Machines for classification problems
 Linear and nonlinear SVMs
 Difficulties with nonlinear SVMs for large problems
 Storage and computational complexity
 Reduced Support Vector Machines
 Incremental Reduced Support Vector Machines
 Numerical Results
 Conclusions
Support Vector Machines (SVMs)
Powerful tools for Data Mining
 SVMs become the most promising learning
algorithm for classification and regression
 SVMs have a sound theoretical foundation
 Base on statistical learning theory
 SVMs have an optimal defined separating surface
 SVMs can be generated very efficiently and have
high accuracy
 SVMs can be extend from linear to nonlinear case
 By using kernel functions
Support Vector Machines for Classification
Maximizing the Margin between Bounding Planes
A+
A-
Support Vector Machine Formulation
 Solve the quadratic program for some
min
(QP)
,
s. t.
where
:
, denotes
or
membership.
 SSVM:Smooth Support Vector Machine is an
efficient SVM algorithm proposed by Yuh-Jye Lee
Nonlinear Support Vector Machine
 Extend to nonlinear cases by using kernel functions
 The value of kernel function represents the inner
product in the feature space
 Map data from input space to a higher dimensional
feature space where the data can be separated linearly
 Nonlinear Support Vector Machine formulation:
min
s. t.
Difficulties with Nonlinear SVM
for Large Problems
 The nonlinear kernel
 Long CPU time to compute
is fully dense
numbers
 Runs out of memory while storing
kernel matrix
 Computational complexity depends on
 Complexity of nonlinear SSVM ø
 Separating surface depends on almost entire dataset
 Need to store the entire dataset after solving the problem
Reduced Support Vector Machines
Overcoming Computational & Storage Difficulties by
Using a Rectangular Kernel
 Choose a small random sample
of
 The small random sample
is a representative sample
of the entire dataset
is 1% to 10% of the rows of
 Typically
 Replace
corresponding
by
with
in nonlinear SSVM
 Only need to compute and store
the rectangular kernel
numbers for
 Computational complexity reduces to
 The nonlinear separator only depends on
Reduced Set
plays the most important role in RSVM
 It is natural to raise two questions:
 Is there a way to choose the reduced set other than
random selection so that RSVM will have a better
performance?
 Is there a mechanism that determines the size of
reduced set automatically or dynamically?
 Incremental reduced support vector machine is
proposed to answer these questions
Our Observations (Ⅰ)
 The nonlinear separating surface
is a linear combination of a set of kernel functions
 If the kernel functions are very similar, the
hypothesis space spanned by this kernel functions
will be very limited.
Our Observations (Ⅱ)
 Start with a very small reduced set , then add
new data point only when the kernel function
is dissimilar to the current function set
 These points contribute the most extra information
How to measure the dissimilar?
solving least squares problems
 The information criterion is
 The distance from the kernel vector
to the
column space of
is greater than a threshold
 This distance can be determined by solving a
least squares problem
Dissimilar Measurement
solving least squares problems
 It has a unique solution
, and the distance is
í
IRSVM Algorithm pseudo-code
(sequential version)
1
Randomly choose two data from the training data as the initial reduced set
2
Compute the reduced kernel matrix
3
For each data point not in the reduced set
4
Computes its kernel vector
5
Computes the distance from the kernel vector
6
to the column space of the current reduced kernel matrix
7
If its distance exceed a certain threshold
8
9
Add this point into the reduced set and form the new reduced kernal matrix
Until several successive failures happened in line 7
10 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel
11 A new data point is classified by the separating surface
Speed up IRSVM
 Note we have to solve the least squares problem
many times whose time complixity is
 The main cost depends on
but not on
 Take advantage of this fact, we proposed a batch
version of IRSVM that examines a batch points
once
IRSVM Algorithm pseudo-code
(Batch version)
1
Randomly choose two data from the training data as the initial reduced set
2
Compute the reduced kernel matrix
3
For a batch data point not in the reduced set
4
Computes their kernel vectors
5
Computes the corresponding distances from these kernel vector
6
to the column space of the current reduced kernel matrix
7
For those points’ distance exceed a certain threshold
8
9
Add those point into the reduced set and form the new reduced kernal matrix
Until no data points in a batch were added in line 7,8
10 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel
11 A new data point is classified by the separating surface
IRSVM on four public data sets
Conclusions
 IRSVM — an advanced algorithm of RSVM
 Start with extremely small reduced set and sequentially expands
to include informative data points into the reduced set
 Determine the size of the reduced set automatically and
dynamically but no pre-specified
 The reduced set generated by IRSVM will be more representative
 All advantages of RSVM for dealing with large scale
nonlinear classification problem are retained
 Experimental tests show that IRSVM used a smaller
reduced set without scarifying classification accuracy