Transcript Document

Support Vector
Machines: A Survey
Qiang Yang, for ICDM 2006 Panel
http://www.cse.ust.hk/~qyang
Partially based on slides from Prof Andrew Moore at CMU:
http://www.cs.cmu.edu/~awm/tutorials
For ICDM Panel on 10 Best Algorithms
Dec 21, 2006
Support Vector Machines
•
Problem:
• Given a set of objects with
known descriptive variables,
and from known classes, i
• {(Xi, yi), i=1, 2, … t}
• Find
•
• a discriminative function f(x)
such that f(Xi)=yi.
SVM today:
• A must try for most applications
• Mathematically well founded
• Robust to noise (non-support
vectors ignored)
• Works even for dozens of
training data
• Among the most accurate
algorithms
• Has many extensions
• Can be scaled up (ongoing
work…)
For ICDM Panel of 10 best algorithms
•
•
•
•
•
Hard-Margin Linear Classifier
• Maximize Margin
• Support Vectors
• Quadratic Programming
Soft-Margin Linear Classifier
Non-Linear Separable Problem and
Kernels
• XOR
Extension to
• Regression for numerical class
• Ranking rather than
classification
SMO and Core vector machines
Support Vector Machines: Slide 2
Linear Classifiers
x
denotes +1
a
f
yest
f(x,w,b) = sign(w. x - b)
denotes -1
How would you
classify this data?
For ICDM Panel of 10 best algorithms
Support Vector Machines: Slide 3
Linear Classifiers
x
denotes +1
a
f
yest
f(x,w,b) = sign(w. x - b)
denotes -1
Any of these
would be fine..
..but which is
best?
For ICDM Panel of 10 best algorithms
Support Vector Machines: Slide 4
Maximum Margin
x
denotes +1
denotes -1
Support Vectors
are those
datapoints that
the margin
pushes up
against
For ICDM Panel of 10 best algorithms
a
f
yest
f(x,w,b) = sign(w. x - b)
The maximum
margin linear
classifier is the
linear classifier
with the maximum
margin.
Support Vector Machines: Slide 5
SVM: Questions
1. Can we understand the meaning of the SVM through a
solid theoretical foundation?
2. Can we extend the SVM formulation to handle cases
where we allow errors to exist, when even the best
hyperplane must admit some errors on the training data?
3. Can we extend the SVM formulation so that it works in
situations where the training data are not linearly
separable?
4. Can we extend the SVM formulation so that the task is to
rank the instances in the likelihood of being a positive
class member, rather than classification?
5. Can we scale up the algorithm for finding the maximum
margin hyperplanes to thousands and millions of instances?
For ICDM Panel of 10 best algorithms
Support Vector Machines: Slide 6
Q1. Support Vector Machines:
Foundations
• The problem of finding the maximum margin can
be transformed to finding the roots of a
Lagrangian
• Can be solved using quadratic programming (QP)
• Has solid theoretical foundations
• Future error < Training error + C*h1/2
• h=VC dimension, which is the max number of
examples shattered by a function class f(x,a)
For ICDM Panel of 10 best algorithms
Support Vector Machines: Slide 7
Q2: extension to allow errors
denotes +1
denotes -1
When noise exists:
Minimize
w.w + C (distance of error
points to their
correct place)
For ICDM Panel of 10 best algorithms
Support Vector Machines: Slide 8
Q3: Non-linear transformation to
Feature spaces
• General idea: introduce kernels
Φ: x → φ(x)
For ICDM Panel of 10 best algorithms
Support Vector Machines: Slide 9
Q4: extension to other tasks?
• SVM for regression analysis
• SV regression
f(x,w,b) = w. x - b
• SVM for ranking
• Ranking SVM
• Idea
• For each order pair of
instances (x1, x2)
where x1 < x2 in
ranking
• Generate a new
instance
• <x1,x2,+1>
• Train an SVM on the
new data set
For ICDM Panel of 10 best algorithms
Support Vector Machines: Slide 10
Q5: Scale Up?
• One of the initial drawbacks of SVM is its computational
inefficiency.
• However, this problem is being solved with great success.
• SMO:
• break a large optimization problem into a series of smaller
problems, each only involves a couple of carefully chosen
variables
• The process iterates
• Core Vector Machines
• finding an approximate minimum enclosing ball of a set of
instances.
• These instances, when mapped to an N-dimensional space,
represent a core set
• Solving the SVM learning problem on these core sets can
produce a good approximation solution in very fast speed.
• Train high quality SVM on millions of data in seconds
For ICDM Panel of 10 best algorithms
Support Vector Machines: Slide 11
References
• An excellent tutorial on VC-dimension and Support Vector
Machines:
• http://www.cs.cmu.edu/~awm/tutorials
• C.J.C. Burges. A tutorial on support vector machines for
pattern recognition. Data Mining and Knowledge
Discovery, 2(2):955-974, 1998.
http://citeseer.nj.nec.com/burges98tutorial.html
• The VC/SRM/SVM Bible: (Not for beginners including
myself)
Statistical Learning Theory by Vladimir Vapnik, Wiley-Interscience;
1998
• Software: SVM-light, http://svmlight.joachims.org/, LibSVM,
http://www.csie.ntu.edu.tw/~cjlin/libsvm/ SMO in Weka
For ICDM Panel of 10 best algorithms
Support Vector Machines: Slide 12