Mathematical Programming in Support Vector Machines
Download
Report
Transcript Mathematical Programming in Support Vector Machines
Proximal Support Vector Machine Classifiers
KDD 2001
San Francisco August 26-29, 2001
Glenn Fung & Olvi Mangasarian
Data Mining Institute
University of Wisconsin - Madison
Support Vector Machines
Maximizing the Margin between Bounding
Planes
w
x 0w = í + 1
A+
A-
x 0w = í à 1
2
jj wjj 2
Proximal Vector Machines
Fitting the Data using two parallel
Bounding Planes
w
x 0w = í + 1
A+
A-
0
xw= í à 1
2
jj wjj 2
Standard Support Vector Machine
Formulation
Solve the quadratic program for some ÷ > 0:
min
÷
2
k
k
y
2
2
1
2kw; í
k 22
y; w; í
s. t. D (Aw à eí ) + y > e
+
(QP)
,
where D i i = æ1, denotes A + or A à membership.
Margin is maximized by minimizing 12kw; í k 22
PSVM Formulation
We have from the QP SVM formulation:
min
w; í
s. t.
Solving for
min
w; í
÷
2
k
k
y
2
2
+ 12kw; í k 22
= e
D (Aw à eí ) + y =
y in terms of w and í
÷
2ke à
(QP)
D (A w à eí
2
)k 2
gives:
+
1
2kw;
í
2
k2
This simple, but critical modification, changes the nature
of the optimization problem tremendously!!
Advantages of New Formulation
Objective function remains strongly convex
An explicit exact solution can be written in terms
of the problem data
PSVM classifier is obtained by solving a single
system of linear equations in the usually small
dimensional input space
Exact leave-one-out-correctness can be obtained in
terms of problem data
Linear PSVM
We want to solve:
min
w; í
÷
2ke à
D (A w à eí
2
)k 2
+
1
2kw;
í
2
k2
Setting the gradient equal to zero, gives a
nonsingular system of linear equations.
Solution of the system gives the desired PSVM
classifier
Linear PSVM Solution
h i
w
í
=
I
(÷
0
+ H H)
à1
0
H De
Here, H = [A à e]
The linear system to solve depends on:
0
HH
which is of the size (n + 1) â (n + 1)
n is usually much smaller than
m
Linear Proximal SVM Algorithm
Input
Define
A; D
H = [A à e]
Calculate
0
v = H De
h i
Solve
Classifier:
( ÷I + H 0H )
w
í
= v
si gn(w 0x à í )
Nonlinear PSVM Formulation
Linear PSVM: (Linear separating surface: x 0w
min
w; í
÷
2
k
k
y
2
2
+ 12kw; í k 22
= í)
(QP)
s. t.
D (Aw à eí ) + y = e
By QP “duality”, w = A 0Du. Maximizing the margin
in the “dual space” , gives:
min ÷2ke à D (AA 0D u à eí ) k 22+ 12ku; í k 22
u; í
Replace AA 0 by a nonlinear kernel K (A ; A 0) :
min ÷2ke à D (K (A; A 0)D u à eí ) k 22+ 12ku; í k 22
u; í
The Nonlinear Classifier
The nonlinear classifier:
K (x 0; A 0)D u = í
K (A; A 0) : R m â n â R n â m7
à ! R mâ m
Where K is a nonlinear kernel, e.g.:
Gaussian (Radial Basis) Kernel :
0
K ( A; A ) i j =
"
à ö kA i à A j k 22
; i; j = 1; . . .; m
The i j -entry of K (A; A 0) represents the “similarity”
of data points A i and A j
Nonlinear PSVM
Similar to the linear case,
setting the gradient equal to zero, we obtain:
hi
u
í
=
I
(÷
0
+ H H)
à1
0
H De
Defining slightly different: H = [K (A; A 0) à e]
Here, the linear system to solve is of the size
(m + 1) â (m + 1)
However, reduced kernels techniques can be used (RSVM)
to reduce dimensionality.
Non Linear Proximal SVM Algorithm
Input
Define
A; D
K = K ( A; A 0)
H = [A
K à e]
Calculate
0
v = H De
h i
Solve
( ÷I + H 0H ) uwí = v
Classifier: si gn(K
à àí )í )
si gn(w
(x 0; A0x0)u
u = Du