Mathematical Programming in Support Vector Machines

Download Report

Transcript Mathematical Programming in Support Vector Machines

Proximal Support Vector Machine Classifiers
KDD 2001
San Francisco August 26-29, 2001
Glenn Fung & Olvi Mangasarian
Data Mining Institute
University of Wisconsin - Madison
Support Vector Machines
Maximizing the Margin between Bounding
Planes
w
x 0w = í + 1
A+
A-
x 0w = í à 1
2
jj wjj 2
Proximal Vector Machines
Fitting the Data using two parallel
Bounding Planes
w
x 0w = í + 1
A+
A-
0
xw= í à 1
2
jj wjj 2
Standard Support Vector Machine
Formulation
 Solve the quadratic program for some ÷ > 0:
min
÷
2
k
k
y
2
2
1
2kw; í
k 22
y; w; í
s. t. D (Aw à eí ) + y > e
+
(QP)
,
where D i i = æ1, denotes A + or A à membership.
 Margin is maximized by minimizing 12kw; í k 22
PSVM Formulation
We have from the QP SVM formulation:
min
w; í
s. t.
Solving for
min
w; í
÷
2
k
k
y
2
2
+ 12kw; í k 22
= e
D (Aw à eí ) + y =
y in terms of w and í
÷
2ke à
(QP)
D (A w à eí
2
)k 2
gives:
+
1
2kw;
í
2
k2
This simple, but critical modification, changes the nature
of the optimization problem tremendously!!
Advantages of New Formulation
 Objective function remains strongly convex
 An explicit exact solution can be written in terms
of the problem data
 PSVM classifier is obtained by solving a single
system of linear equations in the usually small
dimensional input space
 Exact leave-one-out-correctness can be obtained in
terms of problem data
Linear PSVM
We want to solve:
min
w; í
÷
2ke à
D (A w à eí
2
)k 2
+
1
2kw;
í
2
k2
Setting the gradient equal to zero, gives a
nonsingular system of linear equations.
Solution of the system gives the desired PSVM
classifier
Linear PSVM Solution
h i
w
í
=
I
(÷
0
+ H H)
à1
0
H De
Here, H = [A à e]
 The linear system to solve depends on:
0
HH
which is of the size (n + 1) â (n + 1)
 n is usually much smaller than
m
Linear Proximal SVM Algorithm
Input
Define
A; D
H = [A à e]
Calculate
0
v = H De
h i
Solve
Classifier:
( ÷I + H 0H )
w
í
= v
si gn(w 0x à í )
Nonlinear PSVM Formulation
 Linear PSVM: (Linear separating surface: x 0w
min
w; í
÷
2
k
k
y
2
2
+ 12kw; í k 22
= í)
(QP)
s. t.
D (Aw à eí ) + y = e
By QP “duality”, w = A 0Du. Maximizing the margin
in the “dual space” , gives:
min ÷2ke à D (AA 0D u à eí ) k 22+ 12ku; í k 22
u; í
 Replace AA 0 by a nonlinear kernel K (A ; A 0) :
min ÷2ke à D (K (A; A 0)D u à eí ) k 22+ 12ku; í k 22
u; í
The Nonlinear Classifier
 The nonlinear classifier:
K (x 0; A 0)D u = í
K (A; A 0) : R m â n â R n â m7
à ! R mâ m
 Where K is a nonlinear kernel, e.g.:
 Gaussian (Radial Basis) Kernel :
0
K ( A; A ) i j =
"
à ö kA i à A j k 22
; i; j = 1; . . .; m
 The i j -entry of K (A; A 0) represents the “similarity”
of data points A i and A j
Nonlinear PSVM
 Similar to the linear case,
setting the gradient equal to zero, we obtain:
hi
u
í
=
I
(÷
0
+ H H)
à1
0
H De
Defining slightly different: H = [K (A; A 0) à e]
 Here, the linear system to solve is of the size
(m + 1) â (m + 1)
However, reduced kernels techniques can be used (RSVM)
to reduce dimensionality.
Non Linear Proximal SVM Algorithm
Input
Define
A; D
K = K ( A; A 0)
H = [A
K à e]
Calculate
0
v = H De
h i
Solve
( ÷I + H 0H ) uwí = v
Classifier: si gn(K
à àí )í )
si gn(w
(x 0; A0x0)u
u = Du