Probability Density Based Indexing for High
Download
Report
Transcript Probability Density Based Indexing for High
Support Vector Machines:
Hype or Hallelujah?
Kristin Bennett
Math Sciences Dept
Rensselaer Polytechnic Inst.
http://www.rpi.edu/~bennek
October 2-4, 2000
M2000
1
Outline
Support Vector Machines for Classification
Linear Discrimination
Nonlinear Discrimination
Extensions
Application in Drug Design
Hallelujah
Hype
October 2-4, 2000
M2000
2
Support Vector Machines
(SVM)
A methodology for inference based on Vapnik’s
Statistical Learning Theory.
Key Ideas:
“Maximize Margins”
“Do the Dual”
“Construct Kernels”
October 2-4, 2000
M2000
3
Best Linear Separator?
October 2-4, 2000
M2000
4
Best Linear Separator?
October 2-4, 2000
M2000
5
Best Linear Separator?
October 2-4, 2000
M2000
6
Best Linear Separator?
October 2-4, 2000
M2000
7
Best Linear Separator?
October 2-4, 2000
M2000
8
Find Closest Points in
Convex Hulls
d
c
October 2-4, 2000
M2000
9
Plane Bisect Closest Points
x wb
w d c
d
c
October 2-4, 2000
M2000
10
Find using quadratic
program
min
1
2
c i xi
i1
s.t.
i1
i
1
cd
d
2
x
i1
i1
i 0
i
i
i
1
i 1,...,
Many existing and new solvers.
October 2-4, 2000
M2000
11
Best Linear Separator:
Supporting Plane Method
Maximize distance
Between two paral
supporting planes
x w b 1
Distance
= “Margin”
= 2
x w b 1
October 2-4, 2000
|| w ||
M2000
12
Maximize margin using
quadratic program
min
w ,b
s.t
October 2-4, 2000
1
2
2
|| w ||
xi w b 1 i Class 1
xi w b 1 i Class 1
M2000
13
Dual of Closest Points Method
is Support Plane Method
min
s.t.
2
1
yi i xi ||
||
min
2 i 1
w ,b
i 1 i 1 s.t.
2
1
w||
||
2
yi xi w b 1
0
i 1,..,
i1
i1
0
Solution only depends on support vectors:
i
w yi i xi
i 1
October 2-4, 2000
1 i Class 1
yi :
1 i Class 1
M2000
14
Statistical Learning
Theory
Misclassification error and the function
complexity bound generalization error.
Maximizing margins minimizes complexity.
“Eliminates” overfitting.
Solution depends only on Support Vectors
not number of attributes.
October 2-4, 2000
M2000
15
Margins and Complexity
Skinny margin
is more flexible
thus more comple
October 2-4, 2000
M2000
16
Margins and Complexity
Fat margin
is less complex.
October 2-4, 2000
M2000
17
Linearly Inseparable Case
Convex Hulls Intersec
Same argument
won’t work.
October 2-4, 2000
M2000
18
Reduced Convex Hulls
Don’t Intersect
d
iClass1
iClass1
i xi
i 1
0 i D
D
October 2-4, 2000
Reduce by adding
upper bound D
1
2
M2000
19
Find Closest Points
Then Bisect
min
s.t.
1
2
|| i xi i xi ||2
i1
i1
i1
i
1
i1
i
1
0 i D
No change except for D.
D determines number of Support
Vectors.
October 2-4, 2000
M2000
20
Linearly Inseparable Case:
Supporting Plane Method
Just add non-negative error
vector z.
min
w ,b , z
|| w || C zi
2
i 1
yi xi w b zi 1
s.t
October 2-4, 2000
1
2
zi 0 i 1,..,
M2000
21
Dual of Closest Points Method
is Support Plane Method
min
s.t.
2
1
yi i xi ||
||
min
2 i 1
w ,b
i 1 i 1 s.t.
2
1
w|| C zi
||
2
i 1
yi xi w b zi 0
D i 0
zi 0
i1
i1
i 1,..,
i 0
Solution only depends on support vectors:
w yi i xi
i 1
October 2-4, 2000
M2000
22
Nonlinear Classification
x a, b
x w w1a w2b
( x) a, b, ab, a , b
2
2
( x) w w1a w2b w3ab w4 a w5b
2
October 2-4, 2000
M2000
2
23
Nonlinear Classification: Map
to higher dimensional space
IDEA: Map each point to higher dimensional feature space and
construct linear discriminant in the higher dimensional space.
Define : ( x) :
n
n ' n
n'
Dual SVM becomes:
min
s.t.
1
2
y y (x ) (x
i 1
j 1
y
i 1
i
i
i
j
i
j
i
j
) i
i 1
0
i 0 i 1,..,
October 2-4, 2000
M2000
24
Generalized Inner Product
By Hilbert-Schmidt Kernels (Courant and Hilbert 1953)
(u ) (v) K (u, v)
for certain and K, e.g.
(u )
K (u, v)
(u v 1) d
Degree d polynomial
Radial Basis Function Machine
Two Layer Neural Network
October 2-4, 2000
M2000
|| u v ||2
exp
2
sigmoid ( (u v) c)
25
Final Classification via
Kernels
The Dual SVM becomes:
min
s.t.
1
2
y y
i 1 j 1
y
i 1
i
i
i
j
i
j
K ( xi , x j ) i
i 1
0
i 0 i 1, ..,
October 2-4, 2000
M2000
26
October 2-4, 2000
M2000
27
Final SVM Algorithm
Solve Dual SVM QP
Recover primal variable b
Classify new x
f ( x) sign i yi K ( x, xi ) b
i 1
i : 0
Solution only depends on support vectors
October 2-4, 2000
M2000
28
Support Vector Machines
(SVM)
Key Formulation Ideas:
“Maximize Margins”
“Do the Dual”
“Construct Kernels”
Generalization Error Bounds
Practical Algorithms
October 2-4, 2000
M2000
29
SVM Extensions
Regression
Variable Selection
Boosting
Density Estimation
Unsupervised Learning
Novelty/Outlier Detection
Feature Detection
Clustering
October 2-4, 2000
M2000
30
Example in Drug Design
Goal to predict bio-reactivity of molecules
to decrease drug development time.
Target is to predict the logarithm of
inhibition concentration for site "A" on
the Cholecystokinin (CCK) molecule.
Constructs quantitative structure activity
relationship (QSAR) model.
October 2-4, 2000
M2000
31
SVM Regression:
-insensitive loss function
( x w b) y
( x w b) y
-
( x w b) y
October 2-4, 2000
M2000
+
32
SVM Minimizes
Underestimate+Overestimate
( x w b) y
min*
w ,b , z , z
( x w b) y
October 2-4, 2000
s.t
M2000
C zi zi* || w ||1
i 1
j K ( xi , x j ) b y zi
j 1
j K ( xi , x j ) b y zi
j 1
zi , zi* 0 i 1,..,
33
LCCKA Problem
Training data – 66 molecules
323 original attributes are wavelet
coefficients of TAE Descriptors.
39 subset of attributes selected by linear
1-norm SVM (with no kernels).
For details see DDASSL project link off of
http://www.rpi.edu/~bennek.
Testing set results reported.
October 2-4, 2000
M2000
34
LCCK Prediction
LCCKATest Set Estimates
1
Predicted Value
0.8
0.6
0.4
0.2
0
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
True Value
1
Q2=.25
October 2-4, 2000
M2000
35
Many Other Applications
Speech Recognition
Data Base Marketing
Quark Flavors in High Energy Physics
Dynamic Object Recognition
Knock Detection in Engines
Protein Sequence Problem
Text Categorization
Breast Cancer Diagnosis
See: http://www.clopinet.com/isabelle/Projects/
SVM/applist.html
October 2-4, 2000
M2000
36
Hallelujah!
Generalization theory and practice meet
General methodology for many types of
problems
Same Program + New Kernel = New method
No problems with local minima
Few model parameters. Selects capacity.
Robust optimization methods.
Successful Applications
BUT…
October 2-4, 2000
M2000
37
HYPE?
Will SVMs beat my best hand-tuned method Z
for X?
Do SVM scale to massive datasets?
How to chose C and Kernel?
What is the effect of attribute scaling?
How to handle categorical variables?
How to incorporate domain knowledge?
How to interpret results?
October 2-4, 2000
M2000
38
Support Vector Machine
Resources
http://www.support-vector.net/
http://www.kernel-machines.org/
Links off my web page:
http://www.rpi.edu/~bennek
October 2-4, 2000
M2000
39