Non-linear optimization
Download
Report
Transcript Non-linear optimization
Kernel based data fusion
Discussion of a Paper by G. Lanckriet
1
Paper
2
Overview
Problem:
Aggregation of heterogeneous data
Idea:
Different data are represented by different
kernels
Question:
How to combine different kernels in an
elegant/efficient way?
Solution:
Linear combination and SDP
Application: Recognition of ribosomal and
membrane proteins
3
Linear combination of kernels
xTKx = x2K
x2 K
weight
kernel
0
x
Resulting kernel K is positive definite
(xTKx > 0 for x, provided i > 0 and xTKi x > 0 )
Elegant aggregation of heterogeneous data
More efficient than training of individual SVMs
KCCA uses unweighted sum over individual kernels
4
Support Vector Machine
square
norm
vector
penalty
term
Hyperplane
slack
variables
5
Dual form
quadratic,
convex
scalar 0
positive definite
Lagrange
multipliers
Maximization instead of minimization
Equality constraints
Lagrange multipliers instead of w,b,
Quadratic program (QP)
6
Inserting linear combination
ugly
Fixed trace,
avoids trivial
solution
Combined kernel must be
within the cone of
positive semidefinite
matrices
7
Cone and other stuff
Positive semidefinite cone:
A
Positive semidefinite:
xTAx ≥ 0, x
The set of all symmetric positive semidefinite matrices of
particular dimension is called the positive semidefinite cone.
http://www.convexoptimization.com/dattorro/positive_semidefinate_cone.html
8
Semidefinite program (SDP)
Fixed trace,
avoids trivial
solution
positive
semidefinite
constraints
9
Dual form
quadratic
constraint
Quadratically constraint quadratic program (QCQP)
QCQPs can be solved more efficiently than SDPs
(O(n3) <-> O(n4.5))
Interior point methods
10
Interior point algorithm
Linear program:
maximize cTx
subject to Ax < b
x≥0
Classical Simplex method follows edges of polyhedron
Interior point methods walk through the interior of the
feasible region
11
Application
Recognition of ribosomal and membrane proteins
in yeast
3 Types of data
Amino acid sequences
Protein protein interactions
mRNA expression profiles
7 Kernels
Empirical kernel map -> sequence homology
FFT -> sequence hydropathy
BLAST(B), Smith-Waterman(SW), Pfam
KD hydropathy profiles, padding, low-pass filter, FFT, RBF
Interaction kernel(LI) -> PPI
Diffusion(D) -> PPI
RBF(E) -> gene expression
12
Results
Combination of kernels performs better than individual kernels
Gene expression (E) most important for ribosomal protein
recognition
PPI (D) most important for membrane protein recognition
13
Results
Small improvement compared to weights = 1
SDP robust in the presence of noise
How performs SDP versus kernel weights derived
from accuracy of individual SVMs?
Membrane protein recognition
Other methods use sequence information only
TMHMM designed for topology prediction
TMHMM not trained on yeast only
14
Why is this cool?
Everything you ever dreamed of:
Optimization of C included
(2-norm soft margin SVM =1/C)
Hyperkernels
(optimize the kernel itself)
Transduction
(learn from labeled & unlabeled samples in polynomial time)
SDP has many applications
(Graph theory, combinatorial optimization, …)
15
Literature
Learning the kernel matrix with semidefinite programming
G.R.G.Lanckrit et. al, 2004
Kernel-based data fusion and its application to protein
function prediction in yeast
G.R.G.Lanckrit et. al, 2004
Machine learning using Hyperkernels
C.S.Ong, A.J.Smola, 2003
Semidefinite optimization
M.J.Todd, 2001
http://www-user.tu-chemnitz.de/~helmberg/semidef.html
16
Software
SeDuMi (SDP)
Mosek (QCQP, Java,C++, commercial)
YALMIP (Matlab)
… http://www-user.tu-chemnitz.de/~helmberg/semidef.html
17