Non-linear optimization

Download Report

Transcript Non-linear optimization

Kernel based data fusion
Discussion of a Paper by G. Lanckriet
1
Paper
2
Overview
Problem:
Aggregation of heterogeneous data
Idea:
Different data are represented by different
kernels
Question:
How to combine different kernels in an
elegant/efficient way?
Solution:
Linear combination and SDP
Application: Recognition of ribosomal and
membrane proteins
3
Linear combination of kernels
xTKx = x2K
x2 K
weight
kernel
0




x
Resulting kernel K is positive definite
(xTKx > 0 for x, provided i > 0 and xTKi x > 0 )
Elegant aggregation of heterogeneous data
More efficient than training of individual SVMs
KCCA uses unweighted sum over individual kernels
4
Support Vector Machine
square
norm
vector
penalty
term
Hyperplane
slack
variables
5
Dual form
quadratic,
convex
scalar  0
positive definite
Lagrange
multipliers




Maximization instead of minimization
Equality constraints
Lagrange multipliers  instead of w,b,
Quadratic program (QP)
6
Inserting linear combination
ugly
Fixed trace,
avoids trivial
solution
Combined kernel must be
within the cone of
positive semidefinite
matrices
7
Cone and other stuff
Positive semidefinite cone:
A
Positive semidefinite:
xTAx ≥ 0, x
The set of all symmetric positive semidefinite matrices of
particular dimension is called the positive semidefinite cone.
http://www.convexoptimization.com/dattorro/positive_semidefinate_cone.html
8
Semidefinite program (SDP)
Fixed trace,
avoids trivial
solution
positive
semidefinite
constraints
9
Dual form
quadratic
constraint



Quadratically constraint quadratic program (QCQP)
QCQPs can be solved more efficiently than SDPs
(O(n3) <-> O(n4.5))
Interior point methods
10
Interior point algorithm
Linear program:
maximize cTx
subject to Ax < b
x≥0


Classical Simplex method follows edges of polyhedron
Interior point methods walk through the interior of the
feasible region
11
Application


Recognition of ribosomal and membrane proteins
in yeast
3 Types of data




Amino acid sequences
Protein protein interactions
mRNA expression profiles
7 Kernels

Empirical kernel map -> sequence homology


FFT -> sequence hydropathy




BLAST(B), Smith-Waterman(SW), Pfam
KD hydropathy profiles, padding, low-pass filter, FFT, RBF
Interaction kernel(LI) -> PPI
Diffusion(D) -> PPI
RBF(E) -> gene expression
12
Results



Combination of kernels performs better than individual kernels
Gene expression (E) most important for ribosomal protein
recognition
PPI (D) most important for membrane protein recognition
13
Results




Small improvement compared to weights = 1
SDP robust in the presence of noise
How performs SDP versus kernel weights derived
from accuracy of individual SVMs?
Membrane protein recognition



Other methods use sequence information only
TMHMM designed for topology prediction
TMHMM not trained on yeast only
14
Why is this cool?
Everything you ever dreamed of:
 Optimization of C included
(2-norm soft margin SVM =1/C)

Hyperkernels
(optimize the kernel itself)

Transduction
(learn from labeled & unlabeled samples in polynomial time)

SDP has many applications
(Graph theory, combinatorial optimization, …)
15
Literature





Learning the kernel matrix with semidefinite programming
G.R.G.Lanckrit et. al, 2004
Kernel-based data fusion and its application to protein
function prediction in yeast
G.R.G.Lanckrit et. al, 2004
Machine learning using Hyperkernels
C.S.Ong, A.J.Smola, 2003
Semidefinite optimization
M.J.Todd, 2001
http://www-user.tu-chemnitz.de/~helmberg/semidef.html
16
Software
SeDuMi (SDP)
 Mosek (QCQP, Java,C++, commercial)
 YALMIP (Matlab)
… http://www-user.tu-chemnitz.de/~helmberg/semidef.html

17