Sparse Coding and Automatic Relevance
Download
Report
Transcript Sparse Coding and Automatic Relevance
Informatics and Mathematical Modelling / Intelligent Signal Processing
Sparse Coding and Automatic
Relevance Determination for
Multi-way models
Morten Mørup
Joint work with
DTU Informatics
Intelligent Signal Processing
Technical University of Denmark
Sparse’09
8 April 2009
Lars Kai Hansen
DTU Informatics
Intelligent Signal Processing
Technical University of Denmark
1
Informatics and Mathematical Modelling / Intelligent Signal Processing
Vector: 1-way array/ 1st order tensor,
Matrix: 2-way array/ 2nd order tensor,
3-way array/3rd order tensor
Multi-way modeling has become an important tool for
Unsupervised Learning and are in frequent use
today in a variety of fields including
Psychometrics
Chemometrics
NeuroImaging
Textmining
Signal Processing
Sparse’09
8 April 2009
(Subject x Task x Time)
(Sample x Emission x Absorption)
(Channel x Time x Trial)
(Term x Document x Time/User)
(ICA, i.e. diagonalization of Cummulants)
2
Informatics and Mathematical Modelling / Intelligent Signal Processing
Some Tensor algebra
XI x J x K
I x JK Matrix
X(1)
Matricizing
X X(n)
X(2)
X(3)
N-mode multiplication
J x KI Matrix
K x IJ Matrix
i.e. for matrix multiplication CA=A x1C , ABT=A x2B
Sparse’09
8 April 2009
3
Informatics and Mathematical Modelling / Intelligent Signal Processing
Tensor Decomposition
The two most commonly used tensor decomposition
models are the CandeComp/PARAFAC (CP) model
and the Tucker model
Sparse’09
8 April 2009
4
Informatics and Mathematical Modelling / Intelligent Signal Processing
CP-model unique
Tucker model not unique
What constitutes an adequate number of
components,
i.e. determining J for CP and J1,J2,…,JN for Tucker
is an open problem, particularly difficult for the
Tucker model as the number of components are
specified for each modality separately
Sparse’09
8 April 2009
5
Informatics and Mathematical Modelling / Intelligent Signal Processing
Agenda
To use sparse coding to Simplify the Tucker core forming
a unique representation as well as enable interpolation
between the Tucker (full core) and CP (diagonal core)
model
To use sparse coding to turn off excess components in the
CP and Tucker model and thereby select the model order.
To tune the pruning strength from data by Automatic
Relevance Determination (ARD).
Sparse’09
8 April 2009
6
Informatics and Mathematical Modelling / Intelligent Signal Processing
Sparse Coding
Nature, 1996
Bruno A. Olshausen
David J. Field
Preserve Information
Preserve Sparsity (Simplicity)
Tradeoff parameter
Sparse’09
8 April 2009
7
Informatics and Mathematical Modelling / Intelligent Signal Processing
Sparse Coding (cont.)
Image patch 1 to N
…
Patch image
Vectorize patches
Sparse coding attempts to both learn
dictionary and encoding at the same time
(This problem is not convex!)
Sparse’09
8 April 2009
8
Informatics and Mathematical Modelling / Intelligent Signal Processing
Automatic Relevance Determination (ARD)
Automatic Relevance Determination (ARD) is a
hierarchical Bayesian approach widely used for
model selection
In ARD hyper-parameters explicitly represents the
relevance of different features by defining the range
of variation.
(i.e., Range of variation0 Feature removed)
Sparse’09
8 April 2009
9
Informatics and Mathematical Modelling / Intelligent Signal Processing
A Bayesian formulation of the Lasso / BPD problem
Sparse’09
8 April 2009
10
Informatics and Mathematical Modelling / Intelligent Signal Processing
ARD in reality a L0-norm optimization scheme. As
such ARD based on Laplace prior corresponds to
L0-norm optimization by re-weighted L1-norm
L0 norm by re-weighted L2 follows by imposing Gaussian prior instead of Laplace
Sparse’09
8 April 2009
11
Informatics and Mathematical Modelling / Intelligent Signal Processing
Sparse Tucker decomposition by ARD
Brakes into standard Lasso/BPD sub-problems of the form
Sparse’09
8 April 2009
12
Informatics and Mathematical Modelling / Intelligent Signal Processing
Solving efficiently the Lasso/BPD sub-problems
Notice, in general the alternating sub-problems for Tucker estimation have J<I!
Sparse’09
8 April 2009
13
Informatics and Mathematical Modelling / Intelligent Signal Processing
The Tucker ARD Algorithm
Sparse’09
8 April 2009
14
Informatics and Mathematical Modelling / Intelligent Signal Processing
Results
Fluorescene data: emission x excitation x samples
Tucker(10,10,10) models were fitted to the data, given are below the extracted cores
Synthetic Tucker(3,4,5)
Sparse’09
Tucker(3,6,4)
8 April 2009
Tucker(3,3,3)
15
Tucker(?,4,4)
Tucker(4,4,4)
Informatics and Mathematical Modelling / Intelligent Signal Processing
Sparse Coding in fact 2-way case of Tucker model
The presented ARD framework extracts a 20
component model when trained on the natural image
data of (Olshausen and Field, Nature 1996)
Image patch 1 to N
…
Patch image
Vectorize patches
Sparse’09
8 April 2009
16
Informatics and Mathematical Modelling / Intelligent Signal Processing
Conclusion
Imposing sparseness on the core
enable to interpolate between
Tucker and CP model
ARD based on MAP estimation
form a simple framework to tune
the pruning in sparse coding
models giving the model order.
ARD framework especially useful
for the Tucker model where the
order is specified for each mode
separately which makes
exhaustive evaluation of all
potential models expensive.
ARD-estimation based on MAP is
closely related to L0 norm
estimation based on reweighted
L1 and L2 norm. Thus, ARD form
a principled framework for
learning the sparsity pattern in
CS.
Sparse’09
8 April 2009
17
Informatics and Mathematical Modelling / Intelligent Signal Processing
http://mmds.imm.dtu.dk
Sparse’09
8 April 2009
18