Sparse Coding and Automatic Relevance

Download Report

Transcript Sparse Coding and Automatic Relevance

Informatics and Mathematical Modelling / Intelligent Signal Processing
Sparse Coding and Automatic
Relevance Determination for
Multi-way models
Morten Mørup
Joint work with
DTU Informatics
Intelligent Signal Processing
Technical University of Denmark
Sparse’09
8 April 2009
Lars Kai Hansen
DTU Informatics
Intelligent Signal Processing
Technical University of Denmark
1
Informatics and Mathematical Modelling / Intelligent Signal Processing
Vector: 1-way array/ 1st order tensor,
Matrix: 2-way array/ 2nd order tensor,
3-way array/3rd order tensor
Multi-way modeling has become an important tool for
Unsupervised Learning and are in frequent use
today in a variety of fields including





Psychometrics
Chemometrics
NeuroImaging
Textmining
Signal Processing
Sparse’09
8 April 2009
(Subject x Task x Time)
(Sample x Emission x Absorption)
(Channel x Time x Trial)
(Term x Document x Time/User)
(ICA, i.e. diagonalization of Cummulants)
2
Informatics and Mathematical Modelling / Intelligent Signal Processing
Some Tensor algebra
XI x J x K
I x JK Matrix
X(1)
 Matricizing
X  X(n)
X(2)
X(3)
 N-mode multiplication
J x KI Matrix
K x IJ Matrix
i.e. for matrix multiplication CA=A x1C , ABT=A x2B
Sparse’09
8 April 2009
3
Informatics and Mathematical Modelling / Intelligent Signal Processing
Tensor Decomposition
The two most commonly used tensor decomposition
models are the CandeComp/PARAFAC (CP) model
and the Tucker model
Sparse’09
8 April 2009
4
Informatics and Mathematical Modelling / Intelligent Signal Processing
 CP-model unique
 Tucker model not unique
 What constitutes an adequate number of
components,
i.e. determining J for CP and J1,J2,…,JN for Tucker
is an open problem, particularly difficult for the
Tucker model as the number of components are
specified for each modality separately
Sparse’09
8 April 2009
5
Informatics and Mathematical Modelling / Intelligent Signal Processing
Agenda
 To use sparse coding to Simplify the Tucker core forming
a unique representation as well as enable interpolation
between the Tucker (full core) and CP (diagonal core)
model
 To use sparse coding to turn off excess components in the
CP and Tucker model and thereby select the model order.
 To tune the pruning strength from data by Automatic
Relevance Determination (ARD).
Sparse’09
8 April 2009
6
Informatics and Mathematical Modelling / Intelligent Signal Processing
Sparse Coding
Nature, 1996
Bruno A. Olshausen
David J. Field
Preserve Information
Preserve Sparsity (Simplicity)
Tradeoff parameter
Sparse’09
8 April 2009
7
Informatics and Mathematical Modelling / Intelligent Signal Processing
Sparse Coding (cont.)
Image patch 1 to N
…
Patch image
Vectorize patches

Sparse coding attempts to both learn
dictionary and encoding at the same time
(This problem is not convex!)
Sparse’09
8 April 2009
8
Informatics and Mathematical Modelling / Intelligent Signal Processing
Automatic Relevance Determination (ARD)
 Automatic Relevance Determination (ARD) is a
hierarchical Bayesian approach widely used for
model selection
 In ARD hyper-parameters explicitly represents the
relevance of different features by defining the range
of variation.
(i.e., Range of variation0  Feature removed)
Sparse’09
8 April 2009
9
Informatics and Mathematical Modelling / Intelligent Signal Processing
A Bayesian formulation of the Lasso / BPD problem
Sparse’09
8 April 2009
10
Informatics and Mathematical Modelling / Intelligent Signal Processing
ARD in reality a L0-norm optimization scheme. As
such ARD based on Laplace prior corresponds to
L0-norm optimization by re-weighted L1-norm
L0 norm by re-weighted L2 follows by imposing Gaussian prior instead of Laplace
Sparse’09
8 April 2009
11
Informatics and Mathematical Modelling / Intelligent Signal Processing
Sparse Tucker decomposition by ARD
Brakes into standard Lasso/BPD sub-problems of the form
Sparse’09
8 April 2009
12
Informatics and Mathematical Modelling / Intelligent Signal Processing
Solving efficiently the Lasso/BPD sub-problems
Notice, in general the alternating sub-problems for Tucker estimation have J<I!
Sparse’09
8 April 2009
13
Informatics and Mathematical Modelling / Intelligent Signal Processing
The Tucker ARD Algorithm
Sparse’09
8 April 2009
14
Informatics and Mathematical Modelling / Intelligent Signal Processing
Results
Fluorescene data: emission x excitation x samples
Tucker(10,10,10) models were fitted to the data, given are below the extracted cores
Synthetic Tucker(3,4,5)
Sparse’09
Tucker(3,6,4)
8 April 2009
Tucker(3,3,3)
15
Tucker(?,4,4)
Tucker(4,4,4)
Informatics and Mathematical Modelling / Intelligent Signal Processing
Sparse Coding in fact 2-way case of Tucker model
 The presented ARD framework extracts a 20
component model when trained on the natural image
data of (Olshausen and Field, Nature 1996)
Image patch 1 to N
…
Patch image
Vectorize patches

Sparse’09
8 April 2009
16
Informatics and Mathematical Modelling / Intelligent Signal Processing
Conclusion
 Imposing sparseness on the core
enable to interpolate between
Tucker and CP model
 ARD based on MAP estimation
form a simple framework to tune
the pruning in sparse coding
models giving the model order.
 ARD framework especially useful
for the Tucker model where the
order is specified for each mode
separately which makes
exhaustive evaluation of all
potential models expensive.
 ARD-estimation based on MAP is
closely related to L0 norm
estimation based on reweighted
L1 and L2 norm. Thus, ARD form
a principled framework for
learning the sparsity pattern in
CS.
Sparse’09
8 April 2009
17
Informatics and Mathematical Modelling / Intelligent Signal Processing
http://mmds.imm.dtu.dk
Sparse’09
8 April 2009
18