Latent Variable Models

Download Report

Transcript Latent Variable Models

Latent Variable Models
Christopher M. Bishop
1. Density Modeling

A standard approach: parametric models
 a number of adaptive parameters
 Gaussian distribution is widely used.
p( t | μ , )  ( 2 ) d / 2 
1 / 2
 1

exp  ( t  μ )  1 ( t  μ )T 
 2

 Loglikelihood method
N
L(μ , )  ln p( D | μ ,)   ln p( t n | μ , )
n 1
 Limitation
 too
flexible: parameter is so excessive
 not too flexible: only uni-modal

Considering mixture model, latent variable model
1.1. Latent Variables

The number of parameters in normal distribution.
 : d(d+1)/2 + : d  d2.
 Assuming diagonal covariance matrix reduces : d, but this
means that t are statistically independent.

Latent variables
 Degree of freedom can be controlled, and correlation can be
captured.

Goal
 to express p(t) of the variable t1,…,td in terms of a smaller
number of latent variables x=(x1,…,xq) where q < d.
Cont’d
 Joint distribution of p(t,x)
d
p( t, x )  p( x ) p( t | x )  p( x ) p(ti | x )
i 1
 Bayesian network express the factorization
Cont’d
 Express p(t|x) in terms of mapping from latent variables to data
variables.
t  y ( x; w )  u
 The definition of latent variables model is completed by
specifying distribution p(u), mapping y(x;w), marginal
distributino p(x).
 The desired model for distribution p(t), but it is intractable in
almost case.
p( t )   p( t | x ) p(x )dx
 Factor analysis: One of the simplest latent variable models
t  Wx  μ  u
Cont’d
 W,:
adaptive parameters
 p(x): chosen to be N(0,I)
 u: chosen to be zero mean Gaussian with a diagonal covariance
matrix .
 Then P(t) is Gaussian, with mean  and covariance matrix
+WWT.
 Degree of freedom: (d+1)(q+1)-q(q+1)/2
 Can capture the dominant correlations between the data
variables
1.2. Mixture Distributions

Uni-modal  mixture of M simpler parametric
distributions
M
p( t )   i p( t | i )
i 1
usually normal distribution with its own i, i.
 i: mixing coefficients 0   i  1,
i i  1
 p(t|i):
 mixing coefficients: prior probabilities for the values of the
label i.
 Considering indicator variable zni.
 Posterior probabilities: Rni is expectation of zni.
R ni  p (i | t n ) 
 i p( t n | i )
 j  j p( t n | j )
Cont’d

EM-algorithm
N
M
Lcomp ({ i ,μ i , i })   Rni ln{  i p( t | i )}
n 1 i 1

Mixture of latent-variable models
Bayesian network representation of a mixture of latent
variable models. Given the values of i and x, the variables
t1,…,td are conditionally independent.
2. Probabilistic Principal Component
Analysis

Summary
 q principal axes vj, j{1,…,q}
 vj are q dominant eigenvectors of sample covariance matrix.
1
S
N
N
 (t
n
 μˆ )( t n  μˆ )T
i 1
 q principal components:
un  VT (t n  μˆ )
 reconstruction vector:
tˆ n  Vun  μˆ

Disadvantage
 absence of a probability density model and associated likelihood
measure
2.1. Relationship to Latent Variables