Latent Variable Models
Download
Report
Transcript Latent Variable Models
Latent Variable Models
Christopher M. Bishop
1. Density Modeling
A standard approach: parametric models
a number of adaptive parameters
Gaussian distribution is widely used.
p( t | μ , ) ( 2 ) d / 2
1 / 2
1
exp ( t μ ) 1 ( t μ )T
2
Loglikelihood method
N
L(μ , ) ln p( D | μ ,) ln p( t n | μ , )
n 1
Limitation
too
flexible: parameter is so excessive
not too flexible: only uni-modal
Considering mixture model, latent variable model
1.1. Latent Variables
The number of parameters in normal distribution.
: d(d+1)/2 + : d d2.
Assuming diagonal covariance matrix reduces : d, but this
means that t are statistically independent.
Latent variables
Degree of freedom can be controlled, and correlation can be
captured.
Goal
to express p(t) of the variable t1,…,td in terms of a smaller
number of latent variables x=(x1,…,xq) where q < d.
Cont’d
Joint distribution of p(t,x)
d
p( t, x ) p( x ) p( t | x ) p( x ) p(ti | x )
i 1
Bayesian network express the factorization
Cont’d
Express p(t|x) in terms of mapping from latent variables to data
variables.
t y ( x; w ) u
The definition of latent variables model is completed by
specifying distribution p(u), mapping y(x;w), marginal
distributino p(x).
The desired model for distribution p(t), but it is intractable in
almost case.
p( t ) p( t | x ) p(x )dx
Factor analysis: One of the simplest latent variable models
t Wx μ u
Cont’d
W,:
adaptive parameters
p(x): chosen to be N(0,I)
u: chosen to be zero mean Gaussian with a diagonal covariance
matrix .
Then P(t) is Gaussian, with mean and covariance matrix
+WWT.
Degree of freedom: (d+1)(q+1)-q(q+1)/2
Can capture the dominant correlations between the data
variables
1.2. Mixture Distributions
Uni-modal mixture of M simpler parametric
distributions
M
p( t ) i p( t | i )
i 1
usually normal distribution with its own i, i.
i: mixing coefficients 0 i 1,
i i 1
p(t|i):
mixing coefficients: prior probabilities for the values of the
label i.
Considering indicator variable zni.
Posterior probabilities: Rni is expectation of zni.
R ni p (i | t n )
i p( t n | i )
j j p( t n | j )
Cont’d
EM-algorithm
N
M
Lcomp ({ i ,μ i , i }) Rni ln{ i p( t | i )}
n 1 i 1
Mixture of latent-variable models
Bayesian network representation of a mixture of latent
variable models. Given the values of i and x, the variables
t1,…,td are conditionally independent.
2. Probabilistic Principal Component
Analysis
Summary
q principal axes vj, j{1,…,q}
vj are q dominant eigenvectors of sample covariance matrix.
1
S
N
N
(t
n
μˆ )( t n μˆ )T
i 1
q principal components:
un VT (t n μˆ )
reconstruction vector:
tˆ n Vun μˆ
Disadvantage
absence of a probability density model and associated likelihood
measure
2.1. Relationship to Latent Variables