Value Function Approximation

Download Report

Transcript Value Function Approximation

A Non-Parametric Bayesian Method for
Inferring Hidden Causes
by F. Wood, T. L. Griffiths and Z. Ghahramani
Discussion led by Qi An
ECE, Duke University
Outline
•
•
•
•
•
Introduction
A generative model with hidden causes
Inference algorithms
Experimental results
Conclusions
Introduction
• A variety of methods from Bayesian statistics
have been applied to find model structure from a
set of observed variables
– Find the dependencies among the set of observed
variables
– Introduce some hidden causes and infer their
influence on observed variables
Introduction
• Learning model structure containing hidden
causes presents a significant challenge
– The number of hidden causes is unknown and
potentially unbounded
– The relation between hidden causes and observed
variables is unknown
• Previous Bayesian approaches assume the
number of hidden causes is finite and fixed.
A hidden causal structure
• Assume we have T samples of N BINARY
variables. Let X be the data and A be a
dependency matrix among X , X ,  , X .
• Introduce K BINARY hidden causes with T
samples. Let Y be hidden causes and Z be
a dependency matrix between X , X ,  , X and
NxT
NxN
1
2
N
KxT
NxK
1
Y1 , Y 2 ,  , Y K
• K can potentially be infinite.
2
N
A hidden causal structure
Hidden causes (Diseases)
Observed variables (Symptoms)
A generative model
• Our goal is to estimate the dependency matrix Z
and hidden causes Y.
• From Bayes’ rule, we know
• We start by assuming K is finite, and then
consider the case where K∞
A generative model
• Assume the entries of X are conditionally
independent given Z and Y, and are generated
from a noise-OR distribution.
K
where z i ,: y :,t   z i , k y k ,t, ε is a baseline probability that x i , t  1 ,
k 1
and λ is the probability with which any of hidden causes is
effective
A generative model
• The entries of Y are assumed to be drawn from
a Bernoulli distribution
• Each column of Z is assumed to be Bernoulli(θk)
distributed. If we further assume a Beta(α/K,1)
hyper-prior and integrate out θk
where
These assumptions on Z are exactly the same as the assumption in IBP
Taking the infinite limit
• If we let K approach infinite, the distribution on X
remains well-defined, and we only need to
concern about rows in Y that the corresponding
mk>0.
• After some math and reordering of Z, the
distribution on Z can be obtained as
The Indian buffet process is defined in terms of a sequence of N customers
entering a restaurant and choosing from an infinite array of dishes. The first
customer tries the first Poisson(α) dishes. The remaining customers then
enter one by one and pick previously sampled dishes with probability m
i
and then tries Poisson(α/i) new dishes.
 i ,k
Reversible jump MCMC
Gibbs sampler for Infinite case
Experimental results
Synthetic Data
number of iterations
Real data
Conclusions
• A non-parametric Bayesian technique is
developed and demonstrated
• Recovers the number of hidden causes correctly
and can be used to obtain reasonably good
estimate of the causal structure
• Can be integrated into Bayesian structure
learning both on observed variables and on
hidden causes.