Bayesian methods with Monte Carlo Markov Chains III
Download
Report
Transcript Bayesian methods with Monte Carlo Markov Chains III
Bayesian Methods with Monte
Carlo Markov Chains III
Henry Horng-Shing Lu
Institute of Statistics
National Chiao Tung University
[email protected]
http://tigpbp.iis.sinica.edu.tw/courses.htm
1
Part 8
More Examples of Gibbs Sampling
2
An Example with Three Random Variables (1)
To sample (X,Y,N) as follows:
( X , Y , N ) ~ f ( x, y , n )
n
n x 1
n x 1
,
e
(1 y )
c y
n!
x
where x 0,1, 2,..., n, 0 y 1, n 0,1, 2,...,
, are known, and c is a constant.
3
An Example with Three Random Variables (2)
One can see that
n x
f ( x | y, n) y (1 y)n x ~ Binomial (n, y),
x
f ( y | x, n) y x 1 (1 y)n x 1 ~ Beta( x , n x ),
e(1 y ) ((1 y) )n x
f (n x | x, y)
~ Poisson((1 y) ).
(n x)!
4
An Example with Three Random Variables (3)
Gibbs sampling Algorithm:
1.
2.
Initial Setting: t=0,
y0 ~ Unif [0,1] or a arbitrary value [0,1]
n0 ~ Discrete Unif [1, ) or a arbitrary integer value [1, )
x ~ Bin(n , y )
0
0
0
Sample a value (xt+1,yt+1) from
yt 1 ~ Beta ( xt , nt xt )
nt 1 ~ xt Possion((1 yt 1 ) )
x ~ Bin(n , y )
t 1
t 1
t 1
3.
t=t+1, repeat step 2 until convergence.
5
An Example with Three Random Variables by R
10000 samples with
α=2, β=7 and λ=16
6
An Example with Three Random Variables by C (1)
10000 samples with
α=2, β=7 and λ=16
7
An Example with Three Random Variables by C (2)
8
An Example with Three Random Variables by C (3)
9
Example 1 in Genetics (1)
Two linked loci with alleles A and a, and B
and b
A, B: dominant
a, b: recessive
A double heterozygote AaBb will produce
gametes of four types: AB, Ab, aB, ab
A
A
A
a
B
A
a
B
b
a
B
b
b
b
a
B
F (Female)
1- r’
r’ (female recombination fraction)
M (Male)
1-r
r (male recombination fraction)
10
Example 1 in Genetics (2)
r and r’ are the recombination rates for
male and female
Suppose the parental origin of these
heterozygote is from the mating
of AABB aabb. The problem is to estimate r
and r’ from the offspring of selfed
heterozygotes.
Fisher, R. A. and Balmukand, B. (1928). The
estimation of linkage from the offspring of
selfed heterozygotes. Journal of Genetics, 20,
79–92.
http://en.wikipedia.org/wiki/Genetics
http://www2.isye.gatech.edu/~brani/isyebayes/b
11
ank/handout12.pdf
Example 1 in Genetics (3)
MALE
F
E
M
A
L
E
AB
(1-r)/2
ab
(1-r)/2
aB
r/2
Ab
r/2
AB
(1-r’)/2
AABB
(1-r) (1-r’)/4
aABb
(1-r) (1-r’)/4
aABB
r (1-r’)/4
AABb
r (1-r’)/4
ab
(1-r’)/2
AaBb
(1-r) (1-r’)/4
aabb
(1-r) (1-r’)/4
aaBb
r (1-r’)/4
Aabb
r (1-r’)/4
aB
r’/2
AaBB
(1-r) r’/4
aabB
(1-r) r’/4
aaBB
r r’/4
AabB
r r’/4
Ab
r’/2
AABb
(1-r) r’/4
aAbb
(1-r) r’/4
aABb
r r’/4
AAbb
r r’/4
12
Example 1 in Genetics (4)
Four distinct phenotypes: A*B*, A*b*, a*B* and
a*b*.
A*: the dominant phenotype from (Aa, AA, aA).
a*: the recessive phenotype from aa.
B*: the dominant phenotype from (Bb, BB, bB).
b* : the recessive phenotype from bb.
A*B*: 9 gametic combinations.
A*b*: 3 gametic combinations.
a*B*: 3 gametic combinations.
a*b*: 1 gametic combination.
Total: 16 combinations.
13
Example 1 in Genetics (5)
Let (1 r )(1 r '), then
2
P( A * B*)
,
4
1
P( A * b*) P(a * B*)
,
4
P(a * b*)
4
.
14
Example 1 in Genetics (6)
Hence, the random sample of n from the
offspring of selfed heterozygotes will
follow a multinomial distribution:
2 1 1
Multinomial n;
,
,
, .
4
4
4 4
We know that (1 r )(1 r '), 0 r 1/ 2, and 0 r ' 1/ 2.
So, 1 1/ 4.
15
Example 1 in Genetics (7)
Suppose that we observe the data of y =
(y1, y2, y3, y4) = (125, 18, 20, 24),
which is a random sample from
2 1 1
Multinomial n;
,
,
, .
4
4
4 4
Then the probability mass function is
n!
2 y1 1 y2 y3 y4
f ( y | )
(
) (
)
( ) .
y1 ! y2 ! y3 ! y4 ! 4
4
4
16
Example 1 in Genetics (8)
How to estimate
?
MME (shown in the last week):
MLE (shown in the last week):
Bayesian Method:
http://en.wikipedia.org/wiki/Bayesian_method
http://en.wikipedia.org/wiki/Method_of_moments_
%28statistics%29
http://en.wikipedia.org/wiki/Maximum_likelihood
17
Example 1 in Genetics (9)
As the value of is between ¼ and 1, we
can assume that the prior distribution of
is Uniform (¼,1).
The posterior distribution is
f ( | y)
f ( y | ) f ( )
f ( y | ) f ( )d
.
The integration in the above denominator,
f ( y | ) f ( )d ,
does not have a close form.
18
Example 1 in Genetics (10)
We will consider the mean of posterior
distribution (the posterior mean),
E ( | y) f ( | y)d.
The Monte Carlo Markov Chains method is
a good method to estimate
E ( | y) f ( | y)d
even if f ( y | ) f ( ) d and the posterior
mean do not have close forms.
19
Example 1 by R
1
Direct numerical integration when ~ U ( ,1) :
4
We can assume other prior distributions
to compare the results of posterior means:
Beta(1,1), Beta(2,2), Beta(2,3), Beta(3,2),
Beta(0.5,0.5), Beta(10-5,10-5)
20
Example 1 by C/C++
Replace other prior distribution, such
as Beta(1,1),…,Beta(1e-5,1e-5)
21
Beta Prior
22
Comparison for Example 1 (1)
Estimate
Method
ˆ
Estimate
Method
ˆ
MME
0.683616
Bayesian
Beta(2,3)
0.564731
MLE
0.663165
Bayesian
Beta(3,2)
0.577575
Bayesian
U(¼,1)
0.573931
Bayesian
Beta(½,½)
0.574928
Bayesian
Beta(1,1)
0.573918
Bayesian
Beta(10-5,10-5)
0.588925
Bayesian
Beta(2,2)
0.572103
Bayesian
Beta(10-7,10-7)
show below
23
Comparison for Example 1 (2)
Estimate
Method
ˆ
Estimate
Method
ˆ
Bayesian
Beta(10,10)
0.559905
Bayesian
Beta(10-7,10-7)
0.193891
Bayesian
Beta(102,102)
0.520366
Bayesian
Beta(10-7,10-7)
0.400567
Bayesian
Beta(104,104)
0.500273
Bayesian
Beta(10-7,10-7)
0.737646
Bayesian
Beta(105,105)
0.500027
Bayesian
Beta(10-7,10-7)
0.641388
Bayesian
Beta(10n,10n)
0.5
Bayesian
Beta(10-7,10-7)
Not
stationary
n
24
Part 9
Gibbs Sampling Strategy
25
Sampling Strategy (1)
Strategy I:
Run one chain for a long time.
After some “Burn-in” period, sample points
every some fixed number of steps.
Burn-in
N samples from one chain
The code example of Gibbs sampling in the
previous lecture use sampling strategy I.
http://www.cs.technion.ac.il/~cs236372/tirgul09.ps
26
Sampling Strategy (2)
Strategy II:
Run the chain N times, each run for M steps.
Each run starts from a different state points.
Return the last state in each run.
N samples from the last
sample of each chain
Burn-in
27
Sampling Strategy (3)
Strategy II by R:
28
Sampling Strategy (4)
Strategy II by C/C++:
29
Strategy Comparison
Strategy I:
Perform “burn-in” only once and save time.
Samples might be correlated (--although only
weakly).
Strategy II:
Better chance of “covering” the space points
especially if the chain is slow to reach
stationary.
This must perform “burn-in” steps for each
chain and spend more time.
30
Hybrid Strategies (1)
Run several chains and sample few
samples from each.
Combines benefits of both strategies.
N samples from each
chain
Burn-in
31
Hybrid Strategies (2)
Hybrid Strategy by R:
32
Hybrid Strategies (3)
Hybrid Strategy by C/C++:
33
Part 10
Metropolis-Hastings Algorithm
34
Metropolis-Hastings Algorithm (1)
Another kind of the MCMC methods.
The Metropolis-Hastings algorithm can
draw samples from any probability
distribution π(x), requiring only that a
function proportional to the density can be
calculated at x.
Process in three steps:
Set up a Markov chain;
Run the chain until stationary;
Estimate with Monte Carlo methods.
35
http://en.wikipedia.org/wiki/Metropolis-Hastings_algorithm
Metropolis-Hastings Algorithm (2)
Let (1 ,..., n ) be a probability density
(or mass) function (pdf or pmf).
f(‧) is any function and we want to
estimate
n
I E ( f ) f (i ) i.
i 1
Construct P={Pij} the transition matrix of
an irreducible Markov chain with states
1,2,…,n, where Pij Pr{Xt 1 j | Xt i}, Xt (1, 2..., n)
and π is its unique stationary distribution.
36
Metropolis-Hastings Algorithm (3)
Run this Markov chain for times t=1,…,N
and calculate the Monte Carlo sum
ˆI 1
N
N
f {X },
t
t 1
then Iˆ I as N .
Sheldon M. Ross(1997). Proposition 4.3.
Introduction to Probability Model. 7th ed.
http://nlp.stanford.edu/local/talks/mcmc_2004_07_01.ppt
37
Metropolis-Hastings Algorithm (4)
In order to perform this method for a given
distribution π , we must construct a Markov
chain transition matrix P with π as its
stationary distribution, i.e. πP= π.
Consider the matrix P was made to satisfy
the reversibility condition that for all i and j,
πiPij= πjPij.
The property ensures that iPij j for all j ,
i
and hence π is a stationary distribution for
P.
38
Metropolis-Hastings Algorithm (5)
Let a proposal Q={Qij} be irreducible
where Qij =Pr(Xt+1=j|xt=i), and range of
Q is equal to range of π.
But π is not have to a stationary
distribution of Q.
Process: Tweak Qij to yield π.
States from Qij
not π
Tweak
States from Pij
π
39
Metropolis-Hastings Algorithm (6)
We assume that Pij has the form
Pij Qij (i, j ) (i j ),
Pii 1 Pij ,
i j
where (i, j ) is called accepted probability,
i.e. given Xt=i,
Xt 1 j with probability (i, j )
take
Xt 1 i with probability 1- (i, j )
40
Metropolis-Hastings Algorithm (7)
For i j , iPij jPji
iQij (i, j ) jQ ji ( j , i )
(*)
WLOG for some (i,j), iQij jQ ji .
In order to achieve equality (*), one can
introduce a probability (i, j ) 1 on the
left-hand side and set ( j, i) 1 on the
right-hand side.
41
Metropolis-Hastings Algorithm (8)
Then
iQij (i, j ) jQ ji ( j , i ) jQ ji
jQ ji
(i, j )
.
iQij
These arguments imply that the accepted
probability (i, j ) must be
jQ ji
(i, j ) min 1 ,
.
iQij
42
Metropolis-Hastings Algorithm (9)
M-H Algorithm:
Step 1: Choose an irreducible Markov
chain transition matrix Q with transition
probability Qij.
Step 2: Let t=0 and initialize X0 from
states in Q.
Step 3 (Proposal Step):
Given Xt=i, sample Y=j form QiY.
43
Metropolis-Hastings Algorithm (10)
M-H Algorithm (cont.):
Step 4 (Acceptance Step):
Generate a random number U from
Uniform(0,1).
If U (i, j ) set Xt 1 Y j ,
else Xt 1 Xt i.
Step 5: t=t+1, repeat Step 3~5 until
convergence.
44
Metropolis-Hastings Algorithm (11)
An Example of Step
3~5:
Qij
1. Y 1 from QX 0 ,Y and
(Y 1)QY1 , X 0
( X 0)QX 0 ,Y1
accepted X 1 Y 1
( X 0, Y 1) min 1 ,
2. Y 2 from Q X1 ,Y and
(Y 2)QY2 , X1
( X 1, Y 2) min 1 ,
( X 1)QX1 ,Y2
not accepted X 2 Y 1
3. Y 3 from Q X 2 ,Y and
(Y 3)QY2 , X1
a ( X 2, Y 3) min 1 ,
( X 2)QX1 ,Y2
accepted X 3 Y 3
and so on.
Tweak
Pij
Y1
X1= Y1
Y2
X2= Y1
Y3
X3= Y3
‧
‧
‧
‧
‧
‧
‧
‧
‧
YN
XN
Y
X(t)
45
Metropolis-Hastings Algorithm (12)
We may define a “rejection rate” as the
proportion of times t for which Xt+1=Xt.
Clearly, in choosing Q, high rejection rates
are to be avoided.
Example:
Y
will be small
(Xt )
π
and it is likely that X t 1 X t
More Step3~5 are needed .
46
Xt
Y
Example (1)
Simulate a bivariate normal distribution:
X1
1
11 12
X ~ N2 ( ,
),
X2
2
21 22
1
T 1
exp( ( X ) ( X ))
2
i.e. ( X )
.
1/2
2 | |
47
Example (2)
Metropolis-Hastings Algorithm:
1. X 0 , i=0.
2. Generate U ~ U ( 1,1) and U ~ U ( 1,1)
1
2
3.
4.
5.
U1
that U1 and U 2 are independent, then U i .
U2
Y X U .
i
i
i
(Yi )
},
X i 1 Yi w.p. (X i , Yi )=min{1,
( Xi )
X X w.p. 1- (X , Y ).
i
i
i
i 1
i i 1, repeat step 2~4 until convergence.
48
Example of M-H Algorithm by R
49
Example of M-H Algorithm by C (1)
50
Example of M-H Algorithm by C (2)
51
Example of M-H Algorithm by C (3)
52
An Figure to Check Simulation Results
Black points are simulated samples; color
points are probability density.
7
6
5
4
X2
8
9
10
0
1
2
3
X1
4
5
6
53
Exercises
Write your own programs similar to those
examples presented in this talk, including
Example 1 in Genetics and other
examples.
Write programs for those examples
mentioned at the reference web pages.
Write programs for the other examples
that you know.
54