Bayesian methods with Monte Carlo Markov Chains III

Download Report

Transcript Bayesian methods with Monte Carlo Markov Chains III

Bayesian Methods with Monte
Carlo Markov Chains III
Henry Horng-Shing Lu
Institute of Statistics
National Chiao Tung University
[email protected]
http://tigpbp.iis.sinica.edu.tw/courses.htm
1
Part 8
More Examples of Gibbs Sampling
2
An Example with Three Random Variables (1)

To sample (X,Y,N) as follows:
( X , Y , N ) ~ f ( x, y , n ) 
n
 n  x  1

n  x   1  
,
e
(1  y )
c  y
n!
 x
where x  0,1, 2,..., n, 0  y  1, n  0,1, 2,...,
 ,  are known, and c is a constant.
3
An Example with Three Random Variables (2)

One can see that
 n x
f ( x | y, n)    y (1  y)n x ~ Binomial (n, y),
 x
f ( y | x, n)  y x 1 (1  y)n x 1 ~ Beta( x   , n  x   ),
e(1 y )  ((1  y) )n x
f (n  x | x, y) 
~ Poisson((1  y) ).
(n  x)!
4
An Example with Three Random Variables (3)

Gibbs sampling Algorithm:
1.
2.
Initial Setting: t=0,
 y0 ~ Unif [0,1] or a arbitrary value  [0,1]

n0 ~ Discrete Unif [1, ) or a arbitrary integer value  [1, )
 x ~ Bin(n , y )
0
0
 0
Sample a value (xt+1,yt+1) from
 yt 1 ~ Beta ( xt   , nt  xt   )

nt 1 ~ xt  Possion((1  yt 1 ) )
 x ~ Bin(n , y )
t 1
t 1
 t 1
3.
t=t+1, repeat step 2 until convergence.
5
An Example with Three Random Variables by R
10000 samples with
α=2, β=7 and λ=16
6
An Example with Three Random Variables by C (1)
10000 samples with
α=2, β=7 and λ=16
7
An Example with Three Random Variables by C (2)
8
An Example with Three Random Variables by C (3)
9
Example 1 in Genetics (1)

Two linked loci with alleles A and a, and B
and b



A, B: dominant
a, b: recessive
A double heterozygote AaBb will produce
gametes of four types: AB, Ab, aB, ab
A
A
A
a
B
A
a
B
b
a
B
b
b
b
a
B
F (Female)
1- r’
r’ (female recombination fraction)
M (Male)
1-r
r (male recombination fraction)
10
Example 1 in Genetics (2)
r and r’ are the recombination rates for
male and female
 Suppose the parental origin of these
heterozygote is from the mating
of AABB  aabb. The problem is to estimate r
and r’ from the offspring of selfed
heterozygotes.


Fisher, R. A. and Balmukand, B. (1928). The
estimation of linkage from the offspring of
selfed heterozygotes. Journal of Genetics, 20,
79–92.

http://en.wikipedia.org/wiki/Genetics
http://www2.isye.gatech.edu/~brani/isyebayes/b
11
ank/handout12.pdf
Example 1 in Genetics (3)
MALE
F
E
M
A
L
E
AB
(1-r)/2
ab
(1-r)/2
aB
r/2
Ab
r/2
AB
(1-r’)/2
AABB
(1-r) (1-r’)/4
aABb
(1-r) (1-r’)/4
aABB
r (1-r’)/4
AABb
r (1-r’)/4
ab
(1-r’)/2
AaBb
(1-r) (1-r’)/4
aabb
(1-r) (1-r’)/4
aaBb
r (1-r’)/4
Aabb
r (1-r’)/4
aB
r’/2
AaBB
(1-r) r’/4
aabB
(1-r) r’/4
aaBB
r r’/4
AabB
r r’/4
Ab
r’/2
AABb
(1-r) r’/4
aAbb
(1-r) r’/4
aABb
r r’/4
AAbb
r r’/4
12
Example 1 in Genetics (4)










Four distinct phenotypes: A*B*, A*b*, a*B* and
a*b*.
A*: the dominant phenotype from (Aa, AA, aA).
a*: the recessive phenotype from aa.
B*: the dominant phenotype from (Bb, BB, bB).
b* : the recessive phenotype from bb.
A*B*: 9 gametic combinations.
A*b*: 3 gametic combinations.
a*B*: 3 gametic combinations.
a*b*: 1 gametic combination.
Total: 16 combinations.
13
Example 1 in Genetics (5)
Let   (1  r )(1  r '), then
2 
P( A * B*) 
,
4
1
P( A * b*)  P(a * B*) 
,
4
P(a * b*) 

4
.
14
Example 1 in Genetics (6)

Hence, the random sample of n from the
offspring of selfed heterozygotes will
follow a multinomial distribution:
 2   1 1  
Multinomial  n;
,
,
, .
4
4
4 4


We know that   (1  r )(1  r '), 0  r  1/ 2, and 0  r '  1/ 2.
So, 1    1/ 4.
15
Example 1 in Genetics (7)

Suppose that we observe the data of y =
(y1, y2, y3, y4) = (125, 18, 20, 24),
which is a random sample from
 2   1 1  
Multinomial  n;
,
,
, .
4
4
4 4


Then the probability mass function is
n!
2   y1 1   y2  y3  y4
f ( y | ) 
(
) (
)
( ) .
y1 ! y2 ! y3 ! y4 ! 4
4
4
16
Example 1 in Genetics (8)

How to estimate
?

MME (shown in the last week):

MLE (shown in the last week):

Bayesian Method:
http://en.wikipedia.org/wiki/Bayesian_method
http://en.wikipedia.org/wiki/Method_of_moments_
%28statistics%29
http://en.wikipedia.org/wiki/Maximum_likelihood
17
Example 1 in Genetics (9)
As the value of  is between ¼ and 1, we
can assume that the prior distribution of
is Uniform (¼,1).
 The posterior distribution is

f ( | y) 

f ( y |  ) f ( )
 f ( y |  ) f ( )d
.
The integration in the above denominator,
 f ( y |  ) f ( )d ,
does not have a close form.
18
Example 1 in Genetics (10)

We will consider the mean of posterior
distribution (the posterior mean),
E ( | y)    f ( | y)d.

The Monte Carlo Markov Chains method is
a good method to estimate
E ( | y)    f ( | y)d
even if  f ( y |  ) f ( ) d and the posterior
mean do not have close forms.
19
Example 1 by R
1
 Direct numerical integration when  ~ U ( ,1) :
4

We can assume other prior distributions
to compare the results of posterior means:
Beta(1,1), Beta(2,2), Beta(2,3), Beta(3,2),
Beta(0.5,0.5), Beta(10-5,10-5)
20
Example 1 by C/C++
Replace other prior distribution, such
as Beta(1,1),…,Beta(1e-5,1e-5)
21
Beta Prior
22
Comparison for Example 1 (1)
Estimate
Method
ˆ
Estimate
Method
ˆ
MME
0.683616
Bayesian
Beta(2,3)
0.564731
MLE
0.663165
Bayesian
Beta(3,2)
0.577575
Bayesian
U(¼,1)
0.573931
Bayesian
Beta(½,½)
0.574928
Bayesian
Beta(1,1)
0.573918
Bayesian
Beta(10-5,10-5)
0.588925
Bayesian
Beta(2,2)
0.572103
Bayesian
Beta(10-7,10-7)
show below
23
Comparison for Example 1 (2)
Estimate
Method
ˆ
Estimate
Method
ˆ
Bayesian
Beta(10,10)
0.559905
Bayesian
Beta(10-7,10-7)
0.193891
Bayesian
Beta(102,102)
0.520366
Bayesian
Beta(10-7,10-7)
0.400567
Bayesian
Beta(104,104)
0.500273
Bayesian
Beta(10-7,10-7)
0.737646
Bayesian
Beta(105,105)
0.500027
Bayesian
Beta(10-7,10-7)
0.641388
Bayesian
Beta(10n,10n)
 0.5
Bayesian
Beta(10-7,10-7)
Not
stationary
n
24
Part 9
Gibbs Sampling Strategy
25
Sampling Strategy (1)

Strategy I:


Run one chain for a long time.
After some “Burn-in” period, sample points
every some fixed number of steps.
Burn-in
N samples from one chain
The code example of Gibbs sampling in the
previous lecture use sampling strategy I.
http://www.cs.technion.ac.il/~cs236372/tirgul09.ps


26
Sampling Strategy (2)

Strategy II:



Run the chain N times, each run for M steps.
Each run starts from a different state points.
Return the last state in each run.
N samples from the last
sample of each chain
Burn-in
27
Sampling Strategy (3)

Strategy II by R:
28
Sampling Strategy (4)

Strategy II by C/C++:
29
Strategy Comparison

Strategy I:



Perform “burn-in” only once and save time.
Samples might be correlated (--although only
weakly).
Strategy II:


Better chance of “covering” the space points
especially if the chain is slow to reach
stationary.
This must perform “burn-in” steps for each
chain and spend more time.
30
Hybrid Strategies (1)
Run several chains and sample few
samples from each.
 Combines benefits of both strategies.

N samples from each
chain
Burn-in
31
Hybrid Strategies (2)

Hybrid Strategy by R:
32
Hybrid Strategies (3)

Hybrid Strategy by C/C++:
33
Part 10
Metropolis-Hastings Algorithm
34
Metropolis-Hastings Algorithm (1)
Another kind of the MCMC methods.
 The Metropolis-Hastings algorithm can
draw samples from any probability
distribution π(x), requiring only that a
function proportional to the density can be
calculated at x.
 Process in three steps:




Set up a Markov chain;
Run the chain until stationary;
Estimate with Monte Carlo methods.
35
http://en.wikipedia.org/wiki/Metropolis-Hastings_algorithm
Metropolis-Hastings Algorithm (2)
Let   (1 ,...,  n ) be a probability density
(or mass) function (pdf or pmf).
 f(‧) is any function and we want to
estimate
n

I  E ( f )   f (i ) i.
i 1

Construct P={Pij} the transition matrix of
an irreducible Markov chain with states
1,2,…,n, where Pij  Pr{Xt  1  j | Xt  i}, Xt  (1, 2..., n)
and π is its unique stationary distribution.
36
Metropolis-Hastings Algorithm (3)

Run this Markov chain for times t=1,…,N
and calculate the Monte Carlo sum
ˆI  1
N
N
 f {X },
t
t 1
then Iˆ  I as N  .


Sheldon M. Ross(1997). Proposition 4.3.
Introduction to Probability Model. 7th ed.
http://nlp.stanford.edu/local/talks/mcmc_2004_07_01.ppt
37
Metropolis-Hastings Algorithm (4)
In order to perform this method for a given
distribution π , we must construct a Markov
chain transition matrix P with π as its
stationary distribution, i.e. πP= π.
 Consider the matrix P was made to satisfy
the reversibility condition that for all i and j,
πiPij= πjPij.
 The property ensures that   iPij   j for all j ,

i
and hence π is a stationary distribution for
P.
38
Metropolis-Hastings Algorithm (5)
Let a proposal Q={Qij} be irreducible
where Qij =Pr(Xt+1=j|xt=i), and range of
Q is equal to range of π.
 But π is not have to a stationary
distribution of Q.
 Process: Tweak Qij to yield π.

States from Qij
not π
Tweak
States from Pij
π
39
Metropolis-Hastings Algorithm (6)

We assume that Pij has the form
Pij  Qij   (i, j ) (i  j ),
Pii  1   Pij ,
i j
where  (i, j ) is called accepted probability,
i.e. given Xt=i,
 Xt  1  j with probability  (i, j )
take 
 Xt  1  i with probability 1-  (i, j )
40
Metropolis-Hastings Algorithm (7)

For i  j ,  iPij   jPji
  iQij   (i, j )   jQ ji   ( j , i )
(*)
WLOG for some (i,j),  iQij   jQ ji .
 In order to achieve equality (*), one can
introduce a probability  (i, j )  1 on the
left-hand side and set  ( j, i)  1 on the
right-hand side.

41
Metropolis-Hastings Algorithm (8)

Then
 iQij   (i, j )   jQ ji   ( j , i )   jQ ji
 jQ ji
  (i, j ) 
.
 iQij

These arguments imply that the accepted
probability  (i, j ) must be

 jQ ji 
 (i, j )  min 1 ,
.



iQij 


42
Metropolis-Hastings Algorithm (9)

M-H Algorithm:
Step 1: Choose an irreducible Markov
chain transition matrix Q with transition
probability Qij.
Step 2: Let t=0 and initialize X0 from
states in Q.
Step 3 (Proposal Step):
Given Xt=i, sample Y=j form QiY.
43
Metropolis-Hastings Algorithm (10)

M-H Algorithm (cont.):
Step 4 (Acceptance Step):
Generate a random number U from
Uniform(0,1).
If U   (i, j ) set Xt  1  Y  j ,
else Xt  1  Xt  i.
Step 5: t=t+1, repeat Step 3~5 until
convergence.
44
Metropolis-Hastings Algorithm (11)

An Example of Step
3~5:
Qij
1. Y 1 from QX 0 ,Y and

 (Y 1)QY1 , X 0

 ( X 0)QX 0 ,Y1

accepted  X 1  Y 1
 ( X 0, Y 1)  min 1 ,




2. Y 2 from Q X1 ,Y and

 (Y 2)QY2 , X1
 ( X 1, Y 2)  min 1 ,

 ( X 1)QX1 ,Y2

not accepted  X 2  Y 1
3. Y 3 from Q X 2 ,Y and




 (Y 3)QY2 , X1
a ( X 2, Y 3)  min 1 ,

 ( X 2)QX1 ,Y2

accepted  X 3  Y 3



and so on.
Tweak
Pij
Y1
X1= Y1
Y2
X2= Y1
Y3
X3= Y3
‧
‧
‧
‧
‧
‧
‧
‧
‧
YN
XN
Y
X(t)
45
Metropolis-Hastings Algorithm (12)
We may define a “rejection rate” as the
proportion of times t for which Xt+1=Xt.
Clearly, in choosing Q, high rejection rates
are to be avoided.
 Example:

 Y 
will be small
 (Xt )
π
and it is likely that X t 1  X t
 More Step3~5 are needed .
46
Xt
Y
Example (1)

Simulate a bivariate normal distribution:
 X1 
 1 
 11 12 
X    ~ N2 (    ,   
),
 X2 
 2 
  21  22 
1
T 1
exp( ( X   )  ( X   ))
2
i.e.  ( X ) 
.
1/2
2 |  |
47
Example (2)

Metropolis-Hastings Algorithm:
1. X 0   , i=0.
2. Generate U ~ U ( 1,1) and U ~ U ( 1,1)
1
2
3.
4.
5.
 U1 
that U1 and U 2 are independent, then U i    .
U2 

Y  X U .
i
i
i
 (Yi )

},
 X i 1  Yi w.p.  (X i , Yi )=min{1,
 ( Xi )

 X  X w.p. 1- (X , Y ).
i
i
i
 i 1
i  i  1, repeat step 2~4 until convergence.
48
Example of M-H Algorithm by R
49
Example of M-H Algorithm by C (1)
50
Example of M-H Algorithm by C (2)
51
Example of M-H Algorithm by C (3)
52
An Figure to Check Simulation Results
Black points are simulated samples; color
points are probability density.
7
6
5
4
X2
8
9
10

0
1
2
3
X1
4
5
6
53
Exercises

Write your own programs similar to those
examples presented in this talk, including
Example 1 in Genetics and other
examples.

Write programs for those examples
mentioned at the reference web pages.

Write programs for the other examples
that you know.
54