Bayesian methods with Monte Carlo Markov Chains II

Download Report

Transcript Bayesian methods with Monte Carlo Markov Chains II

Bayesian Methods with Monte
Carlo Markov Chains II
Henry Horng-Shing Lu
Institute of Statistics
National Chiao Tung University
[email protected]
http://tigpbp.iis.sinica.edu.tw/courses.htm
1
Part 3
An Example in Genetics
2
Example 1 in Genetics (1)

Two linked loci with alleles A and a, and B
and b



A, B: dominant
a, b: recessive
A double heterozygote AaBb will produce
gametes of four types: AB, Ab, aB, ab
A
A
A
a
B
A
a
B
b
a
B
b
b
b
a
B
F (Female)
1- r’
r’ (female recombination fraction)
M (Male)
1-r
r (male recombination fraction)
3
Example 1 in Genetics (2)
r and r’ are the recombination rates for
male and female
 Suppose the parental origin of these
heterozygote is from the mating
of A A B B  a a b b . The problem is to estimate r
and r’ from the offspring of selfed
heterozygotes.


Fisher, R. A. and Balmukand, B. (1928). The
estimation of linkage from the offspring of
selfed heterozygotes. Journal of Genetics, 20,
79–92.

http://en.wikipedia.org/wiki/Genetics
http://www2.isye.gatech.edu/~brani/isyebayes/b
4
ank/handout12.pdf
Example 1 in Genetics (3)
MALE
F
E
M
A
L
E
AB
(1-r)/2
ab
(1-r)/2
aB
r/2
Ab
r/2
AB
(1-r’)/2
AABB
(1-r) (1-r’)/4
aABb
(1-r) (1-r’)/4
aABB
r (1-r’)/4
AABb
r (1-r’)/4
ab
(1-r’)/2
AaBb
(1-r) (1-r’)/4
aabb
(1-r) (1-r’)/4
aaBb
r (1-r’)/4
Aabb
r (1-r’)/4
aB
r’/2
AaBB
(1-r) r’/4
aabB
(1-r) r’/4
aaBB
r r’/4
AabB
r r’/4
Ab
r’/2
AABb
(1-r) r’/4
aAbb
(1-r) r’/4
aABb
r r’/4
AAbb
r r’/4
5
Example 1 in Genetics (4)










Four distinct phenotypes: A*B*, A*b*, a*B* and
a*b*.
A*: the dominant phenotype from (Aa, AA, aA).
a*: the recessive phenotype from aa.
B*: the dominant phenotype from (Bb, BB, bB).
b* : the recessive phenotype from bb.
A*B*: 9 gametic combinations.
A*b*: 3 gametic combinations.
a*B*: 3 gametic combinations.
a*b*: 1 gametic combination.
Total: 16 combinations.
6
Example 1 in Genetics (5)
L et   (1  r )(1  r '), th en
P ( A * B *) 
2
,
4
P ( A * b *)  P ( a * B *) 
1
,
4
P ( a * b *) 

4
.
7
Example 1 in Genetics (6)
Hence, the random sample of n from the offspring
of selfed heterozygotes will follow a multinomial
distribution:
 2  1 1  
M ultinom ial  n ;
,
,
, .
4
4
4
4

W e know that   (1  r )(1  r '), 0  r  1 / 2, and 0  r '  1 / 2.
So, 1    1 / 4.
8
Bayesian for Example 1 in Genetics (1)

To simplify computation, we let
P ( A * B *)   1 , P ( A * b *)   2
P ( a * B *)   3 , P ( a * b *)   4

The random sample of n from the offspring of
selfed heterozygotes will follow a multinomial
distribution:
M u ltin o m ia l  n ;  1 ,  2 ,  3 ,  4  .
9
Bayesian for Example 1 in Genetics (2)
If we assume a Dirichlet prior distribution
with parameters: D ( 1 ,  2 ,  3 ,  4 ) to estimate
probabilities for A*B*, A*b*,a*B* and
a*b* .
 Recall that
 A*B*: 9 gametic combinations.
 A*b*: 3 gametic combinations.
 a*B*: 3 gametic combinations.
 a*b*: 1 gametic combination.
We consider ( 1 ,  2 ,  3 ,  4 )  (9, 3, 3,1).

10
Bayesian for Example 1 in Genetics (3)
Suppose that we observe the data of y =
(y1, y2, y3, y4) = (125, 18, 20, 24).
 So the posterior distribution is also
Dirichlet with parameters
D(134, 21, 23, 25)
 The Bayesian estimation for probabilities
are: ( 1 ,  2 ,  3 ,  4 )=(0.660, 0.103, 0.113,
0.123)

11
Bayesian for Example 1 in Genetics (4)

Consider the original model,
P ( A * B *) 
2
4

, P ( A * b *)  P ( a * B *) 
1
, P ( a * b *) 
4

.
4
The random sample of n also follow a
multinomial distribution:
 2  1 1  
( y1 , y 2 , y 3 , y 4 ) ~ M ultinom ial  n ;
,
,
, .
4
4
4
4


We will assume a Beta prior distribution:
Beta ( 1 , 2 ).
12
Bayesian for Example 1 in Genetics (5)

The posterior distribution becomes
P ( | y 1 , y 2 , y 3 , y 4 ) 
P ( y 1 , y 2 , y 3 , y 4 |  ) P ( )
 P( y , y
1

2
, y 3 , y 4 |  ) P ( ) d 
.
The integration in the above denominator,
 P( y , y
1
2
, y 3 , y 4 |  ) P ( ) d  ,
does not have a close form.
13
Bayesian for Example 1 in Genetics (6)


How to solve this problem?
Monte Carlo Markov Chains (MCMC)
Method!
What value is appropriate for ( 1 , 2 ) ?
14
Part 4
Monte Carlo Methods
15
Monte Carlo Methods (1)
Consider the game of
solitaire: what’s the
chance of winning
with a properly
shuffled deck?
 http://en.wikipedia.or
g/wiki/Monte_Carlo_
method
 http://nlp.stanford.ed
u/local/talks/mcmc_2
004_07_01.ppt

?
Lose
Lose
Win
Lose
Chance of winning is 1 in 4!
16
Monte Carlo Methods (2)
Hard to compute analytically because
winning or losing depends on a complex
procedure of reorganizing cards
 Insight: why not just play a few hands,
and see empirically how many do in fact
win?
 More generally, can approximate a
probability density function using only
samples from that density

17
Monte Carlo Methods (3)
Given a very large set X and a distribution
f(x) over it.
 We draw a set of N i.i.d. random samples.
 We can then approximate the distribution
using these samples.

f(x)
X
fN ( x) 
1
N
N
 1( x
i 1
(i)
 x)  f ( x)
N 
18
Monte Carlo Methods (4)

We can also use these samples to
compute expectations:
EN (g ) 

1
N
N

i 1
g(x
(i)
)  E(g) 
N 
 g ( x) f (x)
x
And even use them to find a maximum:
(i )
xˆ  arg m ax[ f ( x )]
x
(i)
19
Monte Carlo Example
 X1,

, X n be i.i.d. N (0,1)
Solution:
E ( X i )= 
4

, find
4
E ( X i )= ?

-
x
4
1
2
exp( 
x
2
2
4!
) dx 
2
4/2
4
( )!
2

24
3
8
Use Monte Carlo method to approximation
20
Part 5
Monte Carle Method vs.
Numerical Integration
21
Numerical Integration (1)

Theorem (Riemann Integral):
If f is continuous, integrable, then the Riemann
Sum:
n
Sn 

f ( ci )( x i  1  x i ), w here ci  [ x i , x i  1 ]
i 1
n 
Sn

m ax | x i 1  x i |
I 

b
f ( x ) dx , a constant
a
i 1 ,.., n

http://en.wikipedia.org/wiki/Numerical_integratio
n
http://en.wikipedia.org/wiki/Riemann_integral
22
Numerical Integration (2)

Trapezoidal Rule:
xi  a 
ba
n
x i 1  x i  h 

n
h
2
i 1
 h[
ba
f ( xi )  f ( xi  1)
n
ST 
i , equally spaced intervals
1
2
f ( x1 )  f ( x 2 )  ...  f ( x n ) 
1
2
f ( x n  1 )]
http://en.wikipedia.org/wiki/Trapezoidal_rule
23
Numerical Integration (3)

An Example of Trapezoidal Rule by R :
24
Numerical Integration (4)

Simpson’s Rule:
S pecifically,


b
f ( x ) dx 
ba
[ f (a)  4 f (
6
a
ab
)  f ( b )]
2
One derivation replaces the integrand f(x) by
the quadratic polynomial P(x) which takes the
same values as f(x) at the end points a and b
and the midpoint m = (a+b) / 2.
P ( x)  f (a )

b
a
P ( x) 
( x  m )( x  b )
( a  m )( a  b )
ba
6
[ f (a )  4 f (
 f (m )
ab
( x  a )( x  b )
( m  a )( m  b )
 f (b )
( x  a )( x  m )
( b  a )( b  m )
)  f ( b )]
2
http://en.wikipedia.org/wiki/Simpson_rule
25
Numerical Integration (5)
Example of Simpson’s Rule by R:
0
500
1000
1500

-100
-50
0
50
100
x
26
Numerical Integration (6)

Two Problems in numerical integration:
1.
(b,a)= (  ,  ) , i.e. 


f ( x )d x
How to use numerical integration?
1
Logistic transform: y 
x
1 e
2.
Two or more high dimension integration?
Monte Carle Method!
http://en.wikipedia.org/wiki/Numerical_integration
27
Monte Carlo Integration

I 

f ( x ) dx 

f ( x)
g ( x ) dx ,
g ( x)
where g(x) is a probability distribution
function and let h(x)=f(x)/g(x).
iid

If x1 , ..., x n ~ g ( x ), then
1
n
LLN
h( x ) 

n
i
i 1

n 
E ( h ( x )) 
 h ( x ) g ( x ) dx  
f ( x ) dx  I
http://en.wikipedia.org/wiki/Monte_Carlo_integration 28
Example (1)


1
 (18)
0
 (7 )  (11)
(It is
2
x (1  x ) dx  ?
10
10
 0.0350877 !)
57

Numerical Integration by Trapezoidal Rule:
29
Example (2)

Monte Carlo Integration:


 (18)
1
0
 (7)  (11)
Let

1
x
10
x ~ B eta (7,11)
 (7  11)
4
 (7)  (11)
0
1
x (1  x ) dx 
10
n
x

n
i 1
x
7 1
LLN
4
i
n 
x
0
4
 (7  11)
 (7)  (11)
x
7 1
(1  x )
11 1
, then
(1  x )
 E(x ) 
4

1

11 1
dx  E ( x )
4
1
 (18)
0
 (7 )  (11)
x (1  x ) dx
10
10
30
dx
Example by C/C++ and R
31
Part 6
Markov Chains
32
Markov Chains (1)
A Markov chain is a mathematical model
for stochastic systems whose states,
discrete or continuous, are governed by
transition probability.
 Suppose the random variable X0, X1,…
take state space (Ω) that is a countable
set of value. A Markov chain is a process
that corresponds to the network.

X0
X1
...
X t 1
Xt
X t 1
...
33
Markov Chains (2)

The current state in Markov chain only
depends on the most recent previous
states.
P ( X t  1  j | X t  i , X t  1  it  1 ,
, X 0  i0 )
 P ( X t  1  j | X t  i )  T ran sitio n P ro b ab ility
w h ere ( i0 , ..., i , j )  

http://en.wikipedia.org/wiki/Markov_chain
http://civs.stat.ucla.edu/MCMC/MCMC_tut
orial/Lect1_MCMC_Intro.pdf
34
An Example of Markov Chains
   (1, 2, 3, 4, 5)
X  ( X 0 , X 1 , ..., X t , ...)  
where X0 is initial state and so on.
P is transition matrix.
1
1  0.4

2 0.5

P  3  0.0

4  0.0
5  0.0
2
3
4
5
0.6
0.0
0.0
0.0
0.5
0.0
0.3
0.0
0.7
0.0
0.1
0.3
0.3
0.0
0.5
0.0 

0.0

0.0 

0.6 
0.3 
35
Definition (1)

Define the probability of going from state i
to state j in n time steps as
(n)
p ij
 P ( X tn  j | X t  i)
A state j is accessible from state i if there
(n)
p
are n time steps such that ij  0 , where
n=0,1,…
 A state i is said to communicate with
state j (denote: i  j ), if it is true that
both i is accessible from j and that j is
accessible from i.

36
Definition (2)
A state i has period d(i) if any return to
state i must occur in multiples of d(i) time
steps.
 Formally, the period of a state is defined
as
(n)
d ( i )  gcd{ n : Pii  0}


If d(i)= 1, then the state is said to be
aperiodic; otherwise (d(i)>1), the state
is said to be periodic with period d(i).
37
Definition (3)
A set of states C is a communicating
class if every pair of states in C
communicates with each other.
 Every state in a communicating class must
have the same period
 Example:

38
Definition (4)
A finite Markov chain is said to be
irreducible if its state space (Ω) is a
communicating class; this means that, in
an irreducible Markov chain, it is possible
to get to any state from any state.
 Example:

39
Definition (5)
A finite state irreducible Markov chain is
said to be ergodic if its states are
aperiodic
 Example:

40
Definition (6)
A state i is said to be transient if, given
that we start in state i, there is a non-zero
probability that we will never return back
to i.
 Formally, let the random variable Ti be the
next return time to state i (the “hitting
time”):
Ti  m in{ n : X n  i | X 0  i}


Then, state i is transient iff there exists a
finite Ti such that: P (Ti   )  1
41
Definition (7)
A state i is said to be recurrent or
persistent iff there exists a finite Ti such
that: P (Ti   )  1 .
 The mean recurrent time  i  E [Ti ] .
 State i is positive recurrent if  i is finite;
otherwise, state i is null recurrent.
 A state i is said to be ergodic if it is
aperiodic and positive recurrent. If all
states in a Markov chain are ergodic, then
the chain is said to be ergodic.

42
Stationary Distributions

Theorem: If a Markov Chain is irreducible
and aperiodic, then
(n)
ij
P


1
j
as n   ,  i , j  
Theorem: If a Markov chain is irreducible
and aperiodic, then  !  j  lim P ( X n  j )
and  j 

i
i
Pij ,

n 
j
 1,  j  
i
where  is stationary distribution.
43
Definition (7)

A Markov chain is said to be reversible, if
there is a stationary distribution  such
that
 i Pij   j P ji  i , j  

Theorem: if a Markov chain is reversible,
then
 j    i Pij
i
44
An Example of Stationary
Distributions

A Markov chain:
0.4
0.3
0.3
0.7
1

2
0.3
0.3
0.7
3
 0.7

P  0.3

 0.0
0.0 

0.3

0.7 
0.3
0.4
0.3
The stationary distribution is
1
3

1
3
 0.7
1
0.3


3
 0.0
0.3
0.4
0.3
0.0 
 1
0.3  
 3
0.7 
1
3
1
 
3
1
3
1
3 
1
3 
45
Properties of Stationary Distributions

Regardless of the starting point, the
process of irreducible and aperiodic
Markov chains will converge to a
stationary distribution.

The rate of converge depends on
properties of the transition probability.
46
Part 7
Monte Carlo Markov Chains
47
Applications of MCMC

Simulation:
 n  x   1
n  x   1
Ex: ( x , y ) ~ f ( x , y )  c   y
(1  y )
,
x
w here x  0,1, 2, ..., n , 0  y  1,  ,  are know n.

Integration: computing in high dimensions.
1
Ex:
E ( g ( y )) 


g ( y ) f ( y )dy.
0
Bayesian Inference:
Ex: Posterior distributions, posterior
means…
48
Monte Carlo Markov Chains
MCMC method are a class of algorithms
for sampling from probability distributions
based on constructing a Markov chain that
has the desired distribution as its
stationary distribution.
 The state of the chain after a large
number of steps is then used as a sample
from the desired distribution.
 http://en.wikipedia.org/wiki/MCMC

49
Inversion Method vs. MCMC (1)
Inverse transform sampling, also
known as the probability integral
transform, is a method of sampling a
number at random from any probability
distribution given its cumulative
distribution function (cdf).
 http://en.wikipedia.org/wiki/Inverse_trans
form_sampling_method

50
Inversion Method vs. MCMC (2)


A random variable with a cdf F, then F
has a uniform distribution on [0, 1].
The inverse transform sampling method
works as follows:
1.
2.
3.
Generate a random number from the
standard uniform distribution; call this u.
Compute the value x such that F(x) = u; call
this xchosen.
Take xchosen to be the random number drawn
from the distribution described by F.
51
Inversion Method vs. MCMC (3)
For one dimension random variable,
Inversion method is good, but for two or
more high dimension random variables,
Inverse Method maybe not.
 For two or more high dimension random
variables, the marginal distributions for
those random variables respectively
sometime be calculated difficult with more
time.

52
Gibbs Sampling
One kind of the MCMC methods.
 The point of Gibbs sampling is that given a
multivariate distribution it is simpler to
sample from a conditional distribution
rather than integrating over a joint
distribution.
 George Casella and Edward I. George.
"Explaining the Gibbs sampler". The
American Statistician, 46:167-174, 1992.
(Basic summary and many references.)


http://en.wikipedia.org/wiki/Gibbs_sampling
53
Example 1 (1)

To sample x from:
 n  x   1
n  x   1
( x, y ) ~ f ( x, y )  c   y
(1  y )
,
x
where x  0,1, 2, ..., n , 0  y  1, n ,  ,  are know n
c is a constant.

One can see that
f (x | y) 
f ( y | x) 
n x
nx
   y (1  y )
~ Binom ial ( n , y ),
f ( y)
x
f ( x, y )
f ( x, y )
f ( x)
 y
x   1
(1  y )
n  x   1
~ Beta ( x   , n  x   ).
54
Example 1 (2)

Gibbs sampling Algorithm:

Initial Setting:
 y 0 ~ U niform [0,1] or a arbitrary value  [0,1]

 x 0 ~ B in ( n , y 0 )

For t=0,…,n, sample a value (xt+1,yt+1) from
 y t  1 ~ B eta ( x t   , n  x t   )

 x t  1 ~ B in ( n , y t  1 )

Return (xn, yn)
55
Example 1 (3)
x , yt  y.
Under regular conditions: x t t

t 
 How many steps are needed for
convergence?


Within an acceptable error, such as
i  10

t  i 1
10

i  20
xt


t  i  11
xt
 0.001,  i 
.
10
n is large enough, such as n=1000.
56
Example 1 (4)

Inversion Method:

x is Beta-Binomial distribution.
x ~ f ( x) 


1
f ( x , y ) dy
0
 n   (   )  ( x   )  ( n  x   )
 
 (n     )
 x   ( )  (  )
The cdf of x is F(x) that has a uniform
distribution on [0, 1].
 n   (   )  ( i   )  ( n  i   )
F ( x)    
 (n     )
i  0  i   ( )  (  )
x
57
Gibbs sampling by R (1)
58
Gibbs sampling by R (2)
Inversion method
59
Gibbs sampling by R (3)
60
Inversion Method by R
61
Gibbs sampling by C/C++ (1)
62
Gibbs sampling by C/C++ (2)
Inversion method
63
Gibbs sampling by C/C++ (3)
64
Gibbs sampling by C/C++ (4)
65
Gibbs sampling by C/C++ (5)
66
Inversion Method by C/C++ (1)
)
67
Inversion Method by C/C++ (2)
68
Plot Histograms by Maple (1)

Figure 1:1000 samples with n=16, α=5
and β=7.
Blue-Inversion method
Red-Gibbs sampling
69
Plot Histograms by Maple (2)
70
Probability Histograms by Maple (1)

Figure 2:

Blue histogram and yellow line are pmf of x.
1
Red histogram is Pˆ ( X  x )   P ( X  x | Y  y ) from
m
Gibbs sampling.
m

i
i
i 1
71
Probability Histograms by Maple (2)
The probability histogram of blue
histogram of Figure 1 would be similar to
the blue probability histogram of Figure 2,
when the sample size   .
 The probability histogram of red histogram
of Figure 1 would be similar to the red
probability histogram of Figure 2, when
the iteration n   .

72
Probability Histograms by Maple (3)
73
Exercises

Write your own programs similar to those
examples presented in this talk, including
Example 1 in Genetics and other
examples.

Write programs for those examples
mentioned at the reference web pages.

Write programs for the other examples
that you know.
74