Bayesian methods with Monte Carlo Markov Chains II
Download
Report
Transcript Bayesian methods with Monte Carlo Markov Chains II
Bayesian Methods with Monte
Carlo Markov Chains II
Henry Horng-Shing Lu
Institute of Statistics
National Chiao Tung University
[email protected]
http://tigpbp.iis.sinica.edu.tw/courses.htm
1
Part 3
An Example in Genetics
2
Example 1 in Genetics (1)
Two linked loci with alleles A and a, and B
and b
A, B: dominant
a, b: recessive
A double heterozygote AaBb will produce
gametes of four types: AB, Ab, aB, ab
A
A
a
B
b
1/2
1/2
A
B
A
a
a
B
b
b
1/2
b
a
1/2
B
3
Example 1 in Genetics (2)
Probabilities for genotypes in gametes
No Recombination
Recombination
Male
1-r
r
Female
1-r’
r’
A
A
a
B
b
1/2
1/2
A
B
A
a
a
B
b
1/2
a
1/2
b
AB
ab
aB
Ab
Male
(1-r)/2
(1-r)/2
r/2
r/2
Female
(1-r’)/2
(1-r’)/2
r’/2
r’/2
b
B
4
Example 1 in Genetics (3)
Fisher, R. A. and Balmukand, B. (1928).
The estimation of linkage from the offspring
of selfed heterozygotes. Journal of Genetics,
20, 79–92.
More:
http://en.wikipedia.org/wiki/Genetics
http://www2.isye.gatech.edu/~brani/isyeba
yes/bank/handout12.pdf
5
Example 1 in Genetics (4)
MALE
F
E
M
A
L
E
AB
(1-r)/2
ab
(1-r)/2
aB
r/2
Ab
r/2
AB
(1-r’)/2
AABB
(1-r) (1-r’)/4
aABb
(1-r) (1-r’)/4
aABB
r (1-r’)/4
AABb
r (1-r’)/4
ab
(1-r’)/2
AaBb
(1-r) (1-r’)/4
aabb
(1-r) (1-r’)/4
aaBb
r (1-r’)/4
Aabb
r (1-r’)/4
aB
r’/2
AaBB
(1-r) r’/4
aabB
(1-r) r’/4
aaBB
r r’/4
AabB
r r’/4
Ab
r’/2
AABb
(1-r) r’/4
aAbb
(1-r) r’/4
aABb
r r’/4
AAbb
r r’/4
6
Example 1 in Genetics (5)
Four distinct phenotypes:
A*B*, A*b*, a*B* and a*b*.
A*: the dominant phenotype from (Aa, AA, aA).
a*: the recessive phenotype from aa.
B*: the dominant phenotype from (Bb, BB, bB).
b*: the recessive phenotype from bb.
A*B*: 9 gametic combinations.
A*b*: 3 gametic combinations.
a*B*: 3 gametic combinations.
a*b*: 1 gametic combination.
Total: 16 combinations.
7
Example 1 in Genetics (6)
Let (1 r )(1 r ') , then
2
P( A * B*)
4
1
P( A * b*) P(a * B*)
4
P(a * b*)
4
8
Example 1 in Genetics (7)
Hence, the random sample of n from the
offspring of selfed heterozygotes will follow
a multinomial distribution:
2 1 1
Multinomial n;
,
,
,
4
4
4 4
We know that
(1 r )(1 r '), 0 r 1/ 2, and 0 r ' 1/ 2
So
1/ 4 1
9
Bayesian for Example 1 in Genetics (1)
To simplify computation, we let
P( A * B*) 1 , P( A * b*) 2
P(a * B*) 3 , P(a * b*) 4
The random sample of n from the offspring
of selfed heterozygotes will follow a
multinomial distribution:
Multinomial n;1,2 ,3 ,4
10
Bayesian for Example 1 in Genetics (2)
If we assume a Dirichlet prior distribution
with parameters: D(1,2 ,3 ,4 ) to estimate
probabilities for A*B*, A*b*, a*B* and
a*b*.
Recall that
A*B*: 9 gametic combinations.
A*b*: 3 gametic combinations.
a*B*: 3 gametic combinations.
a*b*: 1 gametic combination.
We consider (1 , 2 , 3 , 4 ) (9,3,3,1).
11
Bayesian for Example 1 in Genetics (3)
Suppose that we observe the data of
y y1, y2 , y3 , y4 125,18,20,24 .
So the posterior distribution is also
Dirichlet with parameters D 134, 21, 23, 25
The Bayesian estimation for probabilities
are (1,2 ,3 ,4 ) 0.660,0.103,0.113,0.123
12
Bayesian for Example 1 in Genetics (4)
Consider the original model,
2
1
P( A * B*)
, P( A * b*) P(a * B*)
, P( a * b*) .
4
4
4
The random sample of n also follow a
multinomial distribution:
2 1 1
( y1 , y2 , y3 , y4 ) ~ Multinomial n;
,
,
, .
4
4
4 4
We will assume a Beta prior distribution:
Beta(1 , 2 ).
13
Bayesian for Example 1 in Genetics (5)
The posterior distribution becomes
P( y1 , y2 , y3 , y4 | ) P( )
P( | y1 , y2 , y3 , y4 )
.
P( y1 , y2 , y3 , y4 | ) P( )d
The integration in the above denominator,
P( y1 , y2 , y3 , y4 | ) P( )d
does not have a close form.
14
Bayesian for Example 1 in Genetics (6)
How to solve this problem?
Monte Carlo Markov Chains (MCMC)
Method!
What value is appropriate for 1 , 2 ?
15
Part 4
Monte Carlo Methods
16
Monte Carlo Methods (1)
Consider the game of
solitaire: what’s the
chance of winning with a
properly shuffled deck?
http://en.wikipedia.org/
wiki/Monte_Carlo_meth
od
http://nlp.stanford.edu/l
ocal/talks/mcmc_2004_
07_01.ppt
?
Lose
Lose
Win
Lose
Chance of winning is 1 in 4!
17
Monte Carlo Methods (2)
Hard to compute analytically because
winning or losing depends on a complex
procedure of reorganizing cards.
Insight: why not just play a few hands, and
see empirically how many do in fact win?
More generally, can approximate a
probability density function using only
samples from that density.
18
Monte Carlo Methods (3)
Given a very large set X and a distribution
f x over it.
We draw a set of N i.i.d. random samples.
We can then approximate the distribution
using these samples.
f(x)
X
1 N
f N x 1 xi x f x
N
N i 1
19
Monte Carlo Methods (4)
We can also use these samples to compute
expectations:
1 N
i
EN g g x
E g g x f x
N
N i 1
x
And even use them to find a maximum:
xˆ arg max f x i
i
x
20
Monte Carlo Example
, X n be i.i.d. N 0,1 , find E( X i 4 ) ?
X1 ,
Solution:
E X i 4 = x4
-
x2
1
4!
24
exp dx
3
4/2 4
2
2
2 ( )! 8
2
Use Monte Carlo method to approximation
> x <- rnorm(100000) # 100000 samples from N(0,1)
> x <- x^4
> mean(x)
[1] 3.034175
21
Part 5
Monte Carlo Method vs.
Numerical Integration
22
Numerical Integration (1)
Theorem (Riemann Integral):
If f is continuous, integrable, then the
Riemann Sum:
n
Sn f (ci )( xi 1 xi ) , where ci xi , xi 1
i 1
n
Sn
max | xi1 xi |
i 1,..,n
b
I f ( x)dx is a constant
a
http://en.wikipedia.org/wiki/Numerical_inte
gration
http://en.wikipedia.org/wiki/Riemann_integ
ral
23
Numerical Integration (2)
Trapezoidal Rule:
ba
xi a
i equally spaced intervals
n
ba
xi 1 xi h
n
n f x f x
i
i 1
ST
h
2
i 1
1
h f ( x1 ) f ( x2 )
2
1
f ( xn ) f ( xn 1 )
2
http://en.wikipedia.org/wiki/Trapezoidal_rul
e
24
Numerical Integration (3)
An Example of Trapezoidal Rule by R:
f <- function(x) {((x-4.5)^3+5.6)/1234}
x <- seq(-100, 100, 1)
plot(x, f(x), ylab = "f(x)", type = 'l', lwd = 2)
temp <- matrix(0, nrow = 6, ncol = 4)
temp[, 1] <- seq(-100, 100, 40);temp[, 2] <- rep(f(-100), 6)
temp[, 3] <- temp[, 1]; temp[, 4] <- f(temp[, 1])
segments(temp[, 1], temp[, 2], temp[, 3], temp[, 4], col =
"red")
temp <- matrix(0, nrow = 5, ncol = 4)
temp[, 1] <- seq(-100, 80, 40); temp[, 2] <- f(temp[, 1])
temp[, 3] <- seq(-60, 100, 40); temp[, 4] <- f(temp[, 3])
segments(temp[, 1], temp[, 2], temp[, 3], temp[, 4], col =
"red")
25
Numerical Integration (4)
26
Numerical Integration (5)
Simpson’s Rule:
b
Specifically, a
f x dx
ba
a b
f
a
4
f
f
b
6
2
One derivation replaces the integrand f x
by the quadratic polynomial P x which
takes the same values as f x at the end
ab
m
points a and b and the midpoint
2
27
Numerical Integration (6)
x m x b
x a x b
x a x m
P x f a
f m
f b
a m a b
m a m b
b a b m
b
a
ba
a b
P x
f a 4 f
f b
6
2
http://en.wikipedia.org/wiki/Simpson_rule
28
Numerical Integration (7)
Example of Simpson’s Rule by R:
f <- function(x) {((x-4.5)^3+100*x^2+5.6)/1234}
x <- seq(-100, 100, 1)
p <- function(x) {
f(-100)*(x-0)*(x-100)/(-100-0)/(-100100)+f(0)*(x+100)*(x-100)/(0+100)/(0100)+f(100)*(x-0)*(x+100)/(100-0)/(100+100)
}
matplot(x, cbind(f(x), p(x)), ylab = "", type = 'l', lwd = 2, lty
= 1)
temp <- matrix(0, nrow = 3, ncol = 4)
temp[, 1] <- seq(-100, 100, 100); temp[, 2] <- rep(p(-60),
3)
temp[, 3] <- temp[, 1]; temp[, 4] <- f(temp[, 1])
segments(temp[, 1], temp[, 2], temp[, 3], temp[, 4], col =29
"red")
Numerical Integration (8)
30
Numerical Integration (9)
Two Problems in numerical integration:
1.
2.
b, a , , i.e. f x dx
How to use numerical integration?
1
Logistic transform: y
1 ex
Two or more high dimension integration?
Monte Carlo Method!
http://en.wikipedia.org/wiki/Numerical_inte
gration
31
Monte Carlo Integration
I f x dx
f x
g x dx ,
g x
where g x is a probability distribution
f x
function and let h x
g x
iid
If x1 ,..., xn ~ g x , then
LLN
1 n
h xi E h x h x g x dx f x dx I
n
n i 1
http://en.wikipedia.org/wiki/Monte_Carlo_i
ntegration
32
Example (1)
18
7 11 x 1 x
1
10
10
0
dx ?
2
0.0350877 )
(It is
57
Numerical Integration by Trapezoidal Rule:
33
Example (2)
Monte Carlo Integration:
1
18
1 x dx x
1
7 11
x71 1 x
7 11
7 11
Let x ~ Beta 7,11 , then
1
111
4 7 11
7 1
4
x
x
1
x
dx
E
x
0 7 11
1
18
1 n 4 LLN
10
4
10
xi E x
x 1 x dx
0
n i 1 n
7 11
0
10
x
10
0
4
111
34
dx
Example by C/C++ and R
> x <- rbeta(10000, 7, 11)
> x <- x^4
> mean(x)
[1] 0.03547369
35
Part 6
Markov Chains
36
Markov Chains (1)
A Markov chain is a mathematical model
for stochastic systems whose states,
discrete or continuous, are governed by
transition probability.
Suppose the random variable X 0 , X1 ,
take
state space (Ω) that is a countable set of
value. A Markov chain is a process that
corresponds to the network.
X0
X1
... X
t 1
Xt
X t 1
...
37
Markov Chains (2)
The current state in Markov chain only
depends on the most recent previous
states.
P X t 1 j | X t i, X t 1 it 1 ,
, X 0 i0
P X t 1 j | X t i Transition probability
where i0 , , i, j
http://en.wikipedia.org/wiki/Markov_chain
http://civs.stat.ucla.edu/MCMC/MCMC_tuto
rial/Lect1_MCMC_Intro.pdf
38
An Example of Markov Chains
1, 2,3, 4,5
X X 0 , X1,
, Xt ,
where X 0 is initial state and so on.
P is transition matrix.
1
1 0.4
2 0.5
P 3 0.0
4 0.0
5 0.0
2
3
4
5
0.6 0.0 0.0 0.0
0.0 0.5 0.0 0.0
0.3 0.0 0.7 0.0
0.0 0.1 0.3 0.6
0.3 0.0 0.5 0.2
39
Definition (1)
Define the probability of going from state i
to state j in n time steps as
( n)
pij P X t n j | X t i
A state j is accessible from state i if there
( n)
p
are n time steps such that ij 0 , where
n 0,1,
A state i is said to communicate with state
j (denote: i j ), if it is true that both i is
accessible from j and that j is accessible
from i.
40
Definition (2)
A state i has period d i if any return to
state i must occur in multiples of d i time
steps.
Formally, the period of a state is defined as
d i gcd n : Pii( n ) 0
If d i 1, then the state is said to be
aperiodic; otherwise (d i 1), the state is
said to be periodic with period d i .
41
Definition (3)
A set of states C is a communicating
class if every pair of states in C
communicates with each other.
Every state in a communicating class must
have the same period
Example:
42
Definition (4)
A finite Markov chain is said to be
irreducible if its state space (Ω) is a
communicating class; this means that, in
an irreducible Markov chain, it is possible to
get to any state from any state.
Example:
43
Definition (5)
A finite state irreducible Markov chain is
said to be ergodic if its states are aperiodic
Example:
44
Definition (6)
A state i is said to be transient if, given
that we start in state i, there is a non-zero
probability that we will never return back to
i.
Formally, let the random variable Ti be the
next return time to state i (the “hitting
time”): Ti min n : X n i | X 0 i
Then, state i is transient iff there exists a
finite Ti such that: P Ti 1
45
Definition (7)
A state i is said to be recurrent or
persistent iff there exists a finite Ti such
that: P Ti 1.
The mean recurrent time i E Ti .
State i is positive recurrent if i is finite;
otherwise, state i is null recurrent.
A state i is said to be ergodic if it is
aperiodic and positive recurrent. If all
states in a Markov chain are ergodic, then
the chain is said to be ergodic.
46
Stationary Distributions
Theorem: If a Markov Chain is irreducible
and aperiodic, then
(n)
ij
P
1
j
as n , i, j
Theorem: If a Markov chain is irreducible
P Xn j
and aperiodic, then ! j lim
n
and j i Pij , j 1, j
i
i
where is stationary distribution.
47
Definition (8)
A Markov chain is said to be reversible, if
there is a stationary distribution such
that i Pij j Pji i, j
Theorem: if a Markov chain is reversible,
then j i Pij
i
48
An Example of Stationary Distributions
A Markov chain:
0.4
0.3
0.3
0.7
1
2
0.3
0.3
0.7
3
0.7 0.3 0.0
P 0.3 0.4 0.3
0.0 0.3 0.7
1
The stationary distribution is
3
0.7 0.3 0.0
1 1 1
1 1 1
3 3 3 0.3 0.4 0.3 3 3 3
0.0 0.3 0.7
1
3
1
3
49
Properties of Stationary Distributions
Regardless of the starting point, the
process of irreducible and aperiodic Markov
chains will converge to a stationary
distribution.
The rate of converge depends on properties
of the transition probability.
50
Part 7
Monte Carlo Markov Chains
51
Applications of MCMC
Simulation:
n x 1
n x 1
1 y
Ex: x, y ~ f x, y c y
where x 0,1, 2,
x
, n, 0 y 1, , are known.
Integration: computing in high dimensions.
1
Ex: E g y 0 g y f y dy
Bayesian Inference:
Ex: Posterior distributions, posterior
means…
52
Monte Carlo Markov Chains
MCMC method are a class of algorithms for
sampling from probability distributions
based on constructing a Markov chain that
has the desired distribution as its stationary
distribution.
The state of the chain after a large number
of steps is then used as a sample from the
desired distribution.
http://en.wikipedia.org/wiki/MCMC
53
Inversion Method vs. MCMC (1)
Inverse transform sampling, also known
as the probability integral transform, is a
method of sampling a number at random
from any probability distribution given its
cumulative distribution function (cdf).
http://en.wikipedia.org/wiki/Inverse_transf
orm_sampling_method
54
Inversion Method vs. MCMC (2)
A random variable with a cdf F , then F has
a uniform distribution on [0, 1].
The inverse transform sampling method
works as follows:
1.
2.
3.
Generate a random number from the standard
uniform distribution; call this u .
Compute the value x such that F x u ; call
this xchosen .
Take xchosen to be the random number drawn
from the distribution described by F .
55
Inversion Method vs. MCMC (3)
For one dimension random variable,
Inversion method is good, but for two or
more high dimension random variables,
Inverse Method maybe not.
For two or more high dimension random
variables, the marginal distributions for
those random variables respectively
sometime be calculated difficult with more
time.
56
Gibbs Sampling
One kind of the MCMC methods.
The point of Gibbs sampling is that given a
multivariate distribution it is simpler to
sample from a conditional distribution
rather than integrating over a joint
distribution.
George Casella and Edward I. George.
"Explaining the Gibbs sampler". The
American Statistician, 46:167-174, 1992.
(Basic summary and many references.)
http://en.wikipedia.org/wiki/Gibbs_samplin
g
57
Example 1 (1)
To sample x from:
n x 1
n x 1
x, y ~ f x, y c y 1 y
x
where x 0,1, 2, , n, 0 y 1, n, , are known
c is a constant
One can see that
f x, y n x
n x
f x | y
y 1 y ~ Binomial n, y
f y
x
f y | x
f x, y
f x
y
x 1
1 y
n x 1
~ Beta x , n x
58
Example 1 (2)
Gibbs sampling Algorithm:
Initial Setting:
For t 0,
Return xn , yn
y0 ~ Uniform 0,1 or a arbitrary value 0,1
x0 ~ Bin n, y0
, n , sample a value xt 1 , yt 1 from
yt 1 ~ Beta xt , n xt
xt 1 ~ Bin n, yt 1
59
Example 1 (3)
x, yt y.
Under regular conditions: xt t
t
How many steps are needed for
convergence?
Within an acceptable error, such as
i 10
x
t
i 20
x
t i 11
10
10
t i 1
t
0.001, i .
n is large enough, such as n 1000 .
60
Example 1 (4)
Inversion Method:
x is Beta-Binomial distribution.
x ~ f x f x, y dy
1
0
n x n x
n
x
The cdf of x is F x that has a uniform
distribution on [0, 1].
n i n i
F x
n
i 0 i
x
61
Gibbs sampling by R (1)
N = 1000; num = 16; alpha = 5; beta = 7
tempy <- runif(1); tempx <- rbeta(1, alpha, beta)
j = 0; Forward = 1; Afterward = 0
while((abs(Forward-Afterward) > 0.0001) && (j <= 1000)){
Forward = Afterward; Afterward = 0
for(i in 1:N){
tempy <- rbeta(1, tempx+alpha, num-tempx+beta)
tempx <- rbinom(1, num, tempy)
Afterward = Afterward+tempx
}
Afterward = Afterward/N; j = j+1
}
sample <- matrix(0, nrow = N, ncol = 2)
for(i in 1:N){
tempy <- rbeta(1, tempx+alpha, num-tempx+beta)
tempx <- rbinom(1, num, tempy)
62
Gibbs sampling by R (2)
sample[i, 1] = tempx; sample[i, 2] = tempy
Inverse method
}
sample_Inverse <- rbetabin(N, num, alpha, beta)
write(t(sample), "Sample for Ex1 by R.txt", ncol = 2)
Xhist <- cbind(hist(sample[, 1], nclass = num)$count,
hist(sample_Inverse, nclass = num)$count)
write(t(Xhist), "Histogram for Ex1 by R.txt", ncol = 2)
prob <- matrix(0, nrow = num+1, ncol = 2)
for(i in 0:num){
if(i == 0){
prob[i+1, 2] = mean(pbinom(i, num, sample[, 2]))
prob[i+1, 1] = gamma(alpha+beta)*gamma(num+beta)
prob[i+1, 1] = prob[i+1,
1]/(gamma(beta)*gamma(num+beta+alpha))
}
63
else{
Gibbs sampling by R (3)
if(i == 1){
prob[i+1, 1] = num*alpha/(num-1+alpha+beta)
for(j in 0:(num-2))
prob[i+1, 1] = prob[i+1,
1]*(beta+j)/(alpha+beta+j)
}
else
prob[i+1, 1] = prob[i+1, 1]*(num-i+1)/(i)*(i1+alpha)/(num-i+beta)
prob[i+1, 2] = mean((pbinom(i, num, sample[, 2])pbinom(i-1, num, sample[, 2])))
}
if(i != num)
prob[i+2, 1] = prob[i+1, 1]
}
write(t(prob), "ProbHistogram for Ex1 by R.txt", ncol = 2)
64
Inversion Method by R (1)
rbetabin <- function(N, size, alpha, beta){
Usample <- runif(N)
Pr_0 =
gamma(alpha+beta)*gamma(size+beta)/gamma(beta)/gam
ma(size+beta+alpha)
Pr = size*alpha/(size-1+alpha+beta)
for(i in 0:(size-2))
Pr = Pr*(beta+i)/(alpha+beta+i)
Pr_Initial = Pr
sample <- array(0,N)
CDF <- array(0, (size+1))
CDF[1] <- Pr_0
65
Inversion Method by R (2)
}
for(i in 1:size){
CDF[i+1] = CDF[i]+Pr
Pr = Pr*(size-i)/(i+1)*(i+alpha)/(size-i-1+beta)
}
for(i in 1:N){
sample[i] = which.min(abs(Usample[i]-CDF))-1
}
return(sample)
66
Gibbs sampling by C/C++ (1)
67
Gibbs sampling by C/C++ (2)
Inverse method
68
Gibbs sampling by C/C++ (3)
69
Gibbs sampling by C/C++ (4)
70
Inversion Method by C/C++ (1)
71
Inversion Method by C/C++ (2)
72
Plot Histograms by Maple (1)
Figure 1:1000 samples with n=16, α=5 and
β=7.
Blue-Inversion method
Red-Gibbs sampling
73
Plot Histograms by Maple (2)
74
Probability Histograms by Maple (1)
Figure 2:
Blue histogram and yellow line are pmf of x.
m
1
Red histogram is Pˆ X x P X x | Yi yi from
m i 1
Gibbs sampling.
75
Probability Histograms by Maple (2)
The probability histogram of blue histogram
of Figure 1 would be similar to the bule
probability histogram of Figue 2, when the
sample size .
The probability histogram of red histogram
of Figure 1 would be similar to the red
probability histogram of Figue 2, when the
iteration n .
76
Probability Histograms by Maple (3)
77
Exercises
Write your own programs similar to those
examples presented in this talk, including
Example 1 in Genetics and other examples.
Write programs for those examples
mentioned at the reference web pages.
Write programs for the other examples that
you know.
78