Transcript Lecture 3
Random walk
P( X n1 i 1 | X n i) p
P( X n1 i 1 | X n i) 1 p
10
8
6
4
2
0
-2
-4
0
10
20
30
40
50
60
70
80
Computational statistics 2009
90
100
Random walk with absorbing barrier
P( X n1 i 1 | X n i) p, i 1
P( X n1 0 | X n 0) 1
P( X n1 i 1 | X n i) 1 p, i 1
14
12
10
8
6
4
2
0
0
10
20
30
40
50
60
70
80
Computational statistics 2009
90
100
Discrete-time Markov chains
Let X0, X1, X2, … be a sequence of integer-valued random variables
Then {Xn}n is said to be a Markov chain if
P( X n1 j | X n i, X n1 in1 , ... , X 0 i 0 ) P( X n1 j | X n i)
for all i, j, i0, …, in-1, n
Transition probabilities depend on the past history of the chain only
through the current value
Computational statistics 2009
Discrete-time Markov chains
- examples
Let X n be the cumulated number of heads in n tosses of a coin
1, if it is raining day n
Let Yn
0, otherwise
Are { X n }n and {Yn }n Markov chains?
Computational statistics 2009
Discrete-time Markov chain
- transition matrix
Let X n be the cumulated number of heads in n tosses of a coin
.
.
0.5 0.5 0
.
0 0.5 0.5 0
.
0 0.5 0.5 0
.
0 0.5 0.5
.
.
.
.
.
.
Computational statistics 2009
.
.
.
.
.
.
.
.
.
Discrete-time Markov chain
- transition matrix
1, if it is raining day n
Let Yn
0, otherwise
Transition matrix?
Computational statistics 2009
Ehrenfest’s diffusion model
A
B
Suppose that M molecules are distributed among the compartments A and B.
Change at each time point the distribution among A and B by selecting one molecule
at random and placing it in a randomly selected compartment.
Let Xn be the number of molecules in compartment A at time n.
Do the limiting probabilities lim n pij exist?
n
Computational statistics 2009
Stationary distribution of a Markov chain
For an irreducible, aperiodic chain with states (x1, …, xS) and transition probabilities pij
= p(xi, xj) there is a unique probability distribution with mass probabilities j = (xj)
satisfying
j i pij
i
j
1
j
This distribution is known as the stationary distribution of the Markov chain
If the initial distribution (0) of a chain is equal to its stationary distribution, the
marginal distribution (n) of the state at time n is again given by the stationary
distribution
Computational statistics 2009
Limiting probabilities for irreducible
aperiodic Markov chains
For an aperiodic irreducible Markov chain with stationary distribution it can be
shown that
lim n pijn j ,
j0
regardless of the initial distribution (0) .
The limiting probability that the process will be in state xj at time n, equals the longrun proportion of time that the process will be in state xj.
A way to generate values from a distribution f is to construct a Markov chain with f as
its stationary distribution, and to run the chain from an arbitrary starting value until
the distribution (n) converges to f
Computational statistics 2009
Ehrenfest’s diffusion model
– stationary distribution
A
B
Suppose that M molecules are distributed among the compartments A and B.
At each time point the distribution among A and B is changed by selecting
one molecule at random and placing it in a randomly selected compartment.
Let Xn be the number of molecules in compartment A at time n. The stationary
distribution is then given by
.
j j 1
M j 1
1
j 1
j j 1
, 1 j M 1
2M
2
2M
and the equations relating 1 to 0 and M to M-1.
Computational statistics 2009
A simple proposal-rejection method
for random number generation
Target distribution:
( x)
x
x!
e , x 0, 1, ...
Proposal chain: simple random walk
Given xt, next propose y = xt + 1 or xt – 1, each with probability 0.5
Compute the “goodness ratio”
r
x ! /( xt 1), if y xt 1
( y)
y x t
( xt )
y! xt / , if y xt 1
t
Acceptance/rejection:
Let U uniform (0,1)
Accept if U < r ; Reject otherwise.
Computational statistics 2009
Random number generation
- an example of a Markov chain proposal-rejection method
Target distribution:
10 x 10
( x)
e , x 0, 1, ...
x!
14
12
10
x(t)
8
6
4
2
0
0
10
20
30
40
50
Computational statistics 2009
60
Random number generation
- an example of a Markov chain proposal-rejection method
Target distribution:
10 x 10
( x)
e , x 0, 1, ...
x!
20
140
18
120
16
100
Frequency
14
10
8
6
80
60
40
4
20
2
0
0
Computational statistics 2009
30
27
24
21
18
1000
15
800
12
600
9
400
6
200
3
0
0
x(t)
12
Markov Chain Monte Carlo (MCMC) methods
Algorithms for sampling from probability distributions based on
constructing a Markov chain that has the desired distribution as
its equilibrium distribution
The most common application of these algorithms is numerically
calculating multi-dimensional integrals, such as
E (h( X 1 , ..., X n )) h( x1 , ..., xk ) f ( X1 , ..., X k ) ( x1 , ..., xk )dx1 , ..., dxk
Rk
1
N
N0 N
h( X
n N 0 1
(n)
1
, ..., X k( n ) )
Computational statistics 2009
Random walk MCMC methods
Metropolis-Hastings algorithm: Generates a random walk
using a proposal density and a method for rejecting proposed
moves
Gibbs sampling: Requires that all the conditional distributions
of the target distribution can be sampled exactly.
Slice sampling: Alternates uniform sampling in the vertical
direction with uniform sampling from the horizontal `slice'
defined by the current vertical position.
.
Computational statistics 2009
Random number generation
- the Metropolis-Hastings algorithm
Start with any initial value x0 and a proposal chain T(x, y)
Suppose xt has been drawn at time t
Draw y T(xt, y) (i.e. propose a move for the next step)
Compute the Metropolis ratio (or “goodness ratio”)
r
( y )T ( y, xt )
( xt )T ( xt , y )
Acceptance/rejection:
.
y, with probabilit y p min( 1, r )
xt 1
xt , with probabilit y 1 p
Computational statistics 2009
The Gibb’s sampler for bivariate distributions
- a simple example
Let's look at simulating observations of a bivariate normal vector (X, Y) with zero mean and unit
variance for the marginals, and a correlation of between the two components.
[X | Y = y] N( y, 1- 2)
[Y | X = x] N( x, 1- 2)
Let’s start from, say, x = 10, y = 5. Draw new x’s from [X | Y = y] and new y’s from [Y | X = x]
6
5
4
3
y
2
1
0
100 simulated values
-1
-2
-3
-4
-2
0
2
4
6
8
10
x
Computational statistics 2009
12
Gibbs sampler
The Gibbs sampler is a way to generate empirical distributions of a
random vector (X, Y) when the conditional probability distributions F(X | Y)
and G(Y | X) are known
Start with a random set of possible X's, draw Y's from G(), then use those
Y's to draw X's, and so on indefinitely.
Keep track of the X's and Y's seen, and this will give samples enough to
find the unconditional distribution of (X, Y).
Computational statistics 2009