Transcript Lecture 3

Random walk
P( X n1  i  1 | X n  i)  p
P( X n1  i  1 | X n  i) 1  p
10
8
6
4
2
0
-2
-4
0
10
20
30
40
50
60
70
80
Computational statistics 2009
90
100
Random walk with absorbing barrier
P( X n1  i  1 | X n  i)  p, i  1
P( X n1  0 | X n  0) 1
P( X n1  i  1 | X n  i) 1  p, i  1
14
12
10
8
6
4
2
0
0
10
20
30
40
50
60
70
80
Computational statistics 2009
90
100
Discrete-time Markov chains
 Let X0, X1, X2, … be a sequence of integer-valued random variables
 Then {Xn}n is said to be a Markov chain if
P( X n1  j | X n  i, X n1  in1 , ... , X 0 i 0 )  P( X n1  j | X n  i)
for all i, j, i0, …, in-1, n
 Transition probabilities depend on the past history of the chain only
through the current value
Computational statistics 2009
Discrete-time Markov chains
- examples
Let X n be the cumulated number of heads in n tosses of a coin
1, if it is raining day n
Let Yn  
0, otherwise
Are { X n }n and {Yn }n Markov chains?
Computational statistics 2009
Discrete-time Markov chain
- transition matrix
Let X n be the cumulated number of heads in n tosses of a coin
.
.
 0.5 0.5 0

.
 0 0.5 0.5 0
 .
0 0.5 0.5 0

.
0 0.5 0.5
 .
 .
.
.
.
.

Computational statistics 2009
.
.
.
.
.
.
.
.
.








Discrete-time Markov chain
- transition matrix
1, if it is raining day n
Let Yn  
0, otherwise
Transition matrix?
Computational statistics 2009
Ehrenfest’s diffusion model
A
B

Suppose that M molecules are distributed among the compartments A and B.

Change at each time point the distribution among A and B by selecting one molecule
at random and placing it in a randomly selected compartment.

Let Xn be the number of molecules in compartment A at time n.

Do the limiting probabilities lim n pij exist?
n
Computational statistics 2009
Stationary distribution of a Markov chain

For an irreducible, aperiodic chain with states (x1, …, xS) and transition probabilities pij
= p(xi, xj) there is a unique probability distribution with mass probabilities j = (xj)
satisfying
 j    i pij
i

j
1
j

This distribution is known as the stationary distribution of the Markov chain

If the initial distribution (0) of a chain is equal to its stationary distribution, the
marginal distribution (n) of the state at time n is again given by the stationary
distribution
Computational statistics 2009
Limiting probabilities for irreducible
aperiodic Markov chains

For an aperiodic irreducible Markov chain with stationary distribution  it can be
shown that
lim n pijn   j ,
j0
regardless of the initial distribution (0) .

The limiting probability that the process will be in state xj at time n, equals the longrun proportion of time that the process will be in state xj.

A way to generate values from a distribution f is to construct a Markov chain with f as
its stationary distribution, and to run the chain from an arbitrary starting value until
the distribution (n) converges to f
Computational statistics 2009
Ehrenfest’s diffusion model
– stationary distribution
A
B
 Suppose that M molecules are distributed among the compartments A and B.
 At each time point the distribution among A and B is changed by selecting
one molecule at random and placing it in a randomly selected compartment.
 Let Xn be the number of molecules in compartment A at time n. The stationary
distribution is then given by
.
 j   j 1
M  j 1
1
j 1
  j   j 1
, 1  j  M 1
2M
2
2M
and the equations relating 1 to 0 and M to M-1.
Computational statistics 2009
A simple proposal-rejection method
for random number generation
 Target distribution:
 ( x) 
x
x!
e  , x  0, 1, ...
 Proposal chain: simple random walk
 Given xt, next propose y = xt + 1 or xt – 1, each with probability 0.5
 Compute the “goodness ratio”
r
x !  /( xt  1), if y  xt  1
 ( y)
 y  x t  
 ( xt )
y!  xt /  , if y  xt  1
t
 Acceptance/rejection:
Let U  uniform (0,1)
Accept if U < r ; Reject otherwise.
Computational statistics 2009
Random number generation
- an example of a Markov chain proposal-rejection method
 Target distribution:
10 x 10
 ( x) 
e , x  0, 1, ...
x!
14
12
10
x(t)
8
6
4
2
0
0
10
20
30
40
50
Computational statistics 2009
60
Random number generation
- an example of a Markov chain proposal-rejection method
 Target distribution:
10 x 10
 ( x) 
e , x  0, 1, ...
x!
20
140
18
120
16
100
Frequency
14
10
8
6
80
60
40
4
20
2
0
0
Computational statistics 2009
30
27
24
21
18
1000
15
800
12
600
9
400
6
200
3
0
0
x(t)
12
Markov Chain Monte Carlo (MCMC) methods
Algorithms for sampling from probability distributions based on
constructing a Markov chain that has the desired distribution as
its equilibrium distribution
The most common application of these algorithms is numerically
calculating multi-dimensional integrals, such as
E (h( X 1 , ..., X n ))   h( x1 , ..., xk ) f ( X1 , ..., X k ) ( x1 , ..., xk )dx1 , ..., dxk
Rk
1

N
N0  N
 h( X
n  N 0 1
(n)
1
, ..., X k( n ) )
Computational statistics 2009
Random walk MCMC methods
 Metropolis-Hastings algorithm: Generates a random walk
using a proposal density and a method for rejecting proposed
moves
 Gibbs sampling: Requires that all the conditional distributions
of the target distribution can be sampled exactly.
 Slice sampling: Alternates uniform sampling in the vertical
direction with uniform sampling from the horizontal `slice'
defined by the current vertical position.
.
Computational statistics 2009
Random number generation
- the Metropolis-Hastings algorithm

Start with any initial value x0 and a proposal chain T(x, y)

Suppose xt has been drawn at time t

Draw y  T(xt, y) (i.e. propose a move for the next step)

Compute the Metropolis ratio (or “goodness ratio”)
r

 ( y )T ( y, xt )
 ( xt )T ( xt , y )
Acceptance/rejection:
.
 y, with probabilit y p  min( 1, r )
xt 1  
 xt , with probabilit y 1  p
Computational statistics 2009
The Gibb’s sampler for bivariate distributions
- a simple example

Let's look at simulating observations of a bivariate normal vector (X, Y) with zero mean and unit
variance for the marginals, and a correlation of  between the two components.

[X | Y = y]  N( y, 1-  2)
[Y | X = x]  N( x, 1-  2)

Let’s start from, say, x = 10, y = 5. Draw new x’s from [X | Y = y] and new y’s from [Y | X = x]
6
5
4
3
y
2
1

0
100 simulated values
-1
-2
-3
-4
-2
0
2
4
6
8
10
x
Computational statistics 2009
12
Gibbs sampler
 The Gibbs sampler is a way to generate empirical distributions of a
random vector (X, Y) when the conditional probability distributions F(X | Y)
and G(Y | X) are known
 Start with a random set of possible X's, draw Y's from G(), then use those
Y's to draw X's, and so on indefinitely.
 Keep track of the X's and Y's seen, and this will give samples enough to
find the unconditional distribution of (X, Y).
Computational statistics 2009