Markov chain and MH

Transcript Markov chain and MH

Markov Chains
1
Markov Chains (1)
A Markov chain is a mathematical model
for stochastic systems whose states,
discrete or continuous, are governed by
transition probability.
 Suppose the random variable X 0 , X1 ,
take
state space (Ω) that is a countable set of
value. A Markov chain is a process that
corresponds to the network.

X0
X1
... X
t 1
Xt
X t 1
...
2
Markov Chains (2)

The current state in Markov chain only
depends on the most recent previous
states.
P  X t 1  j | X t  i, X t 1  it 1 ,
, X 0  i0 
 P  X t 1  j | X t  i   Transition probability
where i0 , , i, j  

http://en.wikipedia.org/wiki/Markov_chain
http://civs.stat.ucla.edu/MCMC/MCMC_tuto
rial/Lect1_MCMC_Intro.pdf
3
An Example of Markov Chains

  1, 2,3, 4,5 
X   X 0 , X1,
, Xt ,

where X 0 is initial state and so on.
P is transition matrix.
1
1 0.4
2  0.5
P  3 0.0

4 0.0
5 0.0
2
3
4
5
0.6 0.0 0.0 0.0 
0.0 0.5 0.0 0.0 
0.3 0.0 0.7 0.0 

0.0 0.1 0.3 0.6 
0.3 0.0 0.5 0.2 
4
Definition (1)

Define the probability of going from state i
to state j in n time steps as
( n)
pij  P  X t n  j | X t  i 

A state j is accessible from state i if there
( n)
p
are n time steps such that ij  0 , where
n  0,1,

A state i is said to communicate with state
j (denote: i  j ), if it is true that both i is
accessible from j and that j is accessible
from i.
5
Definition (2)
A state i has period d  i  if any return to
state i must occur in multiples of d  i  time
steps.
 Formally, the period of a state is defined as



d  i   gcd n : Pii( n )  0

If d i   1, then the state is said to be
aperiodic; otherwise (d i   1), the state is
said to be periodic with period d  i  .
6
Definition (3)
A set of states C is a communicating
class if every pair of states in C
communicates with each other.
 Every state in a communicating class must
have the same period
 Example:

7
Definition (4)
A finite Markov chain is said to be
irreducible if its state space (Ω) is a
communicating class; this means that, in
an irreducible Markov chain, it is possible to
get to any state from any state.
 Example:

8
Definition (5)

A finite state irreducible Markov chain is
said to be ergodic if its states are aperiodic

Example:
9
Definition (6)
A state i is said to be transient if, given
that we start in state i, there is a non-zero
probability that we will never return back to
i.
 Formally, let the random variable Ti be the
next return time to state i (the “hitting
time”): Ti  min n : X n  i | X 0  i
 Then, state i is transient iff there exists a
finite Ti such that: P Ti    1

10
Definition (7)
A state i is said to be recurrent or
persistent iff there exists a finite Ti such
that: P Ti    1.
 The mean recurrent time i  E Ti .
 State i is positive recurrent if i is finite;
otherwise, state i is null recurrent.
 A state i is said to be ergodic if it is
aperiodic and positive recurrent. If all
states in a Markov chain are ergodic, then
the chain is said to be ergodic.

11
Stationary Distributions

Theorem: If a Markov Chain is irreducible
and aperiodic, then
(n)
ij
P


1
j
as n  , i, j 
Theorem: If a Markov chain is irreducible
P Xn  j
and aperiodic, then !  j  lim
n 
and  j   i Pij ,  j  1, j 
i
i
where  is stationary distribution.
12
Definition (8)

A Markov chain is said to be reversible, if
there is a stationary distribution  such
that  i Pij   j Pji i, j 

Theorem: if a Markov chain is reversible,
then  j    i Pij
i
13
An Example of Stationary Distributions

A Markov chain:
0.4
0.3
0.3
0.7
1

2
0.3
0.3
0.7
3
 0.7 0.3 0.0 
P   0.3 0.4 0.3 
 0.0 0.3 0.7 
1
The stationary distribution is   
3
0.7 0.3 0.0 
1 1 1  
1 1 1 

 3 3 3   0.3 0.4 0.3    3 3 3 
 0.0 0.3 0.7 
1
3
1
3 
14
Properties of Stationary Distributions

Regardless of the starting point, the
process of irreducible and aperiodic Markov
chains will converge to a stationary
distribution.

The rate of converge depends on properties
of the transition probability.
15
Monte Carlo Markov Chains
16
Monte Carlo Markov Chains
MCMC method are a class of algorithms for
sampling from probability distributions
based on constructing a Markov chain that
has the desired distribution as its stationary
distribution.
 The state of the chain after a large number
of steps is then used as a sample from the
desired distribution.
 http://en.wikipedia.org/wiki/MCMC

17
Metropolis-Hastings Algorithm
18
Metropolis-Hastings Algorithm (1)
The Metropolis-Hastings algorithm can draw
samples from any probability distribution
, requiring only that a function
proportional
to the density can be
Π
 x
calculated at .
 Process in three
x steps:





Set up a Markov chain;
Run the chain until stationary;
Estimate with Monte Carlo methods.
http://en.wikipedia.org/wiki/MetropolisHastings_algorithm
19
Metropolis-Hastings Algorithm (2)
Let   1, ,  n  be a probability density (or
mass) function (pdf or pmf).
 f  is any function and we want to estimate

n
I  E  f    f  i   i
i 1

Construct P   Pij  the transition matrix of an
irreducible Markov chain with states 1, 2, , n,
where Pij  Pr  Xt 1  j | X t  i , X t  1,2, , n
and Π is its unique stationary distribution.
20
Metropolis-Hastings Algorithm (3)

Run this Markov chain for times t  1,
and calculate the Monte Carlo sum
,N
N
1
Iˆ   f  Xt
N t 1
then Iˆ  I as N  
Sheldon M. Ross(1997). Proposition 4.3.
Introduction to Probability Model. 7th ed.
 http://nlp.stanford.edu/local/talks/mcmc_2
004_07_01.ppt

21
Metropolis-Hastings Algorithm (4)
In order to perform this method for a given
distribution Π , we must construct a Markov
chain transition matrix P with Π as its
stationary distribution, i.e. ΠP = Π.
 Consider the matrix P was made to satisfy
the reversibility condition that for all i and j
Πi Pij = Π j Pij .
 The property ensures that   i Pij   j for all j

i
and hence Π is a stationary distribution for P
22
Metropolis-Hastings Algorithm (5)
Let a proposal Q  Qij  be irreducible where
Qij  Pr  Xt 1  j | X t  i  , and range of Q is equal
to range of Π .
 But Π is not have to a stationary
distribution of Q.
 Process: Tweak Qij to yield Π .

States from Qij
not π
Tweak
States from Pij
π
23
Metropolis-Hastings Algorithm (6)

We assume that Pij has the form
Pij  Qij    i, j   i  j  ,
Pii  1   Pij ,
i j
where   i, j  is called accepted probability,
i.e. given X t  i ,
 X t 1  j with probability   i, j 
take 
 X t 1  i with probability 1-   i, j 
24
Metropolis-Hastings Algorithm (7)

For i  j,  i Pij   j Pji
  iQij  (i, j)   jQji  ( j, i)
*
WLOG for some (i, j ) ,  iQij   j Qji.
 In order to achieve equality (*), one can
introduce a probability   i, j   1 on the lefthand side and set   j, i   1 on the righthand side.

25
Metropolis-Hastings Algorithm (8)

Then  i Qij    i, j    j Q ji    j , i    j Q ji
 j Q ji
   i, j  
 i Qij

These arguments imply that the accepted
probability  (i, j ) must be
  jQ ji 
  i, j   min 1,
  iQ 
ij 

26
Metropolis-Hastings Algorithm (9)

M-H Algorithm:
Step 1: Choose an irreducible Markov chain
transition matrix Q with transition
probability Qij .
Step 2: Let t  0 and initialize X 0 from states
in Q.
Step 3 (Proposal Step):
Given X t  i , sample Y  j form QiY .
27
Metropolis-Hastings Algorithm (10)

M-H Algorithm (cont.):
Step 4 (Acceptance Step):
Generate a random number U from Unif  0,1
If U   i, j  , set X t 1  Y  j
else X t 1  X t  i
Step 5: t  t  1 , repeat Step 3~5 until
convergence.
28

Markov chain and MH

Transcript Markov chain and MH

Directory