Markov chain and MH
Download
Report
Transcript Markov chain and MH
Markov Chains
1
Markov Chains (1)
A Markov chain is a mathematical model
for stochastic systems whose states,
discrete or continuous, are governed by
transition probability.
Suppose the random variable X 0 , X1 ,
take
state space (Ω) that is a countable set of
value. A Markov chain is a process that
corresponds to the network.
X0
X1
... X
t 1
Xt
X t 1
...
2
Markov Chains (2)
The current state in Markov chain only
depends on the most recent previous
states.
P X t 1 j | X t i, X t 1 it 1 ,
, X 0 i0
P X t 1 j | X t i Transition probability
where i0 , , i, j
http://en.wikipedia.org/wiki/Markov_chain
http://civs.stat.ucla.edu/MCMC/MCMC_tuto
rial/Lect1_MCMC_Intro.pdf
3
An Example of Markov Chains
1, 2,3, 4,5
X X 0 , X1,
, Xt ,
where X 0 is initial state and so on.
P is transition matrix.
1
1 0.4
2 0.5
P 3 0.0
4 0.0
5 0.0
2
3
4
5
0.6 0.0 0.0 0.0
0.0 0.5 0.0 0.0
0.3 0.0 0.7 0.0
0.0 0.1 0.3 0.6
0.3 0.0 0.5 0.2
4
Definition (1)
Define the probability of going from state i
to state j in n time steps as
( n)
pij P X t n j | X t i
A state j is accessible from state i if there
( n)
p
are n time steps such that ij 0 , where
n 0,1,
A state i is said to communicate with state
j (denote: i j ), if it is true that both i is
accessible from j and that j is accessible
from i.
5
Definition (2)
A state i has period d i if any return to
state i must occur in multiples of d i time
steps.
Formally, the period of a state is defined as
d i gcd n : Pii( n ) 0
If d i 1, then the state is said to be
aperiodic; otherwise (d i 1), the state is
said to be periodic with period d i .
6
Definition (3)
A set of states C is a communicating
class if every pair of states in C
communicates with each other.
Every state in a communicating class must
have the same period
Example:
7
Definition (4)
A finite Markov chain is said to be
irreducible if its state space (Ω) is a
communicating class; this means that, in
an irreducible Markov chain, it is possible to
get to any state from any state.
Example:
8
Definition (5)
A finite state irreducible Markov chain is
said to be ergodic if its states are aperiodic
Example:
9
Definition (6)
A state i is said to be transient if, given
that we start in state i, there is a non-zero
probability that we will never return back to
i.
Formally, let the random variable Ti be the
next return time to state i (the “hitting
time”): Ti min n : X n i | X 0 i
Then, state i is transient iff there exists a
finite Ti such that: P Ti 1
10
Definition (7)
A state i is said to be recurrent or
persistent iff there exists a finite Ti such
that: P Ti 1.
The mean recurrent time i E Ti .
State i is positive recurrent if i is finite;
otherwise, state i is null recurrent.
A state i is said to be ergodic if it is
aperiodic and positive recurrent. If all
states in a Markov chain are ergodic, then
the chain is said to be ergodic.
11
Stationary Distributions
Theorem: If a Markov Chain is irreducible
and aperiodic, then
(n)
ij
P
1
j
as n , i, j
Theorem: If a Markov chain is irreducible
P Xn j
and aperiodic, then ! j lim
n
and j i Pij , j 1, j
i
i
where is stationary distribution.
12
Definition (8)
A Markov chain is said to be reversible, if
there is a stationary distribution such
that i Pij j Pji i, j
Theorem: if a Markov chain is reversible,
then j i Pij
i
13
An Example of Stationary Distributions
A Markov chain:
0.4
0.3
0.3
0.7
1
2
0.3
0.3
0.7
3
0.7 0.3 0.0
P 0.3 0.4 0.3
0.0 0.3 0.7
1
The stationary distribution is
3
0.7 0.3 0.0
1 1 1
1 1 1
3 3 3 0.3 0.4 0.3 3 3 3
0.0 0.3 0.7
1
3
1
3
14
Properties of Stationary Distributions
Regardless of the starting point, the
process of irreducible and aperiodic Markov
chains will converge to a stationary
distribution.
The rate of converge depends on properties
of the transition probability.
15
Monte Carlo Markov Chains
16
Monte Carlo Markov Chains
MCMC method are a class of algorithms for
sampling from probability distributions
based on constructing a Markov chain that
has the desired distribution as its stationary
distribution.
The state of the chain after a large number
of steps is then used as a sample from the
desired distribution.
http://en.wikipedia.org/wiki/MCMC
17
Metropolis-Hastings Algorithm
18
Metropolis-Hastings Algorithm (1)
The Metropolis-Hastings algorithm can draw
samples from any probability distribution
, requiring only that a function
proportional
to the density can be
Π
x
calculated at .
Process in three
x steps:
Set up a Markov chain;
Run the chain until stationary;
Estimate with Monte Carlo methods.
http://en.wikipedia.org/wiki/MetropolisHastings_algorithm
19
Metropolis-Hastings Algorithm (2)
Let 1, , n be a probability density (or
mass) function (pdf or pmf).
f is any function and we want to estimate
n
I E f f i i
i 1
Construct P Pij the transition matrix of an
irreducible Markov chain with states 1, 2, , n,
where Pij Pr Xt 1 j | X t i , X t 1,2, , n
and Π is its unique stationary distribution.
20
Metropolis-Hastings Algorithm (3)
Run this Markov chain for times t 1,
and calculate the Monte Carlo sum
,N
N
1
Iˆ f Xt
N t 1
then Iˆ I as N
Sheldon M. Ross(1997). Proposition 4.3.
Introduction to Probability Model. 7th ed.
http://nlp.stanford.edu/local/talks/mcmc_2
004_07_01.ppt
21
Metropolis-Hastings Algorithm (4)
In order to perform this method for a given
distribution Π , we must construct a Markov
chain transition matrix P with Π as its
stationary distribution, i.e. ΠP = Π.
Consider the matrix P was made to satisfy
the reversibility condition that for all i and j
Πi Pij = Π j Pij .
The property ensures that i Pij j for all j
i
and hence Π is a stationary distribution for P
22
Metropolis-Hastings Algorithm (5)
Let a proposal Q Qij be irreducible where
Qij Pr Xt 1 j | X t i , and range of Q is equal
to range of Π .
But Π is not have to a stationary
distribution of Q.
Process: Tweak Qij to yield Π .
States from Qij
not π
Tweak
States from Pij
π
23
Metropolis-Hastings Algorithm (6)
We assume that Pij has the form
Pij Qij i, j i j ,
Pii 1 Pij ,
i j
where i, j is called accepted probability,
i.e. given X t i ,
X t 1 j with probability i, j
take
X t 1 i with probability 1- i, j
24
Metropolis-Hastings Algorithm (7)
For i j, i Pij j Pji
iQij (i, j) jQji ( j, i)
*
WLOG for some (i, j ) , iQij j Qji.
In order to achieve equality (*), one can
introduce a probability i, j 1 on the lefthand side and set j, i 1 on the righthand side.
25
Metropolis-Hastings Algorithm (8)
Then i Qij i, j j Q ji j , i j Q ji
j Q ji
i, j
i Qij
These arguments imply that the accepted
probability (i, j ) must be
jQ ji
i, j min 1,
iQ
ij
26
Metropolis-Hastings Algorithm (9)
M-H Algorithm:
Step 1: Choose an irreducible Markov chain
transition matrix Q with transition
probability Qij .
Step 2: Let t 0 and initialize X 0 from states
in Q.
Step 3 (Proposal Step):
Given X t i , sample Y j form QiY .
27
Metropolis-Hastings Algorithm (10)
M-H Algorithm (cont.):
Step 4 (Acceptance Step):
Generate a random number U from Unif 0,1
If U i, j , set X t 1 Y j
else X t 1 X t i
Step 5: t t 1 , repeat Step 3~5 until
convergence.
28