Transcript ppt

STAT 534: Statistical Computing
Hari Narayanan
[email protected]
Course objectives
• Write programs in R and C tailored to specifics
of statistics problems you want to solve
• Familiarize yourself with:
– optimization techniques
– Markov Chain Monte Carlo (mcmc)
Logistics
• Class:
– Tuesdays and Thursdays 12:00pm – 1:20pm
• Office hours:
– Thursday 2:30pm – 4pm (Padelford B-301) or by appt
• Textbooks:
– Robert & Casella : Introducing Monte Carlo Methods with
R
– Kernighan & Ritchie : C Programming Language
• Evaluation:
– 4 assignments
– 2 quizzes
– Final project
Note to students
• These slides contain more material than
• Was covered in class.
• The main source
http://pages.uoregon.edu/dlevin/MARKOV/markovmixing.pdf
is referred to at
• http://faculty.washington.edu/harin/stat534-lectures.html
Review of MCMC done so far
• Natural questions
1. What is the Monte carlo method?
an elementary example (a) finding the area of an irregular
region on the plane.
(b) integrating a nonnegative function over an
area
(a) Area of an irregular region
• Bound the region by a rectangle
• Pick points uniformly with rectangle
• Area of region/Area of rectangle=
(Number in)/Total
Example:
Area of a circle of radius 1 /(area of square of side 2)= π
Approximating Pi
• Draw a circle with radius =1
Inside a 2x2 square – [-1,1] x [-1,1]
• Pick points (x,y) taking
• x<- rnorm(n, min=-1,max=1); y<- rnorm(n,
min=-1,max=1)
• Let int be the number of points
which fell inside the circle.
Approx_pi = int/4
Integrating a nonnegative function
over an area
• Pick points according to a suitable distribution
How to pick points according to a given
distribution?
Picking points randomly
• for area of irregular region pick points uniformly at
random within a larger regular region
and count how many fall within the region in question.
• for finding the integral of a nonnegative function over
an irregular region, pick points according to a suitable
distribution
How to pick points according to a given distribution?
The markov chain method
Pick a Markov chain whose steady state
distribution is the desired distribution.
Start from any point and let the markov chain run for
sometime.
Pick the point you reach.
You would have picked the point you reached
with approximately the steady state probability for
that point
Markov Chain + filter=New MC
• Here you begin with a certain Markov chain
with a symmetric transition matrix
• But you make your move to the next point
with an additional condition of acceptance or
rejection
Details of the Metropolis Filter
• Let T be a symmetric transition matrix.
(T is reversible with respect to the uniform
Probability distribution on the sample space Ω.)
We will modify the transitions made according to T
to obtain a chain with stationary distribution π
Ωπ
Details of Metropolis Filter
• Evolution of the new chain :
• When at state x, candidate move generated from T(x,.).
• If proposed new state is y. then the move is censored with
probability (1-a(x,y)) , equivalently with probability a(x,y) the
new state is the proposed state and with probability
(1-a(x,y)), the chain remains at x.
Note : This is a new Markov chain with transition matrix, say P
Ωπ
What is the new transition matrix?
• P(x,y) = T(x,y) a(x,y) if y not equal to x
= (1 -∑z T(x,z)a(x,z)), z not equal to x
if y=x
We know transition matrix P has stationary
distribution π if
π(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x) for all x not
equal to y.
Transition matrix P has stationary distribution π if
π(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x) for all x not equal to y.
• Proof: For p to be stationary distribution for
transition matrix P(x,y), we need
πP= π, i.e ∑ over x( π (x)P(x,y)) = π (y).
Here take P(x,y)=T(x,y)a(x,y). But
π(x)P(x,y)= π(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x) = π (y)P(y,x)
So ∑ over x( π (x)P(x,y)) = ∑ over x( π (y)P(y,x))
= π (y) (∑ over x P(y,x))= π (y) since (∑ over x P(y,x))= 1.
Back to the filter
• The acceptance probability of a move is a(x,y)
• Rejection of moves slows the chain.
• So make a(x,y) as large as possible.
• How large can it be?
• We will force the `nice’ condition
π(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x),i.e., π(x)a(x,y)= π(y)a(y,x).
Since a(x,y) is a probability π(x)T(x,y)a(x,y)= π(y)T(y,x)a(y,x)
<= π(x) and π(y)
• So a(x,y)<= min(π(y)/ π(x), 1), since T is symmetric and a(x,y)
is a probability
Metropolis chain of a Markov chain
•
•
•
•
T(x,y) Transition matrix of Markov Chain
P(x,y) Transition matrix of Metropolis Chain
Is therefore defined as follows
P(x,y)= T(x,y) (min( π(y)/ π(x), 1), if y not = x
= 1- ∑
over z not=x
(P(x,z)) if y = x
For this P(x,y) , π is a stationary distribution
Why all this complication?
• Note that the Metropolis chain is `just another
Markov Chain’.
So why not start with it right away instead of
starting with some other Markov Chain and
then do this acceptance/rejection?
Reasons for the indirect procedure
1. May be difficult to build a Markov Chain
directly for a stationary distribution π
2. The Metropolis chain depends only on the
ratio π(y)/ π(x). Suppose we know π only
within some normalizing constant M which is
difficult to compute. Metropolis chain does
not care what M is.
Numbers assigned to vertices of a
graph. Find max value vertex
• If you `greedily’ move from any point to a
higher neighbour you can get trapped
at a local maximum
Consider a regular graph
Random walk has a symmetric transition matrix
For k>=1, define πk (x)= k^(f(x))/Z (k)
where f(x) is the number assigned to the vertex and
Z(k)= ∑ k^(f(x)) over all x, making πk (x), a
probability measure.
Suppose number of vertices Large
• Markov chain corresponding to π, difficult to
compute because Z difficult to compute.
• But Metropolis chain of the `random walk
Markov Chain’ accepts transition with
probability k^(f(x)-f(y)).
We will show that for large k this randomized
algorithm ultimately gets to a maximum vertex.
Proof that Metropolis method hits max
value vertex
Since πk (x)= k^(f(x))/Z (k), this distribution
favours vertices with large value of f(x). Let f* be the max value
that a vertex has.
Define Ω* = collection of all vertices with max value.
• Now πk (x) can be rewritten as
( k^(f(x))/k^(f*)) / (Z (k)/ k^(f*))
• (Z (k)/ k^(f*)) can be written as |Ω*|(k^f*/k^f*) + summation
over the remaining elements of the expression k^f(x)/k^f*
• For large k, therefore, πk (x) converges to 1 over the vertices in
Ω*,
• i.e with probability 1 we hit one of the vertices with max value.