The Monte Carlo method

Download Report

Transcript The Monte Carlo method

The Monte Carlo method
Nathan Baker
([email protected])
BME 540
What is Monte Carlo?
• A method to evaluate integrals
• Popular in all areas of science, economics,
sociology, etc.
• Involves:
– Random sampling of domain
– Evaluation of function
– Acceptance/rejection of points
Why Monte Carlo?
• The domain of integration is very
complex:
– Find the integral of a complicated
solvent-solute energy function outside
the boundary of a protein.
– Find the volume of an object from a
tomographic data set.
• The integrals are very highdimensional:
– A partition function or free energy for a
protein
– Analyzing metabolic networks
– Predicting trends in the stock market
Basic Monte Carlo Algorithm
• Suppose we want to approximate
Z   f  x  dx

in a high-dimensional space
• For i = 1 to n
– Pick a point xi at random
– Accept or reject the point based on criterion
– If accepted, then add f(xi) to total sum
• Error estimates are “free” by calculating sums of
squares
1
• Error typically decays as
N
Example: the area of a circle
79.6
• Sample points
randomly from square
surrounding circle of
radius 5
• 10,000 sample points
• Acceptance criterion:
inside circle
• Actual area: 78.54
• Calculated area:
78.66
79.4
79.2
2000
4000
6000
8000
78.8
78.6
78.4
78.2
4
2
-4
-2
2
-2
-4
4
10000
Example: more complicated shapes
4
2
-4
-2
2
-2
-4
4
Example: multiple dimensions
• What is the average of a variable for a Ndimensional probability distribution?
• Two approaches:
– Quadrature
• Discretize each dimension into a set of n points
• Possibly use adaptivity to guide discretization
• For a reasonably smooth function, error decreases as n-N/2
– Monte Carlo
• Sample m points from the space
• Possibly weight sampling based on reference function
• Error decreases as m-1/2
•
Problems: sampling tails of distributions
We want to
f  x   exp   100 x  
1
12
– Integrate a sharply-peaked
function
– Use Monte Carlo with
uniformly-distributed
random numbers
0.8
0.6
0.4
0.2
• What happens?
– Very few points contribute to
the integral (~9%)
– Poor computational
efficiency/convergence
• Can we ignore the tails?
NO!
• Solution: use a different
distribution
-1
-0.5
0.5
1
800
600
400
200
0.2
0.4
0.6
0.8
1
Improved sampling: change of variables
• One way to improve sampling is to change
variables:
– New distribution is flatter
– Uniform variates more useful
• Advantages:
– Simplicity
– Very useful for generating distributions of non-uniform
variates (coming up)
• Disadvantages
– Most useful for invertible functions
Change of variables: method
yf
1. Given an integral
1. I 
 f  y  dy
yi
2. Transform variables
2. y  g  x 
3. Choose variables to
give (nearly) flat
distribution
dg  x 
3. f  g  x  
C
dx
xf
4. I 
4. Integrate

xi
dg  x 
f  g  x 
dx
dx
xi  g 1  yi 
x f  g 1  y f

Change of variables application:
exponential
yf
1. Given an integral
1. I 
 y
e
 dy
yi
log   x 
2. Transform variables
2. y  g  x   
3. Choose variables to
give (nearly) flat
distribution
dg  x 
3. f  g  x  
 1
dx

xf
4. I    dx
xi
4. Integrate
xi  g 1  yi  
e  yi
x f  g 1  y f  
e

 y f

Change of variables application:
exponential
•
Before transformation:
– 0.5% of points in
domain contribute to
integral
– Slow convergence
0.052
0.05
0.048
0.046
0.044
0.042
2000
0.038
4000
6000
8000
10000
Change of variables example: exponential
•
Before transformation:
– 0.5% of points in
domain contribute to
integral
– Slow convergence
•
After transformation:
– All of the points in
domain contribute
– Rapid (exact)
convergence
•
0.1
0.08
0.06
0.04
0.02
In practice:
– Speed-up best when
inversion is nearly exact
2000
4000
6000
8000
10000
Importance sampling
• Functions aren’t always invertible
• Is there another way to improve sampling of “important”
regions of the functions?
– Find flat distributions
– Bias sampling
• Find a function that is almost proportional to the one of
interest:
h  x 
f  x
C
g  x
• Rewrite your integral as a “weighted” integral:
I   f  x  dx   h  x  g  x  dx


Importance sampling example: a lumpy
Gaussian
• Our original integrand is
f, g
 x 2 sin  3 e x / 2  

f  x   exp   
2
20


• This is close to
 x2 
g  x   exp   
 2
• Therefore:
0.5
0.4
0.3
0.2
0.1
– Sample random numbers from (we’ll
talk about how to do this later):
 x2 
1
p  x 
exp   
2
 2
– Evaluate the following integrand
over the random number
distribution:
 sin  3 e
h  x   exp 
20

x/2
 

x
1
2
1
2
3
4
5
h f g
1.15
1.1
1.05
x
0.95
0.9
0.85
0.8
3
4
5
Importance sampling example: a lumpy
Gaussian
• Convergence is pretty good (actual value 1.41377…)
1.4155
1.415
1.4145
1.414
1.4135
2000
4000
6000
8000
10000
Evolution of Monte Carlo methods so far…
1. Uniform points and original integrand…
•
…but this had very poor efficiency….
2. Uniform points and transformed integrand…
•
…but this only worked for certain integrands….
3. Non-uniform points and scaled integrand…
•
…but this is very cumbersome for complicated
integrands…
4. Now, we try Markov chain approaches…
Markov chains
x1, x2 ,
• Properties
– A sequence of randomly-chosen
states
– The probability of transitions
between states is independent of
history
– The entire chain represents a
stationary probability distribution
• Examples
–
–
–
–
Random number generators
Brownian motion
Hidden Markov Models
(Perfectly) encrypted data
, xn , xn1
p y  z  pab
x y z
 p  zn1 | yn 
p  zn1    p  zn1 | yn  p  yn  dyn

-20
-10
10
-10
-20
-30
-40
20
30
40
Detailed balance
• What sorts of Markov chains reach stationary
distributions?
– This is the same question as posed for the Liouville
equation…
– Equilibrium processes have stationary distributions
• What does it mean to be at equilibrium?
– Reversibility
– Detailed balance relates stationary probabilities to
transitions:
p  x  x  y   p  y   y  x 
Detailed balance: example
• Markov chain
with a Gaussian
stationary
probability
distribution
• Detailed
balance
satisfied
800
600
400
200
-2
-1
0
1
2
3
Detailed balance: example
120
• Markov chain
with a Gaussian
stationary
probability
distribution,
EXCEPT:
– Steps to the
right are 1%
more favorable
• Detailed
balance violated
100
80
60
40
20
1000
2000
3000
4000
5000
6000
7000
2000
When both states
are unfavorable,
a 1% bias makes
a big difference!
1500
1000
500
0
20
40
60
80
100
120
Markov chain Monte Carlo
• Assembling the entire distribution for MC is usually hard:
– Complicated energy landscapes
– High-dimensional systems
– Extraordinarily difficult normalization
• Solution:
– Build up distribution from Markov chain
– Choose local transition probabilities which generate distribution
of interest (i.e., ensure detailed balance)
– Each random variable is chosen based on the previous variable
in the chain
– “Walk” along the Markov chain until convergence reached
• Result: Normalization not required, calculations are local
Application: microcanonical ensemble
• Consider the particle
in a N-D box.
• Monte Carlo
algorithm for
microcanonical
ensemble:
– Sample
– Determine the
number of states with
energy ≤E
– Relate to the
microcanonical
partition function by
differentiation
h2
E
8ma 2
3N
2
n
 j
j 1
n j  min  0, n j  1
1 for 0  H  n j   E
a  n j  n j  1  
0 otherwise

E
E
 0
0
  E            d
  E  dE   '  E 
Application: microcanonical ensemble
9-dimensional particle in a box
10
400
8
 h2 

2 
 8 ma 
6
 3h2 

2 
 8 ma 
 5h2 

2 
 8 ma 
300
200
4
100
2
5000
10000
15000
5000
20000
600
400
200
E
2
3
4
5
6
7
10000
15000
20000
Markov chain Monte Carlo: flavors
•
•
•
•
Molecular simulation: Metropolis
Bayesian inferences: Gibbs
Hidden Markov Models: Viterbi
…etc….
Application: stochastic transitions
• This is a very simplistic
 k t
version of kinetic Monte a  i  j; t   p  i  j; t   1  e
Carlo…
k
• Each Monte Carlo step
A
B
corresponds to a time
k
interval
• The probability of moving
between states in that time
interval is related to the rate
constants
• Simulate to give mean first
passage times, transient
k
n
populations, etc.
k01t  0.05, k10 t  0.10, 1  0.51  01
i j
A B
B A
1
0.8
0.6
0.4
0.2
500
1000
1500
n0
2000
k10
Metropolis Monte Carlo
• Start with the detailed balance condition
p  x  x  y   p  y   y  x 
• Derive an “acceptance ratio” condition
a  x  y   x  y p  y


a  y  x   y  x p  x
• Choose a particular acceptance ratio
 p  y
, if p  y   p  x 

a  x  y   p  x

1, otherwise

Application: canonical ensemble
• Our un-normalized probabilities look like
Boltzmann factors:
p  x  e
  H  x
,    kT 
1
• Our acceptance ratio is therefore:
 e  H  y 
 H  y
  H  x
,
if
e

e
   H  x
a  x  y  e
1, otherwise

 e   H  y  H  x  , if H  y   H  x 
a  x  y  
1, otherwise
Algorithm: NVT Metropolis Monte Carlo
Given: a state x, an energy function H  x 
While system not converged:
a. Generate a new state y
b. If H  y   H  x  , then
a 1
Else
Calculate uniform variate U
   H  y   H  x 

If U  e 
, then
a 1
Else
a0
End if.
End if.
c. If a equals 1, then
Accept state and calculate observables
x y
End if.
End while.
Metropolis MC in a harmonic potential
Advantages of Metropolis MC simulations
• Does not require
forces
– Rapidly-changing
energy functions
– No differentiation
required
3
2
4
w
1
2
0
-4
0
y
-2
2
• Amenable to complex
move sets:
–
–
–
–
Torsions
Rotamers
Tautomers
Etc.
-2
0
x
4
-4
3
2
1
-3
-2
-1
1
-1
-2
-3
2
3
Monte Carlo “machinery”
• Boundary
conditions
– Finite
– Periodic
• Interactions
– Complete
– Truncated
– Periodic
• How do we choose
“moves”?
u x
4
3
2
1
x
1
2
3
4
Monte Carlo moves
• Trial moves
– Rigid body translation
– Rigid body rotation
– Internal conformational
changes (soft vs. stiff
modes)
– Titration/electronic
states
–…
• Questions:
– How “big” a move
should we take?
– Move one particle or
many?
Monte Carlo moves
• How “big” a move should
we take?
– Smaller moves: better
acceptance rate, slower
sampling
– Bigger moves: faster
sampling, poorer
acceptance rate
– Amortize mean squared
displacement with
respect to CPU time
– “Rules of thumb”:
percent acceptance rate
• Move one particle or
many?
– Possible to achieve
more efficient sampling
with correct multi-particle
moves
– One-particle moves
must choose particles at
random
Random variates
• Don’t develop your own uniform variate method
– Look for long period, lack of bias, etc.
– These are usually fulfilled in random(), drand48(), etc.
– I like “Mersenne twister”
• Use uniform variates for starting point of other
distributions
• Can use all the methods we’ve described:
– Standard sampling (rejection method)
– Transformations
– Metropolis (watch out for correlations)
Random variates: transformations
• Useful for functions
with easy inverses
• Sample uniform
variable and transform
via inverse to give
particular variates
• Tougher distributions
require rejection with
respect to “easy”
distribution…
1
0.8
0.6
0.4
0.2
1
2
3
4
5
x   ln u, p  x   e x
6
7
Problem: poor convergence
• Error in Monte Carlo
decreases as N-1/2
• What if you want to
improve error by 1%?
• This is due to the
“filling behavior” of
uniformly-distributed
random points
• Affects nearly all areas
of MC simulations
• Convergence can also
be difficult to detect:
– Observe decrease in
variance of data
– Employ error estimates
– Sub-sample data
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
Density
1
0.8
0.6
0.4
0.2
N
2000
4000
6000
8000
10000
1
Quasi-random sequences
For j  1, 2,3,
, 2m  1
• Compare to the “pseudo-random”
a  base 2  j 
methods we’ve discussed so far.
• Can we find a sub-random sequence
b   Reverse the digits of a 
with low internal correlations that fills
base10  a 
base10  b 
space better?
x
, y
m
2
2m
• Solution: a maximally-avoiding set of
End for.
points
• Number-theoretic methods:
Hammersley quasi-random sequence
– Base transformations
– Primes
– Polynomials
• Convergence can reach N-1 (Sobol
sequence)
• Caveats:
– How many points (resolution)?
– What about sharply-varying functions?
Application: the bootstrap method
• Useful for estimating
error fit parameters
• Resampling of original
data
• Examination of resulting
fit parameters
• Algorithm:
True: f  x   4 x  1  R
Fit: f  x   a2 x 2  a1 x  a0
10
5
-2
-1
1
-5
– Generate fit parameters
for the original data
– For (number of resamples
desired)
• Resample the data with
replacement
• Generate new fit
parameters
• Save deviation with
respect to original fit
parameters
– Analyze deviation
statistics
2
-10
40
30
20
10
-0.2
a2  0.4  0.8
0
a1  4.1  0.1
80
60
40
20
-0.4
-0.2
0
0.2
a0  1.0  0.1
0.4
0.2
0.4
Summary
• A method for sampling
– Integrals
– Configuration space
– Error space
• Any distribution that can be represented by
transitions can be sampled
• Sampling can be accelerated using various
tricks