The Monte Carlo method
Download
Report
Transcript The Monte Carlo method
The Monte Carlo method
Nathan Baker
([email protected])
BME 540
What is Monte Carlo?
• A method to evaluate integrals
• Popular in all areas of science, economics,
sociology, etc.
• Involves:
– Random sampling of domain
– Evaluation of function
– Acceptance/rejection of points
Why Monte Carlo?
• The domain of integration is very
complex:
– Find the integral of a complicated
solvent-solute energy function outside
the boundary of a protein.
– Find the volume of an object from a
tomographic data set.
• The integrals are very highdimensional:
– A partition function or free energy for a
protein
– Analyzing metabolic networks
– Predicting trends in the stock market
Basic Monte Carlo Algorithm
• Suppose we want to approximate
Z f x dx
in a high-dimensional space
• For i = 1 to n
– Pick a point xi at random
– Accept or reject the point based on criterion
– If accepted, then add f(xi) to total sum
• Error estimates are “free” by calculating sums of
squares
1
• Error typically decays as
N
Example: the area of a circle
79.6
• Sample points
randomly from square
surrounding circle of
radius 5
• 10,000 sample points
• Acceptance criterion:
inside circle
• Actual area: 78.54
• Calculated area:
78.66
79.4
79.2
2000
4000
6000
8000
78.8
78.6
78.4
78.2
4
2
-4
-2
2
-2
-4
4
10000
Example: more complicated shapes
4
2
-4
-2
2
-2
-4
4
Example: multiple dimensions
• What is the average of a variable for a Ndimensional probability distribution?
• Two approaches:
– Quadrature
• Discretize each dimension into a set of n points
• Possibly use adaptivity to guide discretization
• For a reasonably smooth function, error decreases as n-N/2
– Monte Carlo
• Sample m points from the space
• Possibly weight sampling based on reference function
• Error decreases as m-1/2
•
Problems: sampling tails of distributions
We want to
f x exp 100 x
1
12
– Integrate a sharply-peaked
function
– Use Monte Carlo with
uniformly-distributed
random numbers
0.8
0.6
0.4
0.2
• What happens?
– Very few points contribute to
the integral (~9%)
– Poor computational
efficiency/convergence
• Can we ignore the tails?
NO!
• Solution: use a different
distribution
-1
-0.5
0.5
1
800
600
400
200
0.2
0.4
0.6
0.8
1
Improved sampling: change of variables
• One way to improve sampling is to change
variables:
– New distribution is flatter
– Uniform variates more useful
• Advantages:
– Simplicity
– Very useful for generating distributions of non-uniform
variates (coming up)
• Disadvantages
– Most useful for invertible functions
Change of variables: method
yf
1. Given an integral
1. I
f y dy
yi
2. Transform variables
2. y g x
3. Choose variables to
give (nearly) flat
distribution
dg x
3. f g x
C
dx
xf
4. I
4. Integrate
xi
dg x
f g x
dx
dx
xi g 1 yi
x f g 1 y f
Change of variables application:
exponential
yf
1. Given an integral
1. I
y
e
dy
yi
log x
2. Transform variables
2. y g x
3. Choose variables to
give (nearly) flat
distribution
dg x
3. f g x
1
dx
xf
4. I dx
xi
4. Integrate
xi g 1 yi
e yi
x f g 1 y f
e
y f
Change of variables application:
exponential
•
Before transformation:
– 0.5% of points in
domain contribute to
integral
– Slow convergence
0.052
0.05
0.048
0.046
0.044
0.042
2000
0.038
4000
6000
8000
10000
Change of variables example: exponential
•
Before transformation:
– 0.5% of points in
domain contribute to
integral
– Slow convergence
•
After transformation:
– All of the points in
domain contribute
– Rapid (exact)
convergence
•
0.1
0.08
0.06
0.04
0.02
In practice:
– Speed-up best when
inversion is nearly exact
2000
4000
6000
8000
10000
Importance sampling
• Functions aren’t always invertible
• Is there another way to improve sampling of “important”
regions of the functions?
– Find flat distributions
– Bias sampling
• Find a function that is almost proportional to the one of
interest:
h x
f x
C
g x
• Rewrite your integral as a “weighted” integral:
I f x dx h x g x dx
Importance sampling example: a lumpy
Gaussian
• Our original integrand is
f, g
x 2 sin 3 e x / 2
f x exp
2
20
• This is close to
x2
g x exp
2
• Therefore:
0.5
0.4
0.3
0.2
0.1
– Sample random numbers from (we’ll
talk about how to do this later):
x2
1
p x
exp
2
2
– Evaluate the following integrand
over the random number
distribution:
sin 3 e
h x exp
20
x/2
x
1
2
1
2
3
4
5
h f g
1.15
1.1
1.05
x
0.95
0.9
0.85
0.8
3
4
5
Importance sampling example: a lumpy
Gaussian
• Convergence is pretty good (actual value 1.41377…)
1.4155
1.415
1.4145
1.414
1.4135
2000
4000
6000
8000
10000
Evolution of Monte Carlo methods so far…
1. Uniform points and original integrand…
•
…but this had very poor efficiency….
2. Uniform points and transformed integrand…
•
…but this only worked for certain integrands….
3. Non-uniform points and scaled integrand…
•
…but this is very cumbersome for complicated
integrands…
4. Now, we try Markov chain approaches…
Markov chains
x1, x2 ,
• Properties
– A sequence of randomly-chosen
states
– The probability of transitions
between states is independent of
history
– The entire chain represents a
stationary probability distribution
• Examples
–
–
–
–
Random number generators
Brownian motion
Hidden Markov Models
(Perfectly) encrypted data
, xn , xn1
p y z pab
x y z
p zn1 | yn
p zn1 p zn1 | yn p yn dyn
-20
-10
10
-10
-20
-30
-40
20
30
40
Detailed balance
• What sorts of Markov chains reach stationary
distributions?
– This is the same question as posed for the Liouville
equation…
– Equilibrium processes have stationary distributions
• What does it mean to be at equilibrium?
– Reversibility
– Detailed balance relates stationary probabilities to
transitions:
p x x y p y y x
Detailed balance: example
• Markov chain
with a Gaussian
stationary
probability
distribution
• Detailed
balance
satisfied
800
600
400
200
-2
-1
0
1
2
3
Detailed balance: example
120
• Markov chain
with a Gaussian
stationary
probability
distribution,
EXCEPT:
– Steps to the
right are 1%
more favorable
• Detailed
balance violated
100
80
60
40
20
1000
2000
3000
4000
5000
6000
7000
2000
When both states
are unfavorable,
a 1% bias makes
a big difference!
1500
1000
500
0
20
40
60
80
100
120
Markov chain Monte Carlo
• Assembling the entire distribution for MC is usually hard:
– Complicated energy landscapes
– High-dimensional systems
– Extraordinarily difficult normalization
• Solution:
– Build up distribution from Markov chain
– Choose local transition probabilities which generate distribution
of interest (i.e., ensure detailed balance)
– Each random variable is chosen based on the previous variable
in the chain
– “Walk” along the Markov chain until convergence reached
• Result: Normalization not required, calculations are local
Application: microcanonical ensemble
• Consider the particle
in a N-D box.
• Monte Carlo
algorithm for
microcanonical
ensemble:
– Sample
– Determine the
number of states with
energy ≤E
– Relate to the
microcanonical
partition function by
differentiation
h2
E
8ma 2
3N
2
n
j
j 1
n j min 0, n j 1
1 for 0 H n j E
a n j n j 1
0 otherwise
E
E
0
0
E d
E dE ' E
Application: microcanonical ensemble
9-dimensional particle in a box
10
400
8
h2
2
8 ma
6
3h2
2
8 ma
5h2
2
8 ma
300
200
4
100
2
5000
10000
15000
5000
20000
600
400
200
E
2
3
4
5
6
7
10000
15000
20000
Markov chain Monte Carlo: flavors
•
•
•
•
Molecular simulation: Metropolis
Bayesian inferences: Gibbs
Hidden Markov Models: Viterbi
…etc….
Application: stochastic transitions
• This is a very simplistic
k t
version of kinetic Monte a i j; t p i j; t 1 e
Carlo…
k
• Each Monte Carlo step
A
B
corresponds to a time
k
interval
• The probability of moving
between states in that time
interval is related to the rate
constants
• Simulate to give mean first
passage times, transient
k
n
populations, etc.
k01t 0.05, k10 t 0.10, 1 0.51 01
i j
A B
B A
1
0.8
0.6
0.4
0.2
500
1000
1500
n0
2000
k10
Metropolis Monte Carlo
• Start with the detailed balance condition
p x x y p y y x
• Derive an “acceptance ratio” condition
a x y x y p y
a y x y x p x
• Choose a particular acceptance ratio
p y
, if p y p x
a x y p x
1, otherwise
Application: canonical ensemble
• Our un-normalized probabilities look like
Boltzmann factors:
p x e
H x
, kT
1
• Our acceptance ratio is therefore:
e H y
H y
H x
,
if
e
e
H x
a x y e
1, otherwise
e H y H x , if H y H x
a x y
1, otherwise
Algorithm: NVT Metropolis Monte Carlo
Given: a state x, an energy function H x
While system not converged:
a. Generate a new state y
b. If H y H x , then
a 1
Else
Calculate uniform variate U
H y H x
If U e
, then
a 1
Else
a0
End if.
End if.
c. If a equals 1, then
Accept state and calculate observables
x y
End if.
End while.
Metropolis MC in a harmonic potential
Advantages of Metropolis MC simulations
• Does not require
forces
– Rapidly-changing
energy functions
– No differentiation
required
3
2
4
w
1
2
0
-4
0
y
-2
2
• Amenable to complex
move sets:
–
–
–
–
Torsions
Rotamers
Tautomers
Etc.
-2
0
x
4
-4
3
2
1
-3
-2
-1
1
-1
-2
-3
2
3
Monte Carlo “machinery”
• Boundary
conditions
– Finite
– Periodic
• Interactions
– Complete
– Truncated
– Periodic
• How do we choose
“moves”?
u x
4
3
2
1
x
1
2
3
4
Monte Carlo moves
• Trial moves
– Rigid body translation
– Rigid body rotation
– Internal conformational
changes (soft vs. stiff
modes)
– Titration/electronic
states
–…
• Questions:
– How “big” a move
should we take?
– Move one particle or
many?
Monte Carlo moves
• How “big” a move should
we take?
– Smaller moves: better
acceptance rate, slower
sampling
– Bigger moves: faster
sampling, poorer
acceptance rate
– Amortize mean squared
displacement with
respect to CPU time
– “Rules of thumb”:
percent acceptance rate
• Move one particle or
many?
– Possible to achieve
more efficient sampling
with correct multi-particle
moves
– One-particle moves
must choose particles at
random
Random variates
• Don’t develop your own uniform variate method
– Look for long period, lack of bias, etc.
– These are usually fulfilled in random(), drand48(), etc.
– I like “Mersenne twister”
• Use uniform variates for starting point of other
distributions
• Can use all the methods we’ve described:
– Standard sampling (rejection method)
– Transformations
– Metropolis (watch out for correlations)
Random variates: transformations
• Useful for functions
with easy inverses
• Sample uniform
variable and transform
via inverse to give
particular variates
• Tougher distributions
require rejection with
respect to “easy”
distribution…
1
0.8
0.6
0.4
0.2
1
2
3
4
5
x ln u, p x e x
6
7
Problem: poor convergence
• Error in Monte Carlo
decreases as N-1/2
• What if you want to
improve error by 1%?
• This is due to the
“filling behavior” of
uniformly-distributed
random points
• Affects nearly all areas
of MC simulations
• Convergence can also
be difficult to detect:
– Observe decrease in
variance of data
– Employ error estimates
– Sub-sample data
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
Density
1
0.8
0.6
0.4
0.2
N
2000
4000
6000
8000
10000
1
Quasi-random sequences
For j 1, 2,3,
, 2m 1
• Compare to the “pseudo-random”
a base 2 j
methods we’ve discussed so far.
• Can we find a sub-random sequence
b Reverse the digits of a
with low internal correlations that fills
base10 a
base10 b
space better?
x
, y
m
2
2m
• Solution: a maximally-avoiding set of
End for.
points
• Number-theoretic methods:
Hammersley quasi-random sequence
– Base transformations
– Primes
– Polynomials
• Convergence can reach N-1 (Sobol
sequence)
• Caveats:
– How many points (resolution)?
– What about sharply-varying functions?
Application: the bootstrap method
• Useful for estimating
error fit parameters
• Resampling of original
data
• Examination of resulting
fit parameters
• Algorithm:
True: f x 4 x 1 R
Fit: f x a2 x 2 a1 x a0
10
5
-2
-1
1
-5
– Generate fit parameters
for the original data
– For (number of resamples
desired)
• Resample the data with
replacement
• Generate new fit
parameters
• Save deviation with
respect to original fit
parameters
– Analyze deviation
statistics
2
-10
40
30
20
10
-0.2
a2 0.4 0.8
0
a1 4.1 0.1
80
60
40
20
-0.4
-0.2
0
0.2
a0 1.0 0.1
0.4
0.2
0.4
Summary
• A method for sampling
– Integrals
– Configuration space
– Error space
• Any distribution that can be represented by
transitions can be sampled
• Sampling can be accelerated using various
tricks