High Performance Computing 811

Download Report

Transcript High Performance Computing 811

Computational Methods
in Physics
PHYS 3437
Dr Rob Thacker
Dept of Astronomy & Physics (MM-301C)
[email protected]
Today’s Lecture

Introduction to Monte Carlo methods
Background
 Integration techniques

Introduction


“Monte Carlo” refers to the use of random numbers to
model random events that may model a mathematical
of physical problem
Typically, MC methods require many millions of
random numbers

Of course, computers cannot actually generate truly random
numbers



However, we can make the period of repetition absolutely enormous
Such pseudo-random number generators are based on truncation of
numbers of their significant digits
See Numerical Recipes, p 266-280 (2nd edition FORTRAN)
“Anyone who considers arithmetical methods of producing
random digits is, of course, in a state of sin.”
John von Neumann
History of numerical Monte Carlo
methods






Another contribution to numerical methods related to research
at Los Alamos
Late 1940s: scientists want to follow paths of neutrons following
various sub-atomic collision events
Ulam & von Neumann suggest using random sampling to
estimate this process
100 events can be calculated in 5 hours on ENIAC
The method is given the name “Monte Carlo” by Nicholas
Metropolis
Explosion of inappropriate use in the 1950’s gave the technique
a bad name

Subsequent research illuminated when the method was appropriate
Terminology


Random deviate – a distribution of numbers
choosen uniformly between [0,1]
Normal deviate – numbers chosen randomly
between (-∞,∞) weighted by a Gaussian
Background to MC integration
b

Suppose we have a definite integral
I   f ( x)dx
a

Given a “good” set of N sample points {xi} we can
estimate the integral as
ba N
I   f ( x)dx 
f ( xi )

N i 1
a
b
Sample points e.g. x3 x9
f(x)
a
b
Each sample point yields
an element of the integral of
width (b-a)/N and height
f(xi)
What MC integration really does

While the previous explanation is a reasonable interpretation of
the way MC integration works, the most popular explanation
is below
Height given
by random sample
of f(x)
Average
a
b
Mathematical Applications



Let’s formalize this just a little bit…
Since by the mean value theorem

b
f ( x)dx  (b  a)  f 
a
We can approximate the integral by calculating
(b-a)<f>, and we can calculate <f> by averaging many
values of f(x)
N
1
 f N 
N

 f (x )
i 1
i
Where xiє[a,b] and the values are chosen randomly
Example



1
Consider evaluating I   e x dx  e  1  1.718281828...
0
Let’s take N=1000, then evaluate f(x)=ex with
xє[0,1] at 1000 random points
For this set of points define
 I1=(b-a)<f>N,1=<f>N,1


since b-a=1
Next choose 1000 different xє[0,1] and create a
new estimate I2=<f>N,2
Next choose another 1000 different xє[0,1] and
create a new estimate I3=<f>N,3
Distribution of the estimates


We can carry on doing this, say
10,000 times at which point we’ll
have 10,000 values estimating the
integral, and the distribution of
these values will be a normal
distribution
The distribution of the all of the IN
integrals constrains the errors we
would expect on a single IN estimate



This is the Central Limit Theorem,
for any given IN estimate the sum
of the random variables within it
will converge toward a normal
distribution
Specifically, the standard deviation
will be the estimate of the error in a
single IN estimate
The mean, x0, will approach e-1
1

e
( x  x0 ) 2
2s N2
1/e
x0-sN
x0
x0+sN
Calculating sN

The formula for the standard deviation of N samples is
( N  1)s N2  f ( xi ) 2    f ( xi )  2



2
1
1

2
   f ( xi )     f ( xi ) 
N i 1
 N i 1

If there is no deviation in the data then RHS is zero
N
N
Given some deviation as N→∞, the RHS will settle to
some constant value > 0 (in this case ~ 0.2420359…)
1
1
Thus we can write
sN 
~
( N  1)
N
A rough measure of how good a random number generator is how well does a histogram
of the 10,000 estimates fit to a Gaussian.
Add mc.ps plot
1000 samples per
I integration
Standard deviation
is 0.491/√1000
Increasing the
number of integral
estimates makes the
distribution closer
and closer to the
infinite limit.
Resulting statistics

For data that fits a Gaussian, the theory of probability
distribution functions asserts that




Interpretation of poll data:




68.3% of the data (<f>N) will fall within ±sN of the mean
95.4% of the data (19/20) will fall within ±2sN of the mean
99.7% of the data will fall within ±3sN etc…
“These results will be accurate to ±4%, (19 times out of 20)”
The ±4% corresponds to ±2s  s0.02
Since s1/sqrt(N)  N~2500
This highlights one of the difficulties with random
sampling, to improve the result by a factor of 2 we
must increase N by a factor of 4!
Why would we use this method to
evaluate integrals?

For 1D it doesn’t make a lot of sense
Taking h~1/N then composite trapezoid rule
error~h2~1/N2=N-2
 Double N, get result 4 times better


In 2D, we can use an extension of the trapezoid
rule to use squares



Taking h~1/N1/2 then errorh2N-1
In 3D we get h~1/N1/3 then errorh2N-2/3
In 4D we get h~1/N1/4 then errorh2N-1/2
MC beneficial for N>4


Monte Carlo methods always have sN~N-1/2 regardless
of the dimension
Comparing to the 4D convergence behaviour we see
that MC integration becomes practical at this point



It wouldn’t make any sense for 3D though
For anything higher than 4D (e.g. 6D,9D which are
possible!) MC methods tend to be the only way of
doing these calculations
MC methods also have the useful property of being
comparatively immune to singularities, provided that


The random generator doesn’t hit the singularity
The integral does indeed exist!
Importance sampling



In reality many integrals have functions that vary
rapidly in one part of the number line and more
slowly in others
To capture this behaviour with MC methods
requires that we introduce some way of “putting
our points where we need them the most”
We really want to introduce a new function into
the problem, one that allows us to put the
samples in the right places
General outline

Suppose we have two similar functions g(x) & f(x),
and g(x) is easy to integrate, then
b
b f ( x)
I   f ( x)dx  
g ( x)dx
a
a g ( x)
Change variables : let dy  g(x)dx, then
x
y(x)   g ( x' )dx'
when x  a, y  y(a); x  b, y  y(b)
y(b) f ( x ( y ))
I
dy
y(a) g ( x ( y ))
General outline cont



f ( x( y ))
The integral we have derived I  
dy has
y(a) g ( x ( y ))
some nice properties:
 Because g(x)~f(x) (i.e. g(x) is a reasonable approximation of
f(x) that is easy to integrate) then the integrand should be
approximately 1
 and the integrand shouldn’t vary much!
y(b)
It should be possible to calculate a good approximation
with a fairly small number of samples
Thus by applying the change of variables and mapping
our sample points we get a better answer with fewer
samples
Example



Let’s look at integrating f(x)=ex again on [0,1]
MC random samples are 0.23,0.69,0.51,0.93
Our integral estimate is then
1 0.23 0.69 0.51 0.93
I  (1  0) (e  e  e  e )
4
 1.8630
Difference to answer  1.8630 - 1.7183  0.1447
Apply importance sampling

We first need to decide on our g(x) function, as a guess
let us take g(x)=1+x




We’ll it isn’t really a guess – we know this is the first two
terms of the Taylor expansion of ex!
2
x
y(x) is thus given by y ( x)   (1  x' )dx'  x 
2
For end points we get y(0)=0, y(1)=3/2
Rearrange y(x) to give x(y):
x
x( y)  1  1  2 y
Set up integral & evaluate samples

The integral to evaluate is now
1 1 2 y
3/ 2 e
ex
I
dy  
dy
0 1 x
0
1 2 y
We must now choose y’s on the interval [0,3/2]
e
y
3/ 2

1 1 2 y
1 2 y
0.345
1.038
1.035
1.211
0.765
1.135
1.395
1.324
Close to 1 because
g(x)~f(x)
Evaluate

For the new integral we have
1
I  (3 / 2  0) (1.038  1.211  1.135  1.324)
4
 1.7655
Difference to answer  1.7655 - 1.7183  0.0472
So 3 times better tha n the previous estimate!


Clearly this technique of ensuring the integrand doesn’t vary too
much is extremely powerful
Importance sampling is particularly important in
multidimensional integrals and can add 1 or 2 significant figures
of accuracy for a minimal amount of effort
Rejection technique


Thus far we’ve look in detail at the effect of changing
sample points on the overall estimate of the integral
An alternative approach may be necessary when you
cannot easily sample the desired region which we’ll call
W


We define a larger region V that includes W


Particularly important in multi-dimensional integrals when
you can calculate the integral for a simple boundary but not a
complex one
Note you must also be able to calculate the size of V easily
The sample function is then redefined to be zero
outside the volume, but have it’s normal value within it
Rejection technique diagram
f(x)
V
W
Region we want
to calculate
Area of W=integral of region V multiplied by
fraction of points falling below f(x) within V
Algorithm: just count the total number of points calculated & the number in W!
Better selection of points: subrandom sequences


Choosing N points using a uniform deviate produces an error
that converges as N-0.5
If we could choose points “better” we could make convergence
faster

For example, using a Cartesian grid of points leads to a method that
converges as N-1




Think of Cartesian points as “avoiding” one another and thus sampling a
given region more completely
However, we don’t know a priori how fine the grid should be
We want to avoid short range correlations – particles shouldn’t be too
close to one another
A better solution is to choose points that attempt to “maximally
avoid” one another
A list of sub-random sequences







Tore-SQRT sequences
Van der Corput & Halton sequences
Faure sequence
Generalized Faure sequence
Nets & (t,s)-sequences
Sobol sequence
Many to choose from!
Niederreiter sequence

Well look very briefly at Halton & Sobol sequences, both of
which are covered in detail in Numerical Recipes
Halton’s sequence

Suppose in 1d we obtain the jth number in sequence,
denoted Hj, via

(1) write j as a number in base b, where b is prime


(2) Reverse the digits and place a radix point in front



e.g. 17 in base 3 is 122
e.g. 0.221 base 3
It should be clear why this works, adding an additional
digit makes the “mesh” of numbers progressively finer
For a sequence of points in n dimensions (xi1,…,xin) we
would typically use the first n primes to generate
separate sequences for each of the xij components
2d Halton’s sequence example
Pairs of points constructed from base 3 & 5
Halton sequences
Sobol (1967) sequence


Useful method
described in
Numerical
Recipes as
providing close
to N-1
convergence
rate
Algorithms are
also available at
www.netlib.org
From Num. Recipes
Summary




MC methods are a useful way of numerically
integrating systems that are not tractable by
other methods
The key part of MC methods is the N-0.5
convergence rate
Numerical integration techniques can be greatly
improved using importance sampling
If you cannot write down a function easily then
the rejection technique can often be employed
Next Lecture

More on MC methods – simulating random
walks