Transcript Lesson1

Lesson 1: Introduction to Monte Carlo
•
•
•
Go over outline and syllabus
Background and course overview
Statistical Formulae:
•
•
•
•
Mean
SD of population
SD of mean
A little practice
1
Background and course overview
•
•
•
Monte Carlo methods are a branch of mathematics
that involves the stochastic solution of problems.
Experimental approach to solving a problem.
(“Method of Statistical Trials”)
When the analyst is trying to use a Monte Carlo
approach to estimate a physically measurable
variable, the approach breaks itself down into two
steps:
1.
2.
Devise a numerical experiment whose expected value
would correspond to the desired measurable value x .
Run the problem to determine an estimate to this variable.
We call the estimate x̂ .
•
Many (probably most) Monte Carlo problems are of the “Hit or
Miss” category, which finds the probability of some event
occurring. (e.g., hitting a straight flush, neutron escaping
2
through a surface, etc.)
BG and Overview (2)
•
•
•
The first step can either be very simple or very
complicated, based on the particulars features of the
problem.
If the problem is itself stochastic, the experimental
design step is very simple: Let the mathematical
simulation simulate the problem. This is an analog
simulation, since the calculation is a perfect analog to
the problem.
Lucky for us, the transport of neutral particles is a
stochastic situation. All we HAVE to do to get a
guess at a measurable effect from a transport
situation is to simulate the stochastic "decisions" that
nature makes.
3
BG and Overview (3)
•
•
For processes that are NOT inherently stochastic, the
experimental design is more complex and generally requires that
the analyst:
1.
Derive an equation (e.g., heat transfer equation, Boltzmann
transport equation) from whose solution an estimate of the
effect of interest can be inferred.
2.
Develop a Monte Carlo method to solve the equation.
In this course we will do BOTH approaches:
•
•
•
3 weeks of event-based mathematical basics
4 weeks of optimization of event-based transport methods
3 weeks function-based, using a more formal “functional analysis”
approach to the solution of integral and differential equations (with the
BTE as an example)
4
Our First Example: Finding p
•
Our first example will be a numerical estimation of ,
based on use of a “hit or miss” approach. We know
that the ratio of the area of circle to the area of the
square that (just barely) encloses it is:
Pr{hitting circle}=
•
p r2
 2r 
2

p
4
Knowing this, we can design an experiment that will
deliver an expected value of p
5
Our First Example (2)
1.
Choose a point at random inside a 2x2 square by:
A. Choosing a random number (x1) between -1 and 1 for the x
coordinate, and
B. Choosing a random number (x2) between -1 and 1 for the y
coordinate.
NOTE: By doing this you have made an implicit “change of
variable” to this (which is called a “unit hypercube of order 2”):
6
Our First Example (3)
2.
Score the result of a the trial: Consider a "hit" (score = 4) to be
the situation when the chosen point is inside the circle, i.e.,
x2  y 2  1
3.
a "miss" scoring 0. (Why does a success score 4?)
Run the experiment a large number (N) of times, with the final
estimate of the circle's area being an average of the results:
N
pˆ N 
s
i 1
i
N
7
Coding
•
This course will require lots of coding. You need to be able to
write code in Java
•
Syntax similar to other languages (if you stay non-object
oriented)
•
Free download of language
•
Useful for other things (webpages)
•
Simple tutorial available in the Public area of course
(JavaLite.pdf)
8
Basic view of MC process
•
Our basic view of a Monte Carlo process is a black
box that has a stream of random numbers (between
0 and 1) as input and a stream of estimates of the
effect of interest as output:
•
Sometimes the estimates can be quite approximate,
but with a long enough stream of estimates, we can
get a good sample.
9
3 Formulae
•
There are three statistical formulae that we
will be using over and over in this course:
•
•
•
•
Our estimate of the expected value,
Our estimate of the variance of the sample.
Our estimate of the variance of the expected
value.
You must be able to tell them apart
10
Estimate of the expected value
•
The first, and most important, deals with the how we
gather from the stream of estimates the BEST
POSSIBLE estimate of the expected value. The
resulting formula for xˆ N is:
N
xˆ N 
•
•
x
i 1
i
N
Thus, our best estimate is the unweighted average
of the individual estimates. This is not surprising, of
course.
Let’s compare with a couple of exact formulae.
11
Mean of continuous distribution
•
For a continuous distribution, p(x), over a range (a,b)
(i.e., x=a to x=b). the true mean, x , is the first
moment of x:
b
x   x p  x  dx
a
•
where we have assumed that p(x) is a true probability
density function (pdf), obeying the following:
p  x   0, in the range (a,b)

 p ( x) dx  1

12
Mean of discrete distribution
•
•
For a discrete distribution we choose one of
M choices, each of which probability p i
The equation for the mean is:
M
x   p i xi
i 1
•
Again, we have limitations on the
probabilities:
p i  0, for all i
M
p
i 1
i
1
13
Example: p problem
•
For our example of finding p , we were dealing with a
binomial distribution (i.e., two possible outcomes):
•
Outcome 1 = Hit the circle:
x1  4 p1  p / 4  0.7854
•
Outcome 2 = Miss the circle:
x2  0
•
p 2  1  p / 4  0.2146
Therefore, the expected value is:
M
p x
i 1
i i
 0.7854  4  0.2146  0  p
14
Estimate of the sample variance
Variance = Expected squared error
•

b
2
 x    p  x  x  x 
2
dx for a continuous distribution
a
M
  p i  xi  x  for a discrete distribution
2
i 1
•
MC estimate of variance:
N
  x   S  xn   
2
2
i 1
 xi  xˆN 
2
2
N
N

xi  xi  
N

   
N 1
N  1  i 1 N  i 1 N  
2
15
Sample standard deviation
•
•
The standard deviation is the square root of
the variance.
The same is true of our estimate of it:
S  xn   S 2  xn 
(Many texts call the estimate the “standard error”)
•
Recall that we have been talking about
properties of the sample distribution: How
much the individual estimates differ from each
other
16
Example: p problem
•
Using the same outcomes as before:

M
2
 x    p i  xi  x 
2
i 1
 0.7854   4  p   0.2146  p 2
2
 2.697
  x   2.697  1.642
•
Very non-normal distribution
17
Estimate of the variance of mean
•
Turn our attention to the variance and
standard deviation of the mean.
•
•
•
How much confidence we have in the mean that
we obtained from N samples
We could estimate this by making many
estimates of the mean (each using N
independent samples) and do a statistical
analysis on these estimates.
To our great relief, we can streamline this
process and get an estimate of the mean from
a single set of N samples
18
Variance of mean (cont’d)
•
•
The text has a derivation showing that the
variance of the mean is related to the
variance of the distribution by:
2

x

2
  xˆ N  
N
Since we do not know the actual variance, we
have to use our estimate:
 2  xˆ N   S 2  xˆ N  
  xˆ N   S  xˆ N  
S 2  xi 
N
S  xi 
N
19
Example: p problem
•
Back to our example of finding , using the
probabilities from the previous example, the
standard deviation of the mean for a sample
of N=10,000 would be:
  xˆ N   S  xˆ N  
•
S  xi 
1.642

 0.01642
N
10, 000
1
This brings us to the famous N 2 principle of
Monte Carlo: Each extra digit of accuracy
requires that the problem be run with 100
times as many histories.
20
Markov inequality
•
•
•
•
Most distributions are not normal
What can we say about the probability of a
selection being with 1  when it is NOT
normal?
An upper bound is given by the Chebyshev
inequality, but before attacking it, we need
to build a tool we will use: The Markov
inequality
Thought experiment: If I tell you that a
group of people has an average weight of
100 pounds, what can you say about the
number that weigh more than 200 pounds?
21
Markov inequality (2)

E  x   x   x p  x  dx
0


 x p  x  dx
(Because the range of integration is smaller)
nx


  nx  p  x  dx
(Because nx is the lower limit of x)
nx

 nx  p  x  dx (Because nx is a constant)
nx
 nx Pr  x  nx  (Because the integral defines the probability)

1
Pr  x  nx  
n
22
Chebyshev inequality
•
The Chebyshev applies the Markov to the
variance instead of the average:
E
 x  x 
2

   n Pr
2

•
2
 x  x 
2
 n 2


1

Pr  x  x   n 
n
Replace n with its square (it’s just a positive
number!)
2

2

1
Pr  x  x   n   2
n
1
 Pr  x  x  n   2
n
2
2
2
23
Chebyshev inequality (2)
The resulting statements you can say are not
very satisfying to us (especially since we are
used to normal distributions):
•
•
•
•
•
Normal: 68.3% within 1  vs Chebyshev: ?
Normal: 95.4% within 2  vs Chebyshev: ?
Normal: 99.7% within 3  vs Chebyshev: ?
But, Chebyshev is extremely valuable to
theoretical mathematicians because it proves
that the integral over the “tails” is guaranteed to
decrease with n, with a limit of 0.
24
Law of Large Numbers
•
•
Theoretical basis of Monte Carlo is the Law
of Large Numbers
LLN: The weighted average value of the
function, f :
N
b
f   f  x p  x  dx  lim
a
•
N 
 f x 
i
i 1
N
,
where xi chosen using p  x 
This relates the result of a continuous
integration with the result of a discrete
sampling. All MC comes from this.
25
Law of Large Numbers (2)
At first glance, this looks like this would be
useful for mathematicians trying to estimate
integrals, but not particularly useful to us—
We are not performing integrations - we are
simulating physical phenomena
This attitude indicates that you are just not
thinking abstractly enough—All Monte
Carlo processes are (once you dig down)
integrations over a domain of “all possible
outcomes”
•
•
•
Our values of “x” are over all possible histories
that a particle might have
26
Central limit theorem
•
•
•
The second most important (i.e., useful)
theoretical result for Monte Carlo is the
Central Limit Theorem
CLT: The sum of a sufficiently large number
of independent identically distributed
random variables (i.i.d.) becomes normally
distributed as N increases
This is useful for us because we can draw
useful conclusions from the results from a
large number of samples (e.g., 68.3%
within one standard deviation, etc.)
27
28
29