Monte Carlo simulation - University of South Carolina

Download Report

Transcript Monte Carlo simulation - University of South Carolina

Random Numbers and
Simulation

Generating truly random numbers is not
possible
• Programs have been developed to generate
pseudo-random numbers
• Values are generated from deterministic
algorithms
© Fall
2011 John Grego and the University of South Carolina
1
Random Numbers
Pseudo-random deviates can pass any
statistical test for randomness
 They appear to be independent and
identically distributed
 Random number generators for common
distributions are available in R
 Special techniques (STAT 740) may be
needed as well

2
Monte Carlo Simulation

Some common uses of simulation
• Modeling stochastic behavior
• Calculating definite integrals
• Approximating the sampling distribution of a
statistics (e.g., maximum of a random
sample)
3
Modeling Stochastic Behavior



Buffon’s needle
Random Walk
Observe X1, X2, …, where
p=P(Xi=1)=P(Xi=-1)=.5 and study
S1,S2,…, where
i
Si   X j
j 1
4
Modeling Stochastic Behavior

This is also called Gambler’s ruin; each
Xi represents a $1 bet with a return of
$2 for a win and $0 for a loss.
5
Gambler’s Ruin


The properties of a fair game (p=.5) are
a lot more interesting than the properties
of an unfair game (p≠.5)
Some properties of this process are easy
to anticipate (E(S))
6
Gambler’s Ruin

Some properties are difficult to
anticipate, and can be aided by
simulation.
• Expected number of returns to 0
• Expected length of a winning streak
• Probability of going broke given an initial
bank
7
Calculating Definite Integrals

In statistics, we often have to calculate
difficult definite integrals (posterior
distributions, expected values)
I

b
h(x)dx
a
(here, x could be multidimensional)

8
Calculating Definite Integrals

Example 1
4
I1   0
2 dx
1 x
1


Example 2
I2 
  (4  x
1
1
0
0
2
1
 2x )dx 2 dx1
2
2
9
Hit-or-Miss Monte Carlo

Example 1
4
h(x) 
1 x 2
 1 4

1)  arctan(0))  4  /4   
 0
2 dx  4(arctan(
 1 x


Determine c such that c≥h(x) across
entire region of interest (here, c=4)
10
Hit-or-Miss Monte Carlo
Generate n random uniform (Xi,Yi) pairs,
Xi’s from U[a,b] (here, U[0,1]) and Yi’s
from U[0,c] (here, U[0,4])
 Count the number of times (call this m)
that Yi is less than h(Xi)
 Then I1 ≈c(b-a)m/n

• I.e., (height)(width)(proportion under curve)
11
Classical Monte Carlo
Integration
I

h(x)dx
a
Take n random uniform values, U1,…,Un
over [a,b] and estimate I using



b
n
ba
I
hU i 

n i1
This method seems straightforward, but is
actually more efficient than Hit-or-Miss
Monte
Carlo

12
Expected Value of a Function of
a Random Variable

Suppose X is a random variable with
density f. Find E[h(x)] for some function
h, e.g.,
E X
E
2

 X
E sinX 
13
Expected Value of a Function of
a Random Variable
E hX  
 hxdx
X

For n random values X1, X2, …, Xn from
the distribution of X (i.e., with density f),

1 n
Eh X    h X i 
n i 1
14
Examples
Example 3: If X is a random variable with
a N(10,1) distribution, find E(X2)
 Example 4: If Y is a random variable with
a Beta(5,1) distribution, E(-lnY)
 There are more advanced methods of
integration using simulation (Importance
Sampling)

15
Integration
integrate() performs numerical
integration for functions of a single
variable (not using simulation techniques)
 adapt() in the adapt package performs
multivariate numerical integration

16
Approximating the Sampling
Distribution of a Statistic
To perform inference (CI’s, hypothesis
tests) based on sampling statistics, we
need to know the sampling distribution of
the statistics, at least up to an
approximation
 Example: X1, X2, …, Xn ~ iid N(m,s2).

X m
T
has a t(n 1) distribution
s n
17
Approximating the Sampling
Distribution of a Statistic

What if the data’s distribution is not
known?
• Large sample: Central Limit Theorem
• Small sample: Normal theory or
nonparametric procedures based on
permutation distributions
18
Approximating the Sampling
Distribution of a Statistic

If the population distribution is known, we
can approximate the sampling distribution
with simulation.
• Repeatedly (m times) generate random
samples of size n from the population
distribution
• Calculate a statistic (say, S) each time
• The empirical (observed) distribution of Svalues approximates the true distribution of S
19

Example

X1, X2, X3, X4 ~Expon(1)

What is the sampling distribution of:
X (t he mean)
max(X)  min(X)
(t he midrange)
2
20