101 realizations of a normal distribution p(y) with y=50

Download Report

Transcript 101 realizations of a normal distribution p(y) with y=50

Lecture 6
Bootstraps
Maximum Likelihood Methods
Boostrapping
A way to generate
empirical probability distributions
Very handy for making
estimates of uncertainty
100 realizations of a normal
distribution p(y) with
y=50 sy=100
What is the distribution of
yest
1
=
Si yi
N
?
We know this should be a Normal
distribution with
expectation=y=50
and variance=sy/N=10
p(y)
y
p(yest)
yest
Here’s an empirical way of
determining the distribution
called
bootstrapping
N resampled data
Random integers
in the range 1-N
N original data
y1
4
y’1
y2
3
y’2
y3
7
y’3
y4
11
y’4
y5
4
y’5
y6
1
y’6
y7
9
y’7
…
…
…
yN
6
y’N
Compute estimate
1
Si y’i
N
Now repeat a gazillion times
and examine the resulting
distribution of estimates
Note that we are doing
random sampling with replacement
of the original dataset y
to create a new dataset y’
Note: the same datum, yi,
may appear several times in
the new dataset, y’
Does a cup drawn from
the pot
capture the statistical
behavior of what’s in the
pot?
pot of an infinite
number of y’s with
distribution p(y)
cup of N y’s
drawn from
the pot
Pour into new pot
p(y)
Take 1 cup
p(y)
More or less the same thing in the 2 pots ?
Random sampling easy to code in
MatLab
vector of N
random
integers
between 1
and N
yprime = y(unidrnd(N,N,1));
resampled data
original data
The theoretical and bootstrap results
match pretty well !
theoretical
Bootstrap with 105
realizations
Obviously
bootstrapping is of limited utility
when we know the theoretical
distribution
(as in the previous example)
but it can be very useful when we don’t
for example
what’s the distribution of syest
where (syest)2 = 1/(N-1) Si (yi-yest)2
and yest= (1/N) Si yi
(Yes, I know a statistician would know it follows
Student’s T-distribution …)
To do the bootstrap
we calculate
y’est= (1/N) Si y’i
(sy’est)2 = 1/(N-1) Si (y’i-y’est)2
and sy’est = (sy’est)2
many times – say 105 times
Here’s the bootstrap result …
p(syest)
I numerically calculate
an expected value of
92.8 and a variance
of 6.2
Bootstrap with 105
realizations
Note that the
distribution is not quite
centered about the
true value of 100
sytrue
syest
This is random
variation. The original
N=100 data are not
quite representative of
the an infinite
ensemble of normallydistributed values
So we would be justified saying
sy  92.6 ± 12.4
that is, 26.2, the 95%
confidence interval
The Maximum Likelihood
Distribution
A way to fit
parameterized probability distributions
to data
very handy when you have good reason
to believe the data follow a particular
distribution
Likelihood Function, L
The logarithm of
the probable-ness of a given dataset
N data y are all drawn from the same
distribution p(y)
the probable-ness of a single measurement yi is p(yi)
So the probable-ness of the whole dataset is
p(y1)  p(y2)  …  p(yN) = Pi p(yi)
L = ln Pi p(yi) = Si ln p(yi)
Now imagine that the distribution p(y) is known
up to a vector m of unknown parameters
write p(y; m) with semicolon as a reminder
that its not a joint probabilty
The L is a function of m
L(m) = Si ln p(yi; m)
The Principle of Maximum Likelihood
Chose m so that it maximizes L(m)
L/mi = 0
the dataset that was in fact observed is the most
probable one that could have been observed
Example – normal distribution of
unknown mean y and variance s2
p(yi) = (2p)-1/2 s-1 exp{ -½ s-2 (yi-y)2 }
L = Si ln p(yi) =
-½Nln(2p) –Nln(s) -½ s-2 Si (yi-y)2
L/y = 0 = s-2 Si (yi-y)
L/s = 0 = - N s-1 + s-3 Si (yi-y)2
N’s arise
because sum is
from 1 to N
Solving for y and s
0 = s-2 Si (yi-y)
y = N-1 Siyi
0 = -Ns-1 + s-3 Si (yi-y)2
s2 = N-1 Si (yi-y)2
Interpreting the results
y = N-1 Siyi
s2 = N-1 Si (yi-y)2
Sample mean is the maximum
likelihood estimate of the expected
value of the normal distribution
Sample variance (more-or-less*) is
the maximum likelihood estimate
of the variance of the normal
distribution
*issue of N vs. N-1 in the formula
Example – 100 data drawn from a normal distribution
true
y=50
s=100
L(y,s)
s
max
at
y=62
s=107
y
Another Example – exponential
distribution
Is this
p(yi) = ½ s-1 exp{ - s-1 |yi-y| }
parameter
really the
expectation ?
Is this parameter really variance ?
Check normalization … use z= yi-y
+
p(yi)dy =
- exp{ - s-1 |yi-y| } dyi
+
-1
= ½ s 2 0 exp{ - s-1 z } dz
= s-1 (-s) exp{-s-1z}|0+ = 1
½s-1
Is y the expectation ?
+
E(yi) = - yi ½ s-1 exp{ - s-1 |yi-y| } dyi
use z= yi-y
+
-1
E(yi) = ½ s - (z+y) exp{ - s-1|z| } dz
z exp(-s-1|z|) is odd function times
even function so integral is zero
=½
s-1
2y
= - y exp{ -
+
o exp{
s-1
- s-1 z } dz
z }|o
+
=y
YES !
Is s the variance ?
+
var(yi) = - (yi-y)2 ½ s-1 exp{ - s-1 |yi-y| } dyi
use z= s-1(yi-y)
+ 2 2
-1
E(yi) = ½ s - s z exp{ -|z| } s dz
=
s2
+ 2
0 z
= 2 s 2  s2
exp{ -z } dz
CRC Math Handbook gives this
integral as equal to 2
Not Quite …
Maximum likelihood estimate
L = Nln(½) – Nln(s) -
s-1
Si |yi-y|
|x|
x
d|x|/dx
L/y = 0 = - s-1 Si sgn (yi-y)
+1
L/s = 0 = - N s-1 + s-2 Si |yi-y|
y such that Si sgn (yi-y) = 0
y is the median of the yi’s
x
-1
Zero when half the yi’s
bigger than y, half of
them smaller
Once y is known then …
L/s = 0 = - N s-1 + s-2 Si |yi-y|
s = N-1 Si |yi-y| with y = median(y)
Note that when N is even, y is not unique,
but can be anything between the two
middle values in a sorted list of yi’s
Comparison
Normal distribution:
best estimate of expected value is
sample mean
Exponential distribution
best estimate of expected value is sample
median
Comparison
Normal distribution:
short tailed
outlier extremely uncommon
expected value should be chosen to make
outliers have as small a deviation as
possible
Exponential distribution:
relatively long-tailed
outlier relatively common
expected value should ignore actual value of
outliers
outlier
yi
median
mean
yi
median
mean
another important distribution
Gutenberg-Richter distribution
(e.g. earthquake magnitudes)
for earthquakes greater than some threshhold
magnitude m0, the probability that the earthquake
will have a magnitude greater than m is
P(m)=10
or
–b (m-m0)
P(m) = exp{ – log(10) b (m-m0) }
= exp{-b’ (m-m0) } with b’= log(10) b
This is a cumulative distribution, thus the
probability that magnitude is greater than m0
is unity
P(m) = exp{ –b’ (m-m0) } = exp{0} = 1
Probability density distribution is its derivative
p(m) = b’ exp { –b’ (m-m0) }
Maximum likelihood estimate of b’ is
L(m) = N log(b’) – b’ Si (mi-m0)
L/b’ = 0 = N/b’ - Si (mi-m0)
b’ = N / Si (mi-m0)
Originally Gutenberg & Richter
made a mistake …
Log10 P(m)
slope = -b
magnitude, m
… by estimating slope, b using least-squares,
and not the Maximum Likelihood formula
yet another important distribution
Fisher distribution on a sphere
(e.g. paleomagnetic directions)
given unit vectors xi that scatter around some mean
direction x, the probability distribution for the angle
q between xi and x (that is, cos(q)=xix) is
p(q) =
k
2 sinh(k)
sin(q) exp{ k cos(q) }
k is called
the
“precision
parameter”
Rationale for functional form
p(q)  exp{ k cos(q) }
For q close to zero q  1 – ½q2 so
p(q)  exp{ k cos(q) } = exp{k} exp{ – ½q2 }
which is a gaussian
I’ll let you figure out the
maximum likelihood estimate of
the central direction, x,
and the precision parameter, k