Lecture 10 - Lamont–Doherty Earth Observatory

Download Report

Transcript Lecture 10 - Lamont–Doherty Earth Observatory

Lecture 10
Hypothesis Testing
from a previous lecture …
Functions of a Random Variable
any function of a random variable is itself a random variable
If x has distribution p(x)
the y(x) has distribution
p(y) = p[x(y)] dx/dy
example
Let x have a uniform (white) distribution of [0,1]
p(x)
1
0
x
1
Uniform probability that x is anywhere between 0 and 1
Let y = x2
then x=y½
p[x(y)]=1 and dx/dy=½y-½
p(y)
So p(y)=½y-½ on the interval [0,1]
y
Another example
Let x have a normal distribution with zero expectation
and unit variance. To avoid complication, assume
x>0, so that the distribution is twice the usual
amplitude …
p(x) = 2 (2p)-1/2 exp(-½ x2)
The distribution of y=x2 is:
p(y) = p(x(y)) dx/dy = (2py)-1/2 exp(-½y)
note that we used, as before, dx/dy=½y-½

You can check that 0 p(y)dy=1 by looking up

0 y-1/2exp(-ay)dy=(p/a) in a math book.
singularity at
origin
p(x)
p(y)
x
y
Results not so different from uniform distribution
from a previous lecture …
Functions of Two Random Variables
any function of a several random variables is itself a random
variable
If (x,y) has joint distribution p(x,y)
then given u(x,y) and v(x,y)
Jacobian
determinant
then
p(u,v) = p[x(u,v),y(u,v)] |(x,y)/(u,v)|
note then that
p(u)=p(u,v)dv
and
p(v)=p(u,v)du
example
p(x,y) = 1/(2ps2) exp(-x2/2s2) exp(-y2/2s2)
= 1/(2ps2) exp(-(x2+y2)/2s2)
uncorrelated normal distribution of two variables with
zero expectation and equal variance, s
s=1
What’s the distribution of u=x2+y2 ?
We need to choose a function v(x,y). A reasonable choice is
motivated by polar coordinates, v=tan-1(x/y)
Then x=u1/2 sin(v) and y=u1/2 cos(v)
And the Jacobian determinant is
x/u
x/v
y/u
y/v
=
½ u-1/2 sin(v)
½u-1/2cos(v)
u1/2
-u1/2
s=1
cos(v)
sin(v)
We usually call
these r2 and q
in polar
coordinates
=
½ sin2(v)+½cos2(v) = ½
So p(x,y) = 1/(2ps2) exp(-(x2+y2)/2s2)
transforms to
p(u,v) = 1/(4ps2) exp(-u/2s2)
p(u) = 0 1/(4ps2) exp(-u/2s2) dv = 1/(2s2) exp(-u/2s2)
2p
Note: 0p(u)du =
exp(-u/2s2)|0 = 1
p(u)
as expected
u
The point of my showing you this is to give
you the sense that computing
the probability distributions
associated with functions of random
variables is
not particularly mysterious
but instead is rather routine
(though possibly algebraically tedious)
Four (and only four) Important Distributions
Start with a bunch of random variables, xi
that are uncorrelated, normally distributed, with zero
expectation and unit variance
The four important distributions are:
The distribution of xi, itself and the
distributions of three possible choices of u(x0,x1…)
u = Si=1Nxi2
u = x0 / { N-1 Si=1Nxi2 }
u = { N-1Si=1N xi2} / { M-1Si=1M xN+i2 }
Important Distribution #1
Distribution of xi itself
(normal distribution with zero mean and unit variance)
p(xi)=(2p)-½ exp{-½xi2}
Suppose that a random variable y has expectation y and variance sy2.
Then note the variable
Z = (y-y)/sy
Is normally distributed with zero mean and unit variance.
We show this by noting p(Z)=p(y(Z)) dy/dZ with dy/dZ=sy, so that
p(y)=(2p)-½ s-1exp{-½xi2} transforms to p(Z)=(2p)-½ exp{-½Z2}
x
p(x)
properties of the normal distribution
(with zero expectation and unit variance)
p(xi) = (2p)-½ exp{-½xi2}
Mean = 0
Mode = 0
Variance = 1
Important Distribution #2
Distribution of u = Si=1Nxi2
the sum of squares of N normally-distributed random
variables with zero expectation and unit variance
This is called the “chi-squared distribution with N degrees
of freedom” and u is given the special symbol u=cN2
We have already computed the N=1 and N=2 cases!
p(cN2)
N
Here’s the cases we worked out
cN 2
properties of the chi-squared distribution
1
2)
p(cN =
[cN2]½N-1 exp{ -½ [cN2] }
2½N (½N-1)!
Mean = N
Mode =
0
if N<2
N-2 otherwise
Variance = 2N
Important Distribution #3
Distribution of u = x0 / { N-1 Si=1Nxi2 }
the ratio of
a normally-distributed random variable with zero
expectation and unit variance
and
the square-root of the sum of squares of N normallydistributed random variables with zero expectation and unit
variance, divided by N
This is called “student’s t distribution with N degrees of
freedom” and u is given the special symbol u=tN
N
p(tN)

Note N=1
case is very
“long-tailed”
tN
Looks pretty much like a Gaussian …
in fact, is a Gaussian in the limiting
case N
properties of student’s tN distribution
p(tN) =
(½N-½)!
{1 + N-1tN2}-½(N+1)
(Np) (½N-1)!
Mean = 0
Mode = 0
Variance =

if N<3
N/(N-2) otherwise
Important Distribution #4
Distribution of u = { N-1Si=1N xi2} / { M-1Si=1M xN+i2 }
the ratio of
the sum of squares of N normally-distributed random variables with
zero expectation and unit variance, divided by N
and
the sum of squares of M normally-distributed random variables with
zero expectation and unit variance, divided by M
This is called “F distribution with N and M degrees of freedom” and u
is given the special symbol u=FN,M
p(FN,M)
N
FN,M
M
properties of the FN,M distribution
p(FN,M) = too complicated for me to type in
Mean = M/(M-2)
if M>2
Mode = {(N-2)/N} / {M/(M+2)}
Variance = 2M2(N+M-2)
N(M-2)2(M-4)
if M>4
if N>2
Hypothesis Testing
The Null Hypothesis
always a variant of this theme:
the results of an experiment differs
from the expected value only because
of random variation
Test of Significance of Results
say to 95% significance
The Null Hypothesis would generate
the observed result less than 5% of the
time
Example: You buy an automated pipe-cutting
machine that cuts a long pipes into many segments
of equal length
Specifications:
calibration (mean, mm): exact
repeatability (variance, sm2): 100 mm2
Now you test the machine by having it cut 25 10000mm length pipe segments. You then measure and
tabulate the length of each pipe segment, Li.
Question 1: Is the machine’s calibration correct?
Null Hypothesis: any difference between the mean length of the test
pipe segments from the specified 10000 mm can be ascribed to
random variation
you estimate the mean of the 25 samples: mobs=9990 mm
The mean length deviates (mm-mobs)=10 mm from the setting of 10000.
Is this significant?
Note from a prior lecture, the variance of the mean is
smean2 = sdata2/N.
So the quantity
Z = (mm-mobs) / (sm/N) where mobs=N-1SiLi
is a normally-distributed with zero expectation and
unit variance.
In our case Z = 10 / (10/5) = 5
Scaling a
quantity so it
has zero mean
and unit
variance is an
important trick
Z=5 means that mm is 5 standard deviations from the
expected value of zero.
The amount of area under the normal distribution that is ±5
standard deviations away from the mean is very small.
We can calculated it using the Excel function:
NORMDIST(x,mean,standard_dev,cumulative)
x is the value for which you want the distribution.
mean is the arithmetic mean of the distribution.
standard_dev is the standard deviation of the distribution.
Cumulative is a logical value that determines the form of the
function. If cumulative is TRUE, NORMDIST returns the
cumulative distribution function; if FALSE, it returns the
probability mass function.
=2*NORMDIST(-5,0,1,TRUE) = 5.7421E-07 = 0.00006%
Factor of two to account for both tails
Thus the Null Hypothesis
that the machine is well-calibrated
can be excluded
to very high probability
Question 2: Is the machine’s repeatability within
specs?
Null Hypothesis: any difference between the
repeatability (variance) of the test pipe segments
from the specified sm2=100 mm2 can be ascribed
to random variation
The quantity xi = (Li-mm) / sm is normallydistributed with mean=0 and variance=1, so
The quantity cN2 = Si (Li-mm)2 / sm2 is chi-squared
distributed with 25 degrees of freedom.
Suppose that the root mean squared variation of pipe lengths
was [N-1 Si (Li-mm)2]½ = 12 mm.
Then c252 = Si (Li-mm)2 / sm2 = 25  144 / 100 = 36
CHIDIST(x,degrees_freedom)
x is the value at which you want to evaluate the distribution.
degrees_freedom is the number of degrees of freedom.
CHIDIST = P(X>x), where X is a y2 random variable.
The probability that c252  36 is CHIDIST(36,25)=0.07 or 7%
Thus the Null Hypothesis
that the difference from the expected
result of 10 is random variation
cannot be excluded
(not to greater than 95% probability)
Question #3
But suppose the manufacturer had not
stated a repeatability specs
just a calibration spec
you can’t test the calibration
using the quantity
Not
known
Z = (mm-mobs) / (sm/N)
Since the manufacturer has not supplied a
variance, we must estimate it from the data
sobs2= [N-1 Si (Li-mm)2] = 144 mm2.
and use it in the formula
(mm-mobs) / (sobs/N)
But the quantity
(mm-mobs) / (sobs/N)
is not cN2 distributed
because sobs is itself a random variable
it’s t-distributed
remember: tN = x0 / { N-1 Si=1Nxi2 }
In our case tN = 10 / (12/5) = 4.16
TDIST(x,degrees_freedom,tails)
x is the numeric value at which to evaluate the
distribution.
Degrees_freedom is an integer indicating the number of
degrees of freedom.
Tails specifies the number of distribution tails to return. If
tails = 1, TDIST returns the one-tailed distribution. If tails
= 2, TDIST returns the two-tailed distribution.
TDIST is calculated as TDIST = P(x<X), where X is a
random variable that follows the t-distribution.
tN = TDIST(4.16,25,1) = 0.00016 = 0.016%
Thus the Null Hypothesis
that the difference from the expected result
of 10000 is due to random variation
can be excluded
to high probability,
but not nearly has high as when the
manufacturer told us the repeatability
Question #4
Suppose you performed the test twice, a
year apart, and wanted to know
has the repeatability changed?
This Year: [N-1 Si (Lyr1i-mm)2]½ = 12 mm
Last Year: [M-1 Si (Lyr2i-mm)2]½ = 14 mm
(lets say N=M=25 in both cases)
Null Hypothesis: any difference between the
repeatability (variance) of the test pipe segments
between years can be ascribed to random
variation
The ratio of mean-squared error is F-distributed
FN,M = { N-1Si=1N xi2} / { M-1Si=1M xN+i2 }
= 12/14 = 0.857
p(FN,M)
Note that since F is of
the form F=a/b with
both a and b fluctuating
around a mean value,
that we really want the
cumulative probability
that F<12/14 and
F>14/12
12/14
14/12
1
FN,M
FDIST(x,degrees_freedom1,degrees_freedom2)
x is the value at which to evaluate the function.
Degrees_freedom1 is the numerator degrees of freedom.
Degrees_freedom2 is the denominator degrees of freedom.
FDIST is calculated as FDIST=P( F<x ), where F is a random variable that
has an F distribution.
Since P(F>x) = 1-P(F<x)
Left hand tail:
1-FDIST(0.857,25,25) = 1-0.648=0.352=35.2%
Right hand tail:
1-FDIST(1/0.857,25,25) = 1-0.648=0.352=35.2%
Both tails: 70.4%
Thus the Null Hypothesis
that the year-to-year difference in
variance is due to random variation
cannot be excluded
there is no strong reason to believe that
the repeatability of the machine has
changed between the years
Question #5 :Suppose you performed the test twice, a
year apart, and wanted to know if the calibration
changed.
This Year:
myr1obs = N-1 Si Lyr1i = 9990
syr1obs = [N-1 Si (Lyr1i)2]½ = 12 mm
Last Year:
myr2obs = N-1 Si Lyr2i = 9993
syr2obs = [M-1 Si (Lyr2i-mm)2]½ = 14 mm
(lets say N=M=25 in both cases)
The problem that we face is that while
tyr1N= (myr1obs – mm) / (syr1obs/N)
and
tyr2N= (myr2obs – mm) / (syr2obs/N)
Are individually t-distributed, their difference,
tyr1N – tyr2N
Is not t-distributed. Statisticians have circumvented
this problem by cooking up a function of (myr1obs ,
myr2obs , syr1obs, syr1obs) that is approximately tdistributed. But it’s messy.
In our case
Note: Excel’s function TTEST() allows you to
perform the test on columns of data, without typing
the the formulas … very handy!
19%
Thus the Null Hypothesis
that the difference in means is due to
random variation
cannot be excluded
5 tests
mobs = mprior when mprior and sprior are known
normal distribution
sobs = sprior when mprior and sprior are known
chi-squared distribution
mobs = mprior when mprior is known but sprior is unknown
t distribution
s1obs = s2obs when m1prior and m2prior are known
F distribution
m1obs = m2obs when s1prior and s2prior are unknown
modified t distribution
Example 1: LaGuardia Airport Mean Daily Temperature
Was the 5-year period 1950-1954 significantly warmer
or cooler than the 5-year period 2000-2004?
1950-1954
2000-2004
Null Hypothesis: any differences between the mean temperatures
of these two time periods can be ascribed to random variation
Type of Test: t-test modified to test two means
Results
1950-1954 Mean Temperature = 55.8658±0.77
1950-1954 Mean Temperature = 55.8792±0.80
T-test Significance Probability 49%
The Null Hypothesis, that the difference in means is due to
random variation, cannot be rejected
Issue about noise
Note that we are estimating s by treating the short-term (days-tomonths) temperature fluctuations as ‘noise’
Is this correct?
Certainly such fluctuations are not measurement noise in the
normal sense.
They might be considered ‘model noise’ in the sense that they are
caused by weather systems that are unmodeled (by us)
However, such noise probably does not meet all the requirements
for use in the statistical test. In particular, it probably has
some day-to-day correlation (hot today, hot tomorrow, too)
that violated our implicit assumption of uncorrelated noise.
Example 2: Does a parabola fit better
than a straight line?
Discharge, cfs
First 7 days of data on Neuse River
Hydrograph shown in an early lecture
N=7
day
A parabola will
always
fit better than a straight line
because it has an extra parameter
But does it fit significantly better?
Null Hypothesis: Any difference in fit is
due to random variation
Linear Fit
Quadratic Fit
Approximation: ratio of prediction errors follows an F-distribution with
the number of degrees of freedom given by the number of data minus the
number of parameters in the fit
(N-2)-1Si=1N (diobs-dipre)2 =153431
(N-3)-1Si=1N (diobs-dipre)2 =6985
F= 153431/ 6985 = 21.96
P(F<21.96) = 1-FDIST(21.96,5,4) = 0.995 = 99.5%
The Null Hypothesis can be rejected
with
99.5% confidence
Another Issue about noise
Note that we are again basing estimates upon
‘model noise’
in the sense that the prediction error is being controlled
– at least partly - by the misfit of the curve, as well as
by measurement error
As before, such noise probably does not meet all the
requirements for use in the statistical test. So the test
needs to be used with some caution.