Hypothesis Testing
Download
Report
Transcript Hypothesis Testing
Hypothesis Testing
The goal of hypothesis testing is to set up a procedure(s) to allow us to decide
if a model is acceptable in light of our experimental observations.
Example: A theory predicts that BR(Higgs+ -)= 2x10-5 and you measure (42) x10-5 .
The hypothesis we want to test is “are experiment and theory consistent?”
Hypothesis testing does not have to compare theory and experiment.
Example: CLEO measures the Lc lifetime to be (180 7)fs while SELEX measures (198 7)fs.
The hypothesis we want to test is “are the lifetime results from CLEO and SELEX consistent?”
There are two types of hypotheses tests: parametric and non-parametric
Parametric: compares the values of parameters (e.g. does the mass of proton = mass of electron ?)
Non-parametric: deals with the shape of a distribution (e.g. is angular distribution consistent with being flat?)
Consider the case of neutron decay. Suppose we have two
theories that both predict the energy spectrum of the
electron emitted in the decay of the neutron. Here a
parametric test might not be able to distinguish between
the two theories since both theories might predict the
same average energy of the emitted electron.
However a non-parametric test would be able to
distinguish between the two theories as the shape of the
energy spectrum differs for each theory.
880.P20 Winter 2006
Richard Kass
1
Hypothesis testing
A procedure for using hypothesis testing:
a)
b)
c)
Measure (or calculate) something
Find something that you wish to compare with your measurement (theory, experiment)
Form a hypothesis (e.g. my measurement, x, is consistent with the PDG value) H0: x=xPDG
H0 is called the “null hypothesis”
d)
e)
Calculate the confidence level that the hypothesis is true
Accept or reject the hypothesis depending on some minimum acceptable confidence level
Problems with the above procedure
a)
b)
c)
What is a confidence level ?
How do you calculate a confidence level?
What is an acceptable confidence level ?
How would we test the hypothesis “the space shuttle is safe?”
Is 1 explosion per 10 launches safe? Or 1 explosion per 1000 launches?
A working definition of the confidence level:
The probability of the event happening by chance.
Example: Suppose we measure some quantity (X) and we know that it is described by a gaussian
pdf with =0 and s=1. What is the confidence level for measuring X 2 (i.e. 2s from the mean)?
2
2
P( X 2) P( ,s , x)dx P(0,1, x)dx
2
1
2
-x
e 2
dx 0.025
2
Thus we would say that the confidence level for measuring X 2 is 0.025 or 2.5%
and we would expect to get a value of X 2 one out of 40 tries if the underlying pdf is gaussian.
880.P20 Winter 2006
Richard Kass
2
Hypothesis testing Cautions
Two types of errors associated with hypothesis testing:
Type I: reject H0 when it is true
Type II: accept H0 when it is false
f(x|q0)
For the case where H0: qq0
and the alternative H1: qq1
we can calculate the probability
of a Type I or Type II error.
Assume we reject H0 if x>xc.
area=prob. of a Type I error
x
f(x|q1)
xc
prob .Type I error f(x | q 0 )dx
prob .Type II error
area=prob. of a Type II error
xc
xc
f(x | q )dx
1
-
x
xc
A few cautions about using confidence limits
a) You must know the underlying pdf to calculate the limits.
Example: suppose we have a scale of known accuracy (s = 10 gm ) and we weigh
something to be 20 gm. Assuming a gaussian pdf we could calculate a 2.5% chance that
our object weighs ≤ 0 gm?? We must make sure that the probability distribution is
defined in the region where we are trying to extract information.
b) What does a confidence level really mean? Classical vs Baysian viewpoints
880.P20 Winter 2006
Richard Kass
3
Hypothesis Testing-Gaussian Variables
We wish to test if a quantity we have measured (=average of n measurements )
is consistent with a known mean (0).
Test
= o
Conditions
s2 known
= o
s2 unknown
Test Statistic
- o
s/ n
- o
s/ n
Test Distribution
Gaussian
avg. s
t(n -1)
student’s “t-distribution”
with n-1 DOF
Example: Do free quarks exist? Quarks are nature's fundamental building blocks and are
thought to have electric charge (|q|) of either (1/3)e or (2/3)e (e = charge of electron).
Suppose we do an experiment to look for |q| = 1/3 quarks.
H0: 0.90.2=0.33
We measure: q = 0.90 ± 0.2 This gives and s
Quark theory: q = 0.33
This is
We want to test the hypothesis = o when s is known. Thus we use the first line in the table.
- o 0.9 - 0.33
z
2.85
s/ n
0.2 / 1
We want to calculate the probability for getting a z 2.85, assuming a Gaussian pdf.
2.85
2.85
prob(z 2.85) P( , s ,x)dx P(0,1, x)dx
1
2
2
-x
2
e
dx 0.002
2.85
The CL here is just 0.2 %! What we are saying here is that if we repeated our experiment 1000
times then the results of 2 of the experiments would measure a value q 0.9 if the true mean
was q = 1/3. This is not strong evidence for q = 1/3 quarks!
880.P20 Winter 2006
If acceptable CL=5%, then we would reject H0
Richard Kass
4
Hypothesis Testing-Gaussian Variables
Do charge 2/3 quarks exist?
If instead of q = 1/3 quarks we tested for q = 2/3 what would we get for the CL?
Now we have = 0.9 and s = 0.2 as before but o = 2/3.
We now have z = 1.17 and prob(z 1.17) = 0.13 and the CL = 13%.
H0: 0.90.2=0.67
Now free quarks are starting to get believable!
If acceptable CL=5%, then we would accept H0
Another variation of the quark problem
Suppose we have 3 measurements of the charge q:
q1 = 1.1, q2 = 0.7, and q3 = 0.9
We don't know the variance beforehand so we must determine the variance from our data.
Thus we use the second test in the table.
= (q1+ q2+ q3)/3 = 0.9
n
(qi - )2
0.2 2 + (-0.2) 2 + 0
0.04
n -1
2
- o 0.9 - 0.33
z
4.94
s/ n
0.2 / 3
s2 i1
H0: our charge measurements are
consistent with q=1/3 quarks
In this problem z is described by Student’s t-distribution.
Note: Student is the pseudonym of statistician W.S. Gosset who was employed by a famous English brewery.
Just like the Gaussian pdf, in order to evaluate the t-distribution one must resort to a look up table
(see for example Table 7.2 of Barlow).
In this problem we want prob(z 4.94) when n-1 = 2. The probability of z 4.94 is 0.02.
This is ~ 10X greater than the 1st part of this example where we knew the variance ahead of time.
If acceptable CL=5%, then we would reject H0
880.P20 Winter 2006
Richard Kass
5
Hypothesis Testing-Gaussian Variables
Tests when both means are unknown but come from a gaussian pdf:
Test
1 - 2 = 0
s1
1 - 2 =0
2
Conditions
and s22 known
s12 = s22 = s2
unknown
Test Statistic
1 - 2
s12 / n + s 22 / m
1 - 2
Q 1 / n +1 / m
Q2
1 - 2 =0
s1 s2
2
2
unknown
Test Distribution
Gaussian
avg. s
t (n+m-2)
( n-1)s12 +( m-1)s 2
2
n+ m- 2
1 - 2
approx. Gaussian
avg. s
s12 / n + s22 / m
n and m are the number of measurements for each mean
Example: Do two experiments agree with each other?
CLEO measures the Lc lifetime to be (180 7)fs while SELEX measures (198 7)fs.
z
1 - 2
198 - 180
1.82
2
2
2
2
s1 / n + s 2 / m
(7 ) + (7 )
H0: 1807=198 7
2
1.82
1.82
-1.82
-1.82
P( z 1.82) 1 - P( ,s , x)dx 1 - P(0,1, x)dx 1 -
1
2
1.82 - x
e 2
dx 1 - 0.93 0.07
-1.82
Thus 7% of the time we should expect the experiments to disagree at this level.
If acceptable CL=5%, then we would accept H0
880.P20 Winter 2006
Richard Kass
6
Hypothesis Testing-non-Gaussian Variables
Assume we have a bunch of measurements (xi’s) that are NOT from a gaussian pdf.
(e.g. they could be from a poisson distribution)
We can calculate a quantity that looks like a c2 which compares our data (di) with
corresponding predictions (PR(xi)) from a pdf:
2
(
d
PR
)
i
X2 i
PRi
i 1
n
K. Pearson showed that for a wide variety of pdfs (e.g. Poisson) the above
test statistic becomes distributed according to a c2 pdf with n-1 dof.
For example, the data are from a poisson distribution with mean=.
e - m
P(m, )
m!
If we have a total of N events in our sample then we predict:
NP(m,) events for a given value of m (= 0, 1, 2, 3, 4….)
At the same time, we have observed N(m) events.
If NP(m,) >5 for all m then the following is approximately a c2 with n-1 dof.
M
( N m - NP(m, )) 2
2
X
NP(m, )
m 0
M
( N m - NP(m, )) 2
( N m - NP(m, )) 2
Note for a Poisson NP(m, ) s X
NP(m, )
s m2
m 0
m 0
M
2
m
880.P20 Winter 2006
2
Richard Kass
7
Hypothesis Testing-Pearson’s c2 Test
The following is the numbers of neutrino events detected in 10 second intervals
by the IMB experiment on 23 February 1987 around which time the supernova
S1987a was first seen by experimenters:
#events
0
1
2
3
4
5
6
7
8 9
#intervals
1024
860
307
58
15
3
0
0
0 1
Assuming the data is described by a Poisson distribution.
Calculate the average & compute the average number events expected in an interval.
8
(# events ) (# intervals )
i0
8
#intervals
= 0.777 if we include
interval with 9 events
0.774
i0
We can calculate a c2 assuming the data are described by a Poisson distribution:
i8
e- n
The predicted number of intervals is given by: # intervals #intervals
(# intervals - prediction ) 2
c
3.6
prediction
i0
2
8
i0
Note: we use
s2=prediction
n!
for a Poisson
#events
0
1
2
3
4
5
6
7
8
9
#intervals predicted
1064
823
318
82
16
2
0.3
0.03
0.003
0.0003
There are 7 (= 9-2) DOF’s here and the probability of c2/D.O.F. = 3.6/7 is high (≈80%), indicating a good fit to a Poisson
(# intervals - prediction ) 2
2
3335 and c /D.O.F. = 3335 / 8 417.
prediction
i0
2
The probability of getting a c /D.O.F. which is this large from a Poisson distribution with = 0.774 is 0.
9
However, if the last data point is included: c 2
Hence the nine events are most likely coming from the supernova explosion and not just from a Poisson.
Reject H0: “interval with 9 events is from Poisson with =0.77”
880.P20 Winter 2006
Richard Kass
8