Introduction to hypothesis testing

Download Report

Transcript Introduction to hypothesis testing

Introduction to hypothesis
testing
Idea
1. Formulate research hypothesis H1

New theory, effect of a treatment etc.
2. Formulate an opposite hypothesis H0


Theory is wrong, there is no effect, status
quo
This is often called “null hypothesis”
 The two hypotheses should be mutually
exclusive: one or the other must be true
3. See if H0 could be rejected
Rules for rejecting
 Assume that H0 is true
 Use existing knowledge to construct a
probability model for hypothetical data: what
would be the relative frequency distribution of
hypothetical data x after infinite repetition of
sampling if the H0 was true

i.e. define p(x | H0)
 Set up a decision rule: if the actual observed
data x0 hits a “critical region” then reject H0
accept H1
Choosing the critical region
 There is no theory for that
 Common practice: P(x>k |H0)=0.05, k is
called “critical value”


If x0>= k, then reject H0
If x0<k, do not reject H0
 In other words: if P(x>x0| H0 )<0.05, then
reject H0
 P(x>x0 |H0) : p-value
P-value in words
 Frequency probability of hypothetical data
being more “extreme” than the observed data
we have, given that the null hypothesis is
true.
 Frequency probability statement about data
that we do not have under the assumption
that null hypothesis is true
What p-value is NOT
1. Probability of hypothesis being true given
the observed data P(H0|x0)
2. Probability or probability density of observed
data given that the hypothesis is true
P(x0|H0)
3. Risk of being wrong if you claim that H0 is
false
How to interpret in terms of H0 ?
 There is no quantitative interpretation
 Qualitatively, what can it mean:

If p-value <0.05 :




H0 is false, there is a true and practically meaningful effect
H0 is false, there is a true but practically meaningless effect,
your sample size was large enough to pick that up
H0 is true, you were just “lucky”
If p-value >0.05



H0 is false, there is a meaningful effect, but your sample size
was too small to pick that up
H0 is false, there is a tiny effect, but your sample size was too
small to pick that up
H0 is true, you were not lucky
 Effect size, amount of data and plausibility of H0 all affect the
qualitative conclusion
Statistical significance?
 If the test statistics hits the critical region, the
observed effect is often said to be statistically
significant
 However: there is no statistical theory which
would define the concept directly
 Meaning of statistical significance is defined
by the person who designs the test: choice of
critical region
Publication bias
 H1: Rats who hear Rolling Stones live longer
or shorter than those who hear Led Zeppelin
 H0: No effect
 100 independent experiments by different
research teams:


8 studies with p-value < 0.05,
92 studies with p-value > 0.05
 Because journals like p-value <0.05, those
papers are much more likely to be published
Consequence?
 When studied enough, p-value will eventually be less
than 0.05 independent of the true state of H0
 Sensational findings pop-up easily
 Publication policy exaggerates the phenomenon
 Interpretation of p-value, once again?



small p-value: there is a potential mismatch between
the observations and H0
not-so-small p-value: H0 predicts the data reasonably
well
Note, regardless of the p-value, the “mismatch”
between H1 and data could be small or large
Example: effect of a treatment
 H0: no effect: the means are equal
 H1: some effect: the means differ
 Assume equal and known variance in both groups
 Assume normal distribution in both groups
 Data: 10 measurements from both control and
treatment
 Summarize data with a statistic:
t
m1  m2
2
1
2
2
s
s

10 10
m: group mean
s: group standard deviation
Distribution of hypothetical data
 Under H0, the t-statistic will follow a t-
distribution with degrees of freedom equal to
18 (number of observations -2) when
sampling is repeated infinitely
 Critical region defined by the t-distribution
after choosing 0.05 as significance level


P(| t |>k | H0)=0.05, k=2.1
if |t| > 2.1, reject H0