Testing hypotheses with tests of significance

Download Report

Transcript Testing hypotheses with tests of significance

Significance testing
Ioannis Karagiannis
(based on previous EPIET material)
18th EPIET/EUPHEM Introductory course
28.09.2012
The idea of statistical inference
Generalisation to the population
Conclusions based
on the sample
Population
Hypotheses
Sample
2
Inferential statistics
• Uses patterns in the sample data to draw
inferences about the population represented,
accounting for randomness
• Two basic approaches:
– Hypothesis testing
– Estimation
• Common goal: conclude on the effect of an
independent variable on a dependent variable
3
The aim of a statistical test
To reach a deterministic decision (“yes” or “no”)
about observed data on a probabilistic basis.
4
Why significance testing?
Norovirus outbreak on a Greek island:
“The risk of illness was higher among people
who ate raw seafood (RR=21.5).”
Is the association due to chance?
5
The two hypotheses
There is NO difference between Null Hypothesis (H0)
the two groups
(=no effect)
(RR=1)
There is a difference between
the two groups
(=there is an effect)
Alternative Hypothesis
(H1)
(e.g.: RR=21.5)
When you perform a test of statistical significance,
you reject or do not reject the Null Hypothesis (H0)
6
Norovirus on a Greek island
• Null hypothesis (H0): “There is no association
between consumption of raw seafood and
illness.”
• Alternative hypothesis (H1): “There is an
association between consumption of raw
seafood and illness.”
7
Hypothesis testing
• Tests of statistical significance
• Data not consistent with H0 :
– H0 can be rejected in favour of some alternative
hypothesis H1 (the objective of our study).
• Data are consistent with the H0 :
– H0 cannot be rejected
You cannot say that the H0 is true.
You can only decide to reject it or not reject it.
8
p value
p value = probability that our result (e.g. a
difference between proportions or a RR) or
more extreme values could be observed under
the null hypothesis
H0 rejected using reported p value
9
p values – practicalities
Low p values = low degree of compatibility between H0
and the observed data:
association unlikely to be by chance
you reject H0, the test is significant
High p values = high degree of compatibility between H0
and the observed data:
association likely to be by chance
you don’t reject H0, the test is not
significant
10
Levels of significance – practicalities
We need of a cut-off !
1%
5%
10%
p value > 0.05 = H0 not rejected (non significant)
p value ≤ 0.05 = H0 rejected (significant)
BUT:
Give always the exact p-value rather than „significant“
vs. „non-significant“.
11
Examples from the literature
• ”The limit for statistical significance was set at p=0.05.”
• ”There was a strong relationship (p<0.001).”
• ”…, but it did not reach statistical significance (ns).”
• „ The relationship was statistically significant (p=0.0361)”
p=0.05 
Agreed convention
Not an absolute truth
”Surely, God loves the 0.06 nearly as much
as the 0.05” (Rosnow and Rosenthal, 1991)
12
p = 0.05 and its errors
• Level of significance, usually p = 0.05
• p value used for decision making
But still 2 possible errors:
H0 should not be rejected, but it was rejected :
Type I or alpha error
H0 should be rejected, but it was not rejected :
Type II or beta error
13
Types of errors
Truth
No diff
H0
Decision
based
on the
p value
H0 not rejected
No diff
H0 rejected (H1)
Diff
to be not rejected
Right decision
1-

Type I error
Diff
H0 to be rejected (H1)

Type II error
Right decision
1-
• H0 is “true” but rejected: Type I or  error
• H0 is “false” but not rejected: Type II or  error
14
More on errors
• Probability of Type I error:
– Value of α is determined in advance of the test
– The significance level is the level of α error that we
would accept (usually 0.05)
• Probability of Type II error:
– Value of β depends on the size of effect (e.g. RR, OR)
and sample size
– 1- β: Statistical power of a study to detect an effect on
a specified size (e.g. 0.80)
– Fix β in advance: choose an appropriate sample size
15
Quantifying the association
•
•
•
•
•
Test of association of exposure and outcome
E.g. chi2 test or Fisher’s exact test
Comparison of proportions
Chi2 value quantifies the association
The larger the chi2 value, the smaller the
p value
– the more the observed data deviate from the
assumption of independence (no effect).
16
Chi-square value
 
2

(observed
num.  expected
expected
num.)
2
num.
17
Norovirus on a Greek island
2x2 table
Ill
Raw seafood
Expected number
of ill and not ill
for each cell :
Non ill
29
38
9
31
6
x19% ill
x 81% non-ill
No raw seafood
5
Expected proportion
of ill and not ill :
27
136
34
145
19 %
81%
114
141
x 19% ill
x 81% non-ill
179
18
Chi-square calculation
χ2= 125
Ill
Raw seafood
Non ill
p < 0.001
(29-6)2/6
(9-31)2/31
38
(5-27)2/27
(136-114)2/
114
141
34
145
179
No raw seafood
19
Norovirus on a Greek island
“The attack rate of illness among consumers of
raw seafood was 21.5 times higher than
among non consumers of these food items
(p<0.001).”
The p value is smaller than the chosen
significance level of α = 5%.
→ The null hypothesis is rejected.
There is a < 0.001 probability (<1/1000) that the observed association could have
occured by chance, if there were no true association between
eating imported raw seafood and illness.
20
C2012 vs facilitators
The ultimate (eye) test.
H0: the proportion of facilitators wearing glasses
during the Tuesday morning sessions was equal
to the proportion of fellows wearing glasses.
H1: the above proportions were different.
21
C2012 vs facilitators
Glasses
Fellow
11
Expected proportion
of ill and not ill :
6
38
27
25
13
Facilitator
Expected number
of ill and not ill
for each cell :
No glasses
4.6
8
17
35
33%
67%
9.4
14
x33% +ve
x67% -ve
x33% +ve
x67% -ve
52
22
Chi-square calculation
χ2= 1.11
Fellow
Facilitator
Glasses
No glasses
(11-13)2/13
(27-25)2/25
(6-4.6)2/4.6
(8-9.4)2/9.4
p = 0.343
23
t-test
• Used to compare means of a continuous
variable in two different groups
• Assumes normal distribution
24
t-test
• H0: fellows with glasses do not tend to sit
further in the back of the room compared to
fellows without glasses
• H1: fellows with glasses tend to sit further in
the back of the room compared to fellows
without glasses
25
t-test
26
Epidemiology and statistics
27
Criticism on significance testing
“Epidemiological application need more than a
decision as to whether chance alone could have
produced association.”
(Rothman et al. 2008)
Estimation of an effect measure (e.g. RR, OR)
rather than significance testing.
28
Suggested reading
• KJ Rothman, S Greenland, TL Lash, Modern Epidemiology,
Lippincott Williams & Wilkins, Philadelphia, PA, 2008
• SN Goodman, R Royall, Evidence and Scientific Research,
AJPH 78, 1568, 1988
• SN Goodman, Toward Evidence-Based Medical Statistics.
1: The P Value Fallacy, Ann Intern Med. 130, 995, 1999
• C Poole, Low P-Values or Narrow Confidence Intervals:
Which are more Durable? Epidemiology 12, 291, 2001
29
Previous lecturers
•
•
•
•
Alain Moren
Paolo D’Ancona
Lisa King
Ágnes Hajdu
• Preben Aavitsland
• Doris Radun
• Manuel Dehnert
30