Transcript slides

On teaching statistical inference:
What do p values (not) mean?
Bruce Blaine, PhD, PStat®
Department of Mathematical and Computing Sciences
St. John Fisher College
[email protected]
1
Limitations of NHST
The misapplication of null hypothesis significance testing
(NHST) procedures for statistical inference is well known.
NHST procedures do not address what researchers most
want to know.
• NHST procedures test a (nil) null hypothesis, which is
rarely true and therefore uninformative to reject.
• NHST procedures deliver a conditional probability,
p(D|Ho), which is commonly misinterpreted.
• NHST procedures do not test research hypotheses.
• NHST procedures do not quantify effect size.
2
Misinterpretations of p values
Two misinterpretations of p values from NHST procedures
are common in the social sciences (c.f., Kline, 2004):
1. Magnitude fallacy
p values are misunderstood as an effect size statistic, such that p is inversely
proportional to the evidence for the treatment effect.
“…the effect was marginally significant, p=.07”
“…the effect was highly (or extremely) significant, p<.001”
2. Validity fallacy
p(D|Ho) is misunderstood as p(H1|D).
“…the treatment improved the outcome, p<.05”
“…the treatment had no effect on the outcome, p>.05”
3
Classroom exercise 1:
Addressing the magnitude fallacy
Table1.
Treatment
1.In Excel (using Data Analysis
1
Toolpak add-in), have students
2
enter the data from a hypothetical
3
experiment in Table 1.
4
5
2. Provide, or have them create,
the table in Table 2.
Table 2.
3. Have students run an
independent-samples t test
(assume equal variances).
Statistic
5
Mean difference
4. Copy and paste treatment and
Pooled variance
control data to increase ns by 5,
t
repeating the t test each time.
p (2-tailed)
5. Fill in the table with values from
the analyses.
Control
3
4
5
6
7
Group size (n)
10
15
4
Classroom exercise 1:
Results
This exercise should point out that p values decrease in the 3
experiments even though the treatment has the same effect in
each—why?
Students should come to appreciate that larger samples are
associated with smaller estimated standard errors. For a constant
mean difference (which doesn’t change in this exercise), this will
produce larger t values, and smaller p values.
Statistic
Mean difference
Pooled variance
t
p (2-tailed)
Group size (n)
5
10
15
2
2
2
2.5
2.2
2.1
-2.00
-3.00
-3.74
0.0800
0.0080
0.0008
5
Classroom exercise 2:
Addressing the validity fallacy
Imagine 3 studies that compare students with high (Treatment, or T)
and low Facebook time (Control, or C) on GPA, with descriptive
statistics from the studies in the table below:
1. Have students observe (via hand calculated t tests or 95%
confidence intervals) that none of the 3 studies would reject Ho at
p<.05.
2. In Excel (using the Meta Easy add-in), have students enter the
data from the 3 hypothetical studies and generate a metaanalysis of the effect of Facebook time on GPA.
STUDY
Apple
Blueberry
Cherry
T mean C mean T SD
2.7
3.3
0.6
2.9
3.2
0.9
2.7
2.9
0.8
C SD
0.6
0.9
0.8
Tn
20
20
30
Cn
20
20
30
6
Classroom exercise 2:
Results
The exercise should point out that although none of the 3 studies is
statistically significant (defined as p<.05), when their data is combined
the Facebook effect on GPA is significant.
Students should notice that the 95% CI estimate of the Facebook
effect on GPA (the FE diamond) does not include 0.
7
Summary lessons
These exercises allow data to teach students where p
values come from and how to properly interpret them.
o Exercise 1 shows that although p values are
influenced by mean difference and sample size data,
they cannot be trusted to quantify the mean difference
alone.
o Exercise 2 shows that evidence from “nonsignificant”
studies, when taken as evidence against H1, can be
misleading. Genuine treatment effects may be
obscured in studies with small samples, high
variability, or both.
8
On teaching statistical inference:
more estimation, less NHST
o Typical social science statistics textbooks and curricula
are overdependent upon NHST methods for statistical
inference.
o These exercises can be part of a larger effort to teach
more estimation methods in basic statistics courses,
including confidence intervals, effect size statistics, and
meta-analysis.
o Estimation methods are more intuitive, because they
speak to research, rather than null, hypotheses.
9