Effect Size - Durham University
Download
Report
Transcript Effect Size - Durham University
Effect Size Issues
Rob Coe
WRI Workshop, 18 March 2013
Four parts
I
II
What is Effect Size?
The case for using effect size
•
III Problems in using effect size
•
5 reasons)
(6 problems)
IV Recommendations
•
(13 recommendations)
2
What is Effect Size?
3
© 2003 Robert Coe, University of Durham
Sources
Coe, R. (2002) It's the effect size, stupid:
what effect size is and why it is important.
Paper presented at the Annual Conference
of the British Educational Research
Association, University of Exeter, England,
12-14 September 2002.
Coe, R.J. (2012) ‘Effect Size’
in J. Arthur, M. Waring, R.
Coe, and L.V. Hedges (Ed.s)
(2012) Research Methods and
Methodologies in Education.
London: Sage.
4
© 2013 Robert Coe, University of Durham
Normal distribution
High standard
deviation
(spread out)
Low standard
deviation
(tightly
grouped)
Effect Size is the difference
between the two groups, relative
to the standard deviation
Mean of experimental group – Mean of control group
Effect Size =
Standard deviation
Examples of Effect Sizes:
ES = 0.2
58%
of
control
group
below
mean of
experimental
group
“Equivalent to the
difference in heights
between 15 and 16
year old girls”
Probability you could guess which group a person was in = 0.54
Change in the proportion above a given threshold:
from 50% to 58%
or from 75% to 81%
ES = 0.5
69%
of
control
group
below
mean of
experimental
group
“Equivalent to the
difference in heights
between 14 and 18
year old girls”
Probability you could guess which group a person was in = 0.60
Change in the proportion above a given threshold:
from 50% to 69%
or from 75% to 88%
ES = 0.8
79%
of
control
group
below
mean of
experimental
group
“Equivalent to the
difference in heights
between 13 and 18
year old girls”
Probability you could guess which group a person was in = 0.66
Change in the proportion above a given threshold:
from 50% to 79%
or from 75% to 93%
Effect Sizes from EEF Toolkit
11
© 2013 Robert Coe, University of Durham
The case for using
effect size measures
12
© 2003 Robert Coe, University of Durham
Source
Coe, R. (2004) ‘Issues
arising from the use of
effect sizes in analysing
and reporting research’
in I. Schagen and K.
Elliot (Eds) But what
does it mean? The use
of effect sizes in
educational research.
Slough, UK: National
Foundation for
Educational Research.
http://www.nfer.ac.uk/nfer/publications/SEF01/SEF01.pdf
13
© 2013 Robert Coe, University of Durham
1. Effect size enables uncalibrated
measures to be interpreted
From a questionnaire on teachers’ perceptions
of their training needs (7 item scale, each item
coded 1-4)
By age group:
age 20-40
mean
2.98
n
389
By gender:
age 41-65
SD
0.87
mean
2.09
n
345
female
SD
0.95
“younger teachers
expressed stronger needs
than their older colleagues”
Effect size = 0.98
mean
2.64
n
451
male
SD
1.05
mean
2.44
n
283
SD
1.11
“female teachers appeared
to have higher training
needs than males”
14Effect
© 2003 Robert Coe, University of Durham
size = 0.19
2. Effect size emphasises amounts,
not just statistical significance
The dichotomous “significant/not” decision is
almost never appropriate
The size of a difference is almost always
important
“Significance” has many meanings, but is
inevitably related to the size of the difference
15
© 2003 Robert Coe, University of Durham
Nonsensical dichotomies
Experimental Group
52
69
83
66
58
69
68
44
62
51
mean
SD
31
70
44
63
70
55
86
74
68
74
63.0
13.7
Control Group
51
60
57
80
45
37
56
47
55
63
mean
SD
Experimental Group
51
45
69
62
63
46
73
39
49
52
52
69
83
66
58
69
68
44
62
51
54.9
11.2
mean
SD
t-test gives: p= 0.049
31
70
44
63
70
55
86
74
68
74
63.0
13.7
Control Group
51
60
57
80
45
37
56
47
55
63
mean
SD
51
45
70
62
63
46
73
39
49
52
55.0
11.3
t-test gives: p= 0.052
Statistically significant difference
Difference not significant
THE TREATMENT WORKED!
IT DIDN’T WORK!
16
Don’t ignore amounts
3
2.5
2
1.5
1
0.5
0
-0.5
-1
(a)
"not significant"
(b)
"significant"
17
© 2003 Robert Coe, University of Durham
(c)
"significant"
Types of significance
Statistical significance
o Probability that difference due to chance
Practical significance
o Theoretical or applied importance
Clinical significance
o ‘extent to which the intervention makes a real
difference to the quality of life’ (Kazdin, 1999)
Economic significance
o Benefit in relation to cost (Leech and
Onwuegbuzie, 2003)
18
© 2003 Robert Coe, University of Durham
3. Effect size draws attention to the
margin of error
Statistical power is important, but often
overlooked
Much apparent disagreement is actually just
sampling error
19
© 2003 Robert Coe, University of Durham
3
2.5
2
1.5
1
0.5
0
-0.5
-1
20
© 2003 Robert Coe, University of Durham
4. Effect size may help reduce
reporting bias
The “file-drawer” problem is alive and well
Within-study reporting bias can also be a
problem
21
© 2003 Robert Coe, University of Durham
3
2.5
2
1.5
1
0.5
0
-0.5
-1
22
© 2003 Robert Coe, University of Durham
5. Effect size allows the accumulation
of knowledge
Meta-analysis can combine results from
different studies
Small studies are worth doing
23
© 2003 Robert Coe, University of Durham
Problems in using
effect size measures
24
© 2003 Robert Coe, University of Durham
1. Which effect size?
Proportion of variance accounted for
o Universal measure, but
o Non-directional
o Sensitive to violations of assumptions
o Large standard errors
o Interpretation counter-intuitive
o ‘Effect’ should mean effect
Non-parametric effect size measures
Odds ratio
Un-standardised (raw) difference
25
© 2003 Robert Coe, University of Durham
2. Which standard deviation?
Pooled or control group?
o Control group is conceptually purer
o Pooled is statistically better (provided compatible)
o Sometimes there isn’t a ‘control’ group
Residual standard deviation
o Residual gain becomes ‘progress’
o Effect sizes substantially inflated and dependent on
correlation
o Important to report clearly
Restricted range
o Effect size higher if range limited
26
© 2003 Robert Coe, University of Durham
3. Measurement reliability
Standardised mean difference is spuriously
affected by the reliability of the outcome
measure
o Part of the variance in measured scores is due to
measurement error
o More error Higher S.D. Lower E.S.
o Reliability should be reported
27
© 2003 Robert Coe, University of Durham
4. Non-normal distributions
-4
-3
-2
-1
0
28
© 2003 Robert Coe, University of Durham
1
2
3
4
-4
-3
-2
-1
0
29
© 2003 Robert Coe, University of Durham
1
2
3
4
An effect size of 1
Normal
-3
-2
-1
Contaminated-normal
0
1
2
3
4
-3
Median person raised to the
84th percentile
-2
-1
0
1
2
3
4
5
Median person raised to the
97th percentile
30
© 2003 Robert Coe, University of Durham
6
5. Interpreting effects
Cohen’s
0.2 = small
0.5 = medium
0.8 = large
is a bit simplistic
Interpretation depends on
o Translation into familiar metric
o Comparison with known effects
o Costs
o Feasibility
o Benefits (and their value)
o Availability of alternatives
31
© 2003 Robert Coe, University of Durham
6. Incommensurability
Commensurable outcomes
o Construct
o Operationalisation
o Reliability
Commensurable treatments
o Well defined?
o Fidelity of delivery
o Intensity / duration
o Control group treatment
Commensurable populations
o Range
32
© 2003 Robert Coe, University of Durham
Recommendations
33
© 2003 Robert Coe, University of Durham
Recommendations
1. Calculate and report standardised effect size, with
confidence interval / standard error, for all
comparisons
2. Show these graphically
3. Report all relevant comparisons regardless of
whether confidence intervals include zero
4. Interpret effect sizes by comparison with known
effects and in relation to familiar metrics
5. Report un-standardised raw differences whenever
the outcome is measured on a familiar scale
34
© 2003 Robert Coe, University of Durham
Recommendations (cont)
6. Interpret the significance of an effect with
regard to issues such as its
o
o
o
o
o
o
o
effect size
theoretical importance
associated benefits
associated costs
policy relevance
feasibility
comparison with available alternatives
7. Don’t use the word ‘effect’ (with or without
‘size’) unless a causal claim is intended and
can be justified
35
© 2003 Robert Coe, University of Durham
Recommendations (cont)
9. Be cautious about the calculation and
interpretation of standardised effect sizes
whenever
o Sample has restricted range
o Population is not known to be normal
o Outcome measure has low or unknown reliability
o Outcomes have been statistically adjusted
(residuals)
10.Always report reliability of measures, extent of
restriction, correlations (or R2) in these cases
36
© 2003 Robert Coe, University of Durham
Recommendations (4)
11.Small studies with low power and statistically
non-significant effects should still be
conducted, reported and published, provided
they are free from bias
12.Synthesise the results of compatible studies
using meta-analysis
13.Beware of combining or comparing effect
sizes from studies with incommensurable
outcomes, treatments or populations
37
© 2003 Robert Coe, University of Durham