Campbell Collaboration Training Materials
Download
Report
Transcript Campbell Collaboration Training Materials
CAMPBELL
COLLABORATION
Effect Sizes
Overview
Overview of Effect Sizes
Effect Sizes from the d Family
Effect Sizes from the r Family
Effect Sizes for Categorical Data
Connections Between the Effect-Size Metrics
2
Effect sizes
Meta-analysis expresses the results of each
study using a quantitative index of effect size
(ES).
ESs are measures of the strength or magnitude
of a relationship of interest.
ESs have the advantage of being comparable
(i.e., they estimate the same thing) across all of
the studies and therefore can be summarized
across studies in the meta-analysis.
ESs are relatively independent of sample size.
3
Effect sizes
An effect size is a quantitative index that
represents the results of a study.
Effect sizes make study results comparable so
that …
results can be compared across studies, or
results can be summarized across studies.
Examples of effect-size indices include
standardized mean differences (ds), and
correlation coefficients (rs).
4
Effect sizes
A crucial conceptual distinction is between
effect-size …
estimates, computed from studies (sample
effect sizes), and
parameters (population or true effect sizes).
We want to make inferences about effect-size
parameters using effect-size estimates.
5
Types of effect size
Most reviews use effect sizes from one of three
families of effect sizes:
the d family, including the standardized mean
difference,
the r family, including the correlation coefficient,
and
the odds ratio (OR) family, including proportions
and other measures for categorical data.
6
Types of effect size
Test statistics (e.g., t statistics, F tests, and so
on) are not ideal ES because they depend on:
Effect size and
Sample size (n)
That is, Test Statistic = f(Effect Size, sample size)
7
Types of effect size
The significance level (a.k.a. the p value) is also
not an ideal ES because it depends on the test
statistic and n.
Studies with the same effect sizes can get
different p values, simply because they differ in
sample size.
Studies with fundamentally different results can
get the same p values, because they differ in
size.
Thus, the
size.
p value is a misleading index of effect
8
The choice of effect size
A particular index is chosen to make results from
different studies comparable to one another. The
choice depends on the ...
question of interest for the review,
designs of studies being reviewed,
statistical analyses that have been reported, and
measures of the outcome variable.
9
The choice of effect size
When we have continuous data (means and
standard deviations) for two groups, we typically
compute a raw mean difference or a
standardized difference – an effect size from the
d family,
When we have correlational data, we typically
compute a correlation (from the r family), or
When we have binary data (the patient lived or
died, the student passed or failed), we typically
compute an odds ratio, a risk ratio, or a risk
difference.
10
Features of most effect sizes
We introduce some notation for a common case –
the treatment/control comparison.
Let Y T be the mean posttest score in the treatment
group, Y C be the mean control-group posttest
score, and SYpooled be the pooled within-groups
standard deviation for the Y scores (i.e., the t-test
SD). Then we may compute standardized T - C
differences using posttest means as
gpost (Y Y ) / SYpooled
T
C
11
Features of most effect sizes
Remember that all statistical estimators are
estimating some parameter. What parameter is
being estimated by gpost?
The answer is, the population standardized mean
difference, usually denoted by the Greek letter
delta, where population means and the population
SD s appear in place of the sample values:
post ( ) / s Ypooled
T
C
12
Expected values of effect sizes
Some ES indices are biased in small samples.
It is common to correct for this small-sample
bias.
The posttest effect size gpost is biased, with
expected value
E[gpost ] = δ/c(m),
where c(m)=1 - 3/(4m-1), and m = nT + nC – 2.
In general m is the df for the appropriate t test,
here, the two-sample t test.
13
Expected values of effect sizes
So now we can correct for bias: d = c(m)*gpost.
The expected value of d is δ.
The correlation is also biased, and can be
corrected via
ru = r [1- (1 - r2)/(2n-2)] .
Proportions are not biased, and do not need
correction.
14
Variances of effect sizes
Effect-size indices also have variances that can
be estimated using data from the individual
study from which the ES is obtained.
Below we provide the variances of many ES
indices, noting that in all cases the variance is
an inverse function of the study sample size.
Thus smaller studies have larger variances,
representing less precise information about the
effect of interest. The ES variance is a key
component of nearly all statistical analyses
used in meta-analysis.
15
Statistical properties (Variances)
Often the variances of ES indices are also
conditional on (i.e., are functions of) the
parameter values. Consider the variance of d:
n n
Var (d ) v T C
n n
2(nT nC )
T
C
2
which is a function of . Below we introduce
transformations that can be used with some ES
indices to remove the parameter from the
variance (i.e., to stabilize the variances).
16
Variance of the standardized mean difference
nT n C
2
Var (d ) v T C
n n
2(nT nC )
As d increases (becomes more unusual or
extreme) the variance also increases. We are
more uncertain about extreme effects.
The variance also depends on the sample sizes,
and as the ns increase, the variance decreases.
Large studies provide more precise data; we are
more certain about effects from large studies.
17
Statistical properties (Variances)
Variances of effect sizes are not typically equal
across studies, even if stabilized. This is because
most variances depend on sample sizes, and it is
rare to have identical-sized samples when we look
at sets of studies.
Thus, homoscedasticity assumptions are nearly
always not met by most meta-analysis data!!
This is why we do not use “typical” statistical
procedures (like t tests and ANOVA) for most
analyses in meta-analysis.
18
Quick Examples:
Common Study Outcomes
for
Treatment-Control Meta-analyses
19
Common study outcomes for trt/ctrl
meta-analysis
Treatment (T)/control (C) studies:
Above we introduced the standardized T - C
difference in posttest means:
gpost (Y Y ) / SYpooled
T
C
We also can compute T-C differences in other
metrics and for other outcomes.
20
Common study outcomes for trt/ctrl
meta-analysis: d family
We may also compute standardized T - C
differences in:
gain or difference score means for D = Y – X
gdiff ( D D ) / SD pooled
standardized by the
difference SD
or ( D T D C ) / SY pooled
standardized by the
posttest SD
T
C
covariate adjusted means
g resid (Y T bX T ) (Y C bX C ) / S residual
21
Common study outcomes for trt/ctrl
meta-analysis: Categorical outcomes
differences between proportions: PT PC
odds ratios for proportions:
T
T
C
C
P
/
(1
P
)
/
P
/
(1
P
) or
log odds ratios: log [ PT / (1 PT )] / [ P C / (1 PC )]
differences between arcsine-transformed
proportions
2 sin 1 ( PT ) 2 sin 1 ( PC )
22
Less common study outcomes for trt/ctrl
meta-analysis
differences between transformed variances:
2 log(ST) - 2log(SC) or 2 log(ST/SC)
probability values from various tests of Trt/Ctrl
differences, such as the t test (shown), ANOVA F
test, etc.
Pi
tobs
f (t ) dt
23
Other common study outcomes for
meta-analysis: d family
Single group studies
standardized posttest - pretest mean difference:
gchange (Y T X T ) / SY or (Y T X T ) / S X
covariate adjusted means:
gresidchange (Y T bX T ) / Sresidual
proportions (e.g., post-trt counts for outcome A):
PT n A / n
arcsine proportions: 2 sin 1 ( PT )
24
Other common study outcomes for
meta-analysis
odds ratios
for single
proportions
P
correlations
r
correlation matrices
r1, ..., rp(p-1)
variance ratios
Spost/Spre or 2log(Spost/Spre)
“variance accounted
for” measures
T
/ (1 PT ) or
log[ PT / (1 PT )]
R2 , Eta2 , etc.
25
Common study outcomes for meta-analysis
We next treat each of the three families of effect
sizes in turn :
Effect Sizes from the d Family
Effect Sizes from the r Family
Effect Sizes for Categorical Data
26
More Detail on Effect Sizes:
The d Family
27
Standardized mean difference
The standardized mean difference may be
appropriate when
Studies use different (continuous) outcome
measures
Study designs compare the mean outcomes in
treatment and control groups
Analyses use ANOVA, t tests, and sometimes
chi-squares (if the underlying outcome can be
viewed as continuous)
28
Standardized mean difference: Definition
Population
Group
Means
SD
Effect
Sizes
Treatment
Control
T
Sample
C
s
s
T
Treatment
Y
Control
T
Y
C
Sp
C
Y Y
d c ( m)
SYpooled
T
C
29
Computing standardized mean difference
The first steps in computing d effect sizes
involve assessing what data are available and
what’s missing. You will look for:
Sample size and unit information
Means and SDs or SEs for treatment and control
groups
ANOVA tables
F or t tests in text, or
Tables of counts
30
Sample sizes
Regardless of exactly what you compute you will
need to get sample sizes (to correct for bias and
compute variances).
Sample sizes can vary within studies so check
initial reports of n against
n for each test or outcome or
df associated with each test
31
Calculating effect-size estimates from
research reports
A major issue is often computing the within-group
standard deviation Spooled.
The standard deviation determines the “metric” for
standardized mean differences.
Different test statistics (e.g., t vs. multi-way
ANOVA F ) use different SD metrics.
In general it is best to try to compute or convert to
the metric of within-group (i.e., Treatment and
Control) standard deviations.
32
Calculating effect sizes from means and SDs
Glass’s or Cohen’s effect size is defined as
Y Y
g
SYpooled
T
C
(n 1) S (n 1)S
,
and S Ypooled
T
C
(n 1) (n 1)
T
2
T
C
2
C
where nT and nC are group sample sizes, and
ST2 and SC2 are group variances. Also recall that
d = g *[1 - 3/(4m-1)],
where m = nT + nC – 2.
33
Variance of the standardized mean difference
Most notable for statistical work in meta-analysis
is the fact that each of the study indices has a
“known” variance. These variances are often
conditional on the parameter values.
For d the variance is
nT n C
2
Var (d ) v T C
T
C
n n
2(n n )
The variance is computed by substituting d for .
34
Confidence interval for the standardized
mean difference
The 95% confidence interval for d is
d 1.96 SEd ,
where SEd Var(d )
35
Calculating effect sizes from means and SDs
Equal n
ST (ST2) nT
SC (SC2)
example: 980 50 (2500) 30 1020 60 (3600)
Pooled
standard
deviation:
Effect
size:
nC
30
2500
3600
S Ypooled
Sp
3050 55.23
2
Y T Y C 980 1020 40
g
0.72
SYpooled
55.23
55.23
36
Calculating effect sizes from means and SDs
Data:
Unbiased
effect size:
ST (ST2)
nT
980 50 (2500)
30
g = -0.72,
1020
SC (SC2)
nC
60 (3600)
30
d = -0.72*[1-3/(4*58 - 1)]
= -0.72*.987 = -0.71
nT n C
d2
30 30 (0.71) 2
SE (d )
0.266
T C
T
C
n n
2(n n )
30 * 30
2(60)
95% CI:
-0.71 + 1.96*(.27) = -0.71 + 0.53
or -1.24 to -0.18
37
Calculating effect sizes: Practice
Compute the values of d, the SEs, and the 95%
CIs for these two studies:
Treatment
Control
Mean
12
15
SD
4
6
n
12
12
Treatment
Control
Mean
6.5
5
SD
4
4
n
60
60
Answers are at the end of the section.
38
Calculating effect sizes from the independent
groups F test
If the study’s design is a two group (treatmentcontrol) comparison and the ANOVA F statistic
is reported, then
(n n ) F
g
.
T C
n n
T
C
You must determine the sign from other
information in the study.
39
Calculating effect sizes from the independent
groups t test
When the study makes a two group (treatmentcontrol) comparison and the t statistic is reported,
we can also compute d easily. Then
n n
g t
.
T C
n n
T
C
40
Calculating effect sizes from the two-way
ANOVA
Exactly how we compute d for the two-way
ANOVA depends on the information reported in
the study.
We consider two cases:
the full ANOVA table is reported, and
the cell means and SDs are reported.
41
Calculating effect sizes from the two-way
ANOVA table
Suppose A is the treatment factor and B is the
other factor in this design. We pool the B and AB
factors with within-cell variation to get
S
2
pooled
SSB SSAB SSW
MSWithin
dfB dfAB dfW
where MSWithin is the MSW for the one-way
design with A as the only factor. Then d is
computed as
(nT n C ) MSA
g
2
nT n C SYpooled
42
Calculating effect sizes from the two-way
ANOVA cell means and SDs
Suppose we have J subgroups within the
treatment and control groups, with means Yij and
sample sizes nij (i = 1 is the treatment group and
i = 2 is the control group). We first compute the
treatment and control group means:
J
YT
n
j 1
J
Y
1j 1j
n
j 1
J
1j
YC
n
j 1
J
Y
2j 2j
n
j 1
2j
43
Calculating effect sizes from the two-way
ANOVA cell means and SDs
Then compute the standard deviation Sp via
2
2
S pooled
J
SS B (nij 1) Sij2
i 1 j 1
T
C
n n 2
where SSB is the between cells sum of squares
within the treatment and control groups
J
J
j 1
j 1
SS B n1 j (Y1 j Y T ) 2 n2 j (Y2 j Y C ) 2
Y T Y C
Then calculate the effect size as g
.
S pooled
44
Calculating effect sizes from the two-way
ANOVA: Variants
There are, of course, variants of these two
methods.
For example, you might have the MSW, but not
the within cell standard deviations (the Sij).
Then you could use dfW MSW in place of the
sum of weighted Sij2 values in the last term of the
numerator in the expression for S2pooled on the
previous slide.
45
Calculating effect sizes from the one-way
ANCOVA
A
Suppose a study uses a one-way ANCOVA with
a factor that is a treatment-control comparison.
Can we use the ANCOVA F statistic to compute
the effect size? NO! Or rather, if we do we will
not get a comparable effect-size measure.
The error term used in the ANCOVA F test is not
the same as the unadjusted within (treatment or
control) group variance, and is usually smaller
than the one-way MSW.
46
Calculating effect sizes from the one-way
ANCOVA
A
The F statistic is F = MSB/MSW, but
MSW is the covariate adjusted squared SD
within the treatment and control groups, and
MSB is the covariate adjusted mean difference
between treatment and control groups.
To get the SD needed for a comparable effect
size, we must reconstruct the unadjusted SD
within treatment and control groups.
47
Calculating effect sizes from the one-way
ANCOVA
A
The unadjusted SD is
S pooled
S Adjusted
1 r
2
where r is the covariate-outcome correlation, so
(n n )(1 r ) F
g
T C
n n
T
C
2
48
Calculating effect sizes from the one-way
ANCOVA
A
The procedure is equivalent to
computing the g using the ANCOVA F, as if it
was from a one-way ANOVA (we will call this
gUncorrected)
then “correcting” the g for covariate adjustment
via
g Corrected gUncorrected 1 r
2
49
Calculating effect sizes from the one-way
ANCOVA
A
The effect size given previously uses the
adjusted means in the numerator.
However, the reviewer needs to decide whether
unadjusted or covariate adjusted mean
differences are desired. In randomized
experiments, they will not differ much.
Unadjusted means may not be given in the
research report, leading to a practical decision to
calculate effects based on adjusted means.
50
Calculating effect sizes from the one-way
NCOVA
A
Calculating effect sizes from two-way ANCOVA
designs poses a combination of the problems in
two-way ANOVA designs and one-way ANCOVA
designs
The procedure to compute g has two steps:
Compute the d statistic as for two-way ANOVA
Correct the d value for covariate adjustment via
g Adjusted gUnadjusted 1 r
2
51
Calculating effect sizes from tests on gain
scores
Suppose a t statistic is given for a test of the
difference between gains of the T and C groups.
Can we use this t statistic to get g? NO! Or
rather, as before, this will give a g that is not
totally comparable to the standard t-test g.
The standard deviation in the t statistic for gains
is the SD of gains, not posttest scores. To
compute a comparable effect size, we have to
reconstruct the SD of the posttest scores.
52
Calculating effect sizes from tests on gain
scores
The SD of the posttest scores is
S pooled
SGain
2(1 r )
where r is the pretest-posttest correlation, thus
g tGain
2(1 r )( n n )
T C
n n
T
C
53
Calculating effect sizes from tests on gain
scores
The effect size given previously also uses the
difference between mean gains in the numerator.
Thus, the reviewer needs to decide whether
differences in mean posttest scores or mean
gains are desired. In randomized experiments, the
two types of mean will not usually differ much
from each other.
Post-test means may not be given in the research
report, leading to a practical decision to calculate
effects based on differences in mean gains.
54
Auxiliary data for effect-size calculation
Our examples of the calculation of effect sizes
from designs using ANCOVA and gain scores
illustrates the fact that sometimes auxiliary
information (such as the value of r) is needed to
compute effect sizes.
This information may be missing in many studies,
or even may be missing from all studies.
55
Auxiliary data for effect-size calculation
That poses a choice for the reviewer:
omit studies with missing r values, or
impute r values in some way.
The reviewer's decision on imputation must be
made explicit in the methods section of the
meta-analysis report.
56
Calculating effect sizes: Answers to practice
exercise
Compute the values of d, the SEs, and the 95%
CIs for these two studies:
Study 1
Treatment
Control
Mean
12
15
SD
4
6
n
12
12
Study 2
Treatment
Control
Mean
6.5
5
SD
4
4
n
60
60
57
Calculating effect sizes: Answers to practice
exercise
Study 1
Mean
SD
n
Treatment
Control
12
15
4
6
12
12
For study 1 the values of Spooled and g are
Spooled
(nT 1) ST2 (n C 1) SC2
(11)16 (11)36
26 5.1
T
C
(n 1) (n 1)
(11) (11)
Y T Y C 12 15 3
gd
0.588 or 0.59
Spooled
5.1
5.1
58
Calculating effect sizes: Answers to practice
exercise
Study 1
Mean
SD
n
Treatment
12
4
12
Control
15
6
12
For study 1 d is -0.59*[1-3/(4*22 -1)]=-0.59*0.966
or d = -0.570.
The values of SEd and the 95% CI are
nT n C
d2
24
(0.57) 2
SEd
0.416
T C
T
C
n n
2(n n )
12 *12
2(24)
95% CI: d 1.96 SEd
0.57 1.96 0.42 or 1.39 to 0.25
59
Calculating effect sizes: Answers to practice
exercise
Study 2
Treatment
Control
Mean
6.5
5
SD
4
4
n
60
60
For study 2 the values of Spooled and d are
(nT 1) ST2 (nC 1)SC2
(59)16 (59)16
S pooled
4
T
C
(n 1) (n 1)
(59) (59)
Y T Y C 6.5 5 1.5
d
0.375 or 0.38
S pooled
4
4
60
Calculating effect sizes: Answers to practice
Study 2
Mean
SD
n
exercise
Treatment
6.5
4
60
Control
5
4
60
For study 2
d is
0.38*[1-3/(4*118 -1)]= 0.38*0.994 = 0.38.
the values of SEd and the 95% CI are
nT n C
d2
120
(0.38) 2
SEd
T C
T
C
n n
2(n n )
60 * 60 2(60)
.0333 .0012 0.186
95% CI:
d 1.96 SEd
0.38 1.96 0.19 or 0.01 to 0.75
61
Calculating effect sizes: Answers to practice
exercise
95% CI for study 1:
d 1.96 SEd
0.57 1.96 0.42 or 1.39 to 0.25
95% CI for study 2:
d 1.96 SEd
0.38 1.96 0.19 or 0.01 to 0.75
Even though the effect size for study 2 is smaller
in absolute value that that for study 1, its SE is
smaller and thus the 95% CI does not include 0.
62
More Detail on Effect Sizes:
The r Family
63
The r family
The correlation coefficient or r family effects
may be appropriate when …
studies have a continuous outcome measure,
study designs assess the relation between a
quantitative predictor and the outcome (possibly
controlling for covariates), or
the analysis uses regression (or the general linear
model).
64
Cohen’s benchmarks
Jacob Cohen (1988) proposed general
definitions for anticipating the size of effectsize estimates:
Small
Medium
Large
d
r
.20
.50
.80
.10
.30
.50
65
More on Cohen
Cohen intended these to be “rules of thumb”,
and emphasized that they represent average
effects from across the social sciences.
He cautioned that in some areas, smaller effects
may be more typical, due to
measurement error or
the relative weakness of interventions.
Each reviewer will need to make judgments
about what is “typical” based on his or her
expertise.
66
The r family
The most commonly used effect size in this
family is the correlation coefficient r.
This also equals the standardized regression
coefficient when there is only one predictor in a
regression equation.
Sample
r
Population
ρ
67
The r family
When computing a correlation coefficient, scores
are actually being standardized, which makes r
itself standardized.
Y Y
z
Recall that z is
SY
To compute r we have
r
z x z y
n
where n is the number of X-Y pairs.
68
Statistical properties (Variances)
The variance of the correlation depends on the
sample size and parameter value. We estimate
the variance by using each study’s correlation to
estimate its parameter r. So for study i, we have
vi = Var(ri) = (1 - ri2)2/(ni - 1),
or we can use a consistent estimator of r (e.g., an
average)
vi = Var(ri) = (1 - r 2)2/(ni - 1) .
69
Statistical properties (Transformation)
Sometimes we transform effect sizes because
that simplifies statistical analyses or makes our
assumptions more justifiable.
A common transformation of correlations is the
Fisher z-transform of the correlation
1 r
z .5 ln
.
1 r
70
Statistical properties (Transformation)
Consider a z-transformed correlation coefficient
from a sample of size n. The z transform is a
variance stabilizing transformation, which
means the variance of z does not depend on r,
as did the variance of r.
The variance of z is
1
Var ( zi ) vi
.
ni 3
71
Correlation example
Example:
Study
r
z
n (pairs)
1
.60
.693
50
Effect size: r = .60, zr = .693
1 r
1 .6
.64
v
.091
7
n
49
2
SE of r:
.60 1.96 (.091)
95% CI:
2
in the r metric
.421 to .779 in the r metric
72
Correlation example
Example:
Study
r
z
n (pairs)
1
.60
.693
50
Effect size: r = .60, zr = .693
SE of zr:
1
1
SEzr
.146
n3
47
95% CI:
0.693 1.96(0.146) in the z metric
0.406 to 0.98 in the z metric and
.385 to .753 in the r metric
73
Correlation example
Example:
Study
r
z
n (pairs)
1
.60
.693
50
Effect size: r = .60, zr = .693
We must return the CI for Z to get the 95% CI:
0.693 1.96(0.146) in the z metric
0.406 to 0.98 in the z metric and
.385 to .753 in the r metric
74
More Detail on Effect Sizes:
Categorical Data
75
Effect sizes for categorical data
The effect sizes in the d and r families are
mainly for studies with continuous outcomes.
Three popular effect sizes for categorical data
will be introduced here:
the odds ratio,
the risk ratio (or rate ratio), and
the risk difference (the difference between two
probabilities).
76
Effect sizes for categorical data
Consider a study in which a treatment group (T)
and a control group (C) are compared with
respect to the frequency of a binary
characteristic among the participants.
In each group we will count how many
participants have the binary outcome of interest.
We will refer to having the binary outcome as
“being in the focal group” (e.g., passing a test,
being cured of a disease, etc.).
77
Effect sizes for categorical data
Let πT and πC denote the population probabilities
of being in the focal group within each of the two
groups T and C; PT and PC denote the sample
probabilities.
Population
Δ = πT – π C
Sample
RD = PT – PC
Risk Difference
Risk Ratio
qRR = πT / πC
RR = PT / PC
Odds Ratio
T (1 C )
C
(1 T )
PT (1- P C )
OR C
P (1- PT )
78
Odds ratio
The frequencies (n11, n12, n21, and n22) for the
binary outcomes are counted for both treatment
and control groups.
Yes
(Focal group)
No
Treatment
n11
Control
n12
n21
n22
The odds ratio (OR) is the most widely used
effect-size measure for dichotomous outcomes.
79
Odds ratio
The OR is calculated as
n11 n21 n11n22
OR
n12 n22 n12n21
Yes
No
Treatment
n11
n21
Control
n12
n22
An OR = 1 represents no effect, or no difference
between treatment and control in the odds of
being in the focal group.
The lower bound of the OR is 0 (Control outcome
is better than Treatment outcome).
Its upper bound is infinity (Treatment < Control).
80
Log Odds ratio
The range of values of the OR is inconvenient
for drawing inferences, and its distribution can
be quite non-normal. The logarithm of OR (LOR)
is more nearly normal, and is calculated as
LOR ln(OR)
LOR makes interpretation more intuitive. It is
similar in some respects to d.
• A value of 0 represents no T vs. C difference
or no treatment effect.
• The LOR ranges from - to .
81
Log Odds ratio
The standard error of the LOR ( SELOR ) is
calculated as
SELOR
1
1
1
1
n11 n12 n21 n22
An approximate 95% confidence interval for each
LOR can be calculated as
95% CI LOR 1.96 * SELOR
82
Example: Pearson’s hospital staff typhoid
incidence data
In 1904, Karl Pearson reviewed evidence on the
effects of a vaccine against typhoid (also called
enteric fever).
Pearson’s review included 11 studies of mortality
and immunity to typhoid among British solders.
The treatment was an inoculation against
typhoid and the “cases” – those who became
infected – were the focal group.
83
Example: Pearson’s hospital staff typhoid
incidence data
84
Example: Pearson’s hospital staff typhoid
incidence data
The focal group is the group of diseased staff
members:
Immune
Diseased
Total
Inoculated
n11 = 265
n21 = 32
297
Not Inoculated
n12 = 204
n22 = 75
279
Total
469
107
576
The OR is calculated as
n11 n21 n11n22 265 32 265 * 75
OR
3.04
n12 n22 n12n21 204 75 204 * 32
The odds of becoming diseased are 3 times
greater for staff members who are not inoculated.
85
Example: Pearson’s hospital staff typhoid
incidence data
OR can also be computed from cell proportions:
Inoculated
Not
Inoculated
Immune P11=265/576 P12=204/576
=.46
=.35
Diseased P21=32/576 P22=75/576
=.06
=.13
Total
297
279
counts
Total
counts
469
107
576
P11P22 .46 * .13
OR
3.04
P12 P21 .35 * .06
86
Example: Pearson’s hospital staff typhoid
incidence data
The LOR in this example is
LOR = ln(3.04) = 1.11
The SE of the LOR in this example is
SELOR
1
1 1
1
0.23
265 204 32 75
87
Example: Pearson’s hospital staff typhoid
incidence data
The approximate 95% confidence interval for
LOR in this example is
95%CI=LOR 1.96*SELOR 1.11 1.96*0.23
Upper bound LOR: 1.56
Lower bound LOR: 0.66
The CI does not include zero, suggesting a
significant positive treatment effect for
inoculations.
88
Example: Pearson’s hospital staff typhoid
incidence data
The LORs can be transformed back to the OR
metric via the exponential function:
Upper bound OR = exp(1.56) = 4.76
Lower bound OR = exp(0.66) = 1.93
Again here we can see the CI for the OR does
not include 1, indicating a significant treatment
effect.
89
Converting OR or LOR to d
If most of the effect sizes are in the d metric and
just a few expressed as OR or LOR, the OR or
LOR can be converted to d so all the effect sizes
can be pooled.
The transformation developed by Cox (as cited in
Sanchez-Meca, Marín-Martínez, & ChacónMoscoso, 2003) works well. It is computed as
dCox LOR / 1.65
90
Relative risk
Relative risk (RR) is also used for dichotomous
outcomes.
Immune
Diseased
Total
Treatment
P11
P 21
P •1
Control
P 12
P 22
P •2
Total
P 1•
P 2•
RR ( P11 P1 ) ( P12 P2 )
where P•1 and P•2 are the marginal proportions for
the first column (treatment) and the second
column (control), respectively.
91
Relative risk
The relative risk ranges from 0 to infinity.
A relative risk (RR) of 1 indicates that there is no
difference in risk between the two groups.
A relative risk (RR) larger than one indicates that
the treatment group has higher risk (of being in the
focal group) than the control.
A relative risk (RR) less than one indicates that the
control group has higher risk than the treatment
group.
.
92
Log Relative risk
As was true for the LOR, the logarithm of the RR
(LRR) has better statistical properties. It is
calculated as
LRR ln( RR )
The range of the LRR is from - to , and as for
the LOR, a value of 0 indicates no treatment
effect.
93
Example: Pearson’s hospital staff typhoid
incidence data
Immune
Diseased
Total
Inoculated
P11=265/576
=.46
P21=32/576
=.06
P•1=297/576
=.52
.46 .52 .89
RR
1.22
.35 .48 .73
Not Inoculated
P12=204/576
=.35
P22=75/576
=.13
P•2=279/576
=.48
Total
469
107
576
LRR ln( 1.22) .199
The probability of being immune if inoculated is 1.22
times higher than the probability if not inoculated.
94
Log Relative risk
The standard error of the LRR is calculated as
SELRR
n21
n22
n1n11 n2 n12
based on the tabled counts
Yes
No
Total
Treatment
n11
n 21
n •1
Control
n12
n 22
n •2
Total
n 1•
n 2•
95
Example: Pearson’s hospital staff typhoid
incidence data
The standard error of the LRR for our example is
SELRR
n21
n22
n1n11 n2 n12
32
75
0.04
297 * 265 279 * 204
96
Example: Pearson’s hospital staff typhoid
incidence data
The 95% CI is
LRR 1.96 * SE ( LRR ) 0.20 1.96 * 0.04
Upper bound LRR: 0.28
Lower bound LRR: 0.12
These LRRs can be transformed back to the
RR metric via the exponential function:
Upper bound RR = exp(1.56) = 1.32
Lower bound RR = exp(0.66) = 1.12
97
Converting RR to OR
The RR can also be converted to the OR via
OR RR{(1 P21 P2 ) (1 P11 P1 )}
In Pearson’s example, it is
1 .35 / .48
OR 1.22
3.04
1 .46 / .52
98
Risk difference (difference between two
proportions)
The risk difference (RD) between proportions is
often considered the most intuitive effect size for
categorical data.
n11 n21
RD
n1 n2
The standard error of RD (SERD) is
SERD
n11 * n21 n12 * n22
n1
n2
99
Example: Pearson’s hospital staff typhoid
incidence data
For Pearson’s data, 89.2% of those inoculated
were immune (265/297) and 73.1% of those not
inoculated were immune (204/279).
The difference in immunity rates for those
inoculated or not is RD .89 .73 .16
The standard error of the difference is
SERD
n11 * n21 n12 * n22
265 * 32 204 * 75
0.032
3
3
n1
n2
297
279
100
Example: Pearson’s hospital staff typhoid
incidence data
In general a 95% CI for RD is
RD 1.96 * SERD .16 1.96 * 0.032
Upper bound RD: .22
Lower bound RD: .10
Because the value 0 (meaning no difference in
risk) is not included in the CI, we again conclude
there is a treatment effect.
101
Connections Between the
Effect-Size Metrics
102
Conversions among effect-size metrics
The effects d, r, and the odds ratio (OR) can all
be converted from one metric to another.
Sometimes it is convenient to convert effects for
comparison purposes.
A second reason may be that just a few studies
present results that require computation of a
particular effect size. For example, if most
studies present results as means and SDs (and
thus allow d to be calculated), but one reports
the correlation of treatment with the outcome,
one might want to convert the single r to a d.
103
Conversions between d and LOR
Converting d to the log odds ratio
ln( OR )
d
3
SE ln(OR)
( SE d ) 2
3
Converting the log odds ratio to d
d
3 ln( OR )
SE
2
3 SEln(
OR )
3
104
Conversions of r and d
To convert r to d, we first compute the SE of the
correlation using
SE r (1 r 2 ) SE FisherZ,
SE FisherZ
1
n3
Then
d
2r
1 r 2
,
and
4 SE
SEd
(1 r )
2
r
2 3
105
Conversions of r and d
Converting d to r
d
r 2
where
d A
(n1 n2 )
A
n1n2
2
A is a “correction factor” for cases where the
groups are not the same size.
If the specific group sizes are not available,
assume they are equal and use n1 = n2 = n
which yields A = 4.
106
References
Cox, D. R. (1970). Analysis of binary data. New
York, Chapman & Hall/CRC.
Pearson, K. (1904). Report on certain enteric
fever inoculation statistics. British Medical
Journal, 3, 1243-1246.
Sánchez-Meca, J., Marín-Martínez, F., & ChacónMoscoso, S. (2003). Effect-size indices for
dichotomized outcomes in meta-analysis.
Psychological Methods, 8(4), 448-467.
107
C2 Training Materials Team
Thanks are due to the following institutions and
individuals
Funder: Norwegian Knowledge Centre for the
Health Sciences
Materials contributors: Betsy Becker, Harris
Cooper, Larry Hedges, Mark Lipsey, Therese
Pigott, Hannah Rothstein, Will Shadish, Jeff
Valentine, David Wilson
Materials developers: Ariel Aloe, Betsy Becker,
Sunny Kim, Jeff Valentine, Meng-Jia Wu, Ying
Zhang
Training co-coordinators: Betsy Becker and
Therese Pigott
108