Campbell Collaboration Training Materials

Download Report

Transcript Campbell Collaboration Training Materials

CAMPBELL
COLLABORATION
Effect Sizes
Overview

Overview of Effect Sizes

Effect Sizes from the d Family

Effect Sizes from the r Family

Effect Sizes for Categorical Data

Connections Between the Effect-Size Metrics
2
Effect sizes

Meta-analysis expresses the results of each
study using a quantitative index of effect size
(ES).

ESs are measures of the strength or magnitude
of a relationship of interest.

ESs have the advantage of being comparable
(i.e., they estimate the same thing) across all of
the studies and therefore can be summarized
across studies in the meta-analysis.

ESs are relatively independent of sample size.
3
Effect sizes

An effect size is a quantitative index that
represents the results of a study.

Effect sizes make study results comparable so
that …



results can be compared across studies, or
results can be summarized across studies.
Examples of effect-size indices include


standardized mean differences (ds), and
correlation coefficients (rs).
4
Effect sizes

A crucial conceptual distinction is between
effect-size …



estimates, computed from studies (sample
effect sizes), and
parameters (population or true effect sizes).
We want to make inferences about effect-size
parameters using effect-size estimates.
5
Types of effect size

Most reviews use effect sizes from one of three
families of effect sizes:

the d family, including the standardized mean
difference,

the r family, including the correlation coefficient,
and

the odds ratio (OR) family, including proportions
and other measures for categorical data.
6
Types of effect size

Test statistics (e.g., t statistics, F tests, and so
on) are not ideal ES because they depend on:

Effect size and

Sample size (n)

That is, Test Statistic = f(Effect Size, sample size)
7
Types of effect size

The significance level (a.k.a. the p value) is also
not an ideal ES because it depends on the test
statistic and n.

Studies with the same effect sizes can get
different p values, simply because they differ in
sample size.

Studies with fundamentally different results can
get the same p values, because they differ in
size.

Thus, the
size.
p value is a misleading index of effect
8
The choice of effect size

A particular index is chosen to make results from
different studies comparable to one another. The
choice depends on the ...

question of interest for the review,

designs of studies being reviewed,

statistical analyses that have been reported, and

measures of the outcome variable.
9
The choice of effect size

When we have continuous data (means and
standard deviations) for two groups, we typically
compute a raw mean difference or a
standardized difference – an effect size from the
d family,

When we have correlational data, we typically
compute a correlation (from the r family), or

When we have binary data (the patient lived or
died, the student passed or failed), we typically
compute an odds ratio, a risk ratio, or a risk
difference.
10
Features of most effect sizes

We introduce some notation for a common case –
the treatment/control comparison.

Let Y T be the mean posttest score in the treatment
group, Y C be the mean control-group posttest
score, and SYpooled be the pooled within-groups
standard deviation for the Y scores (i.e., the t-test
SD). Then we may compute standardized T - C
differences using posttest means as
gpost  (Y  Y ) / SYpooled
T
C
11
Features of most effect sizes

Remember that all statistical estimators are
estimating some parameter. What parameter is
being estimated by gpost?

The answer is, the population standardized mean
difference, usually denoted by the Greek letter
delta, where population means and the population
SD s appear in place of the sample values:
 post  (   ) / s Ypooled
T
C
12
Expected values of effect sizes

Some ES indices are biased in small samples.
It is common to correct for this small-sample
bias.

The posttest effect size gpost is biased, with
expected value
E[gpost ] = δ/c(m),
where c(m)=1 - 3/(4m-1), and m = nT + nC – 2.

In general m is the df for the appropriate t test,
here, the two-sample t test.
13
Expected values of effect sizes

So now we can correct for bias: d = c(m)*gpost.

The expected value of d is δ.

The correlation is also biased, and can be
corrected via
ru = r [1- (1 - r2)/(2n-2)] .

Proportions are not biased, and do not need
correction.
14
Variances of effect sizes

Effect-size indices also have variances that can
be estimated using data from the individual
study from which the ES is obtained.

Below we provide the variances of many ES
indices, noting that in all cases the variance is
an inverse function of the study sample size.
Thus smaller studies have larger variances,
representing less precise information about the
effect of interest. The ES variance is a key
component of nearly all statistical analyses
used in meta-analysis.
15
Statistical properties (Variances)

Often the variances of ES indices are also
conditional on (i.e., are functions of) the
parameter values. Consider the variance of d:
n n

Var (d )  v  T C 
n n
2(nT  nC )
T
C
2
which is a function of . Below we introduce
transformations that can be used with some ES
indices to remove the parameter from the
variance (i.e., to stabilize the variances).
16
Variance of the standardized mean difference
nT  n C
2
Var (d )  v  T C 
n n
2(nT  nC )

As d increases (becomes more unusual or
extreme) the variance also increases. We are
more uncertain about extreme effects.

The variance also depends on the sample sizes,
and as the ns increase, the variance decreases.
Large studies provide more precise data; we are
more certain about effects from large studies.
17
Statistical properties (Variances)

Variances of effect sizes are not typically equal
across studies, even if stabilized. This is because
most variances depend on sample sizes, and it is
rare to have identical-sized samples when we look
at sets of studies.

Thus, homoscedasticity assumptions are nearly
always not met by most meta-analysis data!!

This is why we do not use “typical” statistical
procedures (like t tests and ANOVA) for most
analyses in meta-analysis.
18
Quick Examples:
Common Study Outcomes
for
Treatment-Control Meta-analyses
19
Common study outcomes for trt/ctrl
meta-analysis

Treatment (T)/control (C) studies:

Above we introduced the standardized T - C
difference in posttest means:
gpost  (Y  Y ) / SYpooled
T

C
We also can compute T-C differences in other
metrics and for other outcomes.
20
Common study outcomes for trt/ctrl
meta-analysis: d family

We may also compute standardized T - C
differences in:

gain or difference score means for D = Y – X
gdiff  ( D  D ) / SD pooled
standardized by the
difference SD
or ( D T  D C ) / SY pooled
standardized by the
posttest SD
T

C
covariate adjusted means


g resid  (Y T  bX T )  (Y C  bX C ) / S residual
21
Common study outcomes for trt/ctrl
meta-analysis: Categorical outcomes

differences between proportions: PT  PC

odds ratios for proportions:
T
T
C
C
P
/
(1

P
)
/
P
/
(1

P
)  or





log odds ratios: log [ PT / (1  PT )] / [ P C / (1  PC )]

differences between arcsine-transformed
proportions
2  sin 1 ( PT )  2  sin 1 ( PC )
22
Less common study outcomes for trt/ctrl
meta-analysis

differences between transformed variances:
2 log(ST) - 2log(SC) or 2 log(ST/SC)

probability values from various tests of Trt/Ctrl
differences, such as the t test (shown), ANOVA F
test, etc.
Pi  
tobs

f (t ) dt
23
Other common study outcomes for
meta-analysis: d family

Single group studies

standardized posttest - pretest mean difference:
gchange  (Y T  X T ) / SY or (Y T  X T ) / S X

covariate adjusted means:
gresidchange  (Y T  bX T ) / Sresidual

proportions (e.g., post-trt counts for outcome A):
PT  n A / n

arcsine proportions: 2  sin 1 ( PT )
24
Other common study outcomes for
meta-analysis
odds ratios
for single
proportions
P

correlations
r

correlation matrices
r1, ..., rp(p-1)

variance ratios
Spost/Spre or 2log(Spost/Spre)

“variance accounted
for” measures

T
/ (1  PT )  or
log[ PT / (1  PT )]
R2 , Eta2 , etc.
25
Common study outcomes for meta-analysis

We next treat each of the three families of effect
sizes in turn :

Effect Sizes from the d Family

Effect Sizes from the r Family

Effect Sizes for Categorical Data
26
More Detail on Effect Sizes:
The d Family
27
Standardized mean difference

The standardized mean difference may be
appropriate when



Studies use different (continuous) outcome
measures
Study designs compare the mean outcomes in
treatment and control groups
Analyses use ANOVA, t tests, and sometimes
chi-squares (if the underlying outcome can be
viewed as continuous)
28
Standardized mean difference: Definition
Population
Group
Means
SD
Effect
Sizes
Treatment

Control

T
Sample
C
s
 

s
T
Treatment
Y
Control
T
Y
C
Sp
C
Y Y
d  c ( m)
SYpooled
T
C
29
Computing standardized mean difference

The first steps in computing d effect sizes
involve assessing what data are available and
what’s missing. You will look for:





Sample size and unit information
Means and SDs or SEs for treatment and control
groups
ANOVA tables
F or t tests in text, or
Tables of counts
30
Sample sizes

Regardless of exactly what you compute you will
need to get sample sizes (to correct for bias and
compute variances).

Sample sizes can vary within studies so check
initial reports of n against


n for each test or outcome or
df associated with each test
31
Calculating effect-size estimates from
research reports

A major issue is often computing the within-group
standard deviation Spooled.

The standard deviation determines the “metric” for
standardized mean differences.

Different test statistics (e.g., t vs. multi-way
ANOVA F ) use different SD metrics.

In general it is best to try to compute or convert to
the metric of within-group (i.e., Treatment and
Control) standard deviations.
32
Calculating effect sizes from means and SDs

Glass’s or Cohen’s effect size is defined as
Y Y
g
SYpooled
T
C
(n  1) S  (n  1)S
,
and S Ypooled 
T
C
(n  1)  (n  1)
T
2
T
C
2
C
where nT and nC are group sample sizes, and
ST2 and SC2 are group variances. Also recall that
d = g *[1 - 3/(4m-1)],
where m = nT + nC – 2.
33
Variance of the standardized mean difference

Most notable for statistical work in meta-analysis
is the fact that each of the study indices has a
“known” variance. These variances are often
conditional on the parameter values.

For d the variance is
nT  n C
2
Var (d )  v  T C 
T
C
n n
2(n  n )
The variance is computed by substituting d for .
34
Confidence interval for the standardized
mean difference

The 95% confidence interval for d is
d  1.96  SEd ,
where SEd  Var(d )
35
Calculating effect sizes from means and SDs
Equal n
ST (ST2) nT
SC (SC2)
example: 980 50 (2500) 30 1020 60 (3600)
Pooled
standard
deviation:
Effect
size:
nC
30
2500

3600
S Ypooled
Sp 
 3050  55.23
2
Y T  Y C 980  1020  40
g


 0.72
SYpooled
55.23
55.23
36
Calculating effect sizes from means and SDs
Data:
Unbiased
effect size:
ST (ST2)
nT
980 50 (2500)
30
g = -0.72,
1020
SC (SC2)
nC
60 (3600)
30
d = -0.72*[1-3/(4*58 - 1)]
= -0.72*.987 = -0.71
nT  n C
d2
30  30 (0.71) 2
SE (d ) 



 0.266
T C
T
C
n n
2(n  n )
30 * 30
2(60)
95% CI:
-0.71 + 1.96*(.27) = -0.71 + 0.53
or -1.24 to -0.18
37
Calculating effect sizes: Practice

Compute the values of d, the SEs, and the 95%
CIs for these two studies:
Treatment
Control
Mean
12
15
SD
4
6
n
12
12
Treatment
Control
Mean
6.5
5
SD
4
4
n
60
60
Answers are at the end of the section.
38
Calculating effect sizes from the independent
groups F test

If the study’s design is a two group (treatmentcontrol) comparison and the ANOVA F statistic
is reported, then
(n  n ) F
g
.
T C
n n
T

C
You must determine the sign from other
information in the study.
39
Calculating effect sizes from the independent
groups t test

When the study makes a two group (treatmentcontrol) comparison and the t statistic is reported,
we can also compute d easily. Then
n n
g t
.
T C
n n
T
C
40
Calculating effect sizes from the two-way
ANOVA

Exactly how we compute d for the two-way
ANOVA depends on the information reported in
the study.

We consider two cases:

the full ANOVA table is reported, and

the cell means and SDs are reported.
41
Calculating effect sizes from the two-way
ANOVA table

Suppose A is the treatment factor and B is the
other factor in this design. We pool the B and AB
factors with within-cell variation to get
S
2
pooled
SSB  SSAB  SSW

 MSWithin
dfB  dfAB  dfW
where MSWithin is the MSW for the one-way
design with A as the only factor. Then d is
computed as
(nT  n C ) MSA
g
2
nT n C SYpooled
42
Calculating effect sizes from the two-way
ANOVA cell means and SDs

Suppose we have J subgroups within the
treatment and control groups, with means Yij and
sample sizes nij (i = 1 is the treatment group and
i = 2 is the control group). We first compute the
treatment and control group means:
J
YT 
n
j 1
J
Y
1j 1j
n
j 1
J
1j
YC 
n
j 1
J
Y
2j 2j
n
j 1
2j
43
Calculating effect sizes from the two-way
ANOVA cell means and SDs

Then compute the standard deviation Sp via
2
2
S pooled

J
SS B   (nij  1) Sij2
i 1 j 1
T
C
n n 2
where SSB is the between cells sum of squares
within the treatment and control groups
J
J
j 1
j 1
SS B   n1 j (Y1 j  Y T ) 2   n2 j (Y2 j  Y C ) 2

Y T Y C
Then calculate the effect size as g 
.
S pooled
44
Calculating effect sizes from the two-way
ANOVA: Variants

There are, of course, variants of these two
methods.

For example, you might have the MSW, but not
the within cell standard deviations (the Sij).

Then you could use dfW  MSW in place of the
sum of weighted Sij2 values in the last term of the
numerator in the expression for S2pooled on the
previous slide.
45
Calculating effect sizes from the one-way
ANCOVA
A

Suppose a study uses a one-way ANCOVA with
a factor that is a treatment-control comparison.

Can we use the ANCOVA F statistic to compute
the effect size? NO! Or rather, if we do we will
not get a comparable effect-size measure.

The error term used in the ANCOVA F test is not
the same as the unadjusted within (treatment or
control) group variance, and is usually smaller
than the one-way MSW.
46
Calculating effect sizes from the one-way
ANCOVA


A
The F statistic is F = MSB/MSW, but

MSW is the covariate adjusted squared SD
within the treatment and control groups, and

MSB is the covariate adjusted mean difference
between treatment and control groups.
To get the SD needed for a comparable effect
size, we must reconstruct the unadjusted SD
within treatment and control groups.
47
Calculating effect sizes from the one-way
ANCOVA

A
The unadjusted SD is
S pooled 
S Adjusted
1 r
2
where r is the covariate-outcome correlation, so
(n  n )(1  r ) F
g 
T C
n n
T
C
2
48
Calculating effect sizes from the one-way
ANCOVA

A
The procedure is equivalent to

computing the g using the ANCOVA F, as if it
was from a one-way ANOVA (we will call this
gUncorrected)

then “correcting” the g for covariate adjustment
via
g Corrected  gUncorrected 1  r
2
49
Calculating effect sizes from the one-way
ANCOVA
A

The effect size given previously uses the
adjusted means in the numerator.

However, the reviewer needs to decide whether
unadjusted or covariate adjusted mean
differences are desired. In randomized
experiments, they will not differ much.

Unadjusted means may not be given in the
research report, leading to a practical decision to
calculate effects based on adjusted means.
50
Calculating effect sizes from the one-way
NCOVA
A

Calculating effect sizes from two-way ANCOVA
designs poses a combination of the problems in
two-way ANOVA designs and one-way ANCOVA
designs

The procedure to compute g has two steps:

Compute the d statistic as for two-way ANOVA

Correct the d value for covariate adjustment via
g Adjusted  gUnadjusted 1  r
2
51
Calculating effect sizes from tests on gain
scores

Suppose a t statistic is given for a test of the
difference between gains of the T and C groups.

Can we use this t statistic to get g? NO! Or
rather, as before, this will give a g that is not
totally comparable to the standard t-test g.

The standard deviation in the t statistic for gains
is the SD of gains, not posttest scores. To
compute a comparable effect size, we have to
reconstruct the SD of the posttest scores.
52
Calculating effect sizes from tests on gain
scores

The SD of the posttest scores is
S pooled 
SGain
2(1  r )
where r is the pretest-posttest correlation, thus
g  tGain
2(1  r )( n  n )
T C
n n
T
C
53
Calculating effect sizes from tests on gain
scores

The effect size given previously also uses the
difference between mean gains in the numerator.

Thus, the reviewer needs to decide whether
differences in mean posttest scores or mean
gains are desired. In randomized experiments, the
two types of mean will not usually differ much
from each other.

Post-test means may not be given in the research
report, leading to a practical decision to calculate
effects based on differences in mean gains.
54
Auxiliary data for effect-size calculation

Our examples of the calculation of effect sizes
from designs using ANCOVA and gain scores
illustrates the fact that sometimes auxiliary
information (such as the value of r) is needed to
compute effect sizes.

This information may be missing in many studies,
or even may be missing from all studies.
55
Auxiliary data for effect-size calculation

That poses a choice for the reviewer:



omit studies with missing r values, or
impute r values in some way.
The reviewer's decision on imputation must be
made explicit in the methods section of the
meta-analysis report.
56
Calculating effect sizes: Answers to practice
exercise

Compute the values of d, the SEs, and the 95%
CIs for these two studies:
Study 1
Treatment
Control
Mean
12
15
SD
4
6
n
12
12
Study 2
Treatment
Control
Mean
6.5
5
SD
4
4
n
60
60
57
Calculating effect sizes: Answers to practice
exercise

Study 1
Mean
SD
n
Treatment
Control
12
15
4
6
12
12
For study 1 the values of Spooled and g are
Spooled
(nT  1) ST2  (n C  1) SC2
(11)16  (11)36


 26  5.1
T
C
(n  1)  (n  1)
(11)  (11)
Y T  Y C 12  15 3
gd 


 0.588 or  0.59
Spooled
5.1
5.1
58
Calculating effect sizes: Answers to practice
exercise
Study 1
Mean
SD
n
Treatment
12
4
12
Control
15
6
12
For study 1 d is -0.59*[1-3/(4*22 -1)]=-0.59*0.966
or d = -0.570.
The values of SEd and the 95% CI are

nT  n C
d2
24
(0.57) 2
SEd 



 0.416
T C
T
C
n n
2(n  n )
12 *12
2(24)
95% CI: d  1.96  SEd
 0.57  1.96  0.42 or  1.39 to 0.25
59
Calculating effect sizes: Answers to practice
exercise
Study 2
Treatment
Control

Mean
6.5
5
SD
4
4
n
60
60
For study 2 the values of Spooled and d are
(nT  1) ST2  (nC  1)SC2
(59)16  (59)16
S pooled 

4
T
C
(n  1)  (n  1)
(59)  (59)
Y T  Y C 6.5  5 1.5
d


 0.375 or 0.38
S pooled
4
4
60
Calculating effect sizes: Answers to practice
Study 2
Mean
SD
n
exercise
Treatment
6.5
4
60
Control
5
4
60
For study 2
d is
0.38*[1-3/(4*118 -1)]= 0.38*0.994 = 0.38.


the values of SEd and the 95% CI are
nT  n C
d2
120
(0.38) 2
SEd 



T C
T
C
n n
2(n  n )
60 * 60 2(60)
 .0333  .0012  0.186
95% CI:
d  1.96  SEd
0.38  1.96  0.19 or 0.01 to 0.75
61
Calculating effect sizes: Answers to practice
exercise
95% CI for study 1:
d  1.96  SEd
 0.57  1.96  0.42 or  1.39 to 0.25
95% CI for study 2:
d  1.96  SEd
0.38  1.96  0.19 or 0.01 to 0.75

Even though the effect size for study 2 is smaller
in absolute value that that for study 1, its SE is
smaller and thus the 95% CI does not include 0.
62
More Detail on Effect Sizes:
The r Family
63
The r family

The correlation coefficient or r family effects
may be appropriate when …



studies have a continuous outcome measure,
study designs assess the relation between a
quantitative predictor and the outcome (possibly
controlling for covariates), or
the analysis uses regression (or the general linear
model).
64
Cohen’s benchmarks

Jacob Cohen (1988) proposed general
definitions for anticipating the size of effectsize estimates:
Small
Medium
Large
d
r
.20
.50
.80
.10
.30
.50
65
More on Cohen

Cohen intended these to be “rules of thumb”,
and emphasized that they represent average
effects from across the social sciences.

He cautioned that in some areas, smaller effects
may be more typical, due to
 measurement error or
 the relative weakness of interventions.

Each reviewer will need to make judgments
about what is “typical” based on his or her
expertise.
66
The r family

The most commonly used effect size in this
family is the correlation coefficient r.

This also equals the standardized regression
coefficient when there is only one predictor in a
regression equation.
Sample
r
Population
ρ
67
The r family

When computing a correlation coefficient, scores
are actually being standardized, which makes r
itself standardized.
Y Y
z
 Recall that z is
SY

To compute r we have
r
z x z y
n
where n is the number of X-Y pairs.
68
Statistical properties (Variances)

The variance of the correlation depends on the
sample size and parameter value. We estimate
the variance by using each study’s correlation to
estimate its parameter r. So for study i, we have
vi = Var(ri) = (1 - ri2)2/(ni - 1),
or we can use a consistent estimator of r (e.g., an
average)
vi = Var(ri) = (1 - r 2)2/(ni - 1) .
69
Statistical properties (Transformation)

Sometimes we transform effect sizes because
that simplifies statistical analyses or makes our
assumptions more justifiable.

A common transformation of correlations is the
Fisher z-transform of the correlation
1 r 
z  .5 ln 
.
1 r 
70
Statistical properties (Transformation)

Consider a z-transformed correlation coefficient
from a sample of size n. The z transform is a
variance stabilizing transformation, which
means the variance of z does not depend on r,
as did the variance of r.

The variance of z is
1
Var ( zi )  vi 
.
ni  3
71
Correlation example
Example:
Study
r
z
n (pairs)
1
.60
.693
50
Effect size: r = .60, zr = .693
1 r
1  .6
.64
v


 .091
7
n
49
2
SE of r:
.60  1.96 (.091)
95% CI:
2
in the r metric
.421 to .779 in the r metric
72
Correlation example
Example:
Study
r
z
n (pairs)
1
.60
.693
50
Effect size: r = .60, zr = .693
SE of zr:
1
1
SEzr 

 .146
n3
47
95% CI:
0.693  1.96(0.146) in the z metric
0.406 to 0.98 in the z metric and
.385 to .753 in the r metric
73
Correlation example
Example:
Study
r
z
n (pairs)
1
.60
.693
50
Effect size: r = .60, zr = .693
We must return the CI for Z to get the 95% CI:
0.693  1.96(0.146) in the z metric
0.406 to 0.98 in the z metric and
.385 to .753 in the r metric
74
More Detail on Effect Sizes:
Categorical Data
75
Effect sizes for categorical data

The effect sizes in the d and r families are
mainly for studies with continuous outcomes.

Three popular effect sizes for categorical data
will be introduced here:



the odds ratio,
the risk ratio (or rate ratio), and
the risk difference (the difference between two
probabilities).
76
Effect sizes for categorical data

Consider a study in which a treatment group (T)
and a control group (C) are compared with
respect to the frequency of a binary
characteristic among the participants.

In each group we will count how many
participants have the binary outcome of interest.
We will refer to having the binary outcome as
“being in the focal group” (e.g., passing a test,
being cured of a disease, etc.).
77
Effect sizes for categorical data

Let πT and πC denote the population probabilities
of being in the focal group within each of the two
groups T and C; PT and PC denote the sample
probabilities.
Population
Δ = πT – π C
Sample
RD = PT – PC

Risk Difference

Risk Ratio
qRR = πT / πC
RR = PT / PC
Odds Ratio
 T (1   C )
 C
 (1   T )
PT (1- P C )
OR  C
P (1- PT )

78
Odds ratio

The frequencies (n11, n12, n21, and n22) for the
binary outcomes are counted for both treatment
and control groups.
Yes
(Focal group)
No

Treatment
n11
Control
n12
n21
n22
The odds ratio (OR) is the most widely used
effect-size measure for dichotomous outcomes.
79
Odds ratio

The OR is calculated as
n11 n21 n11n22
OR 

n12 n22 n12n21
Yes
No
Treatment
n11
n21
Control
n12
n22

An OR = 1 represents no effect, or no difference
between treatment and control in the odds of
being in the focal group.

The lower bound of the OR is 0 (Control outcome
is better than Treatment outcome).
Its upper bound is infinity (Treatment < Control).

80
Log Odds ratio

The range of values of the OR is inconvenient
for drawing inferences, and its distribution can
be quite non-normal. The logarithm of OR (LOR)
is more nearly normal, and is calculated as
LOR  ln(OR)

LOR makes interpretation more intuitive. It is
similar in some respects to d.
• A value of 0 represents no T vs. C difference
or no treatment effect.
• The LOR ranges from - to .
81
Log Odds ratio

The standard error of the LOR ( SELOR ) is
calculated as
SELOR

1
1
1
1




n11 n12 n21 n22
An approximate 95% confidence interval for each
LOR can be calculated as
95% CI  LOR  1.96 * SELOR
82
Example: Pearson’s hospital staff typhoid
incidence data

In 1904, Karl Pearson reviewed evidence on the
effects of a vaccine against typhoid (also called
enteric fever).

Pearson’s review included 11 studies of mortality
and immunity to typhoid among British solders.

The treatment was an inoculation against
typhoid and the “cases” – those who became
infected – were the focal group.
83
Example: Pearson’s hospital staff typhoid
incidence data
84
Example: Pearson’s hospital staff typhoid
incidence data

The focal group is the group of diseased staff
members:
Immune
Diseased
Total

Inoculated
n11 = 265
n21 = 32
297
Not Inoculated
n12 = 204
n22 = 75
279
Total
469
107
576
The OR is calculated as
n11 n21 n11n22 265 32 265 * 75
OR 



 3.04
n12 n22 n12n21 204 75 204 * 32
The odds of becoming diseased are 3 times
greater for staff members who are not inoculated.
85
Example: Pearson’s hospital staff typhoid
incidence data

OR can also be computed from cell proportions:
Inoculated
Not
Inoculated
Immune P11=265/576 P12=204/576
=.46
=.35
Diseased P21=32/576 P22=75/576
=.06
=.13
Total
297
279
counts
Total
counts
469
107
576
P11P22 .46 * .13
OR 

 3.04
P12 P21 .35 * .06
86
Example: Pearson’s hospital staff typhoid
incidence data

The LOR in this example is
LOR = ln(3.04) = 1.11

The SE of the LOR in this example is
SELOR
1
1 1 
 1
 

    0.23
 265 204 32 75 
87
Example: Pearson’s hospital staff typhoid
incidence data

The approximate 95% confidence interval for
LOR in this example is
95%CI=LOR  1.96*SELOR  1.11  1.96*0.23
Upper bound LOR: 1.56
Lower bound LOR: 0.66

The CI does not include zero, suggesting a
significant positive treatment effect for
inoculations.
88
Example: Pearson’s hospital staff typhoid
incidence data

The LORs can be transformed back to the OR
metric via the exponential function:
Upper bound OR = exp(1.56) = 4.76
Lower bound OR = exp(0.66) = 1.93

Again here we can see the CI for the OR does
not include 1, indicating a significant treatment
effect.
89
Converting OR or LOR to d

If most of the effect sizes are in the d metric and
just a few expressed as OR or LOR, the OR or
LOR can be converted to d so all the effect sizes
can be pooled.

The transformation developed by Cox (as cited in
Sanchez-Meca, Marín-Martínez, & ChacónMoscoso, 2003) works well. It is computed as
dCox  LOR / 1.65
90
Relative risk

Relative risk (RR) is also used for dichotomous
outcomes.
Immune
Diseased
Total
Treatment
P11
P 21
P •1
Control
P 12
P 22
P •2
Total
P 1•
P 2•
RR  ( P11 P1 ) ( P12 P2 )
where P•1 and P•2 are the marginal proportions for
the first column (treatment) and the second
column (control), respectively.
91
Relative risk

The relative risk ranges from 0 to infinity.

A relative risk (RR) of 1 indicates that there is no
difference in risk between the two groups.

A relative risk (RR) larger than one indicates that
the treatment group has higher risk (of being in the
focal group) than the control.

A relative risk (RR) less than one indicates that the
control group has higher risk than the treatment
group.
.
92
Log Relative risk

As was true for the LOR, the logarithm of the RR
(LRR) has better statistical properties. It is
calculated as
LRR  ln( RR )

The range of the LRR is from - to , and as for
the LOR, a value of 0 indicates no treatment
effect.
93
Example: Pearson’s hospital staff typhoid
incidence data
Immune
Diseased
Total
Inoculated
P11=265/576
=.46
P21=32/576
=.06
P•1=297/576
=.52
.46 .52 .89
RR 

 1.22
.35 .48 .73
Not Inoculated
P12=204/576
=.35
P22=75/576
=.13
P•2=279/576
=.48
Total
469
107
576
LRR  ln( 1.22)  .199
The probability of being immune if inoculated is 1.22
times higher than the probability if not inoculated.
94
Log Relative risk

The standard error of the LRR is calculated as
SELRR
n21
n22


n1n11 n2 n12
based on the tabled counts
Yes
No
Total
Treatment
n11
n 21
n •1
Control
n12
n 22
n •2
Total
n 1•
n 2•
95
Example: Pearson’s hospital staff typhoid
incidence data

The standard error of the LRR for our example is
SELRR
n21
n22


n1n11 n2 n12
32
75


 0.04
297 * 265 279 * 204
96
Example: Pearson’s hospital staff typhoid
incidence data

The 95% CI is
LRR  1.96 * SE ( LRR )  0.20  1.96 * 0.04
Upper bound LRR: 0.28
Lower bound LRR: 0.12

These LRRs can be transformed back to the
RR metric via the exponential function:
Upper bound RR = exp(1.56) = 1.32
Lower bound RR = exp(0.66) = 1.12
97
Converting RR to OR

The RR can also be converted to the OR via
OR  RR{(1 P21 P2 ) (1 P11 P1 )}

In Pearson’s example, it is
 1  .35 / .48  
OR  1.22 
  3.04
 1  .46 / .52  
98
Risk difference (difference between two
proportions)

The risk difference (RD) between proportions is
often considered the most intuitive effect size for
categorical data.
n11 n21
RD 

n1 n2

The standard error of RD (SERD) is
SERD
n11 * n21 n12 * n22


n1
n2
99
Example: Pearson’s hospital staff typhoid
incidence data

For Pearson’s data, 89.2% of those inoculated
were immune (265/297) and 73.1% of those not
inoculated were immune (204/279).

The difference in immunity rates for those
inoculated or not is RD  .89  .73  .16

The standard error of the difference is
SERD
n11 * n21 n12 * n22
265 * 32 204 * 75




 0.032
3
3
n1
n2
297
279
100
Example: Pearson’s hospital staff typhoid
incidence data

In general a 95% CI for RD is
RD  1.96 * SERD  .16  1.96 * 0.032
Upper bound RD: .22
Lower bound RD: .10

Because the value 0 (meaning no difference in
risk) is not included in the CI, we again conclude
there is a treatment effect.
101
Connections Between the
Effect-Size Metrics
102
Conversions among effect-size metrics



The effects d, r, and the odds ratio (OR) can all
be converted from one metric to another.
Sometimes it is convenient to convert effects for
comparison purposes.
A second reason may be that just a few studies
present results that require computation of a
particular effect size. For example, if most
studies present results as means and SDs (and
thus allow d to be calculated), but one reports
the correlation of treatment with the outcome,
one might want to convert the single r to a d.
103
Conversions between d and LOR

Converting d to the log odds ratio
ln( OR ) 

d
3
SE ln(OR) 
 ( SE d ) 2
3
Converting the log odds ratio to d
d
3  ln( OR )

SE 
2
3  SEln(
OR )
3
104
Conversions of r and d

To convert r to d, we first compute the SE of the
correlation using
SE r  (1  r 2 )  SE FisherZ,
SE FisherZ 
1
n3
Then
d
2r
1 r 2
,
and
4  SE
SEd 
(1  r )
2
r
2 3
105
Conversions of r and d

Converting d to r
d
r 2
where
d A

(n1  n2 )
A
n1n2
2
A is a “correction factor” for cases where the
groups are not the same size.

If the specific group sizes are not available,
assume they are equal and use n1 = n2 = n
which yields A = 4.
106
References

Cox, D. R. (1970). Analysis of binary data. New
York, Chapman & Hall/CRC.

Pearson, K. (1904). Report on certain enteric
fever inoculation statistics. British Medical
Journal, 3, 1243-1246.

Sánchez-Meca, J., Marín-Martínez, F., & ChacónMoscoso, S. (2003). Effect-size indices for
dichotomized outcomes in meta-analysis.
Psychological Methods, 8(4), 448-467.
107
C2 Training Materials Team

Thanks are due to the following institutions and
individuals




Funder: Norwegian Knowledge Centre for the
Health Sciences
Materials contributors: Betsy Becker, Harris
Cooper, Larry Hedges, Mark Lipsey, Therese
Pigott, Hannah Rothstein, Will Shadish, Jeff
Valentine, David Wilson
Materials developers: Ariel Aloe, Betsy Becker,
Sunny Kim, Jeff Valentine, Meng-Jia Wu, Ying
Zhang
Training co-coordinators: Betsy Becker and
Therese Pigott
108