No Slide Title

Download Report

Transcript No Slide Title

Systematic Reviews:
Methods and Procedures
George A. Wells
Editor, Cochrane Musculoskeletal Review
Group
Department of Epidemiology and Community Medicine
University of Ottawa
Ottawa, Ontario, Canada
Meta-analysis:
• Meta-analysis is a statistical analysis of a
collection of studies
• Meta-analysis methods focus on contrasting and
comparing results from different studies in
anticipation of identifying consistent patterns and
sources of disagreements among these results
• Primary objective:
• Synthetic goal (estimation of summary effect)
vs
• Analytic goal (estimation of differences)
• Systematic Review:
– the application of scientific strategies that limit
bias to the systematic assembly, critical
appraisal and synthesis of all relevant studies
on a specific topic
• Meta-Analysis:
– a systematic review that employs statistical
methods to combine and summarize the
results of several studies
Features of narrative reviews and
systematic reviews
QUESTION
NARRATIVE SYSTEMATIC
Broad
Focused
SOURCES/
SEARCH
Usually unspecified Comprehensive;
Possibly biased
explicit
SELECTION
Unspecified; biased?Criterion-based;
uniformly applied
APPRAISAL
Variable
SYNTHESIS
Usually qualitative Quantitative
INFERENCE
Sometimes
evidence-based
Rigourous
Usually evidencebased
Steps of a Cochrane
Systematic Review
• Clearly formulated question
• Comprehensive data search
• Unbiased selection and extraction
process
• Critical appraisal of data
• Synthesis of data
• Perform sensitivity and subgroup
analyses if appropriate and possible
• Prepare a structured report
• What is the study objective
 to validate results in a large population
 to guide new studies
 Pose question in both biologic and health
care terms specifying with operational
definitions
 population
 intervention
 outcomes (both beneficial and harmful)
Inclusion Criteria
• Study design
• Population
• Interventions
• Outcomes
Steps of a Cochrane
Systematic Review
• Clearly formulated question
• Comprehensive data search
• Unbiased selection and extraction
process
• Critical appraisal of data
• Synthesis of data
• Perform sensitivity and subgroup
analyses if appropriate and possible
• Prepare a structured report
•
•
•
•
Need a well formulated and co-ordinated effort
Seek guidance from a librarian
Specify language constraints
Requirements for comprehensiveness of search
depends on the field and question to be
addressed
• Possible sources include:











computerized bibliographic database
review articles
abstracts
conference proceedings
dissertations
books
experts
granting agencies
trial registries
industry
journal handsearching
• Procedure:
 usually begin with searches of biblographic reports
(citation indexes, abstract databases)
 publications retrieved and references therein searched
for more references
Published Reports
(publication bias ie. tendency to publish
statistically significant results)
 as a step to elimination of publication bias need
information from unpublished research
 databases of unpublished reports
 clinical research registries
 clinical trial registries
 unpublished theses
 conference indexes
Steps of a Cochrane
Systematic Review
• Clearly formulated question
• Comprehensive data search
• Unbiased selection and extraction
process
• Critical appraisal of data
• Synthesis of data
• Perform sensitivity and subgroup
analyses if appropriate and possible
• Prepare a structured report
Study Selection
• 2 independent reviewers select studies
• Selection of studies addressing the
question posed based on a priori
specification of the population,
intervention, outcomes and study design
• Level of agreement: kappa
• Differences resolved by consensus
• Specify reasons for rejecting studies
Data Extraction
• 2 independent reviewers extract data
using predetermined forms
–
–
–
–
Patient characteristics
Study design and methods
Study results
Methodologic quality
• Level of agreement: kappa
• Differences resolved by consensus
Data Extraction ….
• Be explicit, unbiased and reproducible
• Include all relevant measures of benefit
and harm of the intervention
• Contact investigators of the studies for
clarification in published methods etc.
• Extract individual patient data when
published data do not answer questions
about: intention to treat analyses, time-toevent analyses, subgroups, doseresponse relationships
Steps of a Cochrane
Systematic Review
• Well formulated question
• Comprehensive data search
• Unbiased selection and extraction
process
• Critical appraisal of data
• Synthesis of data
• Perform sensitivity and subgroup
analyses if appropriate and possible
• Prepare a structured report
Description of Studies
• Size of study
• Characteristics of study patients
• Details of specific interventions used
• Details of outcomes assessed
Methodologic Quality Assessment
• Can use as:
• threshold for inclusion
• possible explanation form heterogeneity
• Base quality assessments on extent to
which bias is minimized
• Make quality assessment scoring systems
transparent and parsimonious
• Evaluate reproducibility of quality
assessment
• Report quality scoring system used
Quality Assessment: Example
Study
Random Blinding Dropouts
Adami 1995
+
+
+
Black 1996
++
+
+
Bone 1997
+
+
--
Chestnut 1995 +
+
+
Hosking 1998
+
--
+
Liberman 1995 +
+
+
McClung 1998 +
+
+
++ indicates that randomization was appropriate ( e
Random numbers were computer generated) g
Steps of a Cochrane
Systematic Review
• Well formulated question
• Comprehensive data search
• Unbiased selection and extraction
process
• Critical appraisal of data
• Synthesis of data
• Perform sensitivity and subgroup
analyses if appropriate and possible
• Prepare a structured report
Outcome
Discrete
(event)
Odds Relative
Ratio Risk
(OR) (RR)
Continuous
(measured)
Risk
Difference
(RD)
(Basic Data)
Mean
Difference
(MD)
Standardized
Mean Difference
(SMD)
(Basic Data)
Overall Estimate
Overall Estimate
Fixed Effects
Random Effects
Fixed Effects
Random Effects
Effect measures: discrete data
P1 = event rate in experimental group
P2 = event rate in control group
•
•
•
•
•
RD = Risk difference
= P2 - P1
RR = Relative risk
= P1 / P2
RRR = Relative risk reduction = (P2-P1)/P2
OR = Odds ratio
= P1/(1-P1)/[P2/(1-P2)]
NNT = No. needed to treat
= 1 / (P2-P1)
Example
Experimental event rate = 0.3
Control event rate = 0.4
RD = 0.4 - 0.3
RR = 0.3 / 0.4
RRR = (0.4 - 0.3) / 0.4
OR = (0.3/0.7)/(0.4/0.6)
NNT = 1 / (0.4 - 0.3)
= 0.1
= 0.75
= 0.25
= 0.64
= 10
Discrete - Odds Ratio (OR)
Event
a
Experimental
Control
No event
b
c
d
Pe  a ne
Pc  c nc
Odds:
number of patients experiencing event
number of patients not experiencing event
Odds ratio:
Odds in Experimental group
Odds in Control group
OR=
Basic Data
 Pe 


 1-Pe 
a/ne
 Pc

 1-Pc
 ad
=
 bc
c/nc
ne
nc
Discrete - Odds Ratio Example
Experimental
Event
13
Control
No event
33
7
31
Pe  13 46
Pc  7 38
13 * 31
OR 
 1.745
7 * 33
Basic Data
13/46
7/38
46
38
Discrete - Relative Risk (RR)
Event
a
Experimental
Control
No event
b
c
d
Pe  a ne
Pc  c nc
Risk:
number of patients experiencing event
number of patients
Risk Ratio:
Risk in Experimental group
Risk in Control group
RR 
Basic Data
a(c  d)
Pe Pc 
(a  b)c
a/ne
c/nc
ne
nc
Discrete - Relative Risk - Example
Experimental
Event
13
Control
No event
33
7
31
Pe  13 46
Pc  7 38
13 / 46
RR  Pe Pc 
 1.534
7/38
Basic Data
13/46
7/38
46
38
Discrete - Risk Difference (RD)
Event
a
Experimental
Control
No event
b
c
d
Pe  a ne
Risk:
ne
nc
Pc  c nc
number of patients experiencing event
number of patients
Risk Difference: (Risk in Experimental group) - (Risk in Control group)
RD = Pe- Pc
Basic Data
a/ne
a
c


ab cd
c/nc
Discrete - Risk Difference - Example
Event
13
Experimental
Control
No event
33
7
31
Pe  13 46
Pc  7 38
RD = Pe- Pc = 13/46 - 7/38 = 0.098
Basic Data
13/46
7/38
46
38
Discrete - Odds Ratio
(O)
Event
a
Experimental
Control
c
p e  a ne
Estimator:
pˆ e /(1  pˆ e )
o
pˆ c /(1  pˆ c )
Standard Error:
sLo
No event
b
d
ne
nc
pc  c nc
L o  ln(o)


1
1
 


 ne pe (1  pe ) nc pc (1  pc ) 
100(1-  )% CI:
exp(L o  Z/2 sLo )
Lo  Z/2 sLo
1/2
Discrete - Relative Risk (R)
Experimental
Event
a
Control
c
p e  a ne
Estimator:
r  pˆ e /pˆ c
No event
b
d
ne
nc
pc  c nc
Lr  ln(r)
1 - p e 1  pc 
sL r  


n
p
n
p
c c 
 e e
Standard Error:
100(1-  )% CI:
exp(L r  Z/2 sLr )
Lr  Z/2 sLr
1/2
Discrete - Risk Difference (D)
Experimental
Control
Event
a
c
p e  a ne
Estimator:
No event
b
d
nc
pc  c nc
d  pˆ e - pˆ c
Standard Error:
 p e (1 - pe ) pc (1  pc ) 
sd  


nc
 ne

100(1-  )% CI:
d  Z/2 sd
ne
1/2
When to use OR / RR / RD
Association
OR
RR
RD
(0,) (0,) (- 1,1)
‘Decreased’ <1
<1
<0
None
1
1
0
‘Increased’
>1
>1
>0
OR vs RR
Odds Ratio  Relative Risk if event occurs infrequently
(i.e. a and c small relative to b and d)
RR =
a(c+d) 
ad = OR
(a+b)c
bc
Odds Ratio > Relative Risk if event occurs frequently
RD vs RR
When interpretation in terms of absolute difference is better
than in relative terms (eg. Interest in absolute reduction in
adverse events)
PROPERTIES OF RISK DIFFERENCE (RD),
RELATIVE RISK (RR) AND ODDS RATIO (OR)
RD
RR
OR
Simple measure?
Yes
Yes
No
Symmetric (measure unaffected by
labelling of study groups)?
Yes
No
Yes
Predicted event rates restricted to
[0,1] if measure is assumed constant?
No
No
Yes
Unbiased estimate available?
Yes
No
No
Efficient estimation in small samples?
No
No
Yes
Motivating biological model available?
Yes
Yes
Yes
Continuous Data - Mean Difference (MD)
number
mean
standard deviation
Experimental
ne
xe
se
Control
nc
xc
sc
Mean difference (MD) :
se (xe - xc ) 
xe - xc
se2 sc2

n e nc
100(1 -  ) % CI : ( xe -xc )  Z / 2 se (xe -xc )
Continuous Data - Standardized Mean Difference (SMD)
number
mean
standard deviation
Experimental
ne
xe
se
Control
nc
xc
sc
x -x
df e c
s
SMD :
where :
(ne  1)s 2e  (nc  1)s 2c
s
n e  nc  2
4(ne  nc  2)  4
f
4(ne  nc  2)  1
 ne  nc

d
se(d)  


n
n
2
(
n

n
)
e
c 
 e c
2
100(1 -  )% CI : d  Z/2 se(d)
1/ 2
When to use MD / SMD
Mean Difference
• When studies have comparable outcome measures (ie.
Same scale, probably same length of follow-up)
• A meta-analysis using MDs is known as a weighted mean
difference (WMD)
Standardized Mean Difference
• When studies use different outcome measurements which
address the same clinical outcome (eg different scales)
• Converts scale to a common scale: number of standard
deviations
Example: Combining different scales
for Swollen Joint Count
Study
Expt
Mean SD
N
Control
Mean
SD
12 19.4
N
MD
SMD
Andersen 6.9
5.2
Furst
18.0
11.0 17 27.0
15.0 16 -9.0
-0.671
Pinheiro
--
--
--
--
--
--
--
12.2 12 -12.5 -1.287
--
Weinblatt 20.0
7.75 15 23.0
8.0
16 -3.0
-0.371
Williams
12.6 56 25.0
13.4 48 -8.0
-0.612
17.0
Sources of Variation over Studies
• “True” inter-study variation may exist
(fixed/random-effects model)
• Sampling error may vary among studies
(sample size)
• Characteristics may differ among studies
(population, intervention)
Modelling Variation
• Parameter of interest:  (quantifies
average treatment effect)
• Number of independent studies: k
• Summary Statistic: Yi (i=1,2,…,k)
• Large sample size: asymptotic normal
distribution
Fixed-effects model vs Random-effects model
Fixed-Effects Model
• Outcome Yi from study i is a sample from a
distribution with mean 
(ie. common mean across studies)
• Yi are independently distributed as N (  ,s i )
(i=1,2,…,k) where s i2 = Var(Yi ) and assume
E(Yi) = 
2
Fixed-Effects Model
x
Random-Effects Model
• Outcome Yi from study i is a sample from a
distribution with mean  i
(ie. study-specific means)
• Yi are independently distributed as N ( i , s i2)
2
(i=1,2,…,k) where s i = Var(Yi ) and assume
E(Yi) =  i
•  i is a realization from a distribution of ‘effects’
with mean 
•  i are independently distributed as N ( , 2 )
(i=1,2,…,k) where
•  2 = Var (  i ) is the inter-study variation
•  is the average treatment effect
Random-Effects Model
x
Random-Effects Model …..
Estimating Average Study Effect 
• after averaging study-specific effects, distribution
2
2
of Yi is N (  , si   )
2
• although  is parameter of interest,  must be
considered and estimated
Estimating Study-Specific Effects  i
• distribution of  i conditional on observed data,  , 2
and is N ( F   (1  F )Y , s 2 (1  F ) )
i
i
i
i
i
• where Fi is the shrinkage factor for the ith
study Fi  si2 /( si2   2 )
Modelling Variation
• Studies are stratified and then combined to
account for differences in sample size and study
characteristics
• A weighted average of estimates from each study
is calculated
• Question of whether a common or study-specific
parameter is to be estimated remains ….
Procedure:
• perform test of homogeneity
• if no significant difference use fixed-effects model
• otherwise identify study characteristics that stratifies
studies into subsets with homogeneous effects or use
random effects model
Fixed Effects Model
• Require from each study
 effect estimate; and
 standard error of effect estimate
Combine these using a weighted average:
pooled estimate
= sum of (estimate  weight)
where weight
sum of weights
= 1 / variance of estimate
• Assumes a common underlying effect behind every trial
Fixed-Effects Model: General Scheme
Study
Measure
Std Error
Weight
1
2
.
.
.
k
Y1
Y2
.
.
.
Yk
s1
s2
.
.
.
sk
W1
W2
.
.
.
Wk
(no association: Yi=0)
Overall Measure:
ˆmle 
W Y
W
i
i
i
i
i
se(ˆ ) 
1
W
i
i
100(1   )% CI : ˆ  Z / 2se(ˆ )
Wi 
1
2
si
Chi-Square Tests:
2
2
2
 total
  assoc
  hom
og
df (k)
2
 total

2
 assoc

( 1)
(k-1)
k
2
W
Y
 ii
  k2
i1
(  WiYi 2 )2
i

 Wi
12
1
 k21
2
i
2
2
ˆ
 hom

W
(
Y


)
 i i
og

i
1
2
 assoc
 N (0,1)
2
Cochran' s Q test
If ‘large’
association
If ‘large’
heterogeneity
Features in Graphic Display
• For each trial
– estimate (square)
– 95% confidence interval (CI) (line)
– size (square) indicates weight allocated
• Solid vertical line of ‘no effect’
– if CI crosses line then effect not significant (p>0.05)
• Horizontal axis
– arithmetic: RD, MD, SMD
– logarithmic: OR, RR
• Diamond represents combined estimate and 95% CI
• Dashed line plotted vertically through combined estimate
Odds Ratio
Three methods for combining
(1) Mantel-Haenszel method
(2) Peto’s method
(3) Maximum likelihood method
Relative Risk
Risk Difference
Peto Odds Ratio
Mantel-Haenszel Odds Ratio
Relative Risk
Risk Difference
Weighted Mean Difference
Standardized Mean Difference
Weighted Mean Difference
Standardized Mean Difference
Heterogeneity
• Define meaning of heterogeneity for each review
• Define a priori the important degree of heterogeneity (in
large data sets trivial heterogeneity may be statistically
significant)
• If heterogeneity exists examine potential sources
(differences in study quality, participants, intervention
specifics or outcome measurement/definition)
• If heterogeneity exists across studies, consider using
random effects model
• If heterogeneity can be explained using a priori hypotheses,
consider presenting results by these subgroups
• If heterogeneity cannot be explained, proceed with caution
with further statistical aggregation and subgroup analysis
Heterogeneity: How to Identify it
• Common sense
 are the patients, interventions and outcomes in
each of the included studies sufficiently
similar
• Exploratory analysis of study-specific estimates
• Statistical tests
Heterogeneity: How to deal with it
Lau et al. 1997
Heterogeneity: Exploring it
• Subgroup analyses
 subsets of trials
 subsets of patients
 SUBGROUPS SHOULD BE PRE-SPECIFIED
TO AVOID BIAS
• Meta-regression
– relate size of effect to characteristics of the trials
Exploring Heterogeneity: subgroup analysis
Exploring Heterogeneity: subgroup analysis
Random Effects Model
• Assume true effect estimates really vary across studies
• Two sources of variation:
- within studies (between patients)
- between studies (heterogeneity)
• What the software does:
- Revise weights to take into account both components of
variation:
• weight =
1
variance+heterogeneity
• When heterogeneity exists we get
 a different pooled estimate (but not necessarily) with a different
interpretation
 a wider confidence interval
 a larger p-value
Random Effects Model
If  2 is known then MLE of  is
ˆ( )mle 
W ( )Y
W ( )
i
i
i
i
1
where Wi ( )  2
si   2
i
If  2 is unknown three common methods of inference
can be used:
Restricted Maximum Likelihood (REML)
Bayesian
Method of Moments (MOM)
Method of Moments (Random effects model)


2
 hom og  (k  1) 

2w  max 0,

2
W

W
W
  i  i  i
 i

Study
1
2
.
.
.
k
Measure
Y1
Y2
.
.
.
Yk
Overall Measure
ˆ * 
Weight (FE)
W1
W2
.
.
.
Wk
Weight (RE)
w1*=(w1-1+ 2w)-1
w2*=(w2-1+ 2w)-1
.
.
.
wk*=(wk-1+ 2w)-1
W Y
W
*
i
i
i
*
i
i
se(ˆ * ) 
1
W
i
100(1   )% CI :
*
*
i
  Z / 2 se(ˆ * )
Effect of model choice
on study weights
Larger studies receive proportionally
less weight in RE model
than in FE model
Fixed Effects
Random Effects
Fixed vs Random Effects: Discrete Data
Fixed vs Random Effects: Continuous Data
Fixed Effects
Random Effects
Omission of Outlier - Chestnut Study
Analysis
• Include all relevant and clinically useful measures
of treatment effect
• Perform a narrative, qualitative summary when
data are too sparse, of too low quality or too
heterogeneous to proceed with a meta-analysis
• Specify if fixed or random effects model is used
• Describe proportion of patients used in final
analysis
• Use confidence intervals
• Include a power analysis
• Consider cumulative meta-analysis (by order of
publication date, baseline risk, study quality) to
assess the contribution of successive studies
Steps of a Cochrane
Systematic Review
• Well formulated question
• Comprehensive data search
• Unbiased selection and extraction
process
• Critical appraisal of data
• Synthesis of data
• Perform sensitivity and subgroup
analyses if appropriate and possible
• Prepare a structured report
Subgroup Analyses
• Pre-specify hypothesis-testing subgroup
analyses and keep few in number
• Label all a posteriori subgroup analyses
• When subgroup differences are detected,
interpret in light of whether they are:
•
•
•
•
•
•
established a priori
few in number
supported by plausible causal mechanisms
important (qualitative vs quantitative)
consistent across studies
statistically significant (adjusted for multiple testing)
Sensitivity Analyses
• Test robustness of results relative to key features of the
studies and key assumptions and decisions
• Include tests of bias due to retrospective nature of
systematic reviews (eg.with/without studies of lower
methodologic quality)
• Consider fragility of results by determining effect of small
shifts in number of events between groups
• Consider cumulative meta-analysis to explore relationship
between effect size and study quality, control event rates
and other relevent features
• Test a reasonable range of values for missing data from
studies with uncertain results
Funnel Plot
• Scatterplot of effect estimates against sample
size
• Used to detect publication bias
• If no bias, expect symmetric, inverted funnel
x
x
x
x x x
x
x
x
x x x
• If bias, expect asymmetric or skewed shape
x
x x
x x x
x x x
x
Suggestion of missing small studies
Funnel Plot Example 1: Prophylaxis of
NSAID induced Gastric Ulcers
700
600
Sample Size
500
400
300
Intervention
200
100
H2-Blockers
0
0.0
.2
.4
.6
Effect Size (RR)
.8
1.0
1.2
Funnel Plot Example 2: Alendronate for
Postmenopausal Osteoporosis
2500
Sample Size
2000
WMD of %
change in lumbar
bone mineral
density
1500
1000
500
0
0
5
Weighted Mean Difference
10
Steps of a Cochrane
Systematic Review
• Well formulated question
• Comprehensive data search
• Unbiased selection and extraction
process
• Critical appraisal of data
• Synthesis of data
• Perform sensitivity and subgroup
analyses if appropriate and possible
• Prepare a structured report
Presentation of Results
• Include a structured abstract
• Include a table of the key elements of each study
• Include summary data from which the measures
are computed
• Employ informative graphic displays representing
confidence intervals, group event rates, sample
sizes etc.
Interpretation of Results
• Interpret results in context of current health care
• State methodologic limitations of studies and
review
• Consider size of effect in studies and review, their
consistency and presence of dose-response
relationship
• Consider interpreting results in context of
temporal cumulative meta-analysis
• Interpret results in light of other available
evidence
• Make recommendations clear and practical
• Propose future research agenda (clinical and
methodological requirements)
Generic Inferential Framework
Generic inferential framework
(1) Conceptually, think of a ‘generic’ effect
size statistic T
(2) corresponding effect size parameter θ
(3) associated standard error SE(T), square
root of variance
(4) for some effect sizes, some suitable
transformation may be needed to make
inference based on normal distribution
theory
Generic inferential framework ...
(A) Fixed-Effects Model (FEM):
– Assume a common effect size
– Obtain average effect size as a weighted mean
(unbiased)
• Optimal weight is reciprocal of variance (inverse
variance weighted method)
Generic inferential framework ...
• Variances inversely proportional to withinstudy sample sizes
– what is the effect of larger studies in
calculating weights?
– may also weigh by ‘quality’ index, q, scaled
from 0 to 1
Generic inferential framework ...
• Average effect size has conditional
variance (a function of conditional
variances of each effect size, quality
index, …)
– e.g.. V = 1/total weight
• Multiply the resulting standard error by
appropriate critical value (1.96, 2.58, 1.645)
• Construct confidence interval and/or test statistic
Generic inferential framework ...
• Test the homogeneity assumption using a
weighted effect size sums of squares of
deviations, Q
• If Q exceeds the critical value of chisquare at k-1 d.f. (k = number of studies),
then observed between-study variance
significantly greater than what would be
expected under the null hypothesis
Generic inferential framework ...
• When within-study sample sizes are very large, Q
may be rejected even when individual effect size
estimates do not differ much
• One can take different courses of action when Q
is rejected (see next page)
Generic inferential framework ...
• Methodologic choices in dealing with ‘heterogeneous’ data
Generic inferential framework ...
(B) Random-Effects Model (REM):
– Total variability of an observed study effect size reflects
within and between variance (extra variance component)
– If between-studies variance is zero, equations of REM
reduce to those of FEM
– Presence of a variance component which is significantly
different from zero may be indicative of REM
Generic inferential framework ...
• Once significance of variance component is
established (e.g.. Q test for homogeneity of effect
size),
– its magnitude should be estimated
– variance components can be estimated in many ways!
• the most commonly used method is the so-called the
DerSimonian-Laird method which is based on method-ofmoments approach
– Compute random effects weighted mean as an estimate
of the average of the random effects in the population
– construct confidence interval and conduct hypothesis
tests as before (new variance and thus new weights!!!)
Correlation Coefficient
Example: Correlation coefficient
• A measure of association more popular in crosssectional observational studies than in RCTs is
Pearson’s correlation coefficient, r given by
r
 ( X  X )(Y  Y )
 ( X  X )  (Y  Y )
2
2
• X and Y must be continuous (e.g. blood pressure
and weight)
• r lies between -1 to 1
• not available in RevMan / MetaView at this time
Correlation coefficient (cont’d)
• Following the generic framework
discussed earlier:
– the effect size statistic is r
– the corresponding effect size parameter is the
underlying population correlation coefficient, 
– in this case, a suitable transformation is
needed to achieve approximate normality of
effect size
– inference is conducted on the scale of the
transformed variable and final results are
back-transformed to the original scale
Correlation coefficient (cont’d)
Assuming X and Y have a bivariate normal distribution, the Fisher’s Z
transformed variable
1
1 r
Z  log
2
1 r
has, for large sample, an approximate normal distribution with mean of
and a variance of
1
1 
  log
2
1 
1
Var ( Z ) 
n3
Hence, weighting factor associated with Z is W = 1/Var = n-3.
Correlation coefficient (cont’d)
•
meta-analysis is carried out on Z-transformed measures
and final results are transformed back to the scale of
correlation using
e 1
r  2Z
e 1
2Z
Numerical Example
•
Source: Fleiss J., Statistical Methods in Medical Research 1993; 2: 121 -145.
• correlation coefficients reported by 7
independent studies in education are included in
the meta-analysis
• Comparison: association between a
characteristic of the teacher and the mean
measure of his or her student’s achievement
Example: Fleiss (1993)
__________________________________________
Study n
r
Z*
W**
WZ
WZ2
==============================================================
1
15 -0.073 -0.073
12
-0.876
0.064
2
16 0.308 0.318
13
4.134
1.315
3
15 0.481 0.524
12
6.288
3.295
4
16 0.428 0.457
13
5.941
2.715
5
15 0.180 0.182
12
2.184
0.397
6
17 0.290 0.299
14
4.186
1.252
7 __ 15 0.400 0.424 _
12 ___5.088
2.157__
Sum
88
26.945 11.195
===================================================
*Z = Fisher’s Z-transformation of r
** W = n-3
Q  Wi ( Z i  Z )
2
2
 Wi Z i  (Wi Z i ) / Wi
2
 11.195  (26.945) /88  2.94
2
Q = 2.94 on 6 df is not statistically significant.
Results and discussions
• No evidence for heterogeneous association
across studies
• Fixed effect analysis may be undertaken
• Questions:
– Would a random effect analysis as shown earlier
produce a different numerical value for the combined
correlation coefficient?
– How would the weights be modified to carry out a REM?
Results and discussions (cont’d)
• the weighted mean of Z is
Z  Wi Zi / Wi  26.945/88  0.306
• the approximate standard error of the combined
mean is
1
1
SE ( Z ) 

 0.107
Wi 88
Results and discussions (cont’d)
• Test of significance is carried out using
Z
0.306
z

 2.86
SE ( Z ) 0.107
– this value exceeds the critical value 1.96
(corresponding to 5% level of significance), so we
conclude that average value of Z (hence the average
correlation) is statistically significant
Results and discussions (cont’d)
• 95% confidence interval for  is
Z  1.96  SE ( Z )
0.096    0.516
• Transforming back to the original scale, a 95% CI
for the parameter of interest, , is
0.096    0.474
– again confirming a significant association
Critical Appraisal of a
Systematic Review
(A) The Message
• Does the review set out to answer a
precise question about patient care?
– Should be different from an uncritical
encyclopedic presentation
(B) The Validity
• Have studies been sought thoroughly:
 Medline and other relevant bibliographic database
 Cochrane controlled clinical trials register
 Foreign language literature
 "Grey literature" (unpublished or un-indexed reports:
theses, conference proceedings, internal reports,
non-indexed journals, pharmaceutical industry files)
 Reference chaining from any articles found
 Personal approaches to experts in the field to find
unpublished reports
 Hand searches of the relevant specialized journals.
Validity (cont’d)
• Have inclusion and exclusion criteria for
studies been stated explicitly, taking
account of the patients in the studies, the
interventions used, the outcomes
recorded and the methodology?
Validity (cont’d)
• Have the authors considered the
homogeneity of the studies: the idea that
the studies are sufficiently similar in their
design, interventions and subjects to
merit combination.
– this is done either by eyeballing graphs like
the forest plot or by applications of chi-square
tests (Q test)
(C) The Utility
• The various studies may have used
patients of different ages or social
classes, but if the treatment effects are
consistent across the studies, then
generalisation to other groups or
populations is more justified.
Utility (cont’d)
• Be wary of sub-group analyses where the
authors attempt to draw new conclusions
by comparing the outcomes for patients in
one study with the patients in another
study
– Be wary of "data-dredging" exercises, testing
multiple hypotheses against the data,
especially if the hypotheses were constructed
after the study had begun data collection.
Utility (cont’d)
• One may also want to ask:
 Were all clinically important outcomes considered?
 Are the benefits worth the harms and costs?