meta-analysis workshop slides

Transcript meta-analysis workshop slides

Introduction to Meta-analysis
for Intelligence Research
Materials are at: http://tinyurl.com/meta-analysisJuly
Michael McDaniel
Virginia Commonwealth University
[email protected]
Wo r k s h o p a t t h e I n t e r n a t i o na l S u m m e r S c h o o l I n t e r d i s c ipl i na r y
Approaches to the Study of Intelligence
July 12, 2016
S t . Pe t e r s b u r g, Ru s s i a
Meta-analysis in brief
Meta-analysis in brief
• Meta-analysis is the quantitative combination of information from
multiple empirical studies to produce an estimate of the overall magnitude
of a relation, or impact of an intervention.
• Meta-analysis is a quantitative method used to combine the quantitative
outcomes (effect sizes) of primary research studies.
• Meta-analysis uses statistical procedures to determine the best estimate
of the population effect size.
• Meta analysis examines whether or not the effect is uniform, or varies.
• In the event that the effect varies across studies, meta-analytic procedures
assist the researcher in determining the sources of variation.
2
Meta-analysis in brief
Meta-analysis in brief
Meta-analysis of correlations between truth and beauty:
Sample
Manny (1983)
Moe (1995)
Jack (2004)
Zappa (2011)
N
250
114
617
45
Effect Size
.17
.25
.20
.39
• Calculate/estimate:
• The mean and variance of the effect size (e.g., r) distribution.
• The variance due to random sampling error.
• The variance that is not due to random sampling error.
• Seek to explain the non-sampling error variance.
3
Meta-analysis in brief
Meta-analysis in brief
Meta-analysis with corrections:
Sample
Manny (1983)
Moe (1995)
Jack (2004)
Zappa (2011)
N
250
114
617
45
r
.17
.25
.20
.39
rxx
Truth
.76
.85
.84
.89
ryy
Beauty
.88
.84
.76
.78
Range rest.
(u)
.70
.84
.76
.90
• Estimate the mean and variance of a population distribution in which (a)
variables are assessed without measurement error, and (b) variables have no
restriction in their range.
• Seek to explain non-sampling error and non-artifactual variance.
4
Meta-analysis and systematic reviews
Meta-analysis and systematic reviews
• There are differences in the use of the terms meta-analysis and systematic
reviews across scientific disciplines.
• Meta-analysis is a quantitative method used to combine the quantitative
outcomes (effect sizes) of primary research studies.
• Meta-analysis is the statistical or data analytic part of a systematic review of
a research topic.
• The term systematic review refers to all steps in conducting a systematic
literature review. This includes defining study objectives, searching the
literature, coding data, performing the analysis of the data (metaanalysis), and drawing conclusions from the analysis.
5
Poker chips
How many red poker chips are in the bag?
• Poker chip exercise:
• Shake up the bag (3 times).
• Pick 10 chips from the bag and count how many are red.
• Record the number of red chips here: _____ .
• Put the chips back in the bag and pass it to the next person.
6
Why do a systematic review?
Why do a systematic review?
• In general, literature reviews, whether narrative or systematic, have much
higher citation counts than primary studies.
• It is much more efficient to gain knowledge by reading one literature
review than reading 100 primary studies.
• Authors of literature reviews achieve a reputation as experts in that area.
• Particularly true with systematic reviews.
• Systematic reviews are highly cited.
• Once a systematic review is done on a literature, new primary studies need
to be framed with reference to the systematic review.
7
Objectives of this workshop
Objectives of this workshop
At the end of this talk students should:
• Understand …
• … the steps in the systematic review process.
• … some of the basic conceptual and statistical background of meta-analysis.
• … interpret the results of a meta-analysis.
8
Objectives of this workshop
Objectives of this workshop
• I will address meta-analyses in the two primary traditions:
• The meta-analytic approach in the “Hedges and Olkin (H&O) tradition”
(many other researchers have contributed to this tradition; e.g., Borenstein
et al. [2009], Hedges and Olkin [1985], Hedges and Vevea [1998],
Raudenbush [1994]).
• The meta-analytic approach in the “psychometric tradition” or “Hunter and
Schmidt (H&S) tradition” (which is based primarily on the work of Hunter
and Schmidt [1990, 2004] and Schmidt and Hunter [2015]; other researchers
have contributed to this tradition).
9
Resources
Resources
• At http://tinyurl.com/meta-analysisJuly, these slides are available.
• There is also a full meta-analysis course with extensive lecture notes, data
sets, and free software.
• You also have access to a 3o day license for Comprehensive Meta-analysis
software.
10
Intelligence and brain volume
Demonstration of meta-analysis
• Intelligence and brain volume is a topic of historical interest.
• In 1981, Stephen Gould published his book The Mismeasure of Man in
which he asserted that brain volume was unrelated to intelligence.
• Although evidence accumulated contrary to his assertions, he did not
acknowledge the evidence in subsequent updates of the book.
• McDaniel, M.A. (2005). Big-brained people are smarter: A meta-analysis of
the relationship between in vivo brain volume and intelligence,
Intelligence, 33, 337-346.
• I will update that analysis and in the process demonstrate some
approaches to meta-analysis.
11
Literature reviews
L i t er a t ur e r evi ew s , s y s t e ma t i c r ev i e w s , me t a - a naly s is, s t a g e s
of t h e r evi ew p r oces s
Literature reviews
Literature reviews
• The purpose of a literature review is to summarize a body of empirical
literature. The objective is to process many research findings to draw
conclusions.
• Literature reviews are typically classified as:
• Narrative
• Systematic
13
Literature reviews
Narrative reviews
• Narrative reviews:
• Usually written by experts in the field.
• Use informal and subjective methods to collect and interpret information.
• Generally provide a narrative summary of the research literature (hence, the
name “narrative review”).
14
Literature reviews
Narrative reviews
Problems with narrative reviews:
• Different experts may perform a review on the same question and come to
different conclusions.
• Sometimes due to the review of different sets of studies.
• Results and conclusions differ across narrative reviews.
• Even when same studies are reviewed, the process of integrating them is
subjective.
• Results and conclusions differ across narrative reviews.
• Narrative reviews become less efficient with more data.
• While a researcher can combine the results of a few studies in his or her
head, it becomes increasingly difficult to do so as the number of studies
increase.
15
Literature reviews
Narrative reviews
• Narrative reviews have trouble with variation in results across studies:
• If all studies give the same result, it would be easy to summarize the data in
a narrative review.
• However, a collection of studies usually do not give the same results.
• As the number of studies grow, they often examine different populations.
• The size of the relationship of interest may vary in different populations (i.e., there are moderator
effects).
• As the number of studies grow, they often use different measures of the same construct.
• The size of the relationship of interest may vary depending on the measure (i.e., the measure
functions as a moderator; see Pace & Brannick, 2010).
• The narrative reviewer, who has enough trouble summarizing studies when they are all
done with the same population, same measure, etc., has a much harder task when
moderators are present.
16
Literature reviews
Systematic reviews
• What is a systematic review?
• A summary of the research literature that uses:
• systematic and explicit methods to identify and select relevant studies.
• objective techniques to combine these studies and produce results.
17
Literature reviews
Systematic reviews
• A systematic review aims to be:
• Explicit (e.g. in its statement of objectives, materials and methods).
• Systematic (e.g. in its identification of literature).
• Transparent (e.g. in its criteria and decisions).
• Unbiased.
• Reproducible (e.g. in its methodology and conclusions).
• Reproducibility (and thus being explicit, systematic, transparent, and
unbiased) is the hallmark of systemic and meta-analytic reviews (Cooper &
Hedges, 2009; Egger et al. 2001).
18
Literature reviews
Systematic reviews
• A systematic review is structured and explicit regarding:
• Planning the review
• Formulating the question
• Comprehensively searching for relevant literature
• Inclusion or exclusion of studies
• Extraction of data from the included studies
• Synthesis of data (meta-analysis)
• Interpretation and reporting of results
19
Literature reviews
Systematic reviews
• Why is systematic reviewing necessary?
• There may be so much literature on a topic that it is impossible to make
sense of it using traditional narrative techniques.
• Research often produces results that appear to be contradictory, even when
they are actually consistent.
• Research often produces results that are actually contradictory due to
moderator variables (populations, treatments, measures, settings,
methods, designs), and we need to be able to understand how these
influence the observed effects.
20
Literature reviews
Systematic reviews
• Why is systematic reviewing necessary?
• Among the earliest meta-analyses were synthesis of:
• 833 tests of the effectiveness of psychotherapy (Smith & Glass, 1977).
• 345 studies of the effects of interpersonal expectations on behavior (Rosenthal & Rubin,
1978).
• 725 estimates of the relation between class size and academic achievement (Glass &
Smith, 1979).
• 866 comparisons of the differential validity of employment tests for Black and White
workers (Hunter, Schmidt & Hunter, 1979).
21
Literature reviews
Systematic reviews
• Other reasons why systematic reviews are better?
• In addition to being more objective and more replicable, they:
• … are more efficient with more information.
• … can deal with variation due to moderators.
22
Literature reviews
Summary: Narrative vs. systematic reviews
• Narrative reviews:
• Influenced by authors’ point of view (subjectivity, bias).
• Author does not need to justify (or even state) the criteria for inclusion.
• Search for data does not need to be systematic and/or comprehensive.
• Methods not usually specified.
• Narrative summary or vote count.
• Cannot replicate the review.
23
Literature reviews
Summary: Narrative vs. systematic reviews
• Systematic reviews:
• Scientific approach (models itself on primary research).
• Inclusion (exclusion) criteria determined a priori.
• Comprehensive search for relevant information.
• Explicit methods of data extraction and coding.
• Meta-analysis is generally used to combine and synthesize data.
• Replicable.
24
Literature reviews
Book recommendations (recent books)
Borenstein, M., Hedges, L. V., Higgins, J. P. T., &
Rothstein, H. R. (2009). Introduction to metaanalysis. West Sussex, UK: Wiley.
Lipsey, M. W. & Wilson, D.B. (2001). Practical metaanalysis. Thousand Oaks, CA: Sage.
Schmidt, F. L. & Hunter, J. E. (2015). Methods of metaanalysis: Correcting error and bias in research
findings. 3rd edition. Thousand Oaks, CA: Sage.
25
Literature reviews
Book recommendations (recent books)
Cooper, H., Hedges, L. V., & Valentine, J. C. (Eds.).
(2009). The handbook of research synthesis and
meta-analysis (2nd ed.). New York, NY: Russell Sage
Foundation.
Cooper, H. (2010). Research synthesis and metaanalysis: A step-by-step approach (4th ed.).
Thousand Oaks, CA: Sage
Higgins, J. P. T., & Green, S. (Eds.). (2011). Cochrane
handbook for systematic reviews of interventions;
Version 5.1.0 [updated March 2011]: The Cochrane
Collaboration. Available from www.cochranehandbook.org.
26
Literature reviews
Some article recommendations
Berman, N. G., & Parker, R. A. (2002). Meta-analysis: Neither quick nor easy. BMC Medical Research Methodology, 2, 1-9.
doi: 10.1186/1471-2288-2-12
Chalmers, I., Hedges, L. V., & Cooper, H. (2002). A brief history of research synthesis. Evaluation & the Health
Professions, 25, 12-37. doi: 10.1177/0163278702025001003
Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5, 3-8. doi:
10.3102/0013189X005010003
Hedges, L. V. (1992). Meta-analysis. Journal of Educational Statistics, 17, 279-296. doi: 10.2307/1165125
Hedges, L. V., & Vevea, J. L. (1998). Fixed- and random-effects models in meta-analysis. Psychological Methods, 3, 486504. doi: 10.1037/1082-989x.3.4.486
Kepes, S., McDaniel, M., Brannick, M., & Banks, G. (2013). Meta-analytic reviews in the organizational sciences: Two
meta-analytic schools on the way to MARS (the Meta-analytic Reporting Standards). Journal of Business and
Psychology, 28, 123-143. doi: 10.1007/s10869-013-9300-2
Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization.
Journal of Applied Psychology, 62, 529-540. doi: 10.1037/0021-9010.62.5.529
Schmidt, F. L., & Hunter, J. E. (2003). History, development, evolution, and impact of validity generalization and metaanalysis methods, 1975-2001. In K. R. Murphy (Ed.), Validity generalization: A critical review. (pp. 31-65). Mahwah,
NJ: Lawrence Erlbaum.
Schmidt, F. L., Law, K., Hunter, J. E., Rothstein, H. R., Pearlman, K., & McDaniel, M. (1993). Refinements in validity
generalization methods: Implications for the situational specificity hypothesis. Journal of Applied Psychology, 78, 312. doi: 10.1037/0021-9010.78.1.3
27
Sampling error
Demonstration of CMA
Demonstration of CMA
• Poker chip data
29
Sampling error
Introduction to sampling error
• Poker chip exercise review questions:
• What is the population?
• What is the sample?
• What is the population parameter?
• What is the sample statistic?
• What is the sampling distribution?
• What is the standard error?
• Why does the sample statistic sometimes differ from the population
parameter?
• What happens when one increases the size of the sample?
30
Sampling error
Random sampling error
• Sampling error is random error.
• It is somewhat predictable, nevertheless.
• Small samples are more likely to be unrepresentative of the population than
large samples.
31
Sampling error
Random sampling error
• The relationship between sampling error and sample size is asymptotic.
• Increasing sample size results in decreasing random sampling error.
Sampling error
• As sample size increases, one gets diminishing returns in the reduction of
random sampling error.
Sample Size
32
Sampling error
Random sampling error
• If the true proportion in a population is 50% *
If sample size is …
… sampling error is
10
+/- 31.0%
50
+/- 13.9%
100
+/- 9.8%
200
+/- 6.9%
1,000
+/- 3.1%
3,000
+/- 1.8%
• * As the proportion deviates from .50, sampling errors get smaller.
33
Sampling error
Random sampling error
• Sampling error expressed as confidence intervals.
• Correlation of .20 with varying sample sizes.
34
Sampling error
Random sampling error
• In addition to sample size, sampling error also is a function of the
magnitude of the population value.
• One can estimate very large values with small samples.
• If the relationship between two variables is perfect, one does not need to
collect too many cases to find this out.
35
Sampling error
Random sampling error
• Examples for sampling error = f(magnitude of ES)
• If 99% of people believe that drinking poison is a bad idea, one does not
need to poll very many people to get an accurate estimate.
• If only 1% of people believe that “earth worm” flavored ice cream is delicious
and good to eat, one does not need to poll very many people to get an
accurate estimate.
36
Sampling error
Random sampling error
• Because sampling error is random …
• … sometimes the sampling error will cause the sample to underestimate the
population value.
• … sometimes the sampling error will cause the sample to overestimate the
population value.
• Unbiased error.
37
Sampling error
Random sampling error and meta-analysis
• What does this have to do with meta-analysis?
• Meta-analysis is based on sampling error theory.
• Each individual study represents one sample from a given population and
each study sample is likely to differ from the population due to sampling
error.
38
Sampling error
Random sampling error and meta-analysis
• Because …
• … meta-analysis combines the results from many different studies, and
• … sampling error across studies tends to cancel out,
• sampling errors tend to form a normal distribution, and all of the sampling
errors in one direction will be balanced by the sampling errors in the other
direction.
• The meta-analytic mean gives a close approximation to the population
value.
• Sampling error adds error variance to the distribution of observed effect
sizes. On average, the variance of the observed distribution will
substantially overestimate the variance of the population distribution.
39
Effect Sizes
Effect sizes
Effect sizes
• Study results are usually represented quantitatively as effect sizes.
• Effect sizes are measures of direction and strength (magnitude) of a
relation.
• Effect sizes are independent of sample size.
• Effect sizes have the advantage of being comparable (to mean the same
thing) across all studies in a meta-analysis.
41
Effect sizes
Effect sizes
• What are not effect sizes (examples):
• t-test value
• F-test value
• Chi-square value
• Why not?
• Significance test = f(effect size magnitude; sample size).
• Confound effect size and sample size.
42
Effect sizes
Effect sizes
• For inclusion in a meta-analysis, a study needs to provide an effect size
(ES) measure, and a standard error for that ES.
• OR information that you can convert into an effect size and a standard error,
for the Hedges and Olkin tradition analyses.
• OR, for the Hunter and Schmidt tradition, a correlation and sample size (and
reliability estimates, standard deviations, etc. for psychometric metaanalysis).
43
Effect sizes
Effect sizes
• Common effect sizes in intelligence research are:
• Correlation coefficients
• Intelligence correlated with X
• Standardized mean differences
• Mean intelligence differences by some grouping variable (e.g., demographic differences)
• Any measure of magnitude for which a standard error can be calculated
would be an effect size on which a meta-analysis can be conducted:
• Heritability estimates, means, odds ratios, relative risk, risk ratios
44
Effect sizes
Standardized mean difference
• Represents a standardized group contrast on an inherently continuous
measure.
• Uses the pooled standard deviation (some situations use control group
standard deviation).
• Commonly called “d.”
• Hedges’ “g” is the bias-corrected version of this effect size measure.
𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒
𝑑=
𝑆𝐷𝑤𝑖𝑡ℎ𝑖𝑛
𝑆𝐸𝑑 =
2
𝑑
1
1
+
𝑁𝐺1
𝑁𝐺2 + 2 ∗ (𝑁𝐺1 + 𝑁𝐺2 )
45
Effect sizes
Standardized mean difference
• Calculate d and SEd
Mean
SD
N
Treatment
110
20
50
Control
100
20
50
• The difference between the treatment and control means is 10.
• The within group SD is 20.
• Sample size for each group is 50.
46
Effect sizes
Standardized mean difference
10
𝑑=
= .5
20
𝑆𝐸𝑑 =
2
.5
1
1
+
50
50 + 2 ∗ (50 + 50) = .203
47
Effect sizes
Other effect sizes
• Standardized mean difference (d) in matched designs
• Standardized gain score
• Gain or change between two measurement points on the same variable.
• Pre-post correlation needs to be taken into account in computing d or g.
48
Effect sizes
Converting among effect sizes
• David Wilson’s ES calculator:
• http://cebcp.org/practical-meta-analysis-effect-size-calculator/
• http://mason.gmu.edu/~dwilsonb/ma.html
• Also an Excel file from Dr. Wilson is available
• James DeCoster ES calculator
• Excel file
• Most commercial programs that do meta-analysis also have an ES
calculation component.
• Vary in how flexible and comprehensive they are.
49
Effect sizes
Book recommendations
Grissom, R. J. & Kim, J. J. (2012). Effect sizes for
research: univariate and multivariate
applications (2nd ed.). New York, NY: Routledge.
Ellis, P. D. (2010). The essential guide to effect sizes:
statistical power, meta-analysis, and the
interpretation of research results. Cambridge,
UK: Cambridge University Press.
50
Two camps of metaanalysis
Two camps of meta-analysis
Two camps of meta-analysis
• Hedges and Olkin tradition:
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. New York, NY:
Academic Press.
Hedges, L. V., & Vevea, J. L. (1998). Fixed- and random-effects models in meta-analysis.
Psychological Methods, 3, 486-504. doi: 10.1037/1082-989x.3.4.486
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to metaanalysis. West Sussex, UK: Wiley.
• Hunter and Schmidt tradition (also known as psychometric metaanalysis):
Hunter, J. E. & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in
research findings. 1st edition. Thousand Oaks, CA: Sage.
Hunter, J. E. & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in
research findings. 2nd edition. Thousand Oaks, CA: Sage.
Schmidt, F. L. & Hunter, J. E. (2015). Methods of meta-analysis: Correcting error and bias in
research findings. 3rd edition. Thousand Oaks, CA: Sage.
52
Two camps of meta-analysis
Two camps of meta-analysis
Hedges and Olkin tradition
Market Share
• Has most of market share
Hunter and Schmidt tradition
(psychometric meta-analysis)
• Most likely to be found in I/O
psychology and management
(OB and HR)
• Less well elaborated
Statistical
• Well elaborated
underpinnings
Recognition of
• Usually only considers
• Emphasizes need to correct
statistical artifacts
sampling error
for many statistical artifacts
Significance testing • Incorporates statistical
• Shuns statistical significance
significance testing
testing
Sensitivity analyses • Includes many types,
• Beginning to use sensitivity
emphasizes graphical displays
analyses
53
Two camps of meta-analysis
Two camps of meta-analysis
• Both camps of meta-analysis seek to know what the population mean is.
• Hedges and Olkin practitioners will usually define the population mean as
some average of the observed effect sizes.
• Hunter and Schmidt practitioners will define the population mean as some
average of observed effect sizes that has been corrected for statistical
artifacts, such as measurement error and possibly range restriction (or range
enhancement).
• These corrections are possible within the Hedges and Olkin framework, but almost
nobody ever does them (not necessarily easy to do).
54
Two camps of meta-analysis
Two camps of meta-analysis
• Hedges and Olkin meta-analysis is an analysis of effect sizes at the
indicator level (the observed effect size is judged to be a non-biased
estimate of the population effect size).
• Hunter and Schmidt/psychometric meta-analysis is an analysis of effect
sizes at the latent level (the observed effect size is judged to be a biased
estimate of the population effect size).
55
Two camps of meta-analysis
Two camps of meta-analysis
• Both camps of meta-analysis …
• … want to understand what causes variance across studies in the magnitude
of the effect sizes.
• … recognize that the variance across studies may be a function of both
“statistical noise” and systematic sources.
• … recognize random sampling error as one source of non-systematic
“statistical noise” (i.e., statistical artifact).
• … recognize one class of the systematic variance to be moderators.
56
Two camps of meta-analysis
Two camps of meta-analysis
• Hunter and Schmidt (i.e., psychometric meta-analysis) recognize that
observed effect sizes may differ across studies due to differences across
studies in measurement error and range restriction (or enhancement) and
other sources of systematic variance (i.e., other statistical artifacts).
57
Two camps of meta-analysis
Two camps of meta-analysis
• Hedges and Olkin camp members consider two possible sources of
variance:
Moderator
Sampling error
𝝈𝟐𝑶𝒃𝒔𝒆𝒓𝒗𝒆𝒅 = 𝝈𝟐𝑴𝒐𝒅𝒆𝒓𝒂𝒕𝒐𝒓 + 𝝈𝟐𝑺𝒂𝒎𝒑𝒍𝒊𝒏𝒈 𝒆𝒓𝒓𝒐𝒓
58
Two camps of meta-analysis
Two camps of meta-analysis
• Hunter and Schmidt consider multiple sources of variance:
Sampling error
Other artifacts
Moderator
𝝈𝟐𝑶𝒃𝒔𝒆𝒓𝒗𝒆𝒅 = 𝝈𝟐𝑴𝒐𝒅𝒆𝒓𝒂𝒕𝒐𝒓 + 𝝈𝟐𝑺𝒂𝒎𝒑𝒍𝒊𝒏𝒈 𝒆𝒓𝒓𝒐𝒓 + 𝝈𝟐𝑶𝒕𝒉𝒆𝒓 𝒂𝒓𝒕𝒊𝒇𝒂𝒄𝒕𝒖𝒂𝒍 𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆
59
Two camps of meta-analysis
Two camps of meta-analysis
• To the extent that there is artifactual variance, those in the Hedges and
Olkin camp will lump it in with the moderator variance.
• This would be in error.
• Although the Hedges and Olkin camp does not use the word “artifactual,”
they would agree that sampling error is a source of uninteresting noise in
the distribution of effect sizes.
• Hunter and Schmidt call uninteresting noise in the distribution of effect
sizes “artifactual variance,” and they refer to the sources of this noise as
“artifacts.”
60
Two camps of meta-analysis
Book recommendations
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H.
R. (2009). Introduction to meta-analysis. West Sussex,
UK: Wiley.
Schmidt, F. L. & Hunter, J. E. (2015). Methods of metaanalysis: Correcting error and bias in research findings.
3rd edition. Thousand Oaks, CA: Sage.
Schulze, R. (2004). Meta-analysis: A comparison of
approaches. Cambridge, MA: Hogrefe & Huber.
For a good summary of the book, see: Schulze, R. (2007). Current methods for metaanalysis: Approaches, issues, and developments. Zeitschrift für Psychologie/
Journal of Psychology, 215, 90-103. doi: 10.1027/0044-3409.215.2.90
61
Effect size weighting
Effect size weighting
Effect size weighting
• Meta-analyses typically weight effect sizes to give more weight to effect
sizes from samples with greater precision.
• Precision is a function of the standard error of the effect size (and thus,
indirectly, sample size).
63
Effect size weighting
Effect size weighting; H&O tradition
• In meta-analyses in the Hedges and Olkin tradition, effect sizes are
weighted in one of two ways, depending on the estimation model.
• Fixed-effects model:
• Effect sizes are weighted by the inverse of the sampling error variance (i.e.,
precision) when all of the variance in the distribution of effect sizes can be
attributed to sampling error (fixed-effects model).
• A sample’s weight (𝑊𝑖 ) is defined as the inverse of a sample’s variance (𝑉𝑖 ), which is its
squared standard error (𝑆𝐸𝑖2 ):
• 𝑊𝑖 =
1
𝑉𝑖
, where 𝑉𝑖 = 𝑆𝐸𝑖2
64
Effect size weighting
Effect size weighting; H&O tradition
• Random-effects model:
• When sampling error does not explain all of the variance in the observed
effect sizes, the weight is altered (to account for the fact that sampling error
does not explain all of the variance):
• 𝑊𝑖 =
1
𝑉𝑖 +2
,
where Vi is the sampling variance (1/(Ni-3) and 2 (tau squared) is the between-sample
variability (i.e., the variability that cannot be attributed to sampling error).
• Although 𝑉𝑖 is different for each effect size, the 2 (tau squared) portion of
the effect size weight is constant for all effect sizes.
• The 2 component of the weight tends to make the weights more similar
than for the fixed effects case.
65
Effect size weighting
Effect size weighting; H&S tradition
• In the H&S tradition (i.e., the psychometric tradition), effect sizes are
weighted by sample size when the magnitude of the observed correlations
are not corrected for biases (e.g., statistical artifacts such as measurement
error).
• 𝑊𝑖 = 𝑁𝑖
• Large sample studies have smaller standard errors and thus greater
precision.
66
Effect size weighting
Effect size weighting; H&S tradition
• In the psychometric tradition, effect sizes can be corrected for biases (i.e.,
statistical artifacts other than sampling error) to estimate the population
effect size in the absence of such biases.
• These corrected correlations have a larger standard error and thus lower
precision.
• 𝑊𝑖 = 𝑁𝑖 ∗ 𝐴𝑖 2 ,
where 𝐴𝑖 is a sample’s compound attenuation factor, which accounts for
sample-specific bias due to measurement error (i.e., unreliability) and range
restriction.
67
Effect size weighting
Effect size weighting; H&S tradition
• As a result, the weight of a sample (i.e., its effect size) is not only
dependent on its size, but also on the degree of statistical error (e.g.,
measurement error and error due to range restriction) in the variables of
interest (e.g., samples with measures that have less bias due to statistical
error receive more weight than samples with more error).
• Meta-analysis in the H&S tradition (i.e., psychometric meta-analysis) is
considered a random-effects model.
• Its weights are the same regardless of whether sampling error accounts for
all observed variance or not.
68
Stages of a systematic
(meta-analytic) review
Stages of a systematic (meta-analytic) review
Systematic review: Just statistics?
• Meta-analytic reviews, systematic reviews, and meta-analysis are often viewed
and taught as a set of statistical procedures.
• Meta-analysis (actually, a systematic or meta-analytic review) is best viewed as a
multi-stage process with the statistical analysis (the meta-analysis) as only one of
the stages.
• Remember (Kepes et al., 2013, p. 124):
• Meta-analysis is a quantitative method used to combine the quantitative outcomes
(effect sizes) of primary research studies. Meta-analysis is the statistical or data
analytic part of a systematic review of a research topic.
• A systematic review follows a specific, replicable protocol to collect and evaluate
scientific evidence with the primary objective of producing answers to research
questions that cannot be addressed adequately by single studies (Cooper, 1998;
Cooper & Hedges, 2009).
• However, meta-analysis, as a statistical technique, can be used to analyze data
without the systematic review process (Cooper & Hedges, 2009).
70
Stages of a systematic (meta-analytic) review
Stages of a systematic review
• Most of the work involved in conducting a research synthesis is not spent
in statistical analysis.
• The scientific contribution of the synthesis is dependent on all stages of
the meta-analysis and not just the statistical analysis stage.
• Some systematic reviews do not do (i.e., include) a meta-analysis.
71
Stages of a systematic (meta-analytic) review
Stages of a systematic review
• What a systematic review should contain:
• A clearly defined, explicit question.
• A comprehensive and systematic search for studies.
• An explicit, reproducible strategy for screening and including studies
(inclusion/exclusion criteria).
• Explicit, reproducible data extraction (coding).
• Appropriate analysis and reporting of results.
• Interpretation supported by data.
• Implications for future research, and if relevant, for policy or practice.
72
Stages of a systematic (meta-analytic) review
Stages of a systematic review
• Cooper’s (1998) five stage model:
• Problem formulation.
• Data collection.
• Data evaluation.
• Data analysis and interpretation.
• Presentation of results.
Cooper, H. (1998). Synthesizing research: A guide for literature reviews (3rd ed.). Thousand Oaks, CA: Sage.
73
Stages of a systematic (meta-analytic) review
Stage 1: Problem formulation
• A clearly defined explicit question will provide guidance regarding how to:
• Collect studies (with their samples).
• Check which studies (i.e., samples) should be included.
• Conduct the analyses.
• Interpret the results.
74
Stages of a systematic (meta-analytic) review
Stage 1: Problem formulation
• Does it pay to do this meta-analysis?
• Is it an important or “interesting” problem?
• Does one have the resources (literature search, retrieval, quality
assessment, coding, etc.)?
• Has someone already done it?
• An update might (not!) be a large enough contribution to warrant publication.
• Major constraint: Enough primary research on a topic must exist before a
synthesis can be conducted.
• How much primary research is enough?
75
Stages of a systematic (meta-analytic) review
Stage 1: Problem formulation
• How much primary research is enough to conduct a systematic review?
• From a statistical point of view, one only needs two effect sizes.
• The number of effect sizes (i.e., samples) one has in the meta-analytic
distribution and the meta-analytic sample size (the combined sample size of
all primary samples in the meta-analytic distribution) all help determine the
contribution of the meta-analysis.
• Is a systematic review, including the meta-analysis, of two effect sizes a
useful scientific contribution?
76
Stages of a systematic (meta-analytic) review
Stage 1: Problem formulation
• Systematic review protocol:
• A protocol is a plan for conducting a systematic/meta-analytic review.
• Why one should write a protocol?
• Forces one to read and understand the background.
• Makes one formulate a focused question.
• Makes one plan the information retrieval strategy.
• Makes one think through and describe inclusion/exclusion criteria clearly.
• Makes one think about the data to be collected and the methods used to
analyze them.
77
Stages of a systematic (meta-analytic) review
Stage 1: Problem formulation
• Good outline for a protocol:
• Section 4.5 in Higgins, J. P. T., & Green, S. (Eds.). (2011). Cochrane handbook
for systematic reviews of interventions; Version 5.1.0 [updated March 2011]:
The Cochrane Collaboration. Available from www.cochrane-handbook.org.
78
Stages of a systematic (meta-analytic) review
Stages 2 & 3: Data collection and evaluation
• Data collection and data evaluation includes:
• Search the literature.
• Retrieve information (i.e., data).
• Evaluate information based on inclusion criteria.
• Appraise the quality (i.e., features) of each study and its sample(s).
• Extract data and code features of included samples of studies.
• Document all activities.
• Remember, reproducibility, which requires transparency, etc., is the hallmark of a
systematic review!
79
Stages of a systematic (meta-analytic) review
Stages 2 & 3: Data collection and evaluation
• Documentation in a 2007 JAP article:
• We conducted a search of the OCB literature by using a number of online
databases (e.g., Web of Science, PsycINFO) as well as by examining the
reference lists of previous reviews.
• Is this explicit, transparent, replicable?
Hoffman, B. J., Blair, C. A., Meriac, J. P., & Woehr, D. J. (2007). Expanding the criterion domain? A quantitative review of the OCB literature. Journal of
Applied Psychology, 92, 555-566. doi: 10.1037/0021-9010.92.2.555
80
Stages of a systematic (meta-analytic) review
Stages 2 & 3: Data collection and evaluation
• Documentation in a 2006 JAP article:
• We began with an automated search of PsycINFO (Psychological Abstracts)
and ABI/Inform using the key words compensation satisfaction, pay
satisfaction, compensation equity, pay equity, compensation fairness, and pay
fairness. We also searched manually 12 journals for the years 1960 through
2003: Academy of Management Journal, […] and Personnel Psychology. We
chose the year 1960 to begin this search because the first formal attempts to
measure pay satisfaction (e.g., the JDI; Smith et al., 1969) and the first theories
of pay satisfaction (e.g., Lawler, 1971) were developed in the 1960s and early
1970s, and we were unaware of any empirical work on pay level satisfaction
before that time. We also examined the empirical studies that included pay
level satisfaction for references to other publications or articles that might
have included pay level satisfaction.
• Is this explicit, transparent, replicable?
Williams, M. L., McDaniel, M. A., & Nguyen, N. T. (2006). A meta-analysis of the antecedents and consequences of pay level satisfaction.
Journal of Applied Psychology, 91, 392-413. doi: 10.1037/0021-9010.91.2.392
81
Stages of a systematic (meta-analytic) review
Stages 2 & 3: Data collection and evaluation
• Data collection: How to search:
• Search comprehensively.
• All domains, no language restriction, unpublished and published literature, up-to-date.
• Document the search (replicability).
• Good overview article:
Rothstein, H. R. (2012). Accessing relevant literature. In H. M. Cooper (Ed.),
APA handbook of research methods in psychology (Vol. 1: Foundations,
planning, measures, and psychometrics, pp. 133-144). Washington, DC:
American Psychological Association.
82
Stages of a systematic (meta-analytic) review
Stages 2 & 3: Data collection and evaluation
• Documenting the search:
• Search terms and the number of hits returned from PsycINFO:
Terrizzi, J.A., Shook, N.J., & McDaniel, M.A. (2013). The behavioral immune system and social conservatism: A
meta-analysis. Evolution and Human Behavior, 34, 99-108.
83
Stages of a systematic (meta-analytic) review
Stages 2 & 3: Data collection and evaluation
• Documentation of the data collection and
evaluation (i.e., winnowing) process
• Flow chart of samples identified,
included, and excluded (example).
Kepes, S., McDaniel, M. A., Brannick, M. T., & Banks, G. C. (2013). Metaanalytic reviews in the organizational sciences: Two meta-analytic schools
on the way to MARS (the Meta-Analytic Reporting Standards). Journal of
Business and Psychology, 28, 123-143. doi: 10.1007/s10869-013-9300-2 (see
the supplementary materials).
84
Stages of a systematic (meta-analytic) review
Stages 2 & 3: Data collection and evaluation
• Coding form (1):
• One needs a coding form (typically digital, maybe in paper form) to record
information from the study.
• In addition to the study citation and effect size, one typically records:
• Sample size.
• Information to correct for statistical artifacts (psychometric meta-analysis).
• Variables that might explain variance across studies.
• No form is appropriate for all reviews.
• Forms will need to be adapted to make them suitable to record the
information required for your review.
85
Stages of a systematic (meta-analytic) review
Stages 2 & 3: Data collection and evaluation
• Coding form (2):
• ID Information
• Study ID; Coder ID; Date of completion of this form; Title of study; Source; Authors; Date
of publication; Etc.
• Study characteristics:
• Language of study; Type of study (published article, technical report, conference paper,
abstract, unpublished report, etc.); Etc.
• Sample-, context-, and methods-related characteristics:
• Sample-related characteristics: Sample size (number of individuals and “units” in the
sample); Sample type (e.g., children, students, employees, etc.); Demographics (e.g.,
gender, age, etc.);
• Context-related characteristics: Industry type; Geographical region; Etc.
• Methods-related characteristics: Unit of analysis; Design/data structure; Type of measure
(and source of data); Hypothesized/unhypothzided relation; Etc.
86
Stages of a systematic (meta-analytic) review
Stage 4: Data analysis and interpretation
• Check the data!!!!
• Before you get to the data analysis, assume there are errors in your data.
• Run descriptive statistics and frequency analyses for a multitude of
variables.
• Sort your data by the magnitude of the effect sizes. Check the coding of the
several largest and several smallest effect sizes.
• Were the effect sizes coded wrong or are they outliers?
• Sort your data by sample size. Audit the coding of the samples with the
largest sample sizes.
• The large sample size studies will have a large impact on your results because the effect
sizes from the large sample size studies will receive larger weights.
• Run outlier analyses
87
Stages of a systematic (meta-analytic) review
Stage 5: Presentation of results
• In general, be very transparent and use tables and figures, especially
graphs, to communicate your findings.
• See the supplemental materials from Kepes, S., McDaniel, M. A., Brannick,
M. T., & Banks, G. C. (2013). Meta-analytic reviews in the organizational
sciences: Two meta-analytic schools on the way to MARS (the Meta-Analytic
Reporting Standards). Journal of Business and Psychology, 28, 123-143. doi:
10.1007/s10869-013-9300-2
88
Fixed-effects vs.
random-effects
models
Fixed-effects vs. random-effects models
Fixed-effects vs. random-effects models
• Two statistical estimation models:
• Fixed-effects model.
• Random-effects model.
90
Fixed-effects vs. random-effects models
Fixed-effects vs. random-effects models
• Key assumptions of the fixed-effects meta-analytic model:
• There is one “true” effect size which underlies all the samples in the analysis.
• All differences in observed effects are due to sampling error.
• Key assumptions of the random-effects meta-analytic model:
• No such assumptions are made under the random-effects model.
• The “true” effect is allowed to vary from sample to sample (or study to
study).
• Samples differ in the mixes of participants, in the implementations of interventions, and
in other ways that might influence the ES.
• There may be different effect sizes in different samples (or studies).
• Hence, differences in observed effects are due to sample (study, and other)
characteristics, not just sampling error.
91
Fixed-effects vs. random-effects models
Fixed-effects vs. random-effects models
• Random-effects meta-analytic model:
• The effect size will vary from one study to the next for two reasons:
• The first is random error within studies, as in the fixed effect model.
• The second is true variation in effect size from one study to the next.
• If we had an infinite number of samples (meeting our inclusion criteria), the
“true” effect size would be observed to be distributed about some mean.
• The samples that actually were collected and are in the meta-analytic
distribution are assumed to represent a random sample of these studies
with their unique effect sizes.
• Hence the term “random-effects.”
92
Fixed-effects vs. random-effects models
Fixed-effects vs. random-effects models
• We distinguish between a “true” effect size and an “observed” effect size.
• “True” effect size:
• A sample’s true effect size is the effect size in the underlying population, and
is the effect size that we would observe if the sample would be infinitely
large (and, thus, has no sampling error).
• Psychometric meta-analysis will offer another definition of “true” effect
size.
• Observed effect size:
• A sample’s observed effect size is the effect size that is actually observed.
93
Fixed-effects vs. random-effects models
Fixed-effects vs. random-effects models
• Under the fixed-effect model we assume that all samples in the metaanalysis share a common (“true”) effect size.
• Best called a “common effects model” (and not fixed-effects model).
• The observed effect size varies from one sample to the next only because
of sampling error.
94
Fixed-effects vs. random-effects models
Fixed-effects vs. random-effects models
• Graphs and plots:
• In many graphs and plots, for each sample …
• … a circle represents the “true” effect size.
• … a square represents the observed effect size.
95
Fixed-effects vs. random-effects models
Fixed-effects vs. random-effects models
• Fixed-effects model:
96
Fixed-effects vs. random-effects models
Fixed-effects vs. random-effects models
• Fixed-effects model:
97
Fixed-effects vs. random-effects models
Fixed-effects vs. random-effects models
• Random-effects model:
98
Sources of artifactual
variance
Sources of artifactual variance
Sources of artifactual variance
• Sources of artifactual variance in a set of samples:
• Introduction to psychometric meta-analysis.
• Study and sample artifacts.
• In this presentation, we are using correlations as effect sizes. The
reliability corrections are also applicable to standardized mean differences.
100
Sources of artifactual variance
Psychometric meta-analysis
• Sampling error is a major source of error in estimating population
parameters, and population level relations.
• A variety of methodological problems (artifacts) such as error of
measurement and range restriction affect real studies and distort their
results (i.e. what we observe); this obscures our ability to know the “true”
relations that exists in the population.
101
Sources of artifactual variance
Psychometric meta-analysis
• However, we could …
• … combine data from individual studies into one “super” (i.e., big/large)
study:
• We could reduce the impact of sampling error on our results.
• … correct for the influence of artifacts:
• We could reduce their distorting effects on study results.
• The result would be a better estimate of population level (“true”)
parameters and relations.
102
Sources of artifactual variance
Psychometric meta-analysis
• Psychometric meta-analysis:
• Calculates population parameter estimates (ρ) by correcting for sampling
error variance and other artifacts.
• Another way of saying this is that it quantifies and removes the impact of
(some) methodological deficiencies on research results.
103
Sources of artifactual variance
Sources of artifactual error
• Results may vary from study to study due to substantive reasons
(moderators).
• Understanding these substantive differences is often the primary goal of a
meta-analysis.
• Variance related to moderators is substantive variance, not artifactual
variance.
• Results also vary from study to study due to reasons that we usually don’t
find interesting.
• Artifactual variance is variance that we don’t find interesting.
104
Sources of artifactual variance
Sources of artifactual error
• For example, results vary from study to study or sample to sample due to
random sampling error.
• Random sampling error is a source of artifactual error.
• Other sources of artifactual error are differences across studies in:
• Measurement error.
• Dichotomization of continuous variables.
• Range variation (range restriction and enhancement).
• Imperfect construct validity.
• Transcriptional errors.
105
Sources of artifactual variance
Measurement error
• Measures have less than perfect reliability.
• Measurement error causes an observed effect size to underestimate the
population effect size.
• For correlations, both variables will (are highly likely to) have
measurement error.
• For standardized mean differences and proportions, the dependent
variable will have measurement error.
• Measurement error always works in the direction of causing the observed
effect size to underestimate the population effect size.
• One way of understanding why this occurs is through classical test theory.
106
Sources of artifactual variance
Measurement error
• Classical test (true score) theory holds that an observed score is a function
of a true score and an error score:
𝐎 =𝐓 + 𝐄
• Thus, the variance of a distribution of observed scores is the sum of the
true score variance and the error score variance:
𝟐
𝝈𝑶
=
𝟐
𝝈𝑻
𝟐
+ 𝝈𝑬
107
Sources of artifactual variance
Measurement error
• The error is random and has a mean of zero. For a distribution of observed
scores, this error does not alter the mean but adds variance to the
distribution.
• Thus, for a given measure, its distribution of true scores has less variance
than its distribution of observed scores.
• Recall:
𝜎𝑂2 = 𝜎𝑇2 + 𝜎𝐸2
… and, thus, 𝜎𝑇2 = 𝜎𝑂2 - 𝜎𝐸2
108
Sources of artifactual variance
Measurement error
• A correlation coefficient measures the degree of linear covariation
between two variables.
• The correlation between two variables measured without error variance is
the population correlation (would happen in the absence of any other
statistical artifacts).
• Thus, in the absence of other statistical artifacts, the correlation between
two variables is the population correlation.
• Error variance is random. When correlating two observed variables, the
error variances of the two variables cannot covary.
• The error variance is “noise.”
• The scattergram of the observed variables is more circular than the
scattergram of the true score variables.
109
Sources of artifactual variance
Measurement error
• The correlation between two observed variables is less than the
correlation of the variables in true score form.
• Therefore, measurement error causes an observed correlation to
underestimate its population value.
110
Sources of artifactual variance
Measurement error
• One can estimate the population correlation from the observed
correlation, by dividing the observed correlation by the product of the
square roots of the reliabilities:
𝝆=
𝒓
𝒓𝒆𝒍𝑨 ∗ 𝒓𝒆𝒍𝑩
111
Sources of artifactual variance
Measurement error
• Measurement error also affects standardized mean differences (e.g., the d
effect size).
• The d effect size is the mean difference between groups expressed in a zscore metric:
𝒙𝑬 − 𝒙𝑪
𝒅=
𝝈
• The standard deviation in the denominator is a function of the standard
deviations in the two groups.
• As measurement error variance is added to true score variance, the
observed variance of the dependent variable increases.
112
Sources of artifactual variance
Measurement error
113
Sources of artifactual variance
Measurement error
• As the observed variance increases, so does the denominator of the d
effect size calculation.
• As the denominator increases, the d effect size decreases.
• Note that measurement error does not affect the mean of the variables (on
average), it only increases variance!
114
Sources of artifactual variance
Measurement error
• In sum, measurement error reduces the magnitude of the observed effect
size below its population value.
• For a distribution of observed effect sizes, with measurement error as the
sole operating artifact, the mean observed effect size will be an
underestimate of the mean population effect size.
115
Sources of artifactual variance
Range variation
• A correlation is influenced by the range of variation in the variables used in
calculating the correlation.
• At an extreme end, if a variable has a zero variance, the variable is actually a
constant and will correlate zero with all other variables.
• Likewise, variables with some variance, but not much, will have correlations
of small magnitude with other variables.
116
Sources of artifactual variance
Range variation
• Example:
• In the U.S., the primary tests used to screen applicants for college are the
SAT and the ACT.
• SAT (or ACT) scores are not used at some colleges to screen college
applicants.
• At such schools, SAT (or ACT) scores have a broad range and can correlate with other
variables.
• In contrast, other schools use the SAT (or ACT) in making admission
decisions and only admit those with extremely high scores.
• In these very selective schools, the SAT (or ACT) has very little variance, and cannot
correlate with anything.
117
Sources of artifactual variance
Range variation
• Range variation in variables may be a function of several factors:
• Direct selection on a predictor (only select the best).
• Indirect selection on a predictor (or predictors) (sample is selected using a
predictor that is correlated with the predictor of interest).
• Attrition (best get promoted; the worst get fired).
118
Sources of artifactual variance
Range variation
• Consider the correlation between intelligence and brain volume.
• Under U.S. law, participants in research must be volunteers.
• Volunteers are typically people with more free time, higher income, and
higher intelligence.
• Thus, research in intelligence tends to have range restriction on
intelligence.
• The range restriction causes the effect sizes (e.g., correlation coefficients)
to underestimate the population effect size.
119
Sources of artifactual variance
Range variation
• This range restriction in intelligence is best described as indirect range
restriction because it is due to the relationship between intelligence and
being a volunteer.
• If a study required participants to have an IQ score of 115 or higher, then
the range restriction is best described as direct range restriction. I suspect
that this is uncommon in intelligence research (and most research).
120
Sources of artifactual variance
Range variation
• The way to handle range variation is to define a reference population and
express all observed effect sizes in terms of the reference population.
• One can adjust an effect size to what it would be if the sample had the same
variance as the reference population.
• In personnel selection, the reference population is typically defined as the
applicant pool.
• One adjusts the correlation in the observed sample to that which would
have been obtained if the selection tool (e.g., the interview) had the same
variance as in the reference population.
• Most intelligence measures are designed to have a reference population
standard deviation of 15.
121
Sources of artifactual variance
Summary of study artifacts
• The degree of distortion in observed effect sizes due to study artifacts
varies across studies. These differences across studies result in artifactual
variance which cause the observed variance to be larger than the
population variance.
• Several of the study artifacts affect the magnitude of observed effect sizes
such that the mean observed effect size usually underestimates the mean
population effect size.
• Thus, most distributions of observed effect sizes overestimate the
variance of effect sizes in the population and underestimate the mean
population effect size.
122
Sources of artifactual variance
Additional recommended readings
Some good starting points for more specific readings:
Kepes, S., McDaniel, M., Brannick, M., & Banks, G. (2013). Meta-analytic reviews
in the organizational sciences: Two meta-analytic schools on the way to
MARS (the Meta-analytic Reporting Standards). Journal of Business and
Psychology, 28, 123-143. doi: 10.1007/s10869-013-9300-2
Schmidt, F. L. & Hunter, J. E. (2015). Methods of meta-analysis: Correcting error
and bias in research findings. 3rd edition. Thousand Oaks, CA: Sage.
Schmidt, F. L., & Hunter, J. E. (2003). History, development, evolution, and
impact of validity generalization and meta-analysis methods, 1975-2001. In
K. R. Murphy (Ed.), Validity generalization: A critical review. (pp. 31-65).
Mahwah, NJ: Lawrence Erlbaum.
123
Heterogeneity
Heterogeneity
Heterogeneity
• Homogeneity vs. heterogeneity:
• A set of effects sizes are called homogeneous if all of their variance can be
attributed to random sampling error.
• If the above is not true, the set of effect sizes is called heterogeneous.
• These two terms are primarily used in the Hedges and Olkin tradition.
• Because the Hedges and Olkin tradition seldom corrects for artifactual
variance other than sampling error, heterogeneity refers to any source of
non-sampling variance.
125
Heterogeneity
Heterogeneity
• Sources of heterogeneous variance addressed in the Hedges and Olkin
tradition:
• Moderators.
• Sources of heterogeneous variance address in the psychometric tradition:
• Moderators and differences across studies due to statistical artifacts (e.g.,
measurement error and range restriction).
126
Heterogeneity
Taxonomy of sources of heterogeneity
• Miss-estimation of random sampling error variance:
• Heterogeneity is defined as variance that cannot be attributed to random
sampling error variance.
• Random sampling error variance is estimated.
• It is an unbiased estimate in that when it is not accurate, it is biased too high or too low
with about equal frequency.
• If the sampling error variance is underestimated, it will appear that the data are more
heterogeneous than they actually are.
• Thus, apparent heterogeneity may be due to the miss-estimation of random
sampling error variance.
127
Heterogeneity
Taxonomy of sources of heterogeneity
• Systematic Variance:
• Moderators, mediators, and moderator-like variables.
• Non-substantive systematic variance (artifactual):
• Differences across studies in measurement error.
• Differences across studies in range restriction.
• Differences across studies in other statistical artifacts.
128
Heterogeneity
Taxonomy of sources of heterogeneity
• Study characteristics:
• Longitudinal vs. concurrent
• Within and between designs
• Source of the study
• Published article, conference paper, dissertation, technical report, otherwise unpublished
• Study language
• Study country (or geographic region)
• [...]
129
Heterogeneity
Taxonomy of sources of heterogeneity
• Study characteristics:
• [...]
• Study quality
• Journal quality
• Quality of study
• Confidence in inference
• Double blind random assignment
• Observational
• Purpose of the study
• Data of interest to meta-analyst was central focus of study vs. tangential focus
• Age and training
• Funding
• Who benefits? (follow the money)
130
Heterogeneity
Taxonomy of sources of heterogeneity
• Characteristics of the sample:
• Demographics
• Source of sample (community-based vs. students)
• Subject selection criteria
131
Heterogeneity
Taxonomy of sources of heterogeneity
• Characteristics of the intervention:
• Strength of the intervention
• Might be a function of length (two weeks vs. two years)
• Complexity, breadth, or likelihood of change in the outcome measure
• Not cursing vs. eloquence
• Variability of toxicity of Hookah substance A vs. Hookah substance B
• Building paper airplanes vs. weight loss
• Type of the intervention
•
•
•
•
a vs. b
Lecture vs. practice
Self-paced vs. not self-paced
Feedback or no feedback
132
Heterogeneity
Taxonomy of sources of heterogeneity
• Characteristics of measures:
• Measure A vs. measure B
•
•
•
•
•
•
•
Construct similarity
Global conscientiousness vs. facet conscientiousness
Brain weight vs. brain volume
Blood measure vs. air measure
Sensitivity of the measure
Measurement error
Time period for an event (3 month survival vs 6 month survival)
• Format of measure
• Structured vs. unstructured interview
• Observation vs. self-report
133
Heterogeneity
Taxonomy of sources of heterogeneity
• Allegiance of the author:
• Author is on X side of debate vs. Y side of the issue or debate
• Funding of the author
• Industry interests (e.g., pharmaceutical company or test vendor)
• Liability or benefit to the author’s institution or funding source
134
Sensitivity analysis
(part of the data analysis
step in a systematic review)
B a s ed o n Kep es , S . & Mc D a nie l, M. A . ( 20 13 , A ug us t ) .
Publ i ca t ion bi a s : Ca us es , det ect i on, a nd r emedi a t i on. PDW
p r e s en t ed a t t h e a n n ual me e t i n g o f t h e A c a de my o f
Ma n agemen t . O r l a ndo , F L .
Sensitivity analysis
Sensitivity analysis
• A sensitivity analysis examines the extent to which results and conclusions
are altered as a result of changes in the data or analysis approach
(Greenhouse & Iyengar, 2009).
• Sensitivity analyses often focus on decisions made in the coding or analysis
of the data.
• Sensitivity analyses address “what if” questions.
• If the conclusions do not change as a result of the sensitivity analysis, one
can state that the conclusions are robust and one can have greater
confidence in the conclusions.
Greenhouse, J. B., & Iyengar, S. (2009). Sensitivity analysis and diagnostics. In H. Cooper, L. V. Hedges & J. C. Valentine (Eds.), The handbook of
research synthesis and meta-analysis (2nd ed.). (pp. 417-433): New York, NY: Russell Sage Foundation.
136
Sensitivity analysis
Sensitivity analysis
• Sensitivity analyses are seldom conducted in meta-analyses in the social
and organizational sciences.
• Only 16% of meta-analyses conducted sensitivity analyses (Aguinis et al.,
2001)
• Because meta-analyses have a strong impact on literatures, sensitivity
analyses need to become much more common (and reported) in metaanalyses.
137
Sensitivity analysis
Robustness
• Meta-analyses tend to be influential papers (e.g., cited widely).
• Thus, the conclusions from the analyses should be robust.
• Robustness …
• … is the degree to which the results and conclusions of a meta-analysis
remain stable when conditions of the data or of the analysis change
(Greenhouse & Iyengar, 2009).
138
Sensitivity analysis
Sensitivity analysis: Outliers
• Only 3% of meta-analyses conduct outlier analyses (Aguinis et al., 2011).
• Effect size outlier (large or small)
• Graphical methods and statistical tests for outliers (e.g., SAMD statistic; Beal, Corey, &
Dunlap, 2002).
• Sample size outlier (large)
• Sample sizes influence effect size weights in meta-analyses.
139
Sensitivity analysis
Sensitivity analysis: Outliers
• Specific sample removed analysis:
• Identify potential outliers or otherwise influential samples with the
Viechtbauer and Cheung’s (2010) outlier and influence diagnostic, or other
methods (incl. graphical methods).
• Viechtbauer and Cheung’s (2010) outlier and influence diagnostic is preferred as the
SAMD analysis does not take (residual) heterogeneity into account when identifying
potential outliers.
• Run the meta-analysis with and without the identified outlier(s).
• Examine the means.
• How much does the distribution mean change when a given sample is
excluded from the analysis?
• Are the results due to a small number of influential samples?
140
Sensitivity analysis
Sensitivity analysis: Publication bias
• Publication bias analyses are a type of sensitivity analysis.
• Publication bias exists when the research available to the reviewer on a
topic is unrepresentative of all the literature on the topic (Kepes et al.,
2012; Rothstein et al., 2005).
• Publication bias is addressed as data censoring in the APA style manual in
the Meta-Analysis Reporting Standards (MARS)
141
Sensitivity analysis
Sensitivity analysis: Publication bias
• Only between 3% (Aguinis et al., 2011) and 30% (Kepes et al., 2012) of
meta-analyses conduct publication bias analyses (typically with
inappropriate methods; Banks et al., 2012; Kepes et al., 2012).
• Similar terms/phenomena:
• Availability bias, dissemination bias.
• Not necessarily about published vs. not published.
142
Sensitivity analysis
Sensitivity analysis: Publication bias
• In the medical sciences, for depression drugs:
143
Sensitivity analysis
Sensitivity analysis: Publication bias
• Taxonomy of causes of publication bias (Banks & McDaniel, 2011; Kepes et
al. 2012):
• Outcome-level causes
• Sample-level causes
144
Sensitivity analysis
Outcome-level publication bias
• Outcome-level publication bias:
• Outcome-level publication bias refers to selective reporting of results (i.e.,
selective reporting of effect sizes). In other words, the primary study is
available but some results are not reported.
145
Sensitivity analysis
Outcome-level publication bias
• Authors may decide to exclude some effect sizes prior to submitting the
paper.
• Not statistically significant
• Contrary to:
•
•
•
•
expected finding
the author’s theoretical position
the editor’s or reviewers’ theoretical positions
past research
• Results that disrupt the paper’s “story line.”
146
Sensitivity analysis
Outcome-level publication bias
• Authors may also (1):
• Choose the analytic method that maximizes the magnitude of the effect
size.
• Not report the effect size under alternative analysis methods.
• Engage in HARKing (hypothesizing after results are known) (Kerr, 1998).
• HARKing may involve deleting some effect sizes.
• HARKing serves to “convert Type I errors into non-replicable theory, and hides null results
from future generations of researchers” (Rupp, 2011, p. 486).
• Bedeian, Taylor, and Miller (2010) reported that 92% of faculty know of a colleague who has
engaged in HARKing.
• This a sad state of affairs.
147
Sensitivity analysis
Outcome-level publication bias
• Authors may also (2):
• For disciplines that use many control variables, a researcher can go “fishing”
for the control variables that yield the expected results.
• Discard the control variables that yield results inconsistent with the expected result.
• Fail to report the effect sizes prior to “fishing.”
• Manufacture false results (Yong, 2012).
148
Sensitivity analysis
Outcome-level publication bias
• The editorial review process can result in outcome-level bias.
• Reviewers and editors may promote HARKing by knowing the results and
then offering alternative explanations (Leung, 2011; Rupp, 2011).
• An editor or reviewer may:
• Request that the author change the focus of the paper, making some results less relevant.
• Request that the author shorten the paper (e.g., delete “non-central” effect sizes that are
not significant).
• Request that the author drop the analyses yielding statistically non-significant effect
sizes.
149
Sensitivity analysis
Sample-level publication bias
• Sample-level causes of publication bias concern the non-availability of an
entire sample.
• Sources of this bias include author decisions, the editorial review process,
and organizational constraints.
• Research in medicine suggests that author decisions are the primary cause
of non-publication and thus missing samples (Chalmers & Dickersin, 2013;
Dickersin, 1990, 2005).
• An author will likely work on the paper that has the best chance of getting
into the best journal.
• Other papers are abandoned.
• Results in small magnitude effects being hidden from the publically available research
literature.
150
Sensitivity analysis
Sample-level publication bias
• Authors may have personal norms or adopt organizational norms that hold
that only articles in top journals “count.”
• Count for tenure, promotions, raises, discretionary dollars.
• Thus, authors may abandon papers that don’t make the top journal cut.
• Results are “lost” to the literature.
• The editorial process will reject:
• Poorly framed papers.
• Papers without statistically significant findings.
• Papers with results contrary to existing literature and current theory.
• Well done papers with research that “didn’t work.”
151
Sensitivity analysis
Sample-level publication bias
• These editorial decisions result in suppression of effect sizes at the samplelevel.
• Typically, samples with smaller magnitude effect sizes will be “lost.”
• When large effects are socially uncomfortable (e.g., mean demographic
differences), the larger effects may be suppressed.
• To clarify:
• Editors should reject papers that are bad (e.g., bad framing, lack of clear
focus, incomplete theory, poorly developed hypotheses, awful measures,
poorly designed, inappropriate analysis).
• Just don’t define “bad” as …
• … small magnitude/non-significant effect sizes.
• … results inconsistent with hypotheses.
152
Sensitivity analysis
Not missing data at random
• Neither outcome-level publication bias nor sample-level publication bias
results in a “missing data at random” situation.
• Not missing at random (NMAR)
• There is nothing random about it.
• It is systematic!
153
Sensitivity analysis
References and readings for sensitivity analysis
Aguinis, H., Pierce, C. A., Bosco, F. A., Dalton, D. R., & Dalton, C. M. (2011). Debunking myths and urban legends about meta-analysis. Organizational Research
Methods, 14, 306-331. doi: 10.1177/1094428110375720
Banks, G. C., Kepes, S., & Banks, K. P. (2012). Publication bias: The antagonist of meta-analytic reviews and effective policy making. Educational Evaluation and
Policy Analysis, 34, 259-277. doi: 10.3102/0162373712446144
Banks, G. C., Kepes, S., & McDaniel, M. A. (2015). Publication bias: Understanding the myths concerning threats to the advancement of science. In C. E. Lance &
R. J. Vandenberg (Eds.), More statistical and methodological myths and urban legends (pp. 36-64). New York, NY: Routledge.
Banks, G.C., Kepes, S., & McDaniel, M.A. (2012). Publication bias: A call for improved meta-analytic practice in the organizational sciences. International Journal
of Selection and Assessment, 20, 182-196. doi: 10.1111/j.1468-2389.2012.00591.x
Banks, G.C. & McDaniel, M.A. (2011). The kryptonite of evidence-based I-O psychology. Industrial and Organizational Psychology: Perspectives on Science and
Practice, 4, 40-44. doi: 10.1111/j.1754-9434.2010.01292.x
Beal, D. J., Corey, D. M., & Dunlap, W. P. (2002). On the bias of Huffcutt and Arthur's (1995) procedure for identifying outliers in the meta-analysis of correlations.
Journal of Applied Psychology, 87, 583-589. doi: 10.1037/0021-9010.87.3.583
Becker, B. J. (2005). The failsafe N or file-drawer number. In H. R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.), Publication bias in meta analysis: Prevention,
assessment, and adjustments (pp. 111-126). West Sussex, UK: Wiley.
Bedeian, A.G., Taylor, S,G. & Miller, A. N. (2010). Management science on the credibility bubble: Cardinal sins and various misdemeanors. Academy of
Management Learning & Education, 9, 715–725. doi: 10.5465/amle.2010.56659889
Begg, C. B., & Mazumdar, M. (1994). Operating characteristics of a rank correlation test for publication bias. Biometrics, 50, 1088-1101. doi:10.2307/2533446
Begg, C.B. & Berlin, J.A. (1988). Publication bias: A problem in interpreting medical data. Journal of the Royal Statistical Society. Series A (Statistics in Society), 151,
419-463. doi: 10.2307/2982993
Berlin, J.A. & Ghersi, D. (2005). Preventing publication bias: Registries and prospective meta-analysis. In H. R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.),
Publication bias in meta analysis: Prevention, assessment, and adjustments (pp. 35-48). West Sussex, UK: Wiley.
Berry, C. M., Sackett, P. R., & Tobares, V. (2010). A meta-analysis of conditional reasoning tests of aggression. Personnel Psychology, 63, 361-384. doi:
10.1111/j.1744-6570.2010.01173.x
154
Sensitivity analysis
References and readings for sensitivity analysis
Borenstein, M. (2005). Software for publication bias. In H. R. Rothstein, A. J. Sutton & M. Borenstein (Eds.), Publication bias in meta analysis: Prevention,
assessment, and adjustments (pp. 193-220). West Sussex, UK: Wiley.
Chalmers, I., & Dickersin, K. (2013). Biased under-reporting of research reflects biased under-submission more than biased editorial rejection. F1000Research, 2,
1-6. doi: 10.12688/f1000research.2-1.v1
Cooper H. M. (1979). Statistically combining independent studies: A meta-analysis of sex differences in conformity research. Journal of Personality and Social
Psychology, 37, 131-146. doi: 10.1037/0022-3514.37.1.131
Dalton, D. R., Aguinis, H., Dalton, C. M., Bosco, F. A., & Pierce, C. A. (2012). Revisiting the file drawer problem in meta-analysis: An assessment of published and
non-published correlation matrices. Personnel Psychology, 65, 221-249. doi: 10.1111/j.1744-6570.2012.01243.x
Dickersin, K. (1990). The existence of publication bias and risk factors for its occurrence. Journal of the American Medical Association, 263, 1385-1389.
doi:10.1001/jama.263.10.1385
Dickersin, K. (2005). Publication bias: Recognizing the problem, understandings its origins and scope, and preventing harm. In H. R. Rothstein, A. J. Sutton, & M.
Borenstein (Eds.), Publication bias in meta analysis: Prevention, assessment, and adjustments (pp. 11-34). West Sussex, UK: Wiley.
Doucouliagos, H., & Stanley, T. D. (2009). Publication selection bias in minimum-wage research? A metaregression analysis. British Journal of Industrial Relations,
47, 406-428. doi:10.1111/j.1467-8543.2009.00723.x
Duval, S. J. (2005). The ‘‘trim and fill’’ method. In H. R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.), Publication bias in meta-analysis: Prevention, assessment,
and adjustments (pp. 127-144). West Sussex, UK: Wiley.
Egger, M., Smith, G. D., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315, 629-634. doi:
10.1136/bmj.315.7109.629
Fanelli, D. (2010). "Positive" results increase down the hierarchy of the sciences. PLoS ONE, 5, e10068. doi: 10.1371/journal.pone.0010068
Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90, 891-904. doi: 10.1007/s11192-011-0494-7
Ferguson, C. J., & Brannick, M. T. (2012). Publication bias in psychological science: Prevalence, methods for identifying and controlling, and implications for the
use of meta-analyses. Psychological Methods, 17, 120–128. doi:10.1037/a0024445
155
Sensitivity analysis
References and readings for sensitivity analysis
Field, A. P., & Gillett, R. (2010). How to do a meta-analysis. British Journal of Mathematical and Statistical Psychology, 63, 665-694. doi: 10.1348/000711010X502733
Francis, G. (2014). The frequency of excess success for articles in Psychological Science. Psychonomic Bulletin and Review, 21, 1180-1184. doi: 10.3758/s13423-0140601-x
Greenhouse, J. B., & Iyengar, S. (2009). Sensitivity analysis and diagnostics. In H. Cooper, L. V. Hedges & J. C. Valentine (Eds.), The handbook of research synthesis
and meta-analysis (2nd ed.). (pp. 417-433): New York, NY, US: Russell Sage Foundation.
Hambrick, D.C. (2007). The field of management's devotion to theory: Too much of a good thing? Academy of Management Journal, 50, 1348-1352. doi
10.5465/AMJ.2007.28166119
Harrison, J. S., Banks, G. C., Pollack, J. M., O’Boyle, E. H., & Short, J. (in press). Publication bias in strategic management research. Journal of Management. doi:
10.1177/0149206314535438
Hedges, L. V. (1992). Modeling publication selection effects in meta-analysis. Statistical Science, 7, 246-255. doi:10.1214/ss/1177011364
Hopewell, S., Clarke, M., & Mallett, S. (2005). Grey literature and systematic reviews. In H. R. Rothstein, A. J. Sutton, & M. Borenstein (Eds.), Publication bias in
meta analysis: Prevention, assessment, and adjustments (pp. 48-72). West Sussex, UK: Wiley.
Ioannidis J. P. A. & Trikalinos T. A. (2005). Early extreme contradictory estimates may appear in published research: the Proteus phenomenon in molecular
genetics research and randomized trials. Journal of Clinical Epidemiology, 58, 543-9. doi: 10.1016/j.jclinepi.2004.10.019
Ioannidis J. P. A. & Trikalinos T. A. (2007). An exploratory test for an excess of significant findings. Clinical Trials, 4, 245-253,
James, L. R., McIntyre, M. D., Glisson, C. A., Green, P. D., Patton, T. W., LeBreton, J. M., . . . Williams, L. J. (2005). A conditional reasoning measure for aggression.
Organizational Research Methods, 8, 69-99. doi: 10.1177/1094428104272182
Kepes, S., Banks, G. C., McDaniel, M. A., & Sitzmann, T. (2012, August). Assessing the robustness of meta-analytic results and conclusions. Paper presented at
the annual meeting of the Academy of Management, Boston, MA.
Kepes, S., Banks, G. C., & Oh, I.-S. (2014). Avoiding bias in publication bias research: The value of "null" findings. Journal of Business and Psychology. doi:
10.1007/s10869-012-9279-0
156
Sensitivity analysis
References and readings for sensitivity analysis
Kepes, Banks, McDaniel, & Whetzel (2012). Publication bias in the organizational sciences. Organizational Research Methods, 15, 624-662. doi:
10.1177/1094428112452760
Kepes, S. & McDaniel, M.A. (2015). The validity of conscientiousness is overestimated in the prediction of job performance. PLoS ONE 10(10): e0141468.
doi:10.1371/journal.pone.0141468
Kepes, S., & McDaniel, M. A. (2013). How trustworthy is the scientific literature in industrial and organizational psychology? Industrial and Organizational
Psychology: Perspectives on Science and Practice, 6, 252-268. doi: 10.1111/iops.12045
Kepes, S., McDaniel, M. A., Brannick, M. T., & Banks, G. C. (2013). Meta-analytic reviews in the organizational sciences: Two meta-analytic schools on the way to
MARS (the Meta-analytic Reporting Standards). Journal of Business and Psychology, 28, 123-143. doi: 10.1007/s10869-013-9300-2
Kepes, S., McDaniel, M. A., Banks, C., Hurtz, G., & Donovan, J. (2011, April). Publication bias and the validity of the Big Five. Paper presented at the 26th Annual
Conference of the Society for Industrial and Organizational Psychology. Chicago.
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2, 196-217. doi: 10.1207/s15327957pspr0203_4
Lehrer, J. (2010). The truth wears off. New Yorker, 86, 52-57.
Leung, K. (2011). Presenting post hoc hypotheses as a priori: Ethical and theoretical issues. Management and Organization Review, 7, 471-479. doi: 10.1111/j.17408784.2011.00222.x
McDaniel, M. A., Whetzel, D., Schmidt, F. L., Maurer, S. (1994). The validity of the employment interview: A comprehensive review and meta-analysis. Journal of
Applied Psychology, 79, 599-616. doi: 10.1037/0021-9010.79.4.599
McDaniel, M. A., McKay, P. & Rothstein, H. (2006, May). Publication bias and racial effects on job performance: The elephant in the room. Paper presented at the
21st Annual Conference of the Society for Industrial and Organizational Psychology. Dallas.
McDaniel, M. A., Rothstein, H. R. & Whetzel, D. L. (2006). Publication bias: A case study of four test vendors. Personnel Psychology, 59, 927-953. doi:
10.1111/j.1744-6570.2006.00059.x
O'Boyle, E. H., Banks, G. C., & Rutherford, M., W. (2014). Publication bias in entrepreneurship research: An examination of dominant relations to performance.
Journal of Business Venturing, 29, 773-784. doi: 10.1016/j.jbusvent.2013.10.001
O’Boyle, E. H., Banks, G. C., & Gonzalez-Mule, E. (in press). The chrysalis effect: How ugly initial results metamorphosize into beautiful articles. Journal of
Management. doi: 10.1177/0149206314527133
157
Sensitivity analysis
References and readings for sensitivity analysis
Palmer, T. M., Peters, J. L., Sutton, A. J., & Moreno, S. G. (2008). Contour-enhanced funnel plots for meta-analysis. Stata Journal, 8, 242-254.
Peters, J. L., Sutton, A. J., Jones, D. R., Abrams, K. R., & Rushton, L. (2008). Contour-enhanced meta-analysis funnel plots help distinguish publication bias from
other causes of asymmetry. Journal of Clinical Epidemiology, 61, 991-996. doi:10.1016/j.jclinepi.2007.11.010
Pollack, J. M. & McDaniel, M. A. (2008, April). An examination of the PreVisor Employment Inventory for publication bias. Paper presented at the 23rd Annual
Conference of the Society for Industrial and Organizational Psychology. San Francisco.
Renkewitz, F., Fuchs, H. M., & Fiedler, S. (2011). Is there evidence of publication biases in JDM research? Judgment and Decision Making, 6, 870-881.
Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86, 638-641. doi: 10.1037/0033-2909.86.3.638
Rothstein, H. (2012). Accessing relevant literature. In H. M. Cooper (Ed.), APA handbook of research methods in psychology: Vol. 1. Foundations, planning,
measures, and psychometrics (pp. 133-144). Washington, DC: American Psychological Association.
Rothstein, H. R., Sutton, A. J., & Borenstein, M. (Eds.). (2005). Publication bias in meta-analysis: Prevention, assessment, and adjustments. West Sussex, UK: Wiley.
Rupp, D.E. (2011). Ethical issues faced by editors and reviewers. Management and Organization Review, 7, 481-493. doi: 10.1111/j.1740-8784.2011.00227.x
Stanley, T. D., & Doucouliagos, H. (2014). Meta-regression approximations to reduce publication selection bias. Research Synthesis Methods, 5, 60-78. doi:
10.1002/jrsm.1095
Stanley, T. D. (2008). Meta-regression methods for detecting and estimating empirical effect in the presence of publication selection. Oxford Bulletin of
Economics and Statistics, 70, 103-127. doi:10. 1111/j.1468-0084.2007.00487.x
Stanley, T. D. and Doucouliagos, H. (2011). Meta-Regression Approximations to Reduce Publication Selection Bias. Manuscript available at
www.deakin.edu.au/buslaw/aef/workingpapers/papers/2011_4.pdf
Sterling, T. D., & Rosenbaum, W. L. (1995). Publication decisions revisited: The effect of the outcome of statistical tests on the decision to publish and vice versa.
American Statistician, 49, 108-112. doi: 10.1080/00031305.1995.10476125
Sterne, J. A. C., Sutton, A. J., Ioannidis, J. P., Terrin, N., Jones, D. R., Lau, J., . . . Higgins, J. P. (2011). Recommendations for examining and interpreting funnel plot
asymmetry in meta-analyses of randomized controlled trials. British Medical Journal, 343, d4002. doi:10.1136/bmj.d4002
158
Sensitivity analysis
References and readings for sensitivity analysis
Sutton, A. J., Abrams, K. R., Jones, D. R., Sheldon, T. A., & Song, F. (2000). Methods for meta-analysis in medical research. West Sussex, UK: Wiley.
Tate, B. W. & McDaniel, M. A. (2008, August). Race differences in personality: an evaluation of moderators and publication bias. Paper presented at the Annual
meeting of the Academy of Management, Anaheim CA.
Terrin, N., Schmid, C. H., Lau, J., & Olkin, I. (2003). Adjusting for publication bias in the presence of heterogeneity. Statistics in Medicine, 22, 2113-2126.
doi:10.1002/sim.1461
Vevea, J. L., & Hedges, L. V. (1995). A general linear model for estimating effect size in the presence of publication bias. Psychometrika, 60, 419-435.
doi:10.1007/BF02294384
Vevea, J. L., & Woods , C.M. (2005). Publication bias in research synthesis: Sensitivity analysis using a priori weight functions. Psychological Methods, 10, 428–443.
Viechtbauer, W., & Cheung, M. W. L. (2010). Outlier and influence diagnostics for meta-analysis. Research Synthesis Methods, 1, 112-125. doi: 10.1002/jrsm.11
Weinhandl, E.D., & Duval, S. (2012). Generalization of trim and fill for application in meta-regression. Research Synthesis Methods, 3, 51-67.
159
IQ and brain volume
Intelligence and brain volume
• My 2005 paper on brain volume and intelligence was a response to critics of
the research who were denying that brain volume had any relation to
intelligence:
• Some were misinformed, largely driven by Gould’s (1981) Mismeasure of
Man book and it’s subsequent revisions that did not report contrary
evidence.
• Some were advocates of the position that intelligence was a social construct
without any biological basis.
• I am not arguing that the relationship between brain volume and
intelligence is particularly useful for understanding the nature of
intelligence. Research in other areas including neuroscience and behavioral
genetics would be much more useful.
160
IQ and brain volume
Intelligence and brain volume
• The observed mean correlation between intelligence and brain volume was
.29.
• The value of .29 would likely be an underestimate of the population
correlation due to range restriction.
• The 2005 analysis corrected for range restriction but used a direct range
restriction formula because at the time of the data analysis (2003), a usable
formula was not available for indirect range restriction corrections.
•
161
IQ and brain volume
Reanalysis of McDaniel (2005)
• The reanalysis for this workshop employs corrections for:
• Indirect range restriction
• Small amounts of measurement error in the assessment of intelligence.
• The analysis also includes outlier analysis.
• The program is called g and brain volume psychometric meta-analysis.R
and is in the directory: …\2016 McDaniel - Meta-Analysis\R exercises
which is in the CARMA 2016 course materials to be found at
http://tinyurl.com/meta-analysisJuly
• The data file is in the 2016 McDaniel - Meta-Analysis/Data sets and is called
G_HeadSize_Final_Analysis_data_set.xls
•
162
IQ and brain volume
Reanalysis of McDaniel (2005)
• An outlier analysis using the metafor package in R, identified three
outliers:
• Staff (2002) r = -.07
• Tan et al. (1999) r = .62
• Aylward et al. (2002). Supplemented r = -.13
• I dropped these studies and based the reanalysis on 34 samples.
163
IQ and brain volume
Reanalysis of McDaniel (2005)
• To estimate the range restriction in this analysis, I used the reported
standard deviation of IQ when reported and used the median of the
reported values (13.2) to input the missing data.
• The population standard deviation was set at 15.
• Indirect range restriction is preferred over direct range restriction because
studies did not intentionally restrict the range of IQ (e.g., only individuals
with IQ at a certain level).
• Rather, the range restriction was likely due to samples of volunteers, who
on average, have restricted variance on IQ because they are not
representative of the population of all humans. This would be indirect
range restriction.
164
IQ and brain volume
Reanalysis of McDaniel (2005)
• I assumed that reliability of full scale IQ was .95 .
• I assumed that the reliability of brain volume measures was 1.0. There is
very little data on the reliability of in vivo brain volume.
165
IQ and brain volume
Reanalysis of McDaniel (2005)
Analysis
Estimated
population
correlation
Observed mean correlation (McDaniel, 2005)
.29
Observed mean correlation in this reanalysis a
.32
Mean corrected for direct range restriction
(McDaniel, 2005)
.33
Mean corrected for indirect range restriction in
this reanalysis
.37
a. The difference between .29 and .32 is due to dropping 3 outliers
in the reanalysis.
166
IQ and brain volume
Pietschnig et al., 2015
• Pietschnig, J., Penke, L., Wicherts, J. M., Zeiler, M., & Voracek, M. (2015).
Meta-analysis of associations between human brain volume and
intelligence differences: How strong are they and what do they mean?
Neuroscience and Biobehavioral Reviews, 57, 411–432.
• Graciously, and consistent with goals of transparency in science, they
released their data set to me so that I could use it in this workshop.
• I thank them for the release of the data.
167
IQ and brain volume
Pietschnig et al., 2015
• They report a mean of .24 based on samples of healthy and unhealthy
adults.
• For healthy samples, the mean was .26.
• The comparable value from McDaniel (2005) was .29.
• Admirably, they did conduct many publication bias analyses.
• Regrettably, they did not conduct any outlier analyses, correct for
measurement error, or correct for indirect range restriction.
168
IQ and brain volume
Preliminary re-analysis of Pietschnig et al data
• Analyses are hampered by the data set not reporting the standard
deviations of IQ.
• For this preliminary reanalysis:
• I used the data for full scale IQ in healthy subjects which makes the data
comparable to McDaniel (2005)
• I used the imputed value of 13.2 that I used in my reanalysis of McDaniel
(2005)
• Assumed the reliability of .95 for the IQ measure and 1.0 for brain volume as
used in the reanalysis of McDaniel (2005).
• Conducted a psychometric meta-analysis with indirect range restriction
corrections.
169
IQ and brain volume
Preliminary re-analysis of Pietschnig et al. data
• The Pietschnig et al. (2015) data is labelled Pietschnig et al. (2015) data.xlsx
and can be found in the directory … 2016 McDaniel - Meta-Analysis/Data
sets
• The R program for the reanalysis is labelled Reanalysis Pietschng.R and can
be found in the directory … 2016 McDaniel - Meta-Analysis\R exercises
• There were 84 healthy samples in the data set.
• Outlier analyses suggested that 11 were outliers.
• The reanalysis of these data were based on 73 (84-11) healthy samples.
170
IQ and brain volume
Compare McDaniel (2005) with Pietschnig et al.
(2015) original and reanalysis
McDaniel
Pietschnig et al.
Analysis
Estimated
population
correlation
Analysis
Estimated
population
correlation
Observed mean correlation
(McDaniel, 2005)
.29
Observed mean correlation
(Pietschnig et al. , 2015)
.26
Observed mean correlation in
this reanalysis
.32
Observed mean correlation in
this reanalysis
.28
Mean corrected for direct range
restriction (McDaniel, 2005)
.33
Mean corrected for direct range NA
restriction
Mean corrected for indirect
range restriction in this
reanalysis
.37
Mean corrected for indirect
range restriction in this
reanalysis
.33
171
IQ and brain volume
If one wants to do more comprehensive
analyses…
• Use the Pietschnig et al. 2015 et al. healthy sample data.
• There is more of it than in McDaniel (2015)
• Goal of the meta-analysis is to estimate relationship in healthy samples.
• Identify the studies that report the standard deviation of IQ and use the
median to estimate the missing values.
• Conduct sensitivity analyses to evaluate varying estimates of the missing
standard deviations of IQ.
172
IQ and brain volume
If one wants to do more comprehensive
analyses…
• Conduct outlier analyses.
• Conduct publication bias analyses with and without outliers on the
uncorrected correlations.
• Some publication bias results may go away when outliers are dropped.
• Conduct publication bias separately by subgroups as data permits.
173
IQ and brain volume
If one wants to do more comprehensive
analyses…
• Conduct publication bias analyses with and without outliers on the
correlations corrected for measurement error and range restriction.
• The data set DAT produced by the R code contains the estimated population
correlation and the estimated sampling variance of the estimated
population correlation. These variables can be used for many of the
publication bias analyses.
• I don’t think anyone has ever done this before.
• Conduct publication bias separately by subgroups as data permits.
174
IQ and brain volume
If one wants to do more comprehensive
analyses…
• Report the meta-analysis results using the Hedges and Olkin approach
used by Pietschnig et al. (2015)
• Do this separately by moderator subgroups.
• Will provide means by subgroups
• Run a meta-regression if sufficiently powered.
• Report the meta-analysis results using psychometric meta-analysis with
reliability corrections and indirect range restriction correction.
• Do this separately by moderator subgroups.
• Will provide means by subgroups
• Run a meta-regression if sufficiently powered.
175
IQ and brain volume
If one wants to do more comprehensive
analyses…
• Accept that you will have multiple estimates of the correlation between
intelligence and brain volume.
• Report all the estimates.
• Send me a copy of the results.
176
Thank you.
177

meta-analysis workshop slides

Transcript meta-analysis workshop slides

Directory