Estimating the reproducibility of psychological science: accounting

Download Report

Transcript Estimating the reproducibility of psychological science: accounting

Estimating the reproducibility of
psychological science:
accounting for the statistical significance of
the original study
Robbie C. M. van Aert & Marcel A. L. M. van Assen
Tilburg University & Utrecht University
1
Social Sciences Meta-Research Group
http://www.metaresearch.nl/
2
The Problem
Example (Maxwell et al., 2015, in Am Psy)
Independent sample t-test
Original:
d = 0.5, t(78) = 2.24, p = 0.028
Replication (power = .8 at d = 0.5)
d = 0.23, t(170) = 1.50, p= 0.135
Conclusion?!?
Omnipresent and relevant problem: 61% in RPP
Questions considered relevant
1) Does effect exist? (0 or not)
2) What is effect? (best guess)
3
Problem and Solution
Problem
How to evaluate results of original and replication study?
Solution
Accurate estimation of effect size …
… taking statistical significance of the original study into
account
4
The Message
(1) Methods should take statistical significance of original study into
account
Easy, natural, insightful
(2) We developed such methods (frequentist and Bayesian)
(3) Need huge sample sizes (N~1,000) to distinguish 0 from small effect
 With current sample sizes in Psychology, one or two studies is not
sufficient to accurately estimate effect size
(4) Apply methods to Reproducibility Project (2015)
 Best guess for only few nonsignificant replications is zero effect
5
Overview
1. Publication bias and Reproducibility
2. Why we should take significance original study into account
3. Bayesian method
4. Analytical results Bayesian method
5. Application: Reproducibility Project Psychology
6. Conclusion and discussion
6
1. Publication bias and Reproducibility
• Publication bias is ‘the selective publication of studies with a
statistically significant outcome’
• Evidence of publication bias is
HUMONGOUS
• 97% of published original
significant in psychology (97% in
RPP), but average power much
lower (8%, about 20%, 35%, 50%)
•  So… convinced?
7
1. Publication bias and Reproducibility
Publication bias is the 800-lb gorilla in psychology’s living room
(Ferguson and Heene, 2012)
8
1. Publication bias and Reproducibility
But.. Psychologists do not see the gorilla ?!?
‘Shock’ after RPP (97%  36%)
9
2. Why we should take significance of original
study into account
Assume researcher’s goal:
replicate significant original
i.
Selection of high score
ii.
Score subject to (sampling) error
 Regression to the mean:
original overestimates, replication accurate
! Holds irrespective of publication bias !
10
3. Bayesian method
[Snapshot Bayesian Hybrid Method]
Assumptions
– Original study is statistically significant
– Both studies estimate the same effect (fixed-effect)
– No questionable research practices
Basic idea
1) Assume 4 effect sizes (0, small, medium, large [Cohen]) = snapshots
2) Compute posterior probability of four effects = Bayesian
3) Take statistical significance of original study into account = hybrid
11
3. Bayesian method
Basic idea
Likelihoods replication study
12
3. Bayesian method
Basic idea
Likelihoods original study
13
3. Bayesian method
Basic idea
Applied to example Maxwell et al. (2015)
Evidence of 0 and small effect increased; best guess = small effect
Advantages of method
• Easy, natural, insightful
• Easy (re)computation posterior for other (than uniform) prior
14
4. Analytical results Bayesian method
Independent variables:
• ρ = 0; 0.1; 0.3; 0.5
[0, Small, Medium, Large]
• N both original and replication: 31; 55; 96, and 300, 1000
Dependent variables:
• Expected posterior probability
• Probability of strong evidence (posterior > .75 or Bayes
Factor > 3)
15
4. Analytical results Bayesian method
Expected posterior probability (hybrid)
Need huge sample sizes (N~1,000) to distinguish 0 from small effect
16
4. Analytical results Bayesian method
Expected posterior probability (WRONG method)
Uncorrected for publication bias  overestimation
17
4. Analytical results Bayesian method
Expected posterior probability (hybrid)
Easier to distinguish medium and strong effect size
18
4. Analytical results Bayesian method
Probability of strong evidence (hybrid)
High sample size needed for 0 and small effect
19
5. Application: Reproducibility Project Psychology
100 studies from JPSP, Psych Science, JEP  67 could be included
Evidence according to posterior probabilities > .25
0 = zero, S(mall), M(edium), L(arge)
Strong evidence (posterior probability > .75)
Only few studies have strong evidence for zero effect (13.4%)
20
6. Conclusion and discussion
Messages
(1) Methods should take statistical significance of original study into
account
(2) We developed such methods (frequentist and Bayesian)
(3) Need huge sample sizes (N~1,000) to distinguish 0 from small effect
 With current sample sizes in Psychology, one or two studies is not
sufficient to accurately estimate effect size
(4) Apply methods to Reproducibility Project (2015)
 Best guess for few nonsignificant replications is zero effect
21
6. Conclusion and discussion
Other
• Apply methods to reproducibility project economics
• Power analysis (how large should N of replication for 80%
chance of strong evidence?)
• Unequal sample size original and replication
discarding original studies, i.e. using only replication, is
optimal for estimation in some conditions
Start all over again in some fields?!?
• App / user friendly program
22
Thank you for your attention
23
1. Publication bias and Reproducibility
But.. Psychologists do not see the gorilla ?!?
‘Shock’ after RPP (97%  36%)
Denial of results by some psychologist and methodologists
• Bad replication
• Not generalizable to other settings/people/time
• Statistical evaluation of results not right (e.g. Maxwell et al.)
ALL critics true to some extent !
 NEED accurate methods!
24
2. Why we should take significance of original
study into account
Assume researcher’s goal: replicate significant original
i.
Selection of high score
ii.
Score subject to (sampling) error
 Regression to the mean: expected value of replication is smaller
than of original
! Holds irrespective of publication bias !
Assume researcher’s goal: replicate original (was sign.)
No researcher’s selection of high score, but…
Selection of high score through publication bias  regression to the
mean still holds, and should still take significance original study into
account
25
Other very important messages we would like to
convey, but really have no time for it
1. Analysis shows that discarding original studies, i.e. using only
replication, is optimal for estimation if ...
(i)
true effect size is zero-small, and
(ii) Nreplication > Noriginal
 Start all over again in some fields?!?
2. Using Bayesian analysis, or Confidence Intervals, rather than
frequentist statistics is not the solution ...
 Using larger sample sizes is part of the solution
26