Outcome reporting bias - Open Science Framework

Download Report

Transcript Outcome reporting bias - Open Science Framework

Fostering openness, integrity, and
reproducibility of scientific research
April Clyburne-Sherin @april_cs
Center for Open Science @OSFramework
http://cos.io/
Technology to enable change
Training to enact change
Incentives to embrace change
Technology to enable change
Training to enact change
Incentives to embrace change
Reproducible statistics in the
health sciences
April Clyburne-Sherin
Reproducible Research Evangelist
[email protected]
Reproducible statistics in the health sciences
The problem with the published literature
• Reproducibility
• Power
• Reporting Bias
• Research degrees of freedom
The solution
• Preregistration
How to evaluate the published literature
• p-values
• Effect sizes and confidence intervals
How to preregister
• Open Science Framework
Reproducible statistics in the health sciences
Learning objectives
• The findings of many studies cannot be reproduced
• Low powered studies produce inflated effect sizes
• Low powered studies produce low chance of finding true positives
• Researcher Degrees of Freedom lead to false positive inflations
• Selective reporting biases the literature
• Preregistration is a simple solution for reproducible statistics
• A p-value is not enough to establish clinical significance
• Effect sizes plus confidence intervals work better together
Power in Neuroscience
Button et al. (2013)
Figure 1. Positive Results by Discipline.
Fanelli D (2010) “Positive” Results Increase Down the Hierarchy of the Sciences. PLoS ONE 5(4): e10068. doi:10.1371/journal.pone.0010068
http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0010068
The findings of many studies cannot be reproduced
Why should you care?
• To increase the efficiency of your own work
• Hard to build off our own work, or work of others in our lab
• We may not have the knowledge we think we have
• Hard to even check this if reproducibility low
Current barriers to reproducibility
● Statistical
o
o
o
Low power
Researcher degrees of freedom
Ignoring null results
● Transparency
o
o
o
Poor documentation
Loss of materials and data
Infrequent sharing
Low powered studies mean low chance
of finding a true positive
● Low reproducibility due to power
o
16% chance of finding the effect twice
● Inflated effect size estimates
● Decreased likelihood of true positives
Researcher Degrees of Freedom
lead to false positive inflations
Simmons, Nelson, & Simonsohn (2012)
Selective reporting biases the literature
• Selective reporting
 Outcome reporting bias
62% of trials had at
least one primary
outcome changed,
introduced or omitted
50%+ of pre-specified
outcomes not reported
1. Chan, An-Wen, et al. "Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to
published articles." Jama 291.20 (2004): 2457-2465.
2. Macleod, Malcolm R., et al. "Biomedical research: increasing value, reducing waste." The Lancet 383.9912 (2014): 101-104.
Why does selective reporting matter?
• Selective reporting
 Outcome reporting bias
Response from a trialist who had analysed data on a prespecified outcome but not
reported them
“When we looked at that data, it actually showed an increase in harm amongst those
who got the active treatment, and we ditched it because we weren’t expecting it and
we were concerned that the presentation of these data would have an impact on
people’s understanding of the study findings. …
The argument was, look, this intervention appears to help people, but if the paper
says it may increase harm, that will, it will, be understood differently by, you know,
service providers. So we buried it.”
Smyth, R. M. D., et al. "Frequency and reasons for outcome reporting bias in clinical trials: interviews
with trialists." Bmj 342 (2011): c7153.
Solution: Pre-registration
• Before data is collected, specify
• The what of the study
•
•
•
•
Research question
Population
Primary outcome
General design
•
•
•
•
•
Information on exact analysis that will be conducted
Sample size
Data processing and cleaning procedures
Exclusion criterion
Statistical Analyses
• Pre-analysis plan
● Registered in a read-only format and time-stamped
Positive Result Rate dropped
from 57% to 8% after
preregistration required.
Pre-registration in the health sciences
Evaluating the literature
A p-value is not enough to establish clinical significance
●
Missing clinical insight such as treatment effect size, magnitude of change, or
direction of the outcome
●
Clinically significant differences can be statistically insignificant
●
Clinically unimportant differences can be statistically significant
P-values
What is a p-value?
● The probability of getting your data if there is no treatment effect
● p‐level of α = 0.05 means there is a 95% probability that the
researcher will correctly conclude that there is no treatment effect
when there is really is no treatment effect
P-values
What is a p-value?
● Generally leads to dichotomous thinking
o
Either something is significant or it is not
● Influenced by the number and variability of subjects
● Changes from one sample to the next
The dance of the p-values
P-values
A p-value is not enough to establish clinical significance
● P-values should be considered along with
●
●
●
●
Effect size
Confidence intervals
Power
Study design
Effect Size
● A measure of the magnitude of interest, tells us ‘how
much’
● Generally leads to thinking about estimation, rather
than a dichotomous decision about significance
● Often combined with confidence intervals (CIs) to give
us a sense of how much uncertainty there is around our
estimate
Confidence Intervals
● Provide a ‘plausible’ range for effect size in the population
o
In 95% of the samples you draw from a population, the interval will contain
the true population effect

Not the same thing as saying that 95% of the sample ES will fall within the interval
● Can also be used for NHST
o
if 0 falls outside of the CI, then your test will be statistically significant
Better together
● Why should you always report both effect sizes and CIs?
o
o
Effect sizes, like p-values, are bouncy
Point estimate can convey an invalid sense of certainty about your ES
● CIs give you additional information about the plausible upper and
lower bounds of bouncing ESs
Better together
So why use the ESs + CIs?
● Give you more fine grained information about your data
o
point estimates, plausible values, and uncertainty
● Give more information for replication attempts
● Used for meta-analytic calculations, so are more helpful for
accumulating knowledge across studies
Low powered studies still produce inflated
effect sizes
● If I use ES and CIs rather than p-values, do I still have to worry about
sample size?
o
o
Underpowered studies tend to over-estimate ES
Larger samples will lead to better estimation of the ES and smaller CIs

They will have higher levels of precision
Precision isn’t cheap
● To get high precision (narrow CIs) in any one study, you need large
samples
o
Example: You need about 250 people to get an accurate, stable estimate of
the ES in psychology
Precision isn’t cheap
Free training on how to make research
more reproducible
http://cos.io/stats_consulting
Find this presentation at
https://osf.io/rwtyf/
Questions: [email protected]