Pre-analysis plans - Running Randomized Evaluations

Download Report

Transcript Pre-analysis plans - Running Randomized Evaluations

Pre-analysis plans
Module 8.3
Recap on statistics
• If we find a result is significant at the 5% level,
what does this mean?
– there is a 5% or less probability it is the result of
chance
• If we test 10 independent hypotheses there is
a 40% chance we will fail to reject the null at
the 5% level at least once
– i.e. 40% chance find one hypothesis is significant
at the 5% level
Publication bias: what is it?
• Evaluations with positive or negative significant
impacts more likely to get published
• If enough evaluations are run on a given type of
program, some will give positive result by chance
• Those with a vested interest may even deliberately
run many studies and publicize only some of the
results
• Published results will suggest the program is
effective even if it isn’t
Study registers as a solution
• All studies are registered before the results are
known
• Shows how many studies examined a type of
program
• If 20 studies are started on a given question and
only one positive study is published, we may
worry the others found no result
• Incentives for registration
– Some medical journals require registration
before study started for publication
• New registries for RCTs in social sciences
– E.g by the American Economic Association
Coverage of registries
• Q: what are the advantages of having a
registry that only covers RCTs?
• Q: Is publication bias only a problem for RCTs?
– If not why not have a registry for non RCTs?
• Q: Should registration only be allowed before
the start of a study?
Data mining: what is it?
• Looking at the data many different ways, trying to find
the result you want
• If test impact of program on many different outcomes,
some will show positive (or negative) impact by
chance
• If test impact of program on many different
subgroups, some will show positive (or negative)
impact by chance
• We may be falsely accused of data mining
– E.g. we test one subgroup and report the results but readers
think maybe we tested many and only reported the one
that was significant
Preanalysis plans as a solution
• Write down in advance how the data will be
analyzed
– What outcomes are of primary interest
– What subgroups will be examined
• Register the plan with some external organization
• When presenting results, show all those covered
in the PAP
– highlight any deviations from or additions to the PAP
When is a PAP most useful?
• When a study has a large number of outcomes
with no obvious hierarchy of which are the most
important
• When researchers know they are interested in
differential impact on different subgroups
(heterogeneous treatment effects)
• When researchers are concerned others will push
them to find positive impacts
• When want to adjust statistical tests for multiple
hypothesis tests
Incentives for immunization (no PAP)
• Program to increase immunization in rural India with two arms
– Predictable and reliable immunization camps
– Camps plus 1kg lentils for each shot and set of plates on
completion of full immunization
• One main outcome (immunization), more than one indicator
– Ever received any immunization
– % with full immunization
• Showing two indicators illustrated why the program worked
– Regular camps increased children with at least one shot
– Incentive increased % with full immunization
– Suggests unwillingness to immunize not main barrier to full
immunization, instead incentive helped maintain persistence
Results: number of immunizations
Source: Banerjee et al 2010
Results fully immunized
Source: Banerjee et al 2010
Community driven development (with PAP)
• Evaluation of the impact of a CDD program including:
– Quality and quantity of public goods
– Social capital
– Trust
– Participation
• Each outcome area had many indicators, eg many public goods
– >300 outcome indicators in total
• Several important subgroups to examine:
– Women and youth primary targets
– Program implemented in two very different regions
When should a PAP be written?
• Before the baseline?
– Prevents us from learning about what questions have low
response rates or inconsistent answers
• Before the program starts?
– prevents researchers taking advantage of random shocks
which happen to mainly impact treatment or comparison
– Also prevents researchers thinking of new hypotheses, e.g.
unthought-of of negative consequences
• Before looking at any data?
– Can be useful to look at comparison data to determine
appropriate control variables
– drop variables where little chance of improvement (e.g.
95% of control already do)
What should be covered in a PAP?
• Main outcome measures
• Which outcome measures are primary and which
secondary
• The precise composition
• Subgroups to be analyzed
• Direction of expected impact if we want to use a onesided test
• Primary specification to be used
– What control variables to include
– How is the outcome variable specified, in logs, changes etc
Disadvantages of a PAP
• Any analysis that is not included in the PAP has less
credibility
– Only do a PAP if you have the time to think it through
carefully
• Sometimes patterns in the data tell consistent stories we
never thought of
– We might want the flexibility to pursue these
• With complex evaluations it can be hard to think through
all outcome combinations and how analysis should
proceed with each
– We may do different secondary analysis if the impact is
positive vs. negative
– One option is to do PAP in stages, look at some data, then
write another PAP