Statistical meta-evaluation A role for mixed evidence?

Download Report

Transcript Statistical meta-evaluation A role for mixed evidence?

External validity:
What role for short-cut impact
assessment?
(Mixing types of trees to see the forest)
Tanguy Bernard, AFD
Ruth Hill, IFPRI
Doing what works
Seing the forest and picking the trees
•
Banerjee and He (2008)
– List projects that have been shown to work (with ‘internally valid’ studies).
– Argue to scale them up at global level before anything else is done.
•
Yet, things proven effective somewhere may not be elsewhere (and reversely)
– Greenberg et al. (2003) US: welfare programs have different results across sites
– Attanasio et al. (2004) Mexico: Impact of Progresa 3 times larger in richer states.
– Differences likely greater across countries.
•
•
Internal validity: Make sure that Δx caused Δy
External validity: Δx in similar circumstances should also lead to Δy.
•
Unknown: what is meant by similar circumstances. Need to know how impact
varies with
– Type of environments (1)
– Program modalities (2)
and Interactions of (1) and (2)
Risk: missing the forest for the tree (Bardhan, 2005)
 This paper : one way to raise the number of trees (with other type of trees)
Using existing trees (1):
Same objective, diverse approaches
Cost per
additional
year of
school
Source: Duflo, 2009 (lesson)
Using existing trees (2):
Same approach, various environments
Program modality
CCT projects
program
Condition:
Education (E) generosity
(% C°)
Health (H)
E
Bangladesh FSSAP
E
Cambodia JFPR
E
Cambodia CESSP
E+H
CS
Chile
E+H
FA
Colombia
E+H
BDH
Ecuador
E+H
PAF
Honduras
E+H
Opp.
Mexico
E+H
AC
Nicaragua
E+H
RPS
Nicaragua
E+H
SRMP
Turkey
Source: Fiszbein and Schady, 2009
0,6
17
6
7
21,8
29,3
Beneficiaries
Environment
Impact
Girls (G)
Boys (B)
Initial
enrollment
(%)
G
G
G+B
G+B
G+B
G+B
G+B
G+B
G+B
G+B
G+B
44,1
65
65
60,7
91,7
75,2
66,4
94
90,5
72
87,9
12
31,3
21,4
7,5
2,1
10,3
3,3
1,9
6,6
12,8
-3
**
***
***
***
**
**
***
***
***
And: some are pilot, others are national; some are rural, others are urban, some have strong
monitoring and associated sanctions, others don’t etc.
Using existing trees (3):
Deriving general lessons
User fees
Rural Kenya
Peri-urban Zambia
Rural Kenya
Rural Kenya
Incentives
Rural Mexico
Bogota, Colombia
Rural Kenya
Rural Kenya
Rural Malawi
Intervention
Result
$0,30 - 0 deworming pills
Water desinfectant<mkt
Mosquito nets $0 - $0,75
Uniforms $0 - $5.82
↓ take up 82%
Price elasticity: -0,6
↓ take up 75%
↑ attendance 7%-15%
50%-75% school costs
Variants of Progresa
Free school meals
Merit scholarship
HIV tests $0 to $3
↑ attendance older kids
↑ attendance more if rewarded
↑ attendance by 31%
↑ test scores
↑ attendance 8.9%/1$
↑↑ attendance from $0.1 to 0
Adapted from Kremer and Holla, 2008
CCL: while generally consistent with human capital theory, some
evidence of peer effects and time inconsistent preferences.
If we had more of these trees…
Meta-regressions
• With large enough number of RCTs, one can estimate a regression.
yi  0  Ei ' 1  Pi ' 2  ( Ei .Pi )' 3  i
• E : Vector of context characteristics
– Pre-intervention level of outcome
– Site characteristics (e.g. rural/urban, drought prone/moisture reliable)
– Period characteristics (e.g. economic growth)
• P : Vector of project characteristics
– Targeting (e.g. gender targeting, geographical targeting)
– Intensity (e.g. per capita size of project, scale: national or local)
– Modalities (e.g. free vs co-payment, type of condition, who implemented)
• Problem : need enough (project-level) observations to run regression
– Replication problems with RCTs (public goods, resources, publications etc.)
There are other type of trees
Project evaluations
•
Habicht and Vaughan (1999):
– Adequacy: « Did the expected change occur »
– Plausibility: « Did the program seem to have an effect above and beyond external
influence »?
– Probability: « Did the program have an effect with probability <x% »
•
Donors collect independent information on projects effectiveness
–
–
–
–
–
–
•
WB: 25% (~70 / year) evaluated by non-project staff (6 weeks with field mission)
EBRD: 76% projects independently evaluated since 2003
UNDP: All projects greater than $1 million until 1999 independently evaluated
ADB: 40% independently evaluated
AFD: all projects evaluated by external consultants, in field (~70 / year)
…
Different type of trees…
– Impact most often appreciated, no specific data collection
– Sometimes scaled (satisfactory, non-satisfactory etc.), sometimes not.
•
Generally low use.
There are other type of trees
Recent meta-evaluation
WB
(planned)
ADB
(planned)
ADB
(actual)
0,064
[0,090]
-0,585
[0,54]
-0,837
[0,49]
% funding
-0,123
[0.038]
Length
0,0054
[0,016]
0,01
[0,016]
-0,0705
[0,038]
664
0,26
664
0,24
137
0,18
n
Adjusted R2
***
WB
(actual)
*
*
-0,0722
[0,035]
136
0,2
Dependent var: outcome of individual projects (satisfactory/unsatisfactory scale)
Other controls (dummies): sector, country, year of aproval, year of closing
Source: Banerjee and He, 2008
There are other type of trees
Measurement error problems (cf. Hyslop and Imbens, 2001)
• Classical: Error is independent from true value (CME)
– E.g. error due to imprecise measurement tool
• Optimal prediction error (OPE(1)): error independent from reported
value
– Agent reporting the data is fully aware of the imprecision of his/her tool.
Agent reports his/her best estimate, given his/her information set.
• Critical:agent’s awareness.
– If he/she is aware of not having the exact information, he/she will
understand the question « what is the value of X » as « what is your
best guess ».
– Knowing it helps infer the type error and associated bias.
• Important: correlation of errors in outcome variables and
independent variables. Lower bias if no correlation
There are other type of trees
Measurement errors and biases
If no correlation between measurement erros in dependent and independent
variables
Scenario
Classical
OPE (1)
OPE (2)
1
No error
no bias
no bias
no bias
2
Error in regressor only
towards zero
no bias
away from zero
3
Error in outcome only
no bias
towards zero
no bias
4
Error in both
towards zero
towards zero
away from zero
Source: Hyslop & Imbens, 2001
 First best: no bias
 Conservative position: avoid the « away from zero »
 Avoid correlation of errors b/w outcome and regressors
There are other type of trees
What type of errors could we have?
• Outcome variables
– Mainstreaming: no bias in the projects selected to be evaluated,
and raise number of observations
– Comparability (across projects, across agencies): all measures
carry same meaning
– Credibility: independence and trained to the typical problems of
impact assesment (missing data)
 OPE(1)
• Independent variables
– Initial level through administrative statistics ( CME)
– Context typologies (e.g. development domains (Chamberlin,
Pender, Yu, 2005), micro regions (Torero, 2007) ( CME)
– Project design: no error
Mixing the trees…
• Use RCTs to test unbiasedness: is
 significant?
yi  0  Ei ' 1  Pi ' 2  ( Ei .Pi )' 3  Ts  i
• Weak correlation between type and project is necessary if
parameters are to be identified.
 A number of randomly selected projects to be evaluated with RCTs.
• Use RCTs to test prediction performance of meta-analysis
 Requires that a large number of RCTs be implemented as well.
Recommendations
• Systematic evaluation of projects
• Harmonize outcome indicators
– Harmonize with RCTs
– Train evaluator to problem of impact evaluation (missing
counterfactual) and encouragment judgement-based corrections
– Define ‘quality’ standards
• Harmonize project-level indicators
• Global dataset of environmental indicators
• Centralize information
Conclusion
1. Pure statistical learning unlikely. But with somewhat of a theory of
what similar means, meta evaluations can be used for tests.
 One of the tools towards external validity
2. Short cut evidence can be used to raise the number of observations,
although at cost of potential biases
3. Under certain conditions, biases may be well understood, and metaevaluation results informative.
4. Overcoming these biases requires coordination.
Thank you