Randomized Evaluations: Applications
Download
Report
Transcript Randomized Evaluations: Applications
Matching Methods
& Propensity Scores
Kenny Ajayi
October 27, 2008
Global Poverty and Impact Evaluation
Program Evaluation Methods
RANDOMIZATION (EXPERIMENTS)
QUASI-EXPERIMENTS
Regression Discontinuity
Matching, Propensity Score
Difference-in-Differences
Matching Methods
Creating a counterfactual
To measure the effect of a program, we want to
measure
E[Y | D = 1, X] - E[Y | D = 0, X]
but we only observe one of these outcomes for
each individual.
Evaluation Exercise
Argentine Antipoverty Program
Basic Idea
Match each participant (treated) with
one or more nonparticipants (untreated)
with similar observed characteristics
Counterfactual = matched comparison group
(i.e. nonparticipants with same characteristics as
participants)
Illustrate Example
Basic Idea
This assumes that there is no selection bias
based on unobserved characteristics
i.e. there is “selection on observables” and
participation is independent of outcomes once we
control for observable characteristics (X)
What might some of these unobserved
characteristics be?
Propensity Score
When the set of observed variables is
large, we match participants with non
participants using a summary measure:
the propensity score: the probability of participating
in the program (being treated), as a function of the
individual’s observed characteristics
P(X) = Prob(D = 1|X)
D indicates participation in project
X is the set of observable characteristics
Propensity Score
We maintain the assumption of selection
on observables:
i.e., assume that participation is independent of
outcomes conditional on Xi
E (Y|X, D = 1) = E (Y|X, D = 0)
if there had not been a program
This is false if there are unobserved
outcomes affecting participation
Evaluation Exercise
Argentine Antipoverty Program
Propensity Score Matching
1.
Get representative and comparable data
on participants and nonparticipants
(ideally using the same survey & a similar time period)
Propensity Score Matching
1.
Get representative and comparable data
on participants and nonparticipants
(ideally using the same survey & a similar time period)
2.
Estimate the probability of program
participation as a function of observable
characteristics
(using a logit or other discrete choice model)
Jalan and Ravallion (2003)
Propensity Score Matching
1.
Get representative and comparable data
on participants and nonparticipants
(ideally using the same survey & a similar time period)
2.
Estimate the probability of program
participation as a function of observable
characteristics
(using a logit or other discrete choice model)
3.
Use predicted values from estimation to
generate propensity score p(xi)
for all treatment and comparison group members
Propensity Score Matching
Match Participants: Find a sample of
4.
non-participants with similar p(xi)
Restrict samples to ensure common support
Common Support
Density
Density of
scores for nonparticipants
Density of scores for
participants
Region of
common
support
0
Low probability
of participating,
given X
1
Propensity score
16
High probability
of participating,
given X
Propensity Score Matching
Match Participants: Find a sample of
4.
non-participants with similar p(xi)
Restrict samples to ensure common support
Determine a tolerance limit:
how different can matched control individuals
or villages be?
Decide on a matching technique
Nearest neighbors, nonlinear matching,
multiple matches
Propensity Score Matching
Once matches are made, we can
calculate impact by comparing the
means of outcomes across
participants and their matches
5.
The difference in outcomes for each participant
and its match is the estimate of the gain due to the
program for that observation.
Calculate the mean of these individual gains to
obtain the average overall gain.
Possible Scenarios
Case 1: Baseline Data Exists
Arrive at baseline, we can match participants with
nonparticipants using baseline characteristics.
Case 2: No Baseline Data.
Arrive afterwards, we can only match participants
with nonparticipants using time-invariant
characteristics.
Extensions
Matching at baseline can be very useful:
For Estimation:
Use baseline data for matching then combine with other
techniques (e.g. difference-in-differences strategy)
Know the assignment rule, then match based on this rule
For Sampling:
Select non-randomized (but matched) evaluation samples
Be cautious of ex-post matching
Matching on variables that change due to program
participation (i.e. endogenous variables)
What are some invariable characteristics?
Key Factors
Identification Assumption
Selection on Observables: After controlling for
observables, treated and control groups are not
systematically different
Data Requirements
Rich data on as many observable characteristics as
possible
Large sample size (so that it is possible to find
appropriate match)
Additional Considerations
Advantages
Might be possible to do with existing survey data
Doesn’t require randomization/experiment/baseline data
Allows estimation of heterogeneous treatment
effects because we have individual counterfactuals,
instead of just having group averages.
Doesn’t require assumption of linearity
Additional Considerations
Disadvantages
Strong identifying assumption: that there are no
unobserved differences
Requires good quality data
but if individuals are otherwise identical, then why did some
participate and others not?
Need to match on as many characteristics as possible
Requires sufficiently large sample size
Need a match for each participant in the treatment group
Jalan & Ravallion (2003b)
Does piped water reduce diarrhea for
children in rural India?
Data
Rural Household Survey
No baseline data
Detailed information on:
Health status of household members
Education levels of household members
Household income
Access to piped water
What would you use for D, Y, and X?
Propensity Score Regression
Propensity Score Regression
Matching
Prior to matching, the estimated
propensity scores for those with and
without piped water were, respectively,
0.5495 and 0.1933.
After matching there was negligible
difference in the mean propensity scores
of the two groups
0.3743, for those with piped water
0.3742, for the matched control group
Results
“Prevalence and duration of diarrhea among
children under five in rural India are significantly
lower on average for families with piped water than
for observationally identical households without it.”
“However, our results indicate that the health gains
largely by-pass children in poor families, particularly
when the mother is poorly educated.”
Conclusion
Matching is a useful way to control for
OBSERVABLE heterogeneity
Especially when randomization or RD approach
is not possible
However, it requires relatively strong
assumptions