Estimating Causal Effects with Experimental Data - CERGE-EI

Download Report

Transcript Estimating Causal Effects with Experimental Data - CERGE-EI

Estimating Causal Effects with
Experimental Data
Some Basic Terminology
• Start with example where X is binary
(though simple to generalize):
– X=0 is control group
– X=1 is treatment group
• Causal effect sometimes called treatment
effect
• Randomization implies everyone has
same probability of treatment
Why is Randomization Good?
• If X allocated at random then know that X
is independent of all pre-treatment
variables in whole wide world
• an amazing claim but true.
• Implies there cannot be a problem of
omitted variables, reverse causality etc
• On average, only reason for difference
between treatment and control group is
different receipt of treatment
Why is this useful?
An Example: Racial Discrimination
• Black men earn less than white men in US
LOGWAGE |
Coef.
Std. Err.
t
-----------+------------------------------BLACK | -.1673813
.0066708
-25.09
NO_HS | -.2138331
.0077192
-27.70
SOMECOLL |
.1104148
.0049139
22.47
COLLEGE |
.4660205
.0048839
95.42
AGE |
.0704488
.0008552
82.38
AGESQUARED | -.0007227
.0000101
-71.41
_cons |
1.088116
.0172715
63.00
• Could be discrimination or other factors unobserved by the
researcher but observed by the employer?
• hard to fully resolve with non-experimental data
An Experimental Design
• Bertrand/Mullainathan “Are Emily and
Greg More Employable Than Lakisha and
Jamal”, American Economic Review, 2004
• Create fake CVs and send replies to job
adverts
• Allocate names at random to CVs – some
given ‘black-sounding’ names, others
‘white-sounding’
• Outcome variable is call-back rates
• Interpretation – not direct measure of
racial discrimination, just effect of having a
‘black-sounding’ name – may have other
connotations.
• But name uncorrelated by construction
with other material on CV
The Treatment Effect
• Want estimate of:
E  yi X i  1  E  yi X i  0 
Estimating Treatment Effects: the
Statistics Course Approach
• Take mean of outcome variable in
treatment group
• Take mean of outcome variable in control
group
• Take difference between the two
• No problems but:
– Does not generalize to where X is not binary
– Does not directly compute standard errors
Estimating Treatment Effects: A
Regression Approach
• Run regression:
yi=β0+β1Xi+εi
• Proposition 2.2 The OLS estimator of β1 is an unbiased estimator of
the causal effect of X on y:
• Proof: Many ways to prove this but simplest way is perhaps:
– Proposition 1.1 says OLS estimates E(y|X)
– E(y|X=0)= β0 so OLS estimate of intercept is consistent estimate
of E(y│X=0)
– E(y|X=1)= β0+β1 so β1 is consistent estimate of E(y│X=1) E(y│X=0)
• Hence can read off estimate of treatment effect from coefficient on X
• Approach easily generalizes to where X is not binary
• Also gives estimate of standard error
Computing Standard Errors
• Unless told otherwise regression package will
compute standard errors assuming errors are
homoskedastic i.e.
• Even if only interested in effect of treatment on
mean X may affect other aspects of distribution
e.g. variance
• This will cause heteroskedasticity
• Heteroskedasticity does not make OLS
regression coefficients inconsistent but does
make OLS standard errors inconsistent
‘Robust’ Standard Errors
• Also called:
– Huber standard errors
– White standard errors
– Heteroskedastic-consistent standard errors
• Simple to use in practice e.g. in STATA:
. reg y x, robust
• Statistics course approach
– Get variance of estimate of mean of treatment and
control group
– Sum to give estimate of variance of difference in
means
Bertrand/Mullainathan:
Basic Results
Summary So Far
• Econometrics very easy if all data comes
from randomized controlled experiment
• Just need to collect data on
treatment/control and outcome variables
• Just need to compare means of outcomes
of treatment and control groups
• Is data on other variables of any use at
all?
– Not necessary but useful
Including Other Regressors
• Can get consistent estimate of treatment effect
without worrying about other variables
• Reason is that randomization ensures no
problem of omitted variables bias
• But there are reasons to include other
regressors:
–
–
–
–
–
Improved efficiency
Check for randomization
Improve randomization
Control for conditional randomization
Heterogeneity in treatment effects
The Uses of Other Regressors I:
Improved Efficiency
• Don’t just want consistent estimate of causal
effect – also want low standard error (or high
precision or efficiency).
• Standard formula for standard error of OLS
estimate of β is σ2/Var(X)
• σ2 comes from variance of residual in regression
– (1-R2)* Var(y)
• Include more variables and R2 rises – formal
proof (Proposition 2.4) a bit more complicated
but this is basic idea.
The Uses of Other Regressors II:
Check for Randomization
• Randomization can go wrong
– Poor implementation of research design
– Bad luck
• If randomization done well then W should
be independent of X – this is testable:
– Test for differences in W in treatment/control
groups
– Probit model for X on W
The Uses of Other Regressors III:
Improve Randomization
• Can also use W at stage of assigning
treatment
• Can guarantee that in your sample X and
W are independent instead of it being just
probabiliistic
• This is what Bertrand/Mullainathan do
when assigning names to CVs
The Uses of Other Regressors IV:
Adjust for Conditional
Randomization
• This is case where must include W to get
consistent estimates of treatment effects
• Conditional randomization is where
probability of treatment is different for
people with different values of W, but
random conditional on W
• Why have conditional randomization?
– May have no choice
– May want to do it (c.f. stratification)
An Example: Project STAR
.4
.3
.2
.1
Fraction in Treatment Group
.5
• Allocation of students to classes is random within schools
• But small number of classes per school
• This leads to following relationship between probability of treatment
and number of kids in school:
40
60
80
Number of Kids in School
100
120
Controlling for Conditional
Randomization
• X can know be correlated with W
• But, conditional on W, X independent of
other factors
• But must get functional form of relationship
between y and W correct – matching
procedures
• This is not the case with (unconditional)
randomization – see class exercize
Heterogeneity in Treatment Effects
• So far have assumed causal (treatment)
effect the same for everyone
• No good reason to believe this
• Start with case of no other regressors:
yi=β0+β1iXi+εi
• Random assignment implies X
independent of β1i
• Sometimes called random coefficients
model
What treatment effect to estimate?
• Would like to estimate causal effect for everyone
– this is not possible – Holland’s fundamental
problem of statistical inference
• Can only hope to estimate some average
• Average treatment effect:
ATE  E  1i   1
• Proposition 2.5: OLS estimates ATE
Observable Heterogeneity
• Full outcomes notation:
– Outcome if in control group:
y0i=γ0’Wi+u0i
– Outcome if in treatment group:
y1i=γ1’Wi+u1i
• Treatment effect is (y1i-y0i) and can be written as:
(y1i-y0i )=(γ1- γ0 )’Wi+u1i-u0i
• Note treatment effect has observable and unobservable
component
• Can estimate as:
– Two separate equations
– One single equation
Combining treatment and control
groups into single regression
• We can write:
yi  X i y1i  1  X i  y0i
• Combining outcomes equations leads to:
yi  X i  1 'Wi  u1i   1  X i   0 'Wi  u0i 
  0 'Wi   1   0  ' X iWi  u0i  X i  u1i  u0i 
• Regression includes W and interactions of W
with X – these are observable part of treatment
effect
• Note: error likely to be heteroskedastic
Bertrand/Mullainathan
• Different treatment effect for high and low quality
CVs:
Units of Measurement
• Causal effect measured in units of
‘experiment’ – not very helpful
• Often want to convert causal effects to
more meaningful units e.g. in Project
STAR what is effect of reducing class size
by one child
Simple estimator of this would be:

E  y X  1  E  y X  0 
E  S X  1  E  S X  0 
• where S is class size
• Takes the treatment effect on outcome variable
and divides by treatment effect on class size
• Not hard to compute but how to get standard
error?
IV Can Do the Job
• Can’t run regression of y on S – S influenced by
factors other than treatment status
• But X is:
– Correlated with S
– Uncorrelated with unobserved stuff (because of
randomization)
• Hence X can be used as an instrument for S
• IV estimator has form (just-identified case):
1
ˆ
 IV   X ' S  X ' y
The Wald Estimator
• This will give estimate of standard error of
treatment effect
• Where instrument is binary and no other
regressors included the IV estimate of slope
coefficient can be shown to be:

E  y X  1  E  y X  0 
E  S X  1  E  S X  0 
Partial Compliance
• So far:
– in control group implies no treatment
– In treatment group implies get treatment
• Often things are not as clean as this
– Treatment is an opportunity
– Close substitutes available to those in control
group
– Implementation not perfect e.g. pushy parents
An Example: Moving to Opportunity
• Designed to investigate the impact of living in
bad neighbourhoods on outcomes
• Gave some residents of public housing projects
chance to move out
• Two treatments:
– Voucher for private rental housing
– Voucher for private rental housing restricted for use in
‘good’ neighbourhoods
• No-one forced to move so imperfect compliance
– 60% and 40% did use it
Some Terminology
• Z denotes whether in control or treatment
group – ‘intention-to-treat’
• X denotes whether actually get treatment
• With perfect compliance:
– Pr(X=1│Z=1)=1
– Pr(X=1│Z=0)=0
• With imperfect compliance:
1>Pr(X=1│Z=1)>Pr(X=1│Z=0)>0
What Do We Want to Estimate?
• ‘Intention-to-Treat’:
ITT=E(y|Z=1)-E(y|Z=0)
• This can be estimated in usual way
• Treatment Effect on Treated
TOT 
E  y Z  1  E  y Z  0 
E  X Z  1  E  X Z  0 
Estimating TOT
• Can’t use simple regression of y on Z
• But should recognize TOT as Wald estimator
• Can estimated by regressing y on X using Z as
instrument
• Relationship between TOT and ITT:
ITT  TOT .  Pr  X  1 Z  1  Pr  X  1 Z  0  
Most Important Results from MTO
•
•
•
•
No effects on adult economic outcomes
Improvements in adult mental health
Beneficial outcomes for teenage girls
Adverse outcomes for teenage boys
Sample results from MTO
• TOT approximately twice the size of ITT
• Consistent with 50% use of vouchers
IV with Heterogeneous Treatment
Effects (More Difficult)
• If treatment effect same for everyone then
TOT recovers this (obvious)
• But what if treatment effect
heterogeneous?
• No simple answer to this question
• Suppose model for treatment effect is:
yi  0  1i X i   i
Proposition 2.6
The IV estimate for the heterogeneous
treatment case is a consistent estimate of:
where:
E  i 1i 
p lim ˆ1, IV 
E  i 
 i  Pr  X i  1 Z i  1  Pr  X i  1 Z i  0 
the difference in the probability of treatment
for individual i when in treatment and control
group
Interpretation
• This is weighted average of treatment
effects
• ‘weights’ will vary with instrument –
contrast with heterogeneous treatment
case
• Some cases in which can interpret IV
estimate as ATE
How will IV estimate differ from
ATE
• IV is ATE if no correlation between β1i and πi
• Previous formula says depends on covariance of β1i and
πi
• In some situations can sign – but not always
• Example 1: no-one gets treatment in the absence of the
programme so
i
1i
• If those who get treatment when in the treatment group
are those with the highest returns then:
 p
Cov  i , 1i   Cov  p1i , 1i   0
• IV>ATE
• Example 2: treatment is voluntary for
those in the control group but compulsory
for those in the treatment group
• This implies
 i  1  p0i
• If those who get treatment in control are
those with highest returns then:
Cov  i , 1i   Cov  p0i , 1i   0
• IV<ATE
Angrist/Imbens Monotonicity
Assumption
• Case where IV estimate is not ATE
• Assume that everyone moved in same
direction by treatment – monotonicity
assumption
• Then can show that IV is average of
treatment effect for those whose behaviour
changed by being in treatment group
• They call this the Local Average
Treatment Effect (LATE)
Problems with Experiments
• Expense
• Ethical Issues
• Threats to Internal Validity
– Failure to follow experiment
– Experimental effects (Hawthorne effects)
• Threats to External Validity
– Non-representative programme
– Non-representative sample
– Scale effects
Conclusions on Experiments
•
•
•
•
Are ‘gold standard’ of empirical research
Are becoming more common
Not enough of them to keep us busy
Study of non-experimental data can
deliver useful knowledge
• Some issues similar, others different