Transcript - Bepress
Matt Bogard
Office of Institutional Research
Western Kentucky University
Graduate study in applied economics mathematical statistics, econometrics,
experimental design
Focus on applied micro-econometric applications
WKU Dept of Economics Faculty Research Seminar
applied micro-econometrics
Quasi-experimental designs are often
emphasized heavily in applied microeconometric settings
WKU Office of Institutional Research-applied
research and analytics
Build intuition and understanding of concepts related
to quasi-experimental designs and causal inference
A heuristic approach-abstract from many technical
details
Topics:
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
Potential Outcomes/Rubin Causal Model
Fundamental Problem of Causal Inference
Selection Bias
Conditional Independence Assumption
Unobserved Heterogeneity and Endogeneity
Matching
Propensity Score Methods
Instrumental Variables
Difference-in-Difference
Regression Discontinuity
Subjects are randomly assigned to a
treatment and control group
Subjects in each group are identical in all
respects except for the treatment assignment
A difference in means or outcomes
‘identifies’ the treatment effect
Ideally all other factors have been accounted
for in the experimental design
Random assignment of treatment is key
Not always possible in an IR setting
Based on (Angrist and Pischke,2008; Rubin,
1974; Imbens and Wooldridge, 2009, Klaiber
& Smith,2009)
Precise way to think about the implications of
RCE vs non-random treatment assignment
Yi =outcome of interest for individual ‘i’
di= treatment indicator (0,1)
Y0i= baseline potential outcome
Y1i = potential treatment outcome
E[Y1i-Y0i] = average treatment effect (ATE)
Causal effect of interest but can’t observe Y1i
& Y0i for the same individual
Reality forces us to compare outcomes for
different individuals E[Yi|di=1] - E[Yi|di=0]
that may or may not be comparable
Fundamental problem of causal inference!
Sometimes subjects self select their treatment vs. random
assignment
E[Yi|di=1] - E[Yi|di=0] = E[Y1i-Y0i] + {E[Y0i|di=1] - E[Y0i|di=0]}
Observed Difference
Causal Effect
Selection Bias
The observed effect or difference is equal to the population
average treatment effect (ATE) E[Y1i-Y0i] in addition to the
bracketed term which characterizes selection bias.
The treatment effect is confounded with selection bias in the
observed differences
E[Yi|di=1] - E[Yi|di=0] = E[Y1i-Y0i] + {E[Y0i|di=1] - E[Y0i|di=0]}
Selection Bias
i.e. if those that self select (di=1) have greater or less potential
(Y0i) than others (di=0) then we have selection bias
Or E[Y0i|di=1] ≠ E[Y0i|di=0]
Estimated treatment effects lead to erroneous conclusions
In a RCE E[Y0i|di=1] = E[Y0i|di=0] and the term for selection
bias goes away
Simple t-test
◦ SAS: PROC TTEST
◦ R: t.test()
Regression: Y =β0 + β1D+ β2 X + e
◦
◦
◦
◦
D = treatment indicator
X = vector of controls
SAS: PROC REG, GLM GLIMMIX, LOGISTIC other
R: lm()
(Rubin, 1973; Angrist & Pischke, 2008; Rosenbaum and
Rubin, 1983;Angrist and Hahn,2004)
Conditional on covariate comparisons may remove
selection bias
E[Yi|xi,di=1]- E[Yi|xi,di=0]= E[Y1i-Y0i|xi] or Y1i,Y0i ⊥ di| xi
Treatment assignment (di) and response (Y1i,Y0i) are
conditionally independent given covariates xi.
‘Selection on observables’
Motivation for matching
CIA implies balance on observed covariates, which
‘recreates’ a situation similar to a randomized experiment
where all subjects are essentially the same except for the
treatment (Thoemmes and Kim,2011)
Matching achieves ‘identification’
Achieved by comparing units with similar covariate values
and computing a weighted average based on the
distribution of covariates
ΣδxP(Xi=x)=E[Y1i-Y0i] = ATE
Regression is also a type of matching estimator (Angrist &
Pischke,2009)
Weights treatment by control contrasts by the conditional
variance of treatment in each cell
Y =β0 + β1 D+ β2 X + e
β1= E[Yi|xi,di=1]- E[Yi|xi,di=0]
"It's all about comparisons…It's a structured way of
computing average comparisons in data.“-Andrew Gelman
Relies on CIA – selection on observables
Matching can be cumbersome and difficult to
implement
Alternative: implement match based on an
probability of receiving treatment
Propensity Score: p(xi) = p(di=1|xi)
Rosenbaum and Rubin’s propensity score
theorem (1983):
If Y1i,Y0i ⊥di| xi
then Y1i,Y0i ⊥ di|p(xi)
Rosenbaum and Rubin (1984)
Treatment and control groups are stratified
or divided into groups/categories/bins of
propensity scores
Comparisons are made across strata and
combined to estimate an average treatment
effect
Can remove up to 90% of bias due to factors
related to selection using as few as five strata
(Rosenbaum and Rubin, 1984)
p(xi) typically estimated via Logistic
Regression but also support vector machines,
decision trees, and boosting algorithms
SAS: PROC LOGISTIC or SAS Enterprise Miner
Regression, Decision Tree, SVM, or Gradient
Boosting Nodes
R: glm() with link = “logit” or rpart() or
library(randomForest) and randomForest()
SAS: Matched comparisons typically made via
macros and array processing (see links in
appendix)
R: MatchiIt(),optmatch()
STATA: match, psscore
I have not used STATA or R for matching
Identification: Uses p(xi) as weights
to create a pseudo population such
that the distribution of covariates in
the population are independent of
treatment assignment. (Astin,2011).
Uses all data and less cumbersome
than matching
E[Y1i- Yi0] = 𝐸
Example:
𝑌𝑖 𝑑𝑖
𝑌 (1−𝑑𝑖 )
− 𝑖
𝑝(𝑥𝑖 )
1−𝑝(𝑥𝑖 )
= ATE
◦ di=0 & p(xi) = .75 then 1/(1-.75)
= 4 people
◦ di=1 & p(xi) = .75 then 1/.75 =
1.33 people
Impact of Increased Academic Intensity on Transfer Rates:
An Application of Matching Estimators to Student-Unit
Record Data William R. Doyle. Res High Educ (2009) 50:52–
72
Assessing the Effectiveness of a College Freshman Seminar
Using Propensity Score Adjustments. H. Clark and Nicole
L. Cundiff. Res High Educ (2011) 52:616–639
"Estimating the Causal Effect of Advising Contacts on Fall
to Spring Retention Using Propensity Score Matching and
Inverse Probability of Treatment Weighted Regression."
Matt Bogard. 2013.Working Paper
http://works.bepress.com/matt_bogard/25
Suppose we wish to Estimate: Y =β0 + β1D+ β2A+ ε
But we can’t observe A so we regress: Y = β0 + β1 D+ ε
Our estimator for β1 can be expressed as:
b = COV(Y,D)/VAR(D)
By substitution of Y = β0 + β1 D+ e we get:
b = COV(β0 + β1 D+ e,D)/VAR(D)
b = COV(β0,D)/VAR(D) + COV(β1 D,D)/VAR(D) + COV(e,D)/VAR(D)
β1 +COV(e,D)/VAR(D)
◦ If COV(e,D)= 0 then b is unbiased
◦ If we omit a variable like A from (1) then, to the extent that D and A are both
correlated, D becomes correlated with the error term COV(e,D)≠0
◦ OLS does not correctly estimate treatment effects
We may not be able to observe or measure ‘A’
Find Z that is correlated with D but uncorrelated with A such that
COV(Z,e) = 0
We can use Z to construct bIV= COV(Y,Z)/COV(D,Z)
If we substitute Y = β0 + β1 D+ e we get:
COV(β0 + β1D+ e,Z)/COV(D,Z) =
β1 + COV(e,Z)/ COV(D,Z)
By construction COV(e,Z)=0 so we get an unbiased estimate of β1
bIV= COV(Y,Z)/COV(D,Z)
bIV gives us the proportion of ‘quasi-experimental’
variation in D that is related to Y.
2SLS:
◦ DEST = β0 + β1 Z+ e (stage 1)
◦Y
= β0 + βIV DEST + e (stage 2)
Note: IV provide a LATE
SAS: Y =β0 + β1D+ β2X + ε (X = controls)
PROC SYSLIN DATA= DAT1 2sls;
ENDOGENOUS D;
INSTRUMENTS Z X ;
MODEL X = Z X;
MODEL Y = D C;
TITLE "IV Estimate";
RUN;
R:
STATA: ivreg2 ?
summary(ivreg(Y ~ D + X|X +Z, data = DAT1))
Using Instrumental Variables to Account for Selection
Effects in Research on First-Year Programs. Gary R. Pike,
Michele J. Hansen and Ching-Hui Lin.Research in Higher
Education
Volume 52, Number 2, 194-214
◦ Z = (summer bridge, selected major)
"Estimating the Causal Effect of Advising Contacts on
Fall to Spring Retention Using Propensity Score
Matching and Inverse Probability of Treatment Weighted
Regression." Matt Bogard. 2013.Working Paper
http://works.bepress.com/matt_bogard/25
◦ Updated with ‘dorm’ and ‘adviser’ as possible instruments
Difference-in-Difference Estimate of Treatment
Effects
y = b0 + b1D + b2 t+b3 D*t + e
0.8
0.7
A*
0.6
A
0.5
0.4
0.3
Treatment
effect = b3
DD estimators
measure the
departure from an
inter-temporal trend
and use this to
identify treatment
effects
A
B
Normal
Difference
B
0.2
0.1
0
2011
(t = 0)
2012
(t = 1)
• In absence of treatment the difference between control (B) and treatment (A) groups ‘fixed’ over
time.
• DD estimators are a special type of fixed effects estimator.
2011
2012
Group
(t = 0)
(t = 1)
Difference
No Treatment Effect A (D=1)
0.4
0.5
0.1
Treatment Effect
A* (D=1)
0.4
0.7
0.3
Control
B (D=0)
0.3
0.4
0.1
Difference in Difference:
0.2
Group
Treatment
Control
2011
(t = 0)
A (D=1) b0 + b1
B (D=0) b0
Difference in Difference:
2012
(t = 1)
Difference
b0 + b1 +
b2 + b3 b2 + b3
b0 + b2 b2
b3
Goodman, J. Who merits financial aid?:
Massachusetts' Adams Scholarship. Journal of
Public Economics. Volume 92, Issues 10–11,
October 2008, Pages 2121–2131
Used difference-in-difference & Regression
Discontinuity
SAS or R: Regression models with interactions
using PROC REG, GLM, GLIMMIX,LOGISTIC, or
lm() with R
• Treatment assignment is equivalent to random assignment within
the neighborhood of the cutoff (Lee & Lemieux,2010).
• Y = f(x) +ρ D + e where f(x) may be a pth order polynomial
• Even more complicated local linear regressions
• Comparisons of outcomes in the neighborhood of X0 provide
estimates of the treatment effect ρ that does not depend on an
exactly correct specification of the functional form of E[Y|X]
(Angrist &Pischke, 2009)
Sharp: subjects are assigned to treatment and
control groups based on the value of the
observed cutoff X0
Fuzzy:subjects with values of X near the
cutoff in both treatment and control groups
‘mis-assignment’
Non-compliance
Selection on observables and unobservables
Discontinuity serves as an instrument for treatment
status
Note: RD provides a LATE
◦
◦
◦
◦
Goodman, J. Who merits financial aid?: Massachusetts'
Adams Scholarship. Journal of Public Economics. Volume
92, Issues 10–11, October 2008, Pages 2121–2131
RD estimates were larger than DD
van der Klaauw (2002). Estimating the effect of financial
aid offers on college enrollment: A regressiondiscontinuity approach. International Economic
Review. 43(4), 1249–1287.
Moss, B.G and William H. Y. (2006) Shaping Policies
Related to Developmental Education: An Evaluation Using
the Regression-Discontinuity Design. Educational
Evaluation and Policy Analysis , Vol. 28, No. 3 (Autumn),
pp. 215-229
SAS: Basic functional forms Y = f(x) +ρ D + e
PROC REG, GLM, GLIMMIX,LOGISTIC
R: lm() or more advanced:
RDestimate(y~x,<data>, cutpoint = <Xo>)
STATA: more advanced functions ‘rdrobust’,
‘rdbwselect’, ‘rdbinselect’
Slides:
http://works.bepress.com/matt_bogard/27
Paper: A guide to quasiexperimental designs
http://works.bepress.com/matt_bogard/24
Toy data sets and SAS/R code (forthcoming)
www.econometricsense.blogspot.com
Presentation Slides & More Discussion: www.econometricsense.blogspot.com
Elizabeth Stuart’s Matching Methods Page:
http://www.biostat.jhsph.edu/~estuart/propensityscoresoftware.html
Example SAS Paper with Sample Code for Propensity Score Matching:
◦ Lanehart, R.E,de Gil, P.R, Kim,E.S, Bellara, A.P.,Kromney,J.D. & Lee, R.S.
(2012). Propensity score analysis and assessment of propensity score
approaches using SAS® procedures. Paper 314-2012. SAS® Global Forum
2012 Proceedings. North Carolina; SAS® Institute
Other Suggested Reading:
◦ Stephanie Riegg Cellini. Causal Inference and Omitted Variable Bias in
Financial Aid Research: Assessing Solutions The Review of Higher
Education Spring 2008, Volume 31, No. 3, pp. 329–354
◦ Angrist, J. D. & Pischke J. (2009). Mostly harmless econometrics: An
empiricist's companion. Princeton University Press.
◦ Matt Bogard. 2013. "A Guide to Quasi-Experimental Designs"
Available at: http://works.bepress.com/matt_bogard/24
◦