Slide 1 - World Bank

Download Report

Transcript Slide 1 - World Bank

Impact Evaluation
Causal Inference
Sebastian Galiani
November 2006
Motivation

The research questions that motivate
most studies in the health sciences
are causal in nature. For example:

What is the efficacy of a given drug in
Impact: given population? What fraction
of deaths from a given disease could have
been avoided by a given treatment or
policy?
HDN SAR WBI
2
Motivation

The most challenging empirical questions in
economics also involve causal-effect
relationships:

Does school decentralization improve schools quality?
HDN SAR WBI
3
Motivation

Interest in these questions is motivated
by:
 Policy concerns

Does privatization of water systems improve child
health?

Theoretical considerations

Problems facing individual decision
makers
HDN SAR WBI
4
Causal Analysis

The aim of standard statistical analysis, typified
by likelihood and other estimation techniques, is
to infer parameters of a distribution from
samples drawn of that distribution.

With the help of such parameters, one can:
1.
Infer association among variables,
2.
Estimate the likelihood of past and future events,
3.
As well as update the likelihood of events in light of new
evidence or new measurement.
HDN SAR WBI
5
Causal Analysis

These tasks are managed well by standard
statistical analysis as long as experimental
conditions remain the same.

Causal analysis goes one step further:

Its aim is to infer aspects of the data generation
process.

With the help of such aspects, one can deduce not only
the likelihood of events under static conditions, but also
the dynamics of events under changing conditions.
HDN SAR WBI
6
Causal Analysis

This capability includes:
1.
2.
3.

Predicting the effects of interventions
Predicting the effects of spontaneous changes
Identifying causes of reported events
This distinction implies that causal and
associational concepts do not mix.
HDN SAR WBI
7
Causal Analysis
The word cause is not in the vocabulary of standard
probability theory.

All Probability theory allows us to say is that two
events are mutually correlated, or dependent –
meaning that if we find one, we can expect to
encounter the other.

Scientists seeking causal explanations for
complex phenomena or rationales for policy
decisions must therefore supplement the
language of probability with a vocabulary for
causality.
HDN SAR WBI
8
Causal Analysis

Two languages for causality have been
proposed:
1.
Structural equation modeling (ESM)
(Haavelmo 1943).
2.
The Neyman-Rubin potential outcome
model (RCM) (Neyman, 1923; Rubin,
1974).
HDN SAR WBI
9
The Rubin Causal Model

Define the population by U. Each unit in U
is denoted by u.

For each u  U, there is associated a value
Y(u) of the variable of interest Y, which we
call: the response variable.

Let A be a second variable defined on U.
We call A an attribute of the units in U.
HDN SAR WBI
10

The key notion is the potential for exposing or
not exposing each unit to the action of a cause:

Each unit has to be potentially exposable to any
one of the causes.

Thus, Rubin takes the position that causes are
only those things that could be treatments in
hypothetical experiments.

An attribute cannot be a cause in an experiment,
because the notion of potential exposability does
not apply to it.
HDN SAR WBI
11

For simplicity, we assume that there are just two
causes or level of treatment.

Let D be a variable that indicates the cause to
which each unit in U is exposed:
t if unit u is exposed to treatment
D
c if unit u is exposed to control
In a controlled study, D is constructed by the
experimenter. In an uncontrolled study, it is
determined by factors beyond the experimenter’s
control.
HDN SAR WBI
12

The values of Y are potentially affected by the
particular cause, t or c, to which the unit is
exposed.

Thus, we need two response variables:
Yt(u), Yc(u)

Yt is the value of the response that would be
observed if the unit were exposed to t and

Yc is the value that would be observed on the
same unit if it were exposed to c.
HDN SAR WBI
13

Let D also be expressed as a binary
variable:
D = 1 if D = t and D = 0 if D = c

Then, the outcome of each individual can
be written as:
Y(U) = D Y1 + (1 – D) Y0
HDN SAR WBI
14

Definition: For every unit u treatment {Du = 1 instead of
Du = 0} causes the effect
u = Y1(u) – Y0(u)

This definition of a causal effect assumes that the
treatment status of one individual does not affect the
potential outcomes of other individuals.

Fundamental Problem of Causal Inference: It is
impossible to observe the value of Y1(u) and Y0(u) on the
same unit and, therefore, it is impossible to observe the
effect of t on u.

Another way to express this problem is to say that we
cannot infer the effect of treatment because we do not
have the counterfactual evidence i.e. what would have
happened in the absence of treatment.
HDN SAR WBI
15

Given that the causal effect for a single unit u
cannot be observed, we aim to identify the
average causal effect for the entire population
or for sub-populations.

The average treatment effect ATE of t (relative
to c) over U (or any sub-population) is given by:
ATE =E [Y1(u) – Y0(u)]
= E [Y1(u)] – E [Y0(u)]
   Y1  Y0
HDN SAR WBI
(1)
16

The statistical solution replaces the impossibleto-observe causal effect of t on a specific unit
with the possible-to-estimate average causal
effect of t over a population of units.

Although E(Y1) and E(Y0) cannot both be
calculated, they can be estimated.

Most econometrics methods attempt to construct
from observational data consistent estimates of
Y1 and Y0
HDN SAR WBI
17

Consider the following simple estimator
of ATE:
ˆ
ˆ
ˆ
  [Y1 | D  1] - [Y0 | D  0] (2)
 Note
that equation (1) is defined for the
whole population, whereas equation (2)
represents an estimator to be evaluated
on a sample drawn from that population
HDN SAR WBI
18

Let  equal the proportion of the population
that would be assigned to the treatment
group.

Decomposing ATE, we have:
   {D1}  (1   ) {D0}
   Y1  Y0  | D  1  (1   ) Y1  Y0  | D  0
   [ Y1 | D  1]  (1   ) [ Y1 | D  0]
 [Y
0
HDN SAR WBI
| D  1]  (1   ) [ Y0 | D  0]  Y1  Y0
19

If we assume that
[Y1 | D  1]  [Y1 | D  0] and [Y0 | D  1]  [Y0 | D  0]
   [ Y1 | D  1]  (1   ) [ Y1 | D  1]
 [Y
0
| D  0]  (1   ) [ Y0 | D  0]
  [ Y1 | D  1] - [ Y0 | D  0]

Which is consistently estimated by its sample
analog estimator:
ˆ
ˆ
ˆ
  [ Y1 | D  1] - [ Y0 | D  0]
HDN SAR WBI
20

Thus, a sufficient condition for the standard
estimator to consistently estimate the true ATE is
that:
[Y1 | D  1]  [Y1 | D  0] and [Y0 | D  1]  [Y0 | D  0]

In this situation, the average outcome under the treatment
and the average outcome under the control do not differ
between the treatment and control groups.

In order to satisfy these conditions, it is sufficient that
treatment assignment D be uncorrelated with the potential
outcome distributions of Y1 and Y2.

The principal way to achieve this uncorrelatedness is
through random assignment of treatment.
HDN SAR WBI
21

In most circumstances, there is simply no
information available on how those in the control
group would have reacted if they had received the
treatment instead.

This is the basis for an important insight into the
potential biases of the standard estimator (2).

After a bit of algebra, it can be shown that:
ˆ    [Y0 | D  1]  [Y0 | D  0]  (1   ) {D 1}   {D 0} 
HDN SAR WBI



Baseline Difference
Treatment Heterogeneity
22

This equation specifies the two sources of
biases that need to be eliminated from
estimates of causal effects from observational
studies.
1.
Selection Bias: Baseline difference.
Treatment Heterogeneity.
2.

Most of the methods available only deal with
selection bias, simply assuming that the
treatment effect is constant in the population
or by redefining the parameter of interest in
the population.
HDN SAR WBI
23
Treatment on the Treated

ATE is not always the parameter of interest.

In a variety of policy contexts, it is the average
treatment effect for the treated that is of
substantive interest:
TOT =E [Y1(u) – Y0(u)| D = 1]
= E [Y1(u)| D = 1] – E [Y0(u)| D = 1]
HDN SAR WBI
24
Treatment on the Treated

The standard estimator (2) consistently
estimates TOT if:
[Y0 | D  1]  [Y0 | D  0]
HDN SAR WBI
25
Structural Equation Modeling

Structural equation modeling was originally
developed by geneticists (Wright 1921) and
economists (Haavelmo 1943).
HDN SAR WBI
26
Structural Equations

Definition: An equation
y=βx+ε
(8)
is said to be structural if it is to be interpreted as follows:


In an ideal experiment where we control X to x and any
other set Z of variables (not containing X or Y) to z, the
value y of Y is given by β x + ε, where ε is not a function of
the settings x and z.
This definition is in the spirit of Haavelmo (1943), who
explicitly interpreted each structural equation as a statement
about a hypothetical controlled experiment.
HDN SAR WBI
27

Thus, to the often asked question, “Under what
conditions can we give causal interpretation to
structural coefficients?”

Haavelmo would have answered: Always!

According to the founding father of SEM, the
conditions that make the equation y = β x + ε
structural are precisely those that make the
causal connection between X and Y have no
other value but β, and ensuring that nothing
about the statistical relationship between x and
ε can ever change this interpretation of β.
HDN SAR WBI
28

The average causal effect: The average
causal effect on Y of treatment level x is
the difference in the conditional
expectations:
E(Y|X = x) – E(Y|X = 0)

In the context of dichotomous
interventions (x = 1), this causal effect is
called the average treatment effect
(ATE).
HDN SAR WBI
29
Representing Interventions

Consider the structural model M:
z = fz(w)
x = fx(z, )
y = fy(x, u)

We represent an intervention in the model through
a mathematical operator denoted d0(x).

d0(x) simulates physical interventions by deleting
certain functions from the model, replacing them
by a constant X = x, while keeping the rest of the
model unchanged.
HDN SAR WBI
30

To emulate an intervention d0(x0) that holds X constant (at X
= x0) in model M, replace the equation for x with x = x0, and
obtain a new model, Mx0
z = fz(w)
x = x0
y = fy(x, u)

The joint distribution associated with the modified model,
denoted P(z, y| d0(x0)) describes the post-intervention
(“experimental”) distribution.

From this distribution, one is able to assess treatment
efficacy by comparing aspects of this distribution at different
levels of x0.
HDN SAR WBI
31
Structural Parameters

Definition: The interpretation of a structural
equation as a statement about the behavior of Y
under a hypothetical intervention yields a simple
definition for the structural parameters.
The meaning of β in the equation y = β x + ε is
simply

  E[Y | d o (x)]
x
HDN SAR WBI
32
Counterfactual Analysis in Structural
Models

Consider again model Mxo. Call the solution of Y
the potential response of Y to x0.

We denote it as Yx0(u, , w).

This entity can be given a counterfactual
interpretation, for it stands for the way an
individual with characteristics (u, , w) would
respond, had the treatment been x0, rather than
the x = fx(z, ) actually received by the individual.
HDN SAR WBI
33

In our example,
Yx0(u, , w) = Yx0(u) = y = fy(x0, u)
•
This interpretation of counterfactuals, cast as solutions to
modified systems of equations, provides the conceptual
and formal link between structural equation modeling and
the Rubin potential-outcome framework.
•
It ensures us that the end results of the two approaches
will be the same.
•
Thus, the choice of model is strictly a matter of
convenience or insight.
HDN SAR WBI
34
References





Judea Pearl (2000): Causality: Models, Reasoning and
Inference, CUP. Chapters 1, 5 and 7.
Trygve Haavelmo (1944): “The probability approach in
econometrics”, Econometrica 12, pp. iii-vi+1-115.
Arthur Goldberger (1972): “Structural Equations Methods in
the Social Sciences”, Econometrica 40, pp. 979-1002.
Donald B. Rubin (1974): “Estimating causal effects of
treatments in randomized and nonrandomized
experiments”, Journal of Educational Psychology 66, pp.
688-701.
Paul W. Holland (1986): “Statistics and Causal Inference”,
Journal of the American Statistical Association 81, pp. 94570, with discussion.
HDN SAR WBI
35