Transcript Z - UCLA
ROBUSTNESS OF
CAUSAL CLAIMS
Judea Pearl
Computer Science Department
UCLA
www.cs.ucla.edu/~judea
ROBUSTNESS:
MOTIVATION
Genetic Factors (unobserved)
u
x
Smoking
a
y
Cancer
In
linear
systems:
y = on
ax cancer
+e
The
effect
of smoking
is, in general,
a
is non-identifiable.
non-identifiable
(from observational studies).
ROBUSTNESS:
MOTIVATION
Z
Price of
Cigarettes
Genetic Factors (unobserved)
u
b
a
y
x
Smoking
Cancer
Z – Instrumental variable; cov(z,u) = 0
a is identifiable
R yz a b
Rxz b
a
R yz
Rxz
ROBUSTNESS:
MOTIVATION
Z
Price of
Cigarettes
Genetic Factors (unobserved)
u
b
a
x
Smoking
y
Cancer
Problem with Instrumental Variables:
The model may be wrong!
R yz
R yz ab
a
Rxz
ROBUSTNESS:
MOTIVATION
Z1
Price of
Cigarettes
Z2
Peer
Pressure
Genetic Factors (unobserved)
u
b
a
g
y
x
Smoking
Cancer
Solution: Invoke several instruments
a1
R yz1
Rxz1
Surprise: a1 = a2
a2
R yz2
Rxz2
model is likely correct
ROBUSTNESS:
MOTIVATION
Z1
Price of
Cigarettes
Z2
Peer
Pressure
Genetic Factors (unobserved)
u
b
a
g
x
Smoking
y
Cancer
Z3
Anti-smoking Legislation
Zn
Greater surprise: a1 = a2 = a3….= an = q
Claim a = q is highly likely to be correct
ROBUSTNESS:
MOTIVATION
Genetic Factors (unobserved)
u
x
Smoking
a
y
Cancer
s
Symptom
Symptoms do not act as instruments
a remains non-identifiable
Why? Taking a noisy measurement (s) of an
observed variable (y) cannot add new information
ROBUSTNESS:
MOTIVATION
Genetic Factors (unobserved)
Sn
u
S2
a
x
y
Smoking
Cancer
S1
Symptom
Adding many symptoms does not help.
a remains non-identifiable
ROBUSTNESS:
MOTIVATION
Given a parameter a in a general graph
a
x
y
Find if a can evoke an equality surprise
a1 = a2 = …an
associated with several independent estimands of a
Formulate: Surprise, over-identification, independence
Robustness: The degree to which a is robust to violations
of model assumptions
ROBUSTNESS:
FORMULATION
Bad attempt:
if:
f1, f2:
Parameter a is robust (over identifies)
a f1()
a f 2 ()
Two distinct functions
if model induces constraint g () 0, then
a f () t1[ g ()] f () t2[ g ()]
ti [ g ()] are distinct.
ROBUSTNESS:
FORMULATION
ex
ey
b
x
Ryx = b
Rzx = bc
Rzy = c
ez
x = ex
y = bx + ey
z = cy + ez
c
y
z
(b)
b R yx
b Rzx / Rzy
(c)
c Rzy
c Rzx / R yx
constraint:
y → z irrelvant to derivation of b
Rzx R yx Rzy
RELEVANCE:
FORMULATION
Definition 8
Let A be an assumption embodied in model M,
and p a parameter in M. A is said to be relevant
to p if and only if there exists a set of assumptions
S in M such that S and A sustain the identification
of p but S alone does not sustain such
identification.
Theorem 2
An assumption A is relevant to p if and only if A is a
member of a minimal set of assumptions sufficient
for identifying p.
ROBUSTNESS:
FORMULATION
Definition 5 (Degree of over-identification)
A parameter p (of model M) is identified to
degree k (read: k-identified) if there are k
minimal sets of assumptions each yielding a
distinct estimand of p.
ROBUSTNESS:
FORMULATION
b
c
x
y
Minimal assumption sets for c.
x
c
y
G1
z
x
c
y
z
c
z x
y
G3
G2
Minimal assumption sets for b.
x
b
y
z
z
FROM MINIMAL ASSUMPTION SETS
TO MAXIMAL EDGE SUPERGRAPHS
FROM PARAMETERS TO CLAIMS
Definition
A claim C is identified to degree k in model M (graph
G), if there are k edge supergraphs of G that permit the
identification of C, each yielding a distinct estimand.
e.g., Claim: (Total effect) TE(x,z) = q
x
y
TE(x,z) = Rzx
z x
x
y
y
z
TE(x,z) = Rzx Rzy ·x
z
CONCLUSIONS
1. Formal definition to ROBUSTNESS of causal
claims:
“A claim is robust when it is insensitive to
violations of some of the model assumptions”
2. Graphical criteria and algorithms for computing
the degree of robustness of a given causal
claim.