Inferring Causal Graphs - SFU Computing Science

Download Report

Transcript Inferring Causal Graphs - SFU Computing Science

Inferring Causal Graphs
Computing 882
Simon Fraser University
Spring 2002
Applications of Bayes
Nets (I)
• Windows Office “Paper Clip.”
• Bill Gates: “The competitive
advantage of Microsoft lies in
our expertise in Bayes Nets.”
• UBC Intelligent Tutoring
System (ASI X-change).
Applications of Bayes
Nets (II)
• University Drop-outs: Search
program Tetrad II says that
higher SAT score would lead to
lower drop-out rate. Carnegie
Mellon uses this to reduce its
drop-out rates.
• Tetrad II recalibrates Mass
Spectrometer on earth satellite.
• Tetrad II predicts relation
between corn exports and
exchange rates.
Bayes Nets: Basic
Definitions
• Defn: A and B are independent
iff P(A and B) = P(A) x P(B).
• Exercise: Prove that A and B are
independent iff P(A|B) = P(A).
• Thus independence implies
irrelevance.
Independence Among
Variables
• Let X,Y,Z be random variables.
• X is independent of Y iff
P(X=x| Y=y) = P(X=x) for all
x,y s.t. P(Y=y) > 0.
• X is independent of Y given Z
iff P(X=x|Y=y,Z=z) = P(Z=z)
for all y,z s.t. P(Y=y and Z=z)
>0.
• Notation: (X Y|Z).
• Intuitively: given information Z,
Y is irrelevant to X.
Axioms for
Informational Relevance
• Pearl (2000), p.11. It’s possible
to read the  symbol as
“irrelevant”. Then we can
consider a number of axioms for
 as axiomatizations of
relevance, for example:
• Symmetry: if (X  Y|Z) then
(Y  X|Z).
• Decomposition: if (X  YW|Z)
then (X  Y|Z).
Markovian Parents
• In constructing a Bayes net, we look
for “direct causes” – variables that
“immediately determine” the value
of another value. Such direct causes
“screen off” other variables.
• Formally: Let an ordering of
variables X1, …, Xn be given.
Consider Xj. Let PA be any subset of
X1,…,Xj-1. Suppose that P(Xj|PA) =
P(Xj|X1,..,Xj) and that no subset of
PA has this property. Then PA forms
the Markovian parents of Xj.
Markovian Parents and
Bayes Nets
• Given an ordering of variables, we
can construct a causal graphs by
drawing arrows between Markovian
parents and children. Note that
graphs are suitable for drawing the
distinction between “direct” and
“intermediate” causes.
• Exercise: For the variables in figure
1.2, construct a Bayes net in the
given ordering.
• Exercise: Construct a Bayes net
along the ordering (X5, X1, X3, X2,
X4).
Independence in Bayes
Nets
• Note how useful irrelevance
information is – think of a
Prolog-style logical database.
• A typical problem: Given some
information Z, and a query
about X, is Y relevant to X?
• For Bayes nets, the d-separation
criterion is a powerful answer.
d-separation
•
•
•
In principle, information can
flow along any path between
two variables X and Y.
Provisos: A path is blocked by
any collider.
Conditioning on a node
reverses its status.
–
–
Conditioning on non-collider
makes it block.
Conditioning on collider or its
descendant makes it unblocked.
d-separation
characterizes
independence
• If X,Y d-separated by Z in a
DAG G, then (X Y|Z) in all
probability distributions
compatible with G.
• If X,Y not d-separated by Z in a
DAG G, then not [(X Y|Z) in
all probability distributions
compatible with G.].
Observational
Equivalence
• Suppose we can observe the
probabilities of various
occurrences (rain vs. umbrellas,
smoking vs. lung cancer etc.).
• How does prob constrain graph?
• Two causal graphs G1,G2 are
compatible with the same probs
iff.
– G1 has the same adjacencies as
G2 and the same v-structures
(basically, colliders).
Observational
Equivalence: Examples
(I)
• In sprinkler network, cannot tell
whether X1 -> X2 or vice versa.
But can tell that X2 -> X4 and
X4 -> X5.
• General note: You cannot
always tell in machine learning
what the correct hypothesis is
even if you have all possible
data -> need more assumptions
or other kinds of data.
Observational
Equivalence: Examples
(II)
• Vancouver sun, March 29, 2002.
“Adolescents …. Are more likely to
turn to violence in their early
twenties if they watch more than an
hour of television a day… The team
tracked more than 700 children and
took into account the “chicken and
egg” question: Does watching
television cause aggression or do
people prone to aggression watch
more television?”
[Science, Dr. Johnson, Columbia U.]
Two Models of
Aggressive behaviour
Disposition to aggression
TV watching
Violent behavour
Disposition to aggression
TV watching
Violent behavour
Are these two graphs
observationally distinguishable?
Minimal Graphs
• A graph G is minimal for a
probability distribution P iff
– G is compatible with P, and
– no subgraph of G is compatible
with P.
• Example: not minimal if A
{B,C,D}
A
C
B
D
Note on minimality
• Intuitively, minimality requires that
you add an edge between A and B
only if there is some dependence
between A and B.
• In statistical tests, dependence is
observable but independence is not.
• So minimality amounts to “assume
independence until dependence is
observed”.
• That is exactly the strategy for
minimizing mind changes! (“assume
reaction is impossible until
observed”).
Stable Distributions
• A distribution P is stable iff
there is a graph G such that (X
 Y |Z) in P iff X and Y are dseparated given Z in G.
• Intuitively, stability rules out
“exact counterbalance”: two
forces both having a causal
effect but cancelling out each
other exactly in every
circumstance.
Inferring Causal
Structure: The IC
Algorithm
•
•
•
Assume a stable probability
distribution P.
Find a minimal graph for P
with as many edges directed
as possible.
General idea: First find
variables that are “directly
causally related”. Connect
those. Add arrows as far as
possible.
Inferring Causal
Structure: The IC
Algorithm
1. For each pair of variables X and Y,
look for a “screen off” set S(X,Y)
s.t. X  Y| S(X,Y) holds. If there
is no such set, add an undirected
edge between X and Y.
2. For each pair X,Y with a common
neighbour Z, check if Z is part of a
“screening off” set S(X,Y). If not,
make Z a common consequence of
X,Y.
3. Orient edges without creating
cycles or v-structures.
Rules for Orientation
•
•
•
•
Given a  b, b – c add b c if a,c
are not linked (no new collider).
Given a c b, a – b add a  b
(no cycle).
Given a – c d and c d  b and a
– b add a  b if c,d are not linked
(no cycle + no new collider).
Given a – c b and a – d b and a
– b add a  b if c,d are not linked
(no cycle + no new collider).