Monday slides
Download
Report
Transcript Monday slides
Causal learning and
modeling
David Danks
CMU Philosophy & Psychology
2014 NASSLLI
High-level overview
Monday:
History
of causal inference
Basic representation of causal structures
Tuesday:
Inference
& reasoning using graphical models
Interventions in causal structures
High-level overview
Wednesday:
Basic
principles of search & causal discovery
Thursday:
Challenges
Both
to causal discovery, and responses
principled and real-world
High-level overview
Friday: One of two possibilities:
Singular
/ actual causation & counterfactuals (in the
causal graphical model framework)
Recent advances in causal learning & inference
Decided
by a vote at end-of-class tomorrow (Tues)
Structure & assumptions
Mix of lecture & (group) problem-solving, so if you
have questions/uncertainty,
Ask!
If
you’re confused, then someone else probably is too…
Assuming basic knowledge of probabilities
Focus
is on conceptual/foundational issues, not the
technical details
But ask if you want to know more about those details!
A BRIEF HISTORY OF CAUSAL
DISCOVERY
“Big Picture” (very roughly)
Greeks - 1750: Unhelpful platitudes
1750 - 1950: Practical successes
1950 - present: Computers + Formal models =
principled methods
Aristotle
384-322 BC
Trying to answer:
“Why does X have A?”
Four types of ‘cause’
Formal:
Because of its structure
Material: Because of its composition
Efficient: Because of its development
Final: Because of its purpose
But no systematic theory of inference
Francis Bacon
1561-1626
Novum Organum (1620)
For
any phenomenon, construct:
The
table of presence (tabula praesentiae)
The table of absence (tabula absentiae)
The table of degrees (tabula graduum)
The
cause of the phenomenon is the set of properties
that explains every case on each of the three tables
John Stuart Mill
1806-1873
System of Logic (1843)
Algorithmic
form of Bacon’s
method (though unattributed)
Method
of agreement
Method of difference
Method of concomitant variation
David Hume
1711-1776
Causal inference cannot be
done using deduction
It
is always logically possible that future “causes” will
not be followed by the effect
Actually a general argument about induction
But we do it by “custom or habit”
Had
an evolutionary justification, but no framework in
which to express it
Responses to Hume’s skepticism
Hume’s arguments were quite influential in
philosophical circles
And
still matter in present-day philosophy
But in the sciences, people were starting to find
methods that (sometimes) gave answers that at least
seemed right…
Regression (Least Squares)
18th c. astronomy: find the “best” values for 6
unknowns given 75 observations
Euler
(1748)
Failed
due to computational intractability
Legendre
(1805)
Developed
Gauss
the method of least squares
(1795 / 1809)
Independent
(earlier, unpublished) discovery & justification
Still the most common causal inference method…
Growth of statistics
Early theory of statistics emerges from probability
theory throughout the 1800s
1822
1911
Galton
1749
Laplace
1796
1827
Pearson
1857
Quetelet
1874
1863
Spearman
Yule
1871
1800
1936
1900
1945
1951
Ronald A. Fisher
1890-1962
Essentially the father of modern
statistics, and developed:
An
array of statistical tests
An analysis of various experimental designs
The standard statistical and methodological reference
texts for a generation of scientists
Sewall Wright
1889-1988
Path analysis
Graphs
encode high-level
structure, and then regression
can be used to estimate parameters
By mid-20th c., it had been adopted by a number of
economists and sociologists
But no search procedures were provided
Have
to know the high-level structure
Causal graphical models
Developed by statisticians, computer scientists, and
philosophers
Dawid,
Spiegelhalter, Wermuth, Cox, Lauritzen, Pearl,
Spirtes, Glymour, Scheines
Represent both qualitative and quantitative aspects
of causation
REPRESENTING CAUSAL
STRUCTURES
Qualitative representation
We want a representation that captures many
qualitative features of causality
Qualitative representation
We want a representation that captures many
qualitative features of causality
Causation
occurs among variables ⇒
One node per variable
Qualitative representation
We want a representation that captures many
qualitative features of causality
Causation
occurs among variables ⇒
One node per variable
Food
Eaten
Exercise
Weight
Metabolism
Qualitative representation
We want a representation that captures many
qualitative features of causality
Asymmetry
of causation ⇒
Need an asymmetric connection in the graph
Food
Eaten
Exercise
Weight
Metabolism
Qualitative representation
We want a representation that captures many
qualitative features of causality
Asymmetry
of causation ⇒
Need an asymmetric connection in the graph
Food
Eaten
Exercise
Weight
Metabolism
Qualitative representation
We want a representation that captures many
qualitative features of causality
No
(immediate) reciprocal causation ⇒
No cycles (without explicit temporal indexing)
Food
Eaten
Exercise
Weight
Metabolism
Qualitative representation
We want a representation that captures many
qualitative features of causality
No
(immediate) reciprocal causation ⇒
No cycles (without explicit temporal indexing)
Food
Eaten
Exercise
Food
Eaten
Weight
Metabolism
Time t
Exercise
Weight
Metabolism
Time t+1
Directed Acyclic Graphs
More precise: DAG G = <V, E>
V
= set of nodes (for variables)
E = set of edges (i.e., ordered pairs of nodes)
Path π = sequence of adjacent edges
Directed
path = path with all edges same direction
Acyclicity: No directed path from node A to itself
In general: We use genealogical & topological
language to describe graphical relationships
Quantitative representation
DAGs alone can represent “A causes B”…
but not “strength” or “form” of causation
Need
to represent the relationships between the various
variables states
Exact quantitative representation will depend on the
type of variables being represented
Bayesian networks
All variables are discrete/categorical
Represent quantitative causation using a joint
probability distribution
I.e.,
a specification of the probability of any
combination of variable values, such as:
P(E=Hi
& FE=Lo & M=Hi & W=Hi) = 0.001;
P(E=Hi & FE=Lo & M=Hi & W=Lo) = 0.03;
etc.
Note: Nothing inherently Bayesian about Bayes nets!
Structural Equation Models (SEMs)
All variables are continuous/real-valued
Represent quantitative causation using systems of
linear equations
For
example:
Exercise = a1FE + a2M + a3W + εE_noise
FE = b1E + b2M + b3W + εFE_noise
etc.
Connecting the pieces
DAG-based graphical model:
Qualitative
Quantitative
???
P(X) =
P(X1) P(X2 | X1)
P(X3 | X1) P(X4 | X1,X2)
Connecting the pieces
Causal Markov assumption:
Variables
are independent of their non-effects
conditional on their direct causes
Use
the qualitative graph to constrain the quantitative
relationships
Encodes
Given
the intuition of “screening off”
the values of the direct causes, learning the value of
a non-effect doesn’t help me predict
Connecting the pieces
Markov assumption for Bayes nets ⇒
Markov
factorization of P(X1, X2, …):
Connecting the pieces
Markov assumption for Bayes nets:
Markov
factorization of P(X1, X2, …):
Example:
Food
Eaten
Exercise
⇒
Weight
Metabolism
P(E, FE, W, M) =
P(E) * P(FE | E) *
P(M | E) *
P(W | M, FE)
Connecting the pieces
Markov assumption for SEMs:
Markov
factorization of joint probability density:
Connecting the pieces
Markov assumption for SEMs:
Markov
factorization of joint probability density:
Example:
Food
Eaten
Exercise
⇒
Weight
Metabolism
E = εE_noise
FE = a1E + εFE_noise
M = b1E + εM_noise
W = c1FE + c2M + εC_noise
Connecting the pieces
Causal Faithfulness assumption
The
only independencies are those predicted by the
Markov assumption
Uses
the quantitative relations to constrain the qualitative
graph
Implication: No exactly counter-balancing causal paths
Exercise → Food Eaten → Weight
Exercise → Metabolism → Weight
do not exactly offset one another
Implication:
and
No perfectly deterministic relationships
In particular, no variable is a mathematical function of others
Causal vs. statistical models
Bayes nets and SEMs are not inherently causal
models
Markov
and Faithfulness assumptions can be expressed
purely as graph-quant. constraints
Assuming a non-causal version of the assumptions ⇒
purely statistical model
I.e.,
a compact representation of statistical
independencies among some set of variables
Causation and intervention
Causal claims support counterfactuals
In
particular, those about interventions
“If
I had flipped the switch, the light would have turned on”
“If she hadn’t dropped the plate, then it would not have
broken”
Etc.
Causation and intervention
One of the central causal asymmetries
Interventions on a cause lead to changes in the effect
In contrast, interventions on an effect do not lead to changes
in the cause
Flipping the switch turns off the light
Breaking the light bulb doesn’t flip the switch
Some have argued that this is the paradigmatic
feature of causation (Woodward, Hausman)
Looking ahead…
Have: Basic formal representation for causation
Need:
Fundamental
causal asymmetry (of intervention)
Inference & reasoning methods
Search & causal discovery methods
Looking ahead…
Have: Basic formal representation for causation
Need:
Fundamental
causal asymmetry (of intervention)
Inference & reasoning methods
Search & causal discovery methods