Cogito ergo sum” …or do I?: Causality vs Statistical
Download
Report
Transcript Cogito ergo sum” …or do I?: Causality vs Statistical
“Cogito ergo sum” …or do I?:
When can Causality be inferred from DPGM
Felipe Orihuela-Espina
Instituto Nacional de Astrofísica, Óptica y
Electrónica (INAOE)
DyNaMo Research Meeting, 3-4th June 2011
Cogito ergo sum
• A familiar (for the audience) graphical representation
Cause
Effect
Cogito
Sum
Cogito
Sum
Present
Not present
Present
Not present
1.0
0.5
0
0.5
Felipe Orihuela-Espina (INAOE)
2
Why is causality so problematic?
• Cannot be computed from the
data alone
• Systematic temporal
precedence is not sufficient
• Co-ocurrence is not sufficient
• It is not always a direct relation
(indirect relations,
transitivity/mediation, etc may
be present), let alone linear…
• It may occur across frequency
bands
• YOU NAME IT HERE…
A very silly example
Which process causes which?
Causality is so difficult that “it would be
very healthy if more researchers
abandoned thinking of and using terms
such as cause and effect” [Muthen1987 in
PearlJ2011]
Felipe Orihuela-Espina (INAOE)
3
A real example
An ECG
[KaturaT2006] only claim that
there are interrelations
(quantified using MI)
[OrihuelaEspinaF2010]
Felipe Orihuela-Espina (INAOE)
4
THE CONTRIBUTION OF
PHYLOSOPHY
Felipe Orihuela-Espina (INAOE)
5
Causality in Phylosophy
• Aristotle’s four "causes"' of a
thing
– The material cause (that out of
which the thing is made),
– the formal cause (that into
which the thing is made),
– the efficient cause (that which
makes the thing), and
– the final cause (that for which
the thing is made).
In [HollandPW1986]
Felipe Orihuela-Espina (INAOE)
Aristotle (384BC-322BC)
6
Causality in Phylosophy
• Hume’s legacy
– Sharp distinction between analytical
(thoughts) and empirical (facts) claims
– Causal claims are empirical
– All empirical claims originate from experience
(sensory input)
• Hume’s three basic criteria for causation
– (a) spatial/temporal contiguity,
– (b) temporal succession, and
– (c) constant conjunction
• It is not empirically verifiable that the cause
produces the effect, but only that the cause
is invariably followed by the effect.
[HollandPW1986, PeralJ1999_IJCAITalk]
Felipe Orihuela-Espina (INAOE)
David Hume (1711-1776)
7
Causality in Phylosophy
• Mill’s general methods of
experimental enquiry
– Method of concomitant variation (i.e.
correlation…)
– Method of difference (i.e. causation)
– Method of residues (i.e. induction)
– Method of agreement (i.e. null effect –
can only rule out possible causes)
• Mill “only” coded these methods;
but they have been put forth by Sir
Francis Bacon 250 years earlier (The
Advancement of Learning and
Novum Organum Scientiarum)
In [HollandPW1986]
Felipe Orihuela-Espina (INAOE)
John Stuart Mill (1806-1873)
Sir Francis
Bacon (15611626)
8
Causality in Phylosophy
• Suppe’s probabilistic theory of
causality
– “… one event is the cause of another
if the appearance of the first is
followed with a high probability by
the appearance of the second, and
there is no third event that we can
use to factor out the probability
relationship between the first and
second events”
– C is a genuine cause of E if:
• P(E|C)>P(E) (prima facie) and
• not (P(E|C,D)=P(E|D) and
P(E|C,D)>=P(E|C)) (spurious cause)
[SuppeP1970, HollandPW1986]
Felipe Orihuela-Espina (INAOE)
Patrick Colonel Suppes (1922-)
Lucie Stern Emeritus Proffesor of
Philosophie at Stanford
9
CAUSALITY: DIFFERENT VIEWS,
SAME CONCEPT
Felipe Orihuela-Espina (INAOE)
10
Causality requires time!
• “…there is little use in the practice of
attempting to dicuss causality without
introducing time” [Granger,1969]
– …whether philosphical, statistical, econometrical,
topological, etc…
Felipe Orihuela-Espina (INAOE)
11
Causality requires directionality!
• Algebraic equations, e.g. regression “do not
properly express causal relationships […] because
algebraic equations are symmetrical objects […]
To express the directionality of the underlying
process, Wright augmented the equation with a
diagram, later called path diagram in which
arrows are drawn from causes to effects”
[PearlJ2009]
– Feedback and instantaneous causality in any case are
a double causation.
Felipe Orihuela-Espina (INAOE)
12
From association to causation
• Barriers between classical statistics and
causal analysis [PearlJ2009]
1. Coping with untested assumptions and changing
conditions
2. Inappropiate mathematical notation
Felipe Orihuela-Espina (INAOE)
13
Stronger
Causality
• Zero-level causality: a statistical association, i.e. nonindependence which cannot be removed by
conditioning on allowable alternative features.
– i.e. Granger’s, Topological
• First-level causality: Use of a treatment over another
causes a change in outcome
Weaker
– i.e. Rubin´s, Pearl’s
• Second-level causality: Explanation via a generating
process, provisional and hardly lending to formal
characterization, either merely hypothesized or solidly
based on evidence
– i.e. Suppe’s, Wright’s path analysis
– e.g. Smoking causes lung cancer
Inspired from [CoxDR2004]
Felipe Orihuela-Espina (INAOE)
It is debatable
whether second
level causality is
indeed causality
14
Variable types and their joint
probability distribution
• Variable types:
– Background variables (B) – specify what is fixed
– Potential causal variables (C)
– Intermediate variables (I) – surrogates, monitoring,
pathways, etc
– Response variables (R) – observed effects
• Joint probability distribution of the variables:
P(RICB) = P(R|ICB) P(I|CB) P(C|B) P(B)
…but it is possible to integrate over I (marginalized)
P(RCB) = P(R|CB) P(C|B) P(B)
In [CoxDR2004]
Felipe Orihuela-Espina (INAOE)
15
Granger’s Causality
• Granger´s causality:
– Y is causing X (YX) if we are better
to predict X using all available
information (Z) than if the information
apart of Y had been used.
• The groundbreaking paper:
– Granger “Investigating causal
relations by econometric models and
cross-spectral methods” Econometrica
37(3): 424-438
• Granger’s causality is only a
statement about one thing
happening before another!
Sir Clive William John Granger
(1934 –2009) – University of
Nottingham – Nobel Prize
Winner
– Rejects instantaneous causality
Considered as slowness in recording
of information
Felipe Orihuela-Espina (INAOE)
16
Granger’s Causality
• “The future cannot cause the past” [Granger
1969]
– “the direction of the flow of time [is] a central
feature”
– Feedback is a double causation; XY and YX
denoted XY
• “causality…is based entirely on the predictability
of some series…” [Granger 1969]
– Causal relationships may be investigated in terms of
coherence and phase diagrams
Felipe Orihuela-Espina (INAOE)
17
Topological causality
•
“A causal manifold is one with an assignment
to each of its points of a convex cone in the
tangent space, representing physically the
future directions at the point. The usual
causality in MO extends to a causal structure
in M’.” [SegalIE1981]
•
Causality is seen as embedded in the
geometry/topology of manifolds
– Causality is a curve function defined over the
manifdld
•
The groundbreaking book:
– Segal IE “Mathematical Cosmology and
Extragalactic Astronomy” (1976)
•
I am not sure whether Segal is the father of
causal manifolds, but his contribution to the
field is simply overwhelming…
Irving Ezra Segal (1918-1998) Professor of Mathematics at MIT
Felipe Orihuela-Espina (INAOE)
18
Causal (homogeneous Lorentzian) Manifolds:
The topological view of causality
• The cone of causality [SegalIE1981,RainerM1999,
Future
MosleySN1990, KrymVR2002]
Instant present
Past
Felipe Orihuela-Espina (INAOE)
19
Rubin Causal Model
• Rubin Causal Model:
– “Intuitively, the causal effect of one
treatment relative to another for a
particular experimental unit is the
difference between the result if the
unit had been exposed to the first
treatment and the result if, instead,
the unit had been exposed to the
second treatment”
• The groundbreaking paper:
– Rubin “Bayesian inference for causal
effects: The role of randomization”
The Annals of Statistics 6(1): 34-58
Donald B Rubin (1943 – ) –
John L. Loeb Professor of Stats
at Harvard
• The term Rubin causal model was
coined by his student Paul Holland
Felipe Orihuela-Espina (INAOE)
20
Rubin Causal Model
• Causality is an algebraic difference:
treatment causes the effect Ytreatment(u)-Ycontrol(u)
…or in other words; the effect of a cause is always relative
to another cause [HollandPW1986]
• Rubin causal model establishes the conditions under
which associational (e.g. Bayesian) inference may infer
causality (makes assumptions for causality explicit).
Felipe Orihuela-Espina (INAOE)
21
Fundamental Problem of Causal Inference
• Only Ytreatment(u) or Ycontrol(u) can be observed on a
phenomena, but not both.
– Causal inference is impossible without making untested
assumptions
– …yet causal inference is still possible under uncertainty
[HollandPW1986] (two otherwise identical populations u
must be prepared and all appropiate background variables
must be considered in B).
• Again! (see slide #15“Statistical dependence vs
Causality”); Causal questions cannot be computed
from the data alone, nor from the distributions that
govern the data [PearlJ2009]
Felipe Orihuela-Espina (INAOE)
22
Relation between Granger, Rubin and
Suppes causalities
Granger
Rubin’s model
Cause (Treatment)
Y
t
Effect
X
Ytreatment(u)
All other available
information
Z
Z (pre-exposure variables)
• Granger’s noncausality:
X is not Granger cause of Y (relative to information in Z)
X and Y are conditionally independent (i.e.
P(Y|X,Z)=P(Y|Z))
• Granger’s noncausality is equal to Suppes spurious case
Modified from [HollandPW1986]
Felipe Orihuela-Espina (INAOE)
23
Pearl’s statistical causality
(a.k.a. structural theory)
• “Causation is encoding behaviour under
intervention […] Causality tells us which
mechanisms [stable functional
relationships] is to be modified [i.e. broken]
by a given action” [PearlJ1999_IJCAI]
• Causality, intervention and mechanisms can
be encapsulated in a causal model
• The groundbreaking book:
Judea Pearl (1936-) Professor of computer science and
statistics at UCLA
– Pearl J “Causality: Models, Reasoning and
Inference” (2000)*
• Pearl’s results do establish conditions under
which first level causal conclusions are
possible [CoxDR2004]
Felipe Orihuela-Espina (INAOE)
* With permission of his 1995 Biometrika paper masterpiece
Sewall Green Wright
(1889-1988) – Father of
path analysis (graphical
rules)
24
[PearlJ2000, Lauritzen2000, DawidAP2002]
Statistical causality
• Conditioning vs Intervening [PearlJ2000]
– Conditioning: P(R|C)=P(R|CB)P(B|C) useful but
innappropiate for causality as changes in the past (B) occur
before intervention (C)
– Intervention: P(R║C)=P(R|CB)P(B) Pearl´s definition of
causality
• Underlying assumption: The distribution of R (and I)
remains unaffected by the intervention.
– Watch out! This is not trivial serious interventions may
distort all relations [CoxDR2004]
• βCB=0 C╨B P(R|C)=P(R║C) i.e. there is no difference between
conditioning and intervention
Structural
coefficient
Conditional
independence
Felipe Orihuela-Espina (INAOE)
25
LOOKING FOR CAUSALITY: DYNAMIC
PROBABILISTIC CAUSAL MODELS AND
SOME OTHER ANALYTICAL TOOLS
Felipe Orihuela-Espina (INAOE)
26
Some tools for looking at causality…
beyond the interest of this research meeting
• Structural Causal Models* and Path Analysis
[WrightS1921,1932, PearlJ2009]
– Structural Equation Modelling [WrightS1921,
PearlJ2011]
• Dynamic Transfer Function [Kaminski 1991,
2001 and 2005]
• Dynamic Causal Modelling [FristonKJ2003]
• Partial Directed Coherence [BaccaláLA2001]
Felipe Orihuela-Espina (INAOE)
•Well…this one is of interest… as it is the father of probabilistic dynamic models
27
Bayesian Networks
• Bayesian networks are structures (often in the form of
graph) describing probabilistic relationships between
variables [PearlJ2000, KaminskiM2005]
– Conditional independencies are represented by missing
edges
– Arrows convey causal directionality but merely indicate the
possiblity of a causal relation (i.e. they are only a
notational clue); implication of causality must be discarded
as inadequate [PearlJ2009]
• Conditional distributions e.g. P(X|Y), determines
associational distributions [HollandPW1986]
Felipe Orihuela-Espina (INAOE)
28
Causal Bayesian Networks
• The problem of Identification:
– Can the controlled (post-intervention) distribution P(R║C)
be estimated from data governed by the pre-intervention
distribution P(RCB)?
– The answer is a “yes, but…”
• i.e. as long as we account for general control of confounding and
counterfactuals, admissibility, Markovian graphs (i.e. acyclic
graph), ignorability, and a few other criteria beyond my humble
human limitation… seasoned with a good dose of inscrutable
maths.
• Some “recommended” reading if you are up to the challenge:
[PearlJ2000, 2009, Lauritzen2000, DawidAP2002]
Felipe Orihuela-Espina (INAOE)
29
Dynamic Graphical Models
• Tian’s theorem:
– “A sufficient condition for identifying a causal effect P(R║C) is
that every path between C and any of its children traces at least
one arrow emanating from a measured variable I”
– Translation to plain English: You ought to account for
confounders (which are also part of your graph) and causal
relations must cross through those confounders (i.e. they have
been taken into account)
• Note that Tian’s theorem is sufficient but not necessary, i.e. direct
links CR may still encode direct causality
– More translation to plain English: P(R║C) cannot encode
questions of attribution (e.g. how many deaths are due to
specific exposure?) or of susceptibility (e.g. how many would
have got diseased if exposed)
• Note the important implication that a thoroughly/carefully designed
randomized control trial may not suffice!
Felipe Orihuela-Espina (INAOE)
30
Dynamic Graphical Models:
A common error when using them…
• Correct methodology of
structural approach to
causation [PearlJ2009]:
1. Define the target
quantity
2. Assume: Formulate
causal assumptions
3. Identify: Determine if
the target is
identifiable
4. Estimate: i.e.
approximate
• Common application of
the methodology of
structural approach to
causation:
1. Estimate: i.e.
approximate
2. Assume: Formulate
causal assumptions
3. Sometimes Define the
target quantity
Felipe Orihuela-Espina (INAOE)
31
Conclusions
Cogito
Sum
?
• Well… only if you can prove no other factor
to intervene…
Felipe Orihuela-Espina (INAOE)
33
Questions?
THANKS!
Felipe Orihuela-Espina (INAOE)
34
BACK UP SLIDES
Felipe Orihuela-Espina (INAOE)
35
Structural Causal Models and Path
Analysis
• [WrightS1921, 1932, GoldbergerA1972, 1973,
DuncanO1975, PearlJ2009]
Felipe Orihuela-Espina (INAOE)
36
Structural Equation Modelling
• “a huge logical gap exists between
“establishing causation,” which requires
careful manipulative experiments, and
“interpreting parameters as causal effects” “
[PearlJ2011]
Felipe Orihuela-Espina (INAOE)
37
• Uses coherence and
phase
• Can be interpreted in
terms of Granger’s
causality
[KaminskiM2001]
Coherence
Phase
[KaminskiM 1991, 2001 and 2005]
Felipe Orihuela-Espina (INAOE)
Figure from [KaminskiM2001]
Dynamic Transfer Function
38
Dynamic Causal Modelling
• A bilinear model by
which the neural model
(not observed) is
inferred from the
haemodynamic model
(observed)
[FristonKJ2003]
• Embodies requisite
constraints using a
Bayesian framework
Fig. 1. This is a schematic illustrating the concepts underlying dynamic causal
modelling. In particular it highlights the two distinct ways in which inputs or
perturbations can elicit responses in the regions or nodes that compose the
model. In this example there are five nodes, including visual areas V1 and V4 in
the fusiform gyrus, areas 39 and 37, and the superior temporal gyrus STG.
Stimulus-bound perturbations designated u1 act as extrinsic inputs to the
primary visual area V1. Stimulus-free or contextual inputs u2 mediate their
effects by modulating the coupling between V4 and BA39 and between BA37
and V4. For example, the responses in the angular gyrus (BA39) are caused by
inputs to V1 that are transformed by V4, where the influences exerted by V4
are sensitive to the second input. The dark square boxes represent the
components of the DCM that transform the state variables zi in each region
(neuronal activity) into a measured (hemodynamic) response yi
Felipe Orihuela-Espina (INAOE)
39
[FristonKJ2003]
Partial Directed Coherence
• Based on Granger’s causality
Felipe Orihuela-Espina (INAOE)
40
[BaccaláLA2001]
WHAT IT IS NOT CAUSALITY – AND
OTHER COMMON MISCONCEPTIONS
Felipe Orihuela-Espina (INAOE)
41
Statistical dependence
• Statistical dependence is a type of relation between any two
variables [WermuthN1998]: if we find one, we can expect to find
the other
Statistical independence
Association
(symmetric or assymettric)
Deterministic dependence
• The limits of statistical dependence
– Statistical independence: The distribution of one variable is the same
no matter at which level changes occur on in the other variable
X and Y are independent P(X∩Y)=P(X)P(Y)
– Deterministic dependence: Levels of one variable occur in an exactly
determined way with changing levels of the other.
– Association: Intermediate forms of statistical dependency
• Symmetric
• Asymmetric (a.k.a. response) or directed association
Felipe Orihuela-Espina (INAOE)
42
Associational Inference ≡ Descriptive Statistics!!!
• The most detailed information linking two
variables is given by the joint distribution:
P(X=x,Y=y)
• The conditional distribution describes how the
values of X changes as Y varies:
P(X=x|Y=y)=P(X=x,Y=y)/P(Y=y)
• Associational statistics is simply descriptive
(estimates, regressions, posterior distributions,
etc…) [HollandPW1986]
– Example: Regression of X on Y is the conditional
expectation E(X|Y=y)
Felipe Orihuela-Espina (INAOE)
43
Regression and Correlation;
two common forms of associational inference
•
Regression Analysis: “the study of the dependence of one or more response
variables on explanatory variables” [CoxDR2004]
– Strong regression ≠ causality [Box1966]
– Prediction systems ≠ Causal systems [CoxDR2004]
•
Correlation is a relation over mean values; two variables correlate as they move
over/under their mean together (correlation is a ”normalization” of the
covariance)
• Correlation ≠ Statistical dependence
–
If r=0 (i.e. absence of correlation), X and Y are statistically independent, but the opposite is not true
[MarrelecG2005].
• Correlation ≠ Causation [YuleU1900 in CoxDR2004, WrightS1921]
–
Yet, causal conclusions from a carefully design (often synonym of randomized) experiment are often (not
always) valid [HollandPW1986, FisherRA1926 in CoxDR2004]
Felipe Orihuela-Espina (INAOE)
44
Coherence:
yet another common form of associational inference
• Often understood as “correlation in the frequency domain”
Cxy = |Gxy|2/(GxxGyy)
– where Gxy is the cross-spectral density,
– i.e. coherence is the ratio between the (squared) correlation
coefficient and the frequency components.
• Coherence measures the degree to which two series are
related
– Coherence alone does not implies causality! The temporal lag
of the phase difference between the signals must also be
considered.
Felipe Orihuela-Espina (INAOE)
45
Statistical dependence vs Causality
• Statistical dependence provide associational
relations and can be expressed in terms of a joint
distribution alone
– Causal relations CANNOT be expressed on terms of
statistical association alone [PearlJ2009]
• Associational inference ≠ Causal Inference
[HollandPW1986, PearlJ2009]
– …ergo, Statistical dependence ≠ Causal Inference
– In associational inference, time is merely operational
Felipe Orihuela-Espina (INAOE)
46
Causation defies (1st level) logic…
• Input:
– “If the floor is wet, then it rained”
– “If we break this bottle, the floor will get wet”
• Logic output:
– “If we break this bottle, then it rained”
Example taken from [PearlJ1999]
Felipe Orihuela-Espina (INAOE)
47