causal reasoning in BMI

Download Report

Transcript causal reasoning in BMI

Causal reasoning in
Biomedical Informatics
Chitta Baral
Professor
Department of Computer Science and Engineering
[email protected]
Causal connection versus
Correlation
 Rain, Falling Barometer: They usually have a 1-1
correspondence


Does falling barometer cause rain?
Does rain cause falling barometer?
 Rain and Mud:

Does rain cause mud?
 Smoking and Cancer:

Does smoking causes cancer?
 What causes global warming?
Simpson’s Paradox:
Who should take drug?
(Male, Female Unknown Sex)
Recovery
M Took D 18
M ~Take
7
F Took D 2
F ~Take
9
Total Took 20
T ~Take
16
~Recovery
12
3
8
21
20
24
#people
30
10
10
30
40
40
Rec. rate
60
70
20
30
50
40
Simpson’s Paradox (cont.)
 Summary






60% of males who took drug recovered
70% of males who did not take drug recovered
20% of females who took drug recovered
30% of females who did not take drug recovered
50% of people who took drug recovered
40% of people who did not take drug recovered
 Paradox:


If we know a patient is male or a female then we should not give the
drug!
If we do not know the sex then we should give the drug!
Why the Simpson’s Paradox
 From the given data we can calculate the following



P(recovery|took drug, male)
P(recovery|took drug, female)
P(recovery|too drug)
 We should be calculating the following



P(recovery| do(drug), male)
P(recovery| do(drug), female)
P(recovery| do(drug) )
Causality: story and questions
 Story: In a group of people, 50 % were given treatment for
an ailment and 50% were not. Of the 50 % in both the
treated and untreated group, 50 % recovered and 50 % did
not.



Joe, a patient took the treatment and died.
What is the probability that Joe’s death occurred due to treatment?
Or, what is the probability that Joe, who died under treatment, would
have lived had he not been treated?
 Can we answer these questions from the above story?


No?
Why not?
The probability distribution
X
Y
Prob.
 X =1
treatment was given
0
0
0.25
0
1
0.25
1
0
0.25
1
1
0.25
 X=0
treatment was not given
 Y=1
patient died
 Y=0
patient recovered
Causal Model 1 of the story
 The model





U1: a variable that decides treatment; U2: a variable that decides if someone
will die
X, Y: treatment given, patient died
X  U1 (X = U1)
Y  U2 (Y = U2)
P(U1) = 0.5 P(U2) = 0.5. leads to the same probability table
 Observation: Joe took the treatment and died.

X = 1 and Y = 1. Thus U1 = 1 and U 2 =1.
 Question: What is the probability that Joe would have lived if he had not
taken the treatment.



We do X= 0. Find P( Y = 0 | do(X=0))?
Y = U2 = 1 (regardless of the value of X and U1)
Hence P( Y = 0 | do(X=0)) = 0. (Joe would have died anyway.)
Causal Model 2 of the story
 The model





U2: A genetic factor which if present, kills people who take the treatment
and if absent kills people who do not take the treatment
U1, X, Y: decides treatment, treatment given, patient died
X  U1 (X = U1)
Y  U2 Y  X (Y = X*U2 + (1-X)(1-U2) )
P(U1) = 0.5 P(U2) = 0.5. leads to the same probability table
 Observation: Joe took the treatment and died.

X = 1 and Y = 1. Thus U1 = 1 and U 2 =1.
 Question: What is the probability that Joe would have lived if he had not
taken the treatment.



We do X= 0. Find P( Y = 0 | do(X=0))?
Y = 0 * 1 + (1-0)(1-1) = 0
Hence P( Y = 0 | do(X=0)) = 1. (Joe would not have died.)
Summary of the story
 There are at least two causal models which is consistent with
the data (50% of …)
 In model 1 Joe would still have died if he had not taken the
treatment.
 But in model 2 Joe would have lived if he had not taken the
treatment.
 Moral:



Causal models are the key.
Just statistical data is not much useful.
Multiple causal models may correspond to the same statistical data.
Causal relations in molecular
biology
 Certain proteins (transcription factors)
regulate the expression of genes
 One protein may inhibit or activate another
protein or another biochemical molecule
 Catalysts in metabolic reactions
Sources: http://www.ornl.gov/sci/techresources/Human_Genome/graphics/slides/images/REGNET.jpg
How do we get causal models?
 Traditional approach


Knockout genes (too slow)
Temporal or time series data

Not feasible for human cells
 Can we infer causal information from steady
state data?

To some extent
Suppose we have 3 variables A, B, and C
obtained from the data that:



A and B are dependent.
B and C are dependent.
A and C are independent.
Think of A, B and C that satisfy the
above.
Example (cont.)
Most likely your interpretation of A, B and
C would satisfy the causal relations A  B
 C as shown below.
A
C
B
Some necessary definitions
Necessary to state when the
algorithm works.
Causal model
 Causal structure: a directed acyclic graph
(DAG)
 Causal model: Causal structure with
parameters (functions for each variables with
parents, and probabilities for the variables
without parents)
Conditional independence and
d-separation
 X and Y are said to be conditionally independent
given Z if P(x | y, z) = P(x | z) whenever P(y, z) > 0.
 d-separation: A path p is said to be d-separated by a
set of nodes Z if
 p contains i  m  j or i  m  j and m is in Z or
 p contains i  m  j and neither m nor any of its
descendant is in Z.
 Z is said to d-separate X and Y if every path between
a node in X and a node in Y is d-separated by Z
Observationally equivalent
 Two directed acyclic graphs (DAGs) are
observationally equivalent if they have the same set of
independencies.
Alternative Definition:
 Two DAGs are observationally equivalent if they have
the same skeleton and the same set of v-structures

V-structures are structures of the form a  x  b such that
there is no arrow between a and b.
Observationally equivalent
networks
 Two networks that are observationally
equivalent can not be distinguished without
resorting to manipulative experimentation
or temporal information.
Preference
 Ordering between DAGs: G1 is preferable to G2,
if every distribution that can be simulated using
G1 (and some parameter) can also be simulated
using G2 (and some parameter).
 In the absence of hidden variables, tests of
preference and (observational) equivalence can be
reduced to tests of induced dependencies, which
can be determined directly from the topology of
the DAG without considering about the
parameters.
Stability/faithfulness
 Stability/faithfulness: A DAG and distribution are
faithful to each other if they exhibit the same set of
independencies. A distribution is said to be faithful
if it is faithful to some DAG.
 With the added assumption of stability, every
distribution has a unique minimal causal structure
(up to d-separation equivalence), as long as there
are no hidden variables.
IC algorithm and faithfulness
 Given a faithful distribution the IC and IC*
algorithms can find the set of DAGs that are
faithful to this distribution, in absence and in
presence of hidden variables, respectively
IC Algorithm: Step 1
 For each pair of variables a and b in V, search for a set
Sab such that (a╨b | Sab) holds in P – in other words, a and b
should be independent in P, conditioned on Sab .
 Construct an undirected graph G such that vertices a
and b are connected with an edge if and only if no set
Sab can be found.
 Sab
a
╨
b
a
b
a
b
Sab
Not  Sab
IC Algorithm: Step 2
 For each pair of nonadjacent variables a and b with a common
neighbor c, check if c Sab.
 If it is, then continue;
 Else add arrowheads at c
 i.e a→ c ← b
Yes
a
c
b
a
╨
b
C
No
a
c
b
Microarray data
Genes
Samples
 Gene up-regulate, down-regulate;
Our work on learning causal
models
 We developed an algorithm for learning
causal relationship with knowledge of
topological ordering information;



Uses conditional dependencies and
independencies among variables;
Incorporates topological information; and
Learns mutual information among genes.
Steps of learning gene causal
relationships:
mIC algorithm and its evaluation
 Step1: obtain the probability distribution, data
sampling and the topological order of the gene;
 Step2: apply algorithms to find causal relations;
 Step3: compare the original and obtained networks
based on the two notions of precision and recall;
 Step4: repeat step 1-3 for every random network;
We applied the learning
algorithm in Melanoma Dataset
 melanoma -- malignant tumor occurring
most commonly in skin;
Knowledge we have
 The 10 genes involved in this study chosen from 587 genes
from the melanoma data;
 Previous studies show that WNT5A has been identified as a
gene of interest involved in melanoma;
 Controlling the influence of WNT5A in the regulation can
reduce the chance of melanoma metastasizing;
Partial biological prior knowledge:
MMP3 is expected to be at the end of the
pathway
Important Information we
discovered
WNT5A
Pirin causatively influences WNT5A –
“In order to maintain the level of
WNT5A we need to directly control
WNT5A or through pirin”.
Causal connection between WNT5A and
MART-1
“WNT5A directly causes MART-1”
Modeling and simulation of a causal
Boolean network (BN)
 Boolean network:
A
B
f
C
C=f(A,B)
Proper function:
The function that reflects the
influence of the operators.
Example:
c = f(a,b) = (ab) (a   b)
= a is not a proper function.
Simulation process in our study:
1.
2.
3.
4.
5.
6.
Generate M BNs with up to 3 causal parents for each node;
For each BN, generate a random proper function for each node;
Assign random probabilities for the root gene(s);
Given one configuration, get probability distribution;
Collect 200 data points for each network;
Repeat above steps 3-5 for all M networks.
Comparing original and obtained
networks
 Original graph is a DAG, while obtained graph has
both directed and undirected edges;
Orig Graph
Obt. Graph
FN
TP
TN
FP
PFN, PTP
PTN, PFP
Recall = ATP/(AFN+ATP), Precision = ATP/(ATP + AFP)
Learning with IC algorithm
Learning with MIC algorithm
Conclusion
 Causality differs from correlation.
 P(X|Y) differs from P(X| do(Y)).
 While P(X|Y) can be answered using joint probability distributions and
other representations of it (such as Bayes nets), to answer P(X|do(Y))
one needs a causal model.
 We have worked on various causal model representations and how to
reason with them.
 Causal models can be learned by knock out experiments and from
temporal and time series data.
 Recent algorithms have been proposed to learn causal models from
steady state data.


IC algorithm.
We have improved on the IC algorithm.
References
 Judea Pearl: Reasoning with cause and effect.
http://singapore.cs.ucla.edu/IJCAI99/index.html
 An algorithm to learn causal connection between genes from steady state
data: simulation and its application to melanoma dataset. Xin Zhang,
Chitta Baral, Seungchan Kim. Proc. of 10th Conference on Artificial
Intelligence in Medicine (AIME 05) 23 - 27 July 2005 Aberdeen,
Scotland. pages 524-534.
http://www.public.asu.edu/~cbaral/papers/AIME05_final.pdf