P(D, S 0 ) - Department of Computer Science

Download Report

Transcript P(D, S 0 ) - Department of Computer Science

Learning Linear Causal Models
Oksana Kohutyuk
ComS 673 Spring 2005
Department of Computer Science
Iowa State University
Outline


Motivation
Linear causal models
Introduction
 Properties


Structure learning

Greedy hillclimbing search


PC algorithm


Results
Results
EGS/EGS2 algorithm

Results
Motivation

Given an amount of data, what can we deduce
about the unknown underlying regulatory network?

Observe the state of a cell and how it changes under
different conditions, and from this derive a model of how
these state changes are generated
Linear Causal Models

Special case of Bayesian Networks

Assumes linear interactions between a child and its
parents in the graph




Influences of parents are additive
Cannot model combinatorial effects
Parameters specify the type of influence
(positive/negative) and the amount of influence
Allows continuous data
Linear Causal Models

Example



X2 = C1 + b12X1 + e1
X3 = C2 + b13X1 + b23X2 + e2
….
X1
X5
X2
Nodes = variables
(e.g, gene expression level)
X4
Arcs = causes
(e.g, biological processes)
X3
X6
Properties

Conditional Independence: I (x,z,y) if

Each node is asserted to be conditionally independent of
its non-descendants, given its parents

Each node is conditionally independent of all other nodes
given its Markov blanket

Then the total probability is:
n
P( X 1  X n )   P( X i | parents( X i ))
i 1
Independence

Structure and Correlation Constraints
 Acyclic graphs
 Example:
X2 = C1 + b12X1 + e1
 X3 = C2 + b23X2 + e2


Calculating the correlation of X1 and X3 from the model's
equations, we find that
Conditional independence

In linear causal models conditional independence is
equivalent to zero partial correlation, also known as a
vanishing partial correlation
 xy. z 
    
1   1   
xy
xz
2
xz
yz
2
yz
xy.z=0 => I(x,z,y), there is no linear correlation between x
and y given the value of z
Conditional independence
Example of a linear correlation between x and y
Conditional independence

When conditioning on a set of variables, the partial
correlation coefficient can be computed recursively
 xy.qZ 
 xy.Z   xq.Z  yq.Z
(1  
2
xq . Z
)(1  
2
yq . Z
)
Conditional independence
First-order partial correlation
where

Correlation coefficient

Covariance

Variance

Standard deviation

Expected value/mean
 xy. z 
    
1   1   
xy
xz
2
xz
yz
2
yz
Conditional independence

D-separation


Two variables I and J are independent (have zero partial
correlation) if all paths between them are blocked by
evidence K
A path is blocked in three cases

K
Head-to-head node
I

Intermediate node
I
J
K
I

Tail-to-tail node
J
J
Z
K
Inferring independence in graphs

Algorithm to determine if dsep(X, Z, Y) holds:

Recursively prune every leaf node W as long as W
does not belong to X U Y U Z

Delete all edges leaving any node in Z

Test if X and Y are disconnected in a resulting graph
Inferring independence in graphs

I (X2, {} , X4)?
X1
X2
X3
X4
X5
X6
X7
Inferring independence in graphs

I (X2, X5, X4)?
X1
X2
X3
X4
X5
X6
X7
Inferring independence in graphs

I (X3, X2, X5)?
X1
X2
X3
X4
X5
X6
X7
Paper review

“Revising Regulatory Networks: From
Expression Data to Linear Causal Models”
S. D. Bay, J. Shrager, A. Pohorille, and P. Langley
http://newatlantis.isle.org/~sbay/papers/jbi-BSPL.pdf
Greedy hillclimbing search

A partial model already exists

Modify/improve initial model using microarray gene
expression data
Revising the model

Start with initial model

Find causal relationships



Use greedy hillclimbing to search through the space of
candidate models
Use the bootstrap method to determine the stability of the
suggested revisions
Determine the type of regulation between variables
(positive/negative regulation)
Hillclimbing search
Greedy search algorithm:
1. Start with a proposed model M (a DAG)
2. Until no improvements in the score can be
made
1. Randomly pick two nodes x, y
2. Add, delete, or reverse an edge between x and
y, depending on which operation causes the
best improvement in the score of the model
Revising model structure

Determine partial correlation ρxy.z for every
combination and ordering of variables x,y,z

In the model:


Determine if ρxy.z =0
D-separation rules
≡ I(x,z,y) using
In the data:

Test if rxy.z = 0 using first-order partial correlation
coefficient
Determining partial correlations
In the data:
 Test the significance of rxy . z, observed value of xy. z



Null hypothesis
H 0: xy . z  0
Alternate hypothesis Ha : xy . z  0
Perform a t-test with 95% two-sided CI
t N 3 

Accept H 0 only if
rxy. z N  3
1  rxy2 . z
| tN  3 | c.v
Determining partial correlations

If the null hypothesis is clearly rejected or accepted, there are
four possible outcomes:





model entails
model entails
model entails
model entails
Scoring function
and data implies
and data implies
and data implies
and data implies
(true positive)
(false positive)
(false negative)
(true negative)
Parameter signs and correlations




If I and J are directly connected, assume
sign (ρij) = sign (rij)
Otherwise, multiply the signs of the links
If I and J are connected by multiple paths, pick the sign
implied by the dominant path
Use greedy hillclimbing with scoring function
Bootstrapping

Goal: learn stable changes from an initial model

Method:



Sample with replacement from a data set of size n to
create k new data sets of size n (in this case, k=20)
Record suggested changes
Only accept changes with repeatability greater than a
threshold (in this case, 75%)
Results
Results
Issues


Independence tests are performed
conditioning on only one variable
The search can easily get stuck in local
maxima
PC Algorithm
1. Start with a complete undirected graph G
2. For all adjacent nodes x and y, try to separate nodes by checking
for conditional independencies between x and y. Check a
conditional independence relation I(x, y|S) if all variables in S are
adjacent to x or y. If I(x, y|S) for some set S, remove the edge
between x and y.
3. For each triple of nodes (x, y, z) such that x is adjacent to y and y is
adjacent to z but x is not adjacent to z, orient x – y – z as x -> y <z if and only if y was not in S
4. Until no more edges can be directed
1. direct all arcs necessary to avoid new v-structures
2. direct all arcs necessary to avoid cycles
using rules R1 – R4
PC Algorithm: Problems
 Order(V), that is, the order in which the
independencies are checked, makes a
difference in the orientation of edges
 Relies heavily on independence tests
Result: PC algorithm is unstable
“A Hybrid Anytime Algorithm for the Construction of Causal
Models From Sparse Data”
Denver Dash, Marek J. Druzdzel
In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial
Intelligence (UAI-99), San Francisco, CA, 1999
PC Algorithm: problems
EGS Algortihm
Repeat:
1. Draw an  from P().
2. Call PerformPC(, order(V ), D, I) to
generate an essential graph G0.
3. Randomly convert G0 to a dag S0.
4. Calculate P(D, S0) and record the structure
with the maximum value.
5. Randomly generate a different configuration
of order(V ).
EGS Algortihm
Scoring metric (BSM):
EGS2 algortihm
Input: A set of variables V, a distribution P() over the significance level , a
test of independence I(x; y |S), an initial configuration for order(V ), and an
integer n.
Repeat:
1.
Draw an  from P().
2.
Call PerformPC(, order(V ), D, I) to generate an essential graph
G0 .
3.
Randomly convert G0 to a dag S0.
4.
Calculate P(D, S0) and record the structure with the maximum
value.
5.
Randomly generate a different configuration of order(V ).
Until: n structures are generated without updating the maximum value
of P(D, S0).
Results
Results
Discussion

Limitations of linear causal models:


Cannot model combinatorial effects
Do not model time dependent effects
References

“Revising Regulatory Networks: From Expression
Data to Linear Causal Models”
S. D. Bay, J. Shrager, A. Pohorille, and P. Langley
http://newatlantis.isle.org/~sbay/papers/jbi-BSPL.pdf

“A Hybrid Anytime Algorithm for the Construction of
Causal Models From Sparse Data”
Denver Dash, Marek J. Druzdzel
In Proceedings of the Fifteenth Annual Conference on Uncertainty
in Artificial Intelligence (UAI-99), San Francisco, CA, 1999