P(D, S 0 ) - Department of Computer Science
Download
Report
Transcript P(D, S 0 ) - Department of Computer Science
Learning Linear Causal Models
Oksana Kohutyuk
ComS 673 Spring 2005
Department of Computer Science
Iowa State University
Outline
Motivation
Linear causal models
Introduction
Properties
Structure learning
Greedy hillclimbing search
PC algorithm
Results
Results
EGS/EGS2 algorithm
Results
Motivation
Given an amount of data, what can we deduce
about the unknown underlying regulatory network?
Observe the state of a cell and how it changes under
different conditions, and from this derive a model of how
these state changes are generated
Linear Causal Models
Special case of Bayesian Networks
Assumes linear interactions between a child and its
parents in the graph
Influences of parents are additive
Cannot model combinatorial effects
Parameters specify the type of influence
(positive/negative) and the amount of influence
Allows continuous data
Linear Causal Models
Example
X2 = C1 + b12X1 + e1
X3 = C2 + b13X1 + b23X2 + e2
….
X1
X5
X2
Nodes = variables
(e.g, gene expression level)
X4
Arcs = causes
(e.g, biological processes)
X3
X6
Properties
Conditional Independence: I (x,z,y) if
Each node is asserted to be conditionally independent of
its non-descendants, given its parents
Each node is conditionally independent of all other nodes
given its Markov blanket
Then the total probability is:
n
P( X 1 X n ) P( X i | parents( X i ))
i 1
Independence
Structure and Correlation Constraints
Acyclic graphs
Example:
X2 = C1 + b12X1 + e1
X3 = C2 + b23X2 + e2
Calculating the correlation of X1 and X3 from the model's
equations, we find that
Conditional independence
In linear causal models conditional independence is
equivalent to zero partial correlation, also known as a
vanishing partial correlation
xy. z
1 1
xy
xz
2
xz
yz
2
yz
xy.z=0 => I(x,z,y), there is no linear correlation between x
and y given the value of z
Conditional independence
Example of a linear correlation between x and y
Conditional independence
When conditioning on a set of variables, the partial
correlation coefficient can be computed recursively
xy.qZ
xy.Z xq.Z yq.Z
(1
2
xq . Z
)(1
2
yq . Z
)
Conditional independence
First-order partial correlation
where
Correlation coefficient
Covariance
Variance
Standard deviation
Expected value/mean
xy. z
1 1
xy
xz
2
xz
yz
2
yz
Conditional independence
D-separation
Two variables I and J are independent (have zero partial
correlation) if all paths between them are blocked by
evidence K
A path is blocked in three cases
K
Head-to-head node
I
Intermediate node
I
J
K
I
Tail-to-tail node
J
J
Z
K
Inferring independence in graphs
Algorithm to determine if dsep(X, Z, Y) holds:
Recursively prune every leaf node W as long as W
does not belong to X U Y U Z
Delete all edges leaving any node in Z
Test if X and Y are disconnected in a resulting graph
Inferring independence in graphs
I (X2, {} , X4)?
X1
X2
X3
X4
X5
X6
X7
Inferring independence in graphs
I (X2, X5, X4)?
X1
X2
X3
X4
X5
X6
X7
Inferring independence in graphs
I (X3, X2, X5)?
X1
X2
X3
X4
X5
X6
X7
Paper review
“Revising Regulatory Networks: From
Expression Data to Linear Causal Models”
S. D. Bay, J. Shrager, A. Pohorille, and P. Langley
http://newatlantis.isle.org/~sbay/papers/jbi-BSPL.pdf
Greedy hillclimbing search
A partial model already exists
Modify/improve initial model using microarray gene
expression data
Revising the model
Start with initial model
Find causal relationships
Use greedy hillclimbing to search through the space of
candidate models
Use the bootstrap method to determine the stability of the
suggested revisions
Determine the type of regulation between variables
(positive/negative regulation)
Hillclimbing search
Greedy search algorithm:
1. Start with a proposed model M (a DAG)
2. Until no improvements in the score can be
made
1. Randomly pick two nodes x, y
2. Add, delete, or reverse an edge between x and
y, depending on which operation causes the
best improvement in the score of the model
Revising model structure
Determine partial correlation ρxy.z for every
combination and ordering of variables x,y,z
In the model:
Determine if ρxy.z =0
D-separation rules
≡ I(x,z,y) using
In the data:
Test if rxy.z = 0 using first-order partial correlation
coefficient
Determining partial correlations
In the data:
Test the significance of rxy . z, observed value of xy. z
Null hypothesis
H 0: xy . z 0
Alternate hypothesis Ha : xy . z 0
Perform a t-test with 95% two-sided CI
t N 3
Accept H 0 only if
rxy. z N 3
1 rxy2 . z
| tN 3 | c.v
Determining partial correlations
If the null hypothesis is clearly rejected or accepted, there are
four possible outcomes:
model entails
model entails
model entails
model entails
Scoring function
and data implies
and data implies
and data implies
and data implies
(true positive)
(false positive)
(false negative)
(true negative)
Parameter signs and correlations
If I and J are directly connected, assume
sign (ρij) = sign (rij)
Otherwise, multiply the signs of the links
If I and J are connected by multiple paths, pick the sign
implied by the dominant path
Use greedy hillclimbing with scoring function
Bootstrapping
Goal: learn stable changes from an initial model
Method:
Sample with replacement from a data set of size n to
create k new data sets of size n (in this case, k=20)
Record suggested changes
Only accept changes with repeatability greater than a
threshold (in this case, 75%)
Results
Results
Issues
Independence tests are performed
conditioning on only one variable
The search can easily get stuck in local
maxima
PC Algorithm
1. Start with a complete undirected graph G
2. For all adjacent nodes x and y, try to separate nodes by checking
for conditional independencies between x and y. Check a
conditional independence relation I(x, y|S) if all variables in S are
adjacent to x or y. If I(x, y|S) for some set S, remove the edge
between x and y.
3. For each triple of nodes (x, y, z) such that x is adjacent to y and y is
adjacent to z but x is not adjacent to z, orient x – y – z as x -> y <z if and only if y was not in S
4. Until no more edges can be directed
1. direct all arcs necessary to avoid new v-structures
2. direct all arcs necessary to avoid cycles
using rules R1 – R4
PC Algorithm: Problems
Order(V), that is, the order in which the
independencies are checked, makes a
difference in the orientation of edges
Relies heavily on independence tests
Result: PC algorithm is unstable
“A Hybrid Anytime Algorithm for the Construction of Causal
Models From Sparse Data”
Denver Dash, Marek J. Druzdzel
In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial
Intelligence (UAI-99), San Francisco, CA, 1999
PC Algorithm: problems
EGS Algortihm
Repeat:
1. Draw an from P().
2. Call PerformPC(, order(V ), D, I) to
generate an essential graph G0.
3. Randomly convert G0 to a dag S0.
4. Calculate P(D, S0) and record the structure
with the maximum value.
5. Randomly generate a different configuration
of order(V ).
EGS Algortihm
Scoring metric (BSM):
EGS2 algortihm
Input: A set of variables V, a distribution P() over the significance level , a
test of independence I(x; y |S), an initial configuration for order(V ), and an
integer n.
Repeat:
1.
Draw an from P().
2.
Call PerformPC(, order(V ), D, I) to generate an essential graph
G0 .
3.
Randomly convert G0 to a dag S0.
4.
Calculate P(D, S0) and record the structure with the maximum
value.
5.
Randomly generate a different configuration of order(V ).
Until: n structures are generated without updating the maximum value
of P(D, S0).
Results
Results
Discussion
Limitations of linear causal models:
Cannot model combinatorial effects
Do not model time dependent effects
References
“Revising Regulatory Networks: From Expression
Data to Linear Causal Models”
S. D. Bay, J. Shrager, A. Pohorille, and P. Langley
http://newatlantis.isle.org/~sbay/papers/jbi-BSPL.pdf
“A Hybrid Anytime Algorithm for the Construction of
Causal Models From Sparse Data”
Denver Dash, Marek J. Druzdzel
In Proceedings of the Fifteenth Annual Conference on Uncertainty
in Artificial Intelligence (UAI-99), San Francisco, CA, 1999