Simulation and Application to learn gene causal relationships

Transcript Simulation and Application to learn gene causal relationships

Simulation and Application on
learning gene causal relationships
Xin Zhang
Introduction
• High-throughput genetic technologies empowers
to study how genes interact with each other;
• Simulation to evaluate how well IC algorithm
learns gene causal relationships;
• We present an algorithm (mIC algorithm) for
learning causal relationship with knowledge of
topological ordering information, and apply it on
Melanoma dataset;
• Apply mIC algorithm on Melanoma dataset;
Steps for Simulation Study
• Construct a causal network N;
• Generate datasets based on the causal network;
• Learning the simulated data using causal
algorithms (e.g. IC algorithm) to obtain network
N´;
• Compare the original network N with obtained
network N´ w.r.t precision and recall;
Modeling and simulation of a causal
Boolean network (BN)
• Boolean network:
A
B
f
C
C=f(A,B)
• Constructing a causal structure;
• Assign parameters (proper functions) for each
node with casual parents;
• Assign probability distribution;
Constructing Boolean Network
1. Generate M BNs with up to 3 causal parents for each
node;
2. For each BN, generate a random proper function for
each node;
3. Assign random probabilities for the root gene(s);
4. Given one configuration, get probability distribution;
5. Collect 200 data points for each network;
6. Repeat above steps 3-5 for all M networks.
Constructing Causal Structure
A
B
C
E
D
Steps for constructing causal structure
Proper function (1)
Proper function: The function that reflects the
influence of the operators.
Example:
By simplifying f, c is a function of a with c = a
b is a pseudo predictor of c, and has no effect on c.
f is not a proper function.
Proper function (2)
• Definition:
 With n predictors, the number of proper
function is given by:
Probability Distribution
Generating dataset
Steps of learning gene causal
relationships
• Step1: obtain the probability distribution and data
sampling;
• Step2: apply algorithms to find causal relations;
• Step3: compare the original and obtained networks
based on the two notions of precision and recall;
• Step4: repeat step 1-3 for every random network;
Comparing two networks
A
B
C
A
D
Original Network
B
C
D
Obtained Network
Precision and Recall
• Original graph is a DAG, while obtained graph
has both directed and undirected edges;
Orig Graph
Obt. Graph
FN
TP
TN
FP
PFN, PTP
PTN, PFP
Recall = ATP/(AFN+ATP), Precision = ATP/(ATP + AFP)
Observational equivalence and Transitive
Closure
• Two DAGs are said to be observational equivalent (OE) if
they have the same skeleton and the same set of vstructure;
A
B
A
B
OE
C
D
C
D
Transitive closure (TC):
A ->B -> C with A -> C
cc(x,y): is true if there is a directed or an undirected edge from x to y;
pcc(x,y): is true if there is a path from x to y consisting of properly
directed and undirected edges
pcc(x,y):= cc(x,y) | pcc(x,z)  pcc(z,y)
Result for IC algorithm
How to improve IC algorithm
• The original IC algorithm did not have good
results on learning gene causal
relationships;
• A possible way to improve the performance
is to incorporate extra information;
• If we know the topological ordering of the
regulatory network, it would be helpful to
improve the learning result;
Gene topological ordering
• If a specific gene is the causal parent of
another gene;
• In a pathway, if one gene appears before
another gene;
• If one gene is at the beginning or at the end
of the pathway;
IC algorithm + topological ordering information
mIC algorithm
• mIC algorithm based on IC, but incorporates both topological
ordering information with steady state data to infer causality;
• 3 Steps of mIC algorithm:
– Find conditional independence:
For each pair of gene gi and gj in a dataset, test pairwise conditional
independence. If they are dependent, search for a set
Sij = {gk | gi and gj are independent given gk, with i<k<j, or j<k<i}.
Construct an undirected graph G such that gi and gj are connected with an edge
if an only if they are pairwise dependent and no Sij can be found;
– Find v-structure:
For each pair of nonadjacent genes gi and gj with common neighbor gk, if gk
Sij, and k>i, k>j, add arrowheads pointing at gk, such as
gi ->gk <- gj;
– Orientate more directed edges according to rules:
Orientate the undirected edges without creating new cycles and v-structures;
Results from mIC algorithm
Melanoma dataset
• The 10 genes involved in this study chosen from 587 genes
from the melonoma data;
• Previous studies show that WNT5A has been identified as
a gene of interest involved in melanoma;
• Controlling the influence of WNT5A in the regulation can
reduce the chance of melanoma metastasizing;
Applying mIC algorithm on Melanoma
Dataset
Partial biological prior knowledge:
MMP3 is expected to be the end of the pathway
WNT5A
Pirin causatively influences WNT5A –
In order to maintain the level of
WNT5A we need to directly control
WNT5A or through pirin.
WNT5A directly causes MART-1
Conclusion
• Evaluated IC algorithm using simulation data;
• We presented mIC algorithm that can infer gene causal
relationship from steady state data with gene topological
ordering information;
• Performed simulation based on Boolean network to
evaluate the performance of the causal algorithms;
• We applied mIC algorithm to real biological microarray
data Melanoma dataset;
• The result showed that some of the important causal
relationships associated with WNT5A gene have been
identified using mIC algorithm.

Simulation and Application to learn gene causal relationships

Transcript Simulation and Application to learn gene causal relationships

Directory