Simulation and Application to learn gene causal relationships

Download Report

Transcript Simulation and Application to learn gene causal relationships

Simulation and Application on
learning gene causal relationships
Xin Zhang
Introduction
• High-throughput genetic technologies empowers
to study how genes interact with each other;
• Simulation to evaluate how well IC algorithm
learns gene causal relationships;
• We present an algorithm (mIC algorithm) for
learning causal relationship with knowledge of
topological ordering information, and apply it on
Melanoma dataset;
• Apply mIC algorithm on Melanoma dataset;
Steps for Simulation Study
• Construct a causal network N;
• Generate datasets based on the causal network;
• Learning the simulated data using causal
algorithms (e.g. IC algorithm) to obtain network
N´;
• Compare the original network N with obtained
network N´ w.r.t precision and recall;
Modeling and simulation of a causal
Boolean network (BN)
• Boolean network:
A
B
f
C
C=f(A,B)
• Constructing a causal structure;
• Assign parameters (proper functions) for each
node with casual parents;
• Assign probability distribution;
Constructing Boolean Network
1. Generate M BNs with up to 3 causal parents for each
node;
2. For each BN, generate a random proper function for
each node;
3. Assign random probabilities for the root gene(s);
4. Given one configuration, get probability distribution;
5. Collect 200 data points for each network;
6. Repeat above steps 3-5 for all M networks.
Constructing Causal Structure
A
B
C
E
D
Steps for constructing causal structure
Proper function (1)
Proper function: The function that reflects the
influence of the operators.
Example:
By simplifying f, c is a function of a with c = a
b is a pseudo predictor of c, and has no effect on c.
f is not a proper function.
Proper function (2)
• Definition:
 With n predictors, the number of proper
function is given by:
Probability Distribution
Generating dataset
Steps of learning gene causal
relationships
• Step1: obtain the probability distribution and data
sampling;
• Step2: apply algorithms to find causal relations;
• Step3: compare the original and obtained networks
based on the two notions of precision and recall;
• Step4: repeat step 1-3 for every random network;
Comparing two networks
A
B
C
A
D
Original Network
B
C
D
Obtained Network
Precision and Recall
• Original graph is a DAG, while obtained graph
has both directed and undirected edges;
Orig Graph
Obt. Graph
FN
TP
TN
FP
PFN, PTP
PTN, PFP
Recall = ATP/(AFN+ATP), Precision = ATP/(ATP + AFP)
Observational equivalence and Transitive
Closure
• Two DAGs are said to be observational equivalent (OE) if
they have the same skeleton and the same set of vstructure;
A
B
A
B
OE
C
D
C
D
Transitive closure (TC):
A ->B -> C with A -> C
cc(x,y): is true if there is a directed or an undirected edge from x to y;
pcc(x,y): is true if there is a path from x to y consisting of properly
directed and undirected edges
pcc(x,y):= cc(x,y) | pcc(x,z)  pcc(z,y)
Result for IC algorithm
How to improve IC algorithm
• The original IC algorithm did not have good
results on learning gene causal
relationships;
• A possible way to improve the performance
is to incorporate extra information;
• If we know the topological ordering of the
regulatory network, it would be helpful to
improve the learning result;
Gene topological ordering
• If a specific gene is the causal parent of
another gene;
• In a pathway, if one gene appears before
another gene;
• If one gene is at the beginning or at the end
of the pathway;
IC algorithm + topological ordering information
mIC algorithm
• mIC algorithm based on IC, but incorporates both topological
ordering information with steady state data to infer causality;
• 3 Steps of mIC algorithm:
– Find conditional independence:
For each pair of gene gi and gj in a dataset, test pairwise conditional
independence. If they are dependent, search for a set
Sij = {gk | gi and gj are independent given gk, with i<k<j, or j<k<i}.
Construct an undirected graph G such that gi and gj are connected with an edge
if an only if they are pairwise dependent and no Sij can be found;
– Find v-structure:
For each pair of nonadjacent genes gi and gj with common neighbor gk, if gk
Sij, and k>i, k>j, add arrowheads pointing at gk, such as
gi ->gk <- gj;
– Orientate more directed edges according to rules:
Orientate the undirected edges without creating new cycles and v-structures;
Results from mIC algorithm
Melanoma dataset
• The 10 genes involved in this study chosen from 587 genes
from the melonoma data;
• Previous studies show that WNT5A has been identified as
a gene of interest involved in melanoma;
• Controlling the influence of WNT5A in the regulation can
reduce the chance of melanoma metastasizing;
Applying mIC algorithm on Melanoma
Dataset
Partial biological prior knowledge:
MMP3 is expected to be the end of the pathway
WNT5A
Pirin causatively influences WNT5A –
In order to maintain the level of
WNT5A we need to directly control
WNT5A or through pirin.
WNT5A directly causes MART-1
Conclusion
• Evaluated IC algorithm using simulation data;
• We presented mIC algorithm that can infer gene causal
relationship from steady state data with gene topological
ordering information;
• Performed simulation based on Boolean network to
evaluate the performance of the causal algorithms;
• We applied mIC algorithm to real biological microarray
data Melanoma dataset;
• The result showed that some of the important causal
relationships associated with WNT5A gene have been
identified using mIC algorithm.