Transcript file1
Inferring subnetworks from perturbed expression profiles
Dana Pe’er, Aviv Regev, Gal Elidan and Nir Friedman
Bioinformatics, Vol.17 Suppl. 1 2001
Motivation
• Expression profiles give genome wide
information about the state of metabolism,
gene regulation, signal transduction, etc.
• One would like to infer functional
relationships between the genes from this
data.
• Perturbations such as mutations give
insight into the effects of particular genes
and help us infer causal relationships.
Tool – Bayesian Networks
Z
Pr(X|Z)
X
Pr(Y|Z)
Y
• Random Variables: Gene Expression Levels
• Probabilistic Dependencies: Regulatory Interactions
Goal of Paper
• Extend Bayesian framework in cellular
context to deal with mutations
• Develop better methods to discretize data
• Define and learn new features in our
model such as mediators, activators, and
inhibitors
• Construct subnetworks of strong statistical
significance
Learning Networks
• Network is learned through maximizing a score
function with respect to the collected data.
S ( D : G) Slocal ( X i , Pa : D)
G
i
i
Slocal ( X i , U : D) log( Pai U ) log P( X i [m] | U [m], )dP( )
m
D = Data; G = Graph; Pa = Parent; X = expression level; m = sample number
Equivalent Graphs
• Two graphs may imply the same dependencies and are
called equivalent.
X
Y
Z
=
X
Y
Z
• So instead of directed graphs we make partially directed
graphs.
X
Y
Z
Learning with Mutations
• If gene X is mutated we replace its expression level by a constant.
For example if X is knocked out, its expression is replaced by 0.
• Our new score function is:
Slocal ( X i ,U : D) log( Pai U ) log
P( X [m] | U [m], )dP( )
i
m , X i Int ( m )
Where Int(m) is the set of “intervened” (mutated) variables
in experiment m.
• Notice that two structurally equivalent graph are no longer
guaranteed to get the same score. If two graphs get the same score
under this scoring function they are called “intervention equivalent.”
Other Perturbations
• Temperature sensitivity, kinetic mutations, and environmental stress
can also be model in the Bayesian Network framework.
• A node is added for each condition which can take the values “on” or
“off.”
Temperature
Y
X
Z
What Do Bayesian Networks Buy Us
1) Markov Neighbors (Direct Relationships)
X
X
Y
Y
Z
X
Y
2) Activator/Inhibitors
A
X
B
Y
Let U = Parents(Y) – X. If for all states u of U we have:
P (Y 1 | X , u )
increasing as X increases then we say X is an activator.
P(Y 1 | X , u ) decreasing as X decreases then we say X is an inhibitor.
3) d-Seperation: Mediators
Z
U
X
Y
Both Z and U d-separate X and Y. In this framework
they would be called mediators of X and Y.
Feature Confidence
•
A confidence can be associated with each feature which measures how sure
we are about truth of the detected feature. This confidence is given by:
P( f (G ) | D)
f (G) P(G | D)
High
Scoring
G
where f(G) is the indicator function of the feature of interest
Building Significant Subnetworks
1)
Naïve Approach:
For some threshold, T, find all if edges such that confidence is
above T. For all maximally connected subgraphs of size greater
than 3, grow out the graph by adding edges which have
confidence greater than some weaker threshold S.
Z
A
Y
X
B
2) Score-based Approach
They want to build a subnetwork and associate a score measuring the
networks significance. If we build a network with k nodes from a possible
n nodes and include k edges, the score we assign the network is:
n K
( )( ) F (ci )
k l
i
where K is (k choose 2), the number of possible edges on k nodes, ci is the
confidence of edge i, and F(x) is probability that an edge has confidence
greater than or equal to x.
F(x) is estimated by calculated by counting the number of edges with
confidence greater than x.
Using this criteria networks are built from seeds as in the naïve approach
are grown one node at a time.
Data
•
•
•
•
•
•
•
The Rosetta Inpharmatics Compendium
Organism: S. cerevisiae
300 complete genomes (experiments)
276 deletion mutations
11 tetracyclin regulatable alleles
13 chemical treated cultures
In this paper 565 genes analyzed
Pairwise Relations
• The method can recognize functional relationships missed by
similiarity. Scores are reported as (Confidence, Pearson Correlation)
Purine Biosynthesis pair:
(.797, .518)
ADE2
Novel Predictions:
ESC4
Chromatin
silencing
ADE1
(.914, .162)
KU70
DNA break
repair
Literature search reveal strong support for this interaction.
Seperator Relations
Transcription Regulators: Nuclear Fusion
Post-Translational Activation (by phosphorylation): cell wall integrity pathway
Post-Translational Negative Regulation: G-protein mating signalling pathway
KAR4
SST2
FUS1
-
AGA1
TEC1
SLT2
Rlm1p
Swi4/6
STE6
Subnetwork Analysis
KAR4
SST2
TEC1
SLT2
KSS1
YLR343W
YLR334C
SLT2
STE6
FUS1
PRM1
AGA1
AGA2 TOM6
FIG1
FUS3
YEL059W
•They claim they often get modular components
•More structure than clustering alone
•Visual inspection can give clue to unknown gene functions
•STE12 missing and marginal position of FUS3 disturbing
•http://www.cs.huji.ac.il/labs/compbio/ismb01/
Conclusions
• This technique is better than clustering alone because confidence
measures can detect interactions previously undetected. Also, we
get more specific information about structure of interaction networks
so it is easier to guess at unknown gene functions.
• Statistical significance of features allows biological exploration of
interaction network.
• Can not recover all interactions
• No incorporation of previous biological knowledge