Transcript Slide 1

Mechanistic Networks: explaining gene expression data
using literature-based molecular interactions
Andreas Krämer, Stuart Tugendreich, Jeff Green; Ingenuity Systems, 1700 Seaport Blvd, 3rd Floor, Redwood City, CA 94063
Summary
The Ingenuity® Knowledge Base contains over 4.5 million findings
curated from the scientific literature and third party databases, and
constitutes the basis for a large-scale network of molecular
interactions. Most of these interactions represent direct or indirect
causal relationships between genes and chemicals in various
experimental situations. We present a method to identify causal
interactions that are likely relevant in the context of a given gene
expression data set, and construct regulatory networks upstream of
the genes whose expression has been observed to change. This
identifies potential molecular signaling mechanisms that explain the
observed expression changes.
Identification of causal relationships
between upstream regulators that are
likely implicated in the data set
Building mechanistic networks from
upstream regulators and upstream causal
edges
Key idea: Relevant causal edges (AB in the diagram below) are
enriched in “causal triangles” with respect to regulated genes in the
data set:
We identify possible signaling cascades that connect an upstream
regulator to the up- or down-regulated genes through several steps.
Hypothesis networks are constructed “top down”, starting with any
identified regulator as the root node. The expectation is that the root
node is an indirect regulator (like a ligand or receptor) while the
“bottom layer” of the network contains transcription factors that are
directly connected to the data through expression edges. Expression
edges to the data are not included in the networks shown below.
A
A, B: upstream regulators
C: data set gene
B
expression edges
Method :
C
Ingenuity Knowledge Base and literature–
based global causal network
Ingenuity Knowledge Base
Given presence of a “causal mechanism” A B  C: It is likely
(with sufficient coverage) that the causal effect A C has also
been observed and is present in the causal network.
Method:
Literature findings
+ public databases
Network of causeeffect relationships
• Ingenuity Knowledge Base contains ~4.5M findings from the
biomedical literature
• Causal network with ~39000 nodes and ~116000 edges
• Nodes represent genes, chemicals, microRNA
• Edges represent cause-effect relationships (experimental
observations) and binding events
• Edge types: expression/transcription, molecular modification,
activation/inhibition, proteolysis, localization etc.
• Edges are associated with direction of effect (increase/decrease)
• Many edges represent indirect cause-effect relationships that
were observed in various contexts (tissue, cell-type etc).
Inference of upstream regulators from
gene expression data
Given a data set of up- and down-regulated genes, determine
upstream molecules in the causal network that are connected to
data set molecules through transcription or expression edges.
1) Determine set S of upstream regulators that pass a given
overlap p-value cut off
2) For each causal edge between any pair of regulators in S
calculate an edge p-value based on the overlap between the
corresponding regulated genes in the data set (FET p-value,
data set as universe):
For network with “breadth” N and “depth” K:
1) Starting from any upstream regulator, select N regulators that are
connected downstream through edges with lowest edge p-values.
2) For each of those regulators perform Step 1 recursively. Stop if
maximal path length K is reached. Avoid cycles.
3) Build network from union of all paths.
Example 1: A mechanistic network for the
estradiol data set
A
A
B
B
data set
data set
Causal relationships AB with significant overlap (low edge pvalue) are more likely part of a regulation mechanism that can
explain the observed data.
Example 2: Primary human endothelial cells
stimulated with TNF-a
D. Viemann et al., J. Leukoc Biol 2006, 80(1): 174-185 GSE2639
Example 1: Top-scoring causal edges (low
edge p-values) for estradiol data set
Regulator A
Top-scoring upstream regulators:
| Regulator B | Edge p -value
Regulators can be
regulator
• Transcription factors
• Any molecule (incl. endogenous
chemicals, drugs, microRNA) (using
indirect expression findings)
Mechanistic network for TNF as upstream regulator identifies known
NFkB pathway:
data set
A. Overlap p-value (used as a score)
Measures significant overlap between expression
pattern and genes affected by given regulator
(Fisher’s Exact Test, right-tailed)
B. Activation z-score
Infer activation state of regulator by testing for match
in up/down regulation pattern (z-score)
regulator
+ - ++
overlap
Applications:
• Generate hypotheses about mechanism of action
• Find potential regulators with similar response
• Find potential regulators with opposite response
Inferred upstream causal edges are
enriched in canonical pathways
Consider signaling pathways in the Ingenuity Pathway Library and
overlay relationships from the global causal network. Example:
Example 1: Estradiol exposure in MCF-7
breast cancer cells
CY Lin et al., PLoS Genet 2007, 3(6):e87 GSE11352
Top-scoring upstream regulators ordered by overlap p-value:
• pool all 321 signaling pathways in one
network
• use edge p-value to predict actual
pathway edges from all overlaid edges
• ROC for estradiol data set 
true positive rate
Upstream Regulator Analysis and
Mechanistic Networks are new
features available in Ingenuity®
Pathway Analysis (IPA®)
false positive rate