Network inference from repeated observations of node sets Neil
Download
Report
Transcript Network inference from repeated observations of node sets Neil
Network inference from repeated
observations of node sets
Neil Clark, Avi Ma'ayan
Network Inference
Protein-Protein interaction network
Cell signaling network
Overview
• Network inference - the deduction of an
underlying network of interactions from indirect
data.
1. A general class of network inference problem
2. Network inference approach
3. Application:
1. inference of physical interactions: PPI
2. Inference of gene associations: Stem cell genes
3. inference of statistical interactions: Drug/side effect
network
GMT files
The inference problem
• Input: a set of entities (genes or proteins or ...) in the form
of a GMT file - the results of experiments, or sampling
more generally.
• Assumptions:
• 1 An underlying network exists which relates the
interactions between the entities in the GMT file
• 2 Each line of the GMT file contains information on the
connectivity of the underlying network
• The problem: Given a GMT file can we extract enough
information to resolve the underlying network?
A synthetic example
Approach...
• Forget for the moment that we know the underlying network and
pretend we only have the GMT file.
• Attempt to use the accumulation of our course data to infer the fine
details of the underlying network.
• Consider the set of all networks that are consistent with our data there are likely to be many.
• Use an algorithm to sample this ensemble of networks randomly.
• The mean adjacency matrix gives the probability of each link being
present within the ensemble.
Inference live!
Information content
Analytic Approximation
• When applying this approach to real data typically there are large
numbers of nodes
• Sample space of networks can be very large -> computationally
demanding
• Write a simple analytical approximation which mimics the action of
the algorithm.
𝑝𝑖𝑗 = 1 −
𝑘
2𝛼
1−
𝑛𝑖𝑗𝑘
Compare analytic approximation
Correction for sampling bias
• Destroy any information by a random permutation of the GMT file
and compare the actual edge weight to the distribution of edge
weights from the randomly permuted GMT files:
Application to Infer PPIs
Malovannaya A et al. Analysis of the human endogenous coregulator complexome.
Cell. 2011 May 27;145(5):787-99
PPI network
Validataion
• Compare inferred PPI network to the following
databases:
– BioCarta
– HPRD PPIInnateDB
– IntAct
– KEGG
– MINT mammalia
– MIPS
– BioGrid
Comparison
Validation
Validation
Application to stem cells
• We used two types of high-throughput data from the
ESCAPE database (www.maayanlab.net/ESCAPE).
• Chip X data: from Chip-Chip and Chip-seq experiments.
– 203,190 protein DNA binding interactions in the proximity
of coding regions from 48 ESC-relevant source proteins.
• Logof followed by microarray data: A manually
compiled database of Protein-mRNA regulatory
interactions deriving from loss-of-function gain-offunction followed by microarray profiling.
– 154,170 interactions from 16 ESC-relevant regulatory
proteins from loss-of-function studies, and 54 from gainof-function studies.
Chip X network
Logof network
Combining networks
• Each data source gives a different perspective on
the associations between the genes
• New insights may possibly be gained by
combining the different perspectives. e.g. small
but consistent associations across different
perspectives will be revealed by the enhanced
signal-to-noise ratio.
𝑝𝑖𝑗 = 1 −
1−
𝑘1
2𝛼
𝑛𝑖𝑗 𝑘 1
1−
𝑘2
2𝛽
𝑛𝑖𝑗 𝑘 2
…
… …
Combination of Chip X and Logof
An extension of the approach...
Application II: Inference of Network of
statistical relationships in AERS database
• Adverse Event Reporting System (AERS) database contains records of
....
AERS Record 1 Drug 1, Drug 2, ...
AERS Record 2 Dug 3, Drug 4, ...
…
…
Side-effect 1, Side-effect 2, ...
Side-effect 3, Side effect 4, ...
AERS sub network
AERS Large-scale Adjacency Matrix
And finally…
Summary
• We described a general class of problem in network
inference.
• A network of physical interactions between proteins is
inferred based on high-throughput IP/MS experiments
• The method has been applied to examine associations
between stem-cell genes from multiple perspectives
• We have begun to apply the approach to the inference
of statistical interactions between drugs and sideeffects based on the AERS database
• More details can be found on the
website •
www.maayanlab.net/S2N