Integrative Genomics and Biomarker discovery

Download Report

Transcript Integrative Genomics and Biomarker discovery

Network Inference
Chris Holmes
Oxford Centre for Gene Function, &,
Department of Statistics
University of Oxford
Overview




Statistical Inference
Challenges of inferring network topology &
the structure of local dependencies
Use of “Integrative Genomics” to aid
inference
Conclusions
Inference


Inference is the process of “learning from
data”
We have two objects to infer:
 Network structure (topology)
 Functional form of the dependencies
within a given network structure
FY ( y )  y
y  FY ( y ), GY ( y )
Probabilistic (Bayesian) Networks



Graphical structure used to define interactions
which encode a set of conditional independencies
Way of simplifying a joint distribution
Have become extremely popular in genomics
- R. Cowell et al, Springer (1999)
- Friedman, http://www.cs.huji.ac.il/~nir/
Probabilistic Networks

Advantages:
 Coherent axiomatic framework
 Provides a calculus for integrating information
from multiple sources that guards against logical
inconsistencies
 Allows precise statements of uncertainty
- on global network structure (topologies), and marginals

Sequential Experimental design
- Calculate optimal follow up experiments to learn most
about the network structure given current state of
knowledge
Probabilistic Networks

Disadvantages:
 Causal relationships not explicitly handled


Dawid AP. Causal inference without counterfactuals (with
Discussion). J Am Statist Assoc (2000)
Restrictions on valid structures

Hammersley-Clifford theorem; Rue & Held,
Gaussian Markov Random Fields, Chapman Hall
(2005)
Network Inference

Prior on network space leads to posterior
Pr( F y )  Pr( y F ) Pr( F )

Computational framework to learn
 Markov Chain Monte Carlo: Wilks et al,
MCMC in practice, Springer, (1999)
 Stochastic search
Hypothesis-Driven Networks

Originally networks were hypothesis driven



Well defined small networks
Experiments set up to test specific hypothesis
Then arrival of high-throughput genomic
(disruptive) technologies
 Treats network structure unknown
 Data mining (data dredging?)
Bayesian Network Approach
Aim is to find graph topology that maximises likelihood given the data
Finding Optimal Network – Hard Problem
Data Driven Networks


Data is extremely sparse, compared with the
dimensionality of the network space
Great uncertainty in any conclusions


High numbers of false positives (false connections) and false
negatives (missing connections)
This uncertainty is encompassed in a fully Bayesian
model, via the posterior distribution on network space,
Pr(F | y)
The Learned Network Structure
Data Driven Networks

A problem with data mining approaches

Often the “data goes in one end and
the answer comes out the other end
untouched by human thought” –
adapted from Doug Altman
Further complicating issues

Dynamic networks


Network Dynamics


Imoto (2002); Beal et al, Bioinformatics
(2005)
Luscombe et al, Nature, (2004)
Interventional analysis

Ideker et al, Science, (2002)
Way Forward


More refined Prior structures
Multiple information sources

Literature mining


Comparative genomics


Rajagopalan, Bioinformatics (2005)
Amoutzias, EMBO (2004)
Combining other genomic measurement platforms

Schadt et al, Nat. Genet. (2005); Zhu et al, Cytogenet Genome Res.
(2004); Beer and Tavazoie, Cell. (2004)
Improving Network Inference
Perturbations
Genetics
Biological Context
Expression observations
Regulatory
Signals
Comparative
Genomics
Integrative Genomics


Combine information from multiple sources to
improve precision
Information is preserved across sources while
noise (random variation) is independent across
information sources
Germline
DNA
ENVIRONMENT
Somatic
DNA
RNA
Protein
Physiology
Sequencing
SNPs
Epigenetics &
CGH
Microarrays
Proteomics
Metabonomics
Schadt, Nat. Genet. July 2005.
Schadt et al.,
Transcription – cis and trans motifs
AND Logic:
AND Logic, OR Logic:
OR Logic, NOT Logic:
Combinatorial patterns help identify
groups of transcripts predicted to
show similar abundance profiles
Beer and Tavazoie, Cell. 2004
Solid: Actual expression Dashed: Predicted
Conclusions


Current move back towards more hypothesis driven
analysis on smaller networks
Conditioning on a well characterised network
structures and using multiple data sources to infer
and explore local topographic regions
References

Bayes nets: Friedman,
http://www.cs.huji.ac.il/~nir/