Statistical Analysis of Gene Expression Data (A Large

Download Report

Transcript Statistical Analysis of Gene Expression Data (A Large

Statistical methods and tools for
integrative analysis of perturbation
signatures
Mario Medvedovic
Laboratory for Statistical Genomics and Systems Biology
Department of Environmental Health
University of Cincinnati Medical Center
http://GenomicsPortals.org
Aims of the project
Methods for characterizing concordances in perturbation
signatures and constructing meta-signatures
Explaining LINCS signatures and meta-signatures by constructing
regulatory network models
Use of LINCS signatures and models to explain disease-related
signatures
On- and off-line computational infrastructure
Mario Medvedovic, Environmental Health, University of Cincinnati
Concordances in perturbations
signatures (eg gene expression)
( sgi , pgi ), sgi ... diff. exp. score; pgi ... probability of diff. exp.;
for gene g  1,..., G, and the perturbation signature i  1, 2
Given two differential expression signatures, are the genes
differentially expressed in both signatures more common than
expected by chance?
What are the genes with “unusually” high similarities in
differential expression?
Currently used statistical methods for addressing these questions
are inadequate.
Mario Medvedovic, Environmental Health, University of Cincinnati
Concordances in perturbations
signatures (eg gene expression)
( sgi , pgi ), sgi ... diff. exp. score; pgi ... probability of diff. exp.;
for gene g  1,..., G, and the perturbation signature i  1, 2
 Generalized Random Set (GRS) analysis
(Freudenberg et al.,
Bioinformatics 27: 70, 2011)
  p1g  sg2  pg2  s1g
1 g
g
X 

1
2   pg
pg2

g
 g





H0
f N ( E ( X ),Var ( X ))
“Meta-signature”
 1 2
2
1 
p

s
p

s
G
eg   g 1g  g 2g 
2   pk  pk 
k
 k

f H0 (e) obtained by randomly permuting gene labels
Mario Medvedovic, Environmental Health, University of Cincinnati
Concordances in perturbations
signatures (eg gene expression)
 GRS works significantly better than alternatives
 “Meta-signatures” of two “concordant”
signatures are more functionally coherent
 “Meta-signatures” accentuate common features
of two (possibly) different regulatory programs
Mario Medvedovic, Environmental Health, University of Cincinnati
Concordances in perturbations
signatures (eg gene expression)
 Extend the methodology to a group of signature
 Form groups of concordant signatures and
associated “meta-signatures” for different types of
readouts
 Integration across different perturbations
Mario Medvedovic, Environmental Health, University of Cincinnati
Regulatory network models of
signatures and meta-signatures
Integrated Perturbation Signature and Meta-Signatures
Integration across different types of readouts
Gene-level scores assessing the likelihood that the genes’ activity
readout is affected by one or a set of perturbations
Correlating with existing pathways
De-novo regulatory network constructions by integrating with the
global protein-protein protein-gene interaction networks
Mario Medvedovic, Environmental Health, University of Cincinnati
Network models of LINCS signatures
and meta-signaturs
Primary targets of the
perturbation
Drug-target interaction data
Regulatory
activity
scores for
all nodes
Signal transducers
Biochemical response data
+
Random
Network
Walk Model
=
Integrated
Regulatory
Network
Activity
Signature in
response to a
perturbation
Public domain
ChIP-seq
Transcription
regulation
TF1
TFn
Network
Meta-signatures
Active subnetworks
Known
pathways
Transcriptional
response
Change in gene expression
Public domain
transcriptional response to
perturbations
Mario Medvedovic, Environmental Health, University of Cincinnati
Library of Regulatory Network
models and signatures
Using LINCS signatures and models to
explain disease-relate signatures
Correlate the disease-related readouts (eg gene expression profile)
with corresponding LINCS signatures and meta-signatures
Associate LINCS models and complementary types of readouts
with the disease
Construct disease-specific regulatory model
Associate LINCS phenotypic readouts (eg images, proliferation,
apoptosis) with the disease
Mario Medvedovic, Environmental Health, University of Cincinnati
Genomics Portals
http://GenomicsPortals.org
LINCS Readouts of Cellular States (primary data)
Epigenomics
Events
microRNA
Expression
•Signature
comparison (GRS)
•Network models
•Interactive
network
visualization
Statistical
Analysis
Gene
Expression
Transcription
Factor Binding
Genomics Data
(R and Bioconductor)
Genomics
Portals
Visualization
•LINCS signatures
and meta-signatures
•Public domain
signatures and metasignatures
CpG
Islands
Gene Interaction KEGG
Ontology networks pathways
Machine
Learning
Analytical Tools
CGH
Functional
Knowledge Base
Literature
concepts
• New functional
knowledge
• New physiological
understanding
• New testable
hypotheses
Transcriptional
Modules
LINCS network
models and network
based signatures
Mario Medvedovic, Environmental Health, University of Cincinnati
Integration with other projects
•Raw data
•Meta data
•Methods for deriving
summaries and scores
•Signatures, networks
LINCS Readouts of Cellular States (primary data)
•Data dumps
•Direct db queries
•Web access to
analysis engines
(PI: Schurer)
Epigenomics
Events
microRNA
Expression
•Signature
comparison (GRS)
•Network models
•Interactive
network
visualization
•Integrated signatures and
models
•Meta-signatures
•Disease-related signatures
•Regulatory event
signatures
•Analysis engines for
comparisons against
signature, meta-signatures
and networks
Statistical
Analysis
Gene
Expression
Transcription
Factor Binding
Genomics Data
Gene Interaction KEGG
Ontology networks pathways
Machine
Learning
Analytical Tools
(R and Bioconductor)
Genomics
Portals
Visualization
•LINCS signatures
and meta-signatures
•Public domain
signatures and metasignatures
CGH
CpG
Islands
•Analytical
synergies
•Methods
•Tools
itNETZ: Integrative and
Translational Networkbased Cellular Signature
Analyzer (PI: Zhou)
Functional
Knowledge Base
Literature Transcriptional
concepts
Modules
• New functional
knowledge
• New physiological
understanding
• New testable
hypotheses
LINCS network
models and network
based signatures
Mario Medvedovic, Environmental Health, University of Cincinnati
Mario Medvedovic, Environmental Health, University of Cincinnati
A Systems Approach to
Elucidate Mechanisms of
Drug Activity and Sensitivity
(PI: Califano)
Out team
(http://BayesianGenomics.org )
The Team:
PI: Mario Medvedovic, Bioinformatician, Assoc Professor, Department of
Environmental Health
Co-I: Siva Sivaganesan, Statistician, Professor in Dept of Mathematics
Co-I: John Reinchard, Molecular Biologist, Research Scientist in Dept of
Environmental Health
Mukta Phatak, PhD Bioinformatician, Res Associate in SGSB Lab
Jing Chen, PhD Bioinformatician, Res Associate in SGSB Lab
Wen Niu, MS in CS and Mol Biol, Application Specialist in SGSB Lab
Mario Medvedovic, Environmental Health, University of Cincinnati