Presentation Title - University of Illinois at Urbana

Download Report

Transcript Presentation Title - University of Illinois at Urbana

Nathan D. Price
Department of Chemical and Biomolecular Engineering
Center for Biophysics and Computational Biology
Institute for Genomic Biology
University of Illinois, Urbana-Champaign
May 22, 2009
INSTITUTE for
GENOMIC
BIOLOGY
Meta-Analysis: What can we learn from integrating
data from 2000 microarrays?

How distinct and separable are each of the phenotypes?

What are the molecular features that are unique for each
phenotype?

Are these differences sufficient to enable identification of
phenotype just given the microarray?

What are shared molecular features between subsets of
phenotypes?

Can we reliably reconstruct networks active in bee brains
and at what scale?
A few key issues

Data normalization from the loop design arrays

Using knowledge from drosophila to aid in interpretation at
multiple levels
 Single gene
 Pathway
 Network
 Cell specificity?

Interpretation of data from homogenized brain
 Multiple cell types

Scale of network reconstruction/inference possible
 Particularly since we don’t have time series or molecular
perturbation experiments
Minimal sample needs for statistical learning

Differential

expression
 Pathway Analysis 

 Classification
 Network Inference 
10s
10s
10s-100s
100s-1000s
Data sets in the 1000s are rare in biology today –
so tremendous opportunity!
INSTITUTE for
GENOMIC
BIOLOGY
Classification: molecular signatures to differentiate
phenotypes
5
10
OBSCN expression
Classified as GIST
4
10
3
10
2
10
ClinicopathologicalDiagnosis
X – GIST
O - LMS
1
10
1
10
2
10
Classified as LMS
3
10
4
5
10
10
C9orf65 expression
Accuracy on data = 99% Predicted accuracy on future data (LOOCV) = 98%
Price, N.D. et al, PNAS 104:3414-9 (2007)
INSTITUTE for
GENOMIC
BIOLOGY
Multi-class classification: Example from brain disease
Disease (sample #)
Accuracy
Sensitivity
Specificity
Alzheimer’s (93)
99.9%
100%
99.8%
Astrocytoma I (35)
100%
100%
100%
Astrocytoma II (10)
99.1%
100%
99.1%
Astrocytoma III (43)
93.0%
72.1%
94.4%
ATRT (9)
100%
100%
100%
100%
100%
100%
Epilepsy (26)
99.6%
100%
99.6%
Glioblastoma (147)
92.5%
78.2%
96.2%
Ganglioneuroma (8)
100%
100%
100%
Medulloblastoma (51)
99.9%
98.0%
100%
Meningioma (67)
99.7%
98.5%
99.8%
Neuroblastoma (115)
99.9%
99.1%
100%
Oligodendroglioma II (32)
96.9%
68.8%
98.2%
Oligodendroglioma III (20)
97.4%
95.0%
97.5%
Parkinson’s (38)
100%
100%
100%
PNET (5)
100%
100%
100%
91.8%
99.1%
(atypical teratoid/rhabdoid tumor)
EFT (5)
(Ewing’s Family Disease)
(primitive neuroectodermal tumor)
Overall (704)
98.6%
Novel method for pathway analysis:
Differential rank conservation (DIRAC)
…across pathways in a phenotype
Highest
conservation
tightly regulated pathway
g3
g3
g3
g3
g2
g2
g1
g2
g1
g1
g2
g1
g4
g4
g4
g4
weakly regulated pathway
Lowest
conservation
…across phenotypes
for a pathway
shuffled pathway ranking
between phenotypes
GIST
LMS
g3
g4
g2
g1
g7
g6
g5
g7
g1
g3
g8
g8
g7
g6
g4
g2
g6
g7
g6
g8
g5
g5
g8
g5
Eddy et al, In preparation
Rank difference scores in GBM and normal
Diverse rank conservation in brain disease
Highest rank
conservation
Lowest rank
conservation
Pathways
Low conservation:
Astrocytoma, grade I
Lower conservation:
Astrocytoma, grade III
Lowest conservation:
Glioblastoma (grade IV)
Phenotypes
Differential regulation of pathway ranking in disease
Classification with DIRAC
Network inference
Training Set (268 conditions)
Test Set (24 conditions)
Similar accuracies –not overfitting and
has predictive capacity!
Bi-clustering for data
reduction and learning:
SAMBA, cMONKEY
Bonneau, R. et al, 7:R36, Genome Biology, 2006
Visual Representations
My lab has made a function Matlab-based version of the Inferelator and are
looking forward to testing it out on the BeeSpace data!
May 5, 2009
Final
Presentation
Project
Overview
Slide 14
geneWAY
Price Lab
Postdocs
Pan-Jun Kim
Amit Ghosh
Graduate Students
James Eddy
Shu-wen Huang
Matt Gonnerman
Swati Gupta
Caroline Milne
Ravali Raju
Jaeyun Sung
Chunjing Wang
Sriram Chandrasekaran
Collaborators
Funding Sources
Donald Geman, Johns Hopkins
NIH Howard Temin Pathway to Independence Award
Lee Hood, Institute for Systems Biology
NSF CAREER
Ilya Shmulevich, Institute for Systems Biology
Department of Defense – TATRC
Jonathan Trent, MD Anderson Cancer Center
Energy Biosciences Institute (BP)
Wei Zhang, MD Anderson Cancer Center