Transcript Slide 1
Toward a Characterization of Gene
Expression in Single Tumor Samples
JOSEPH LUCAS
The Power of Microarrays
Promise of “personalized medicine”
Lack of consistency/reproducability
Problem with overfitting
Microarrays from lab bench to clinic
Data collection bias and quality control
In vitro -> in vivo
In vivo -> in vitro
Translation of meta-genes from in vitro to in vivo
Overview
Laboratory bias
Experiments on the lab bench
Correction doping controls
Modeling to alleviate bias
Tumor expression
Factors as markers of pathway activity
Biological relevance
Clinical relevance
Beyond
Oncogene Upregulation
Human Mammary Epithelial Cells (HMEC)
9 upregulated oncogenes and one set of controls
Data collected in three batches
Demonstrates collection bias
Bild et al., “Oncogenic pathway signatures in human cancers as
a guide to targeted therapies” Nature 439 (19), 2006
Collection Bias
Doping control
Should be identical
across all observations
Collection Bias
Consistent errors across many
genes
May obscure interesting biology
Modeling to Correct Collection Bias
Systematic Errors
NFE2L1
Before Subtracting Error
Systematic Errors, Corrected
Before Subtracting Error
NFE2L1
Upregulation
of MYC
• NFE2L1 – regulation of apoptosis
• MYC binding sequence in promoter
• GENES & DEVELOPMENT (2003-01-15)
After Subtracting Error
Single Sample – Factor Modeling
Personalized medicine
Need to deal with one array at a time
Can not use the same correction technique
Relative levels of genes within a sample should be informative
Single Sample – Factor Modeling
Personalized medicine
Need to deal with one array at a time
Can not use the same correction technique
Relative levels of genes within a sample should be informative
Design Matrix
Latent Factors
Latent factors for Correction of Lab Bias
Microenvironment Experiments
Chen et al., “Genomic analysis of response to lactic acidosis in
human cancers”
Exact same conditions as oncogene experiment
24 “control” arrays split across 2 labs and 4 time
points
Uncorrelated measurements of gene expression?
Correlation between Two Different Samples
Microenvironment, array #1
Consistently Correlated across all Pairs
Correlation of ¼ -0.6
Microenvironment, array #1
Factor Model almost Eliminates Correlation
Before Correction
After Correction
Microenvironment, array #1
Factor Model almost Eliminates Correlation
Before Correction
After Correction
Oncogene, array #7
Microarray Quality Control (MAQC)
120 arrays, also U133+ 2.0
6 different labs
5 repetitions per group
4 groups
Universal Human Reference RNA
Human Brain Reference RNA
Titration of RNA to form groups
Nature Biotechnology, all of volume 24 (2006)
Example 2
We believe these are collection errors
Due to pH, temperature, duration before washing, etc
Errors should be universal for U133+ 2.0 arrays
Keep all oncogene and microenvironment control
observations
Keep all 120 observations from MAQC
Mean expression for each gene is different between
MAQC and HMEC’s
Refit model, but assume error correction is same!
Retain Ability to Correct Bias in HMEC
Improved Fidelity
Have we improved the fidelity?
Raw data, Labs 1,2,3,5
Raw data, Lab 4
Raw data, Lab 6
Corrected data
UH
75% UH
25% UH
HB
UH – Universal Human Reference RNA
HB – Human Brain Reference RNA
Improved Fidelity
Very different error types, both corrected
Differentially Expressed Gene
UH
75% UH
25% UH
HB
UH – Universal Human Reference RNA
HB – Human Brain Reference RNA
UH
75% UH
25% UH
HB
Defining Success
By design, should be monotone ordering
Does probability of correctly ordering increase?
Before Correction
UH
75% UH
25% UH
After Correction
HB
Red points are not monotone ! failure
UH
75% UH
25% UH
HB
Red points are monotone ! success
MAQC Experiment
More than Error Correction?
Can correct biases from vastly different experiments
Aggregate data from multiple labs across multiple time points
Analyze and incorporate new data as it comes in
More than Error Correction?
Can correct biases from vastly different experiments
Aggregate data from multiple labs across multiple time points
Analyze and incorporate new data as it comes in
Metagenes discovered in vitro can be used as in vivo
phenotypes, however
Signatures developed in cloned cells
Lack biological variability
In vivo, other pathways will be active/inactive
Factor Evolution
Break down into multiple pathways in vivo
Evolutionary factor search to dissect and enhance signatures
Carvalho, et al., “High-dimensional sparse factor modelling Applications in gene expression genomics.”, submitted
Consider behavior of genes in vivo
Miller, et al., “An expression signature for p53 status in human
breast cancer predicts mutation status, transcriptional effects, and
patient survival”, PNAS, 102, 13550-13555 (2005)
in vitro
!
in vivo
* Mean()
Initial
genes
New
genes
'225378_at'
'225399_at'
'225407_at'
'225493_at'
'225527_at'
'225681_at'
'225768_at‘
.
.
.
Expression
differences from
lactic acidosis
experiment
Highly
differentially
expressed genes
factors
P53 Wild Type versus Mutant
• Each factor is a collection of genes
that are expressed together across all
samples
P53 Wild Type versus Mutant
• Combinations of factors
are predictive of important
phenotypes
Tamoxifen
Didn’t receive Tamoxifen
Treated with Tamoxifen
Dark Blue
• All patients receiving Tamoxifen were ER positive
• Tamoxifen sensitivity independent of ER status
Light Blue
Endothelial Cell Signature?
Contains 143 of the 188 genes in a known
microvascular endothelial cell signature
Chi et al., “Endothelial cell diversity revealed by global
expression profiling”, PNAS, 100 (19), 2003
“independently of ER status of tumor cells, Tam
could affect the microvessel structure through the
antagonism with endothelial cells ER”
Clinical Cancer Research Vol. 7, 2656-2661, September 2001
[Tamoxifen] “inhibited tube formation by rat
microvascular endothelial cells”
Gen Pharmacol (2000) 34: 107-16
Estrogen Receptors
Trained on Miller
Predictive on others (Wang)
Breast Tumor Factors in Lung
Factor behavior in Lung tissue
Endothelial cell factor
Estrogen Receptor factor
Endothelial Cell Factor
Breast Cancer Samples
Lung Cancer Samples
Estrogen Receptor Factor
Breast Cancer Samples
Lung Cancer Samples
Summary
Correction of laboratory biases
Allows aggregation of multiple data sets
Discovery of conserved metagenes relevant to
Survival
Cellular phenotypes, ER, PgR, P53
Identification of novel biology
Within a framework that allows identification of
meta-genes on single arrays
Beyond?
Concurrent modeling of multiple different tumor types
Collaborators
Statistics
Mike West
Carlos Carvalho
Dan Merl
Quanli Wang
Biology
Jen-Tsan Ashley Chi
Joe Nevins
Julia Ling-Yu Chen
Andrea Bild
Microarrays to Identify Phenotypes
Disease Diagnosis
Cancer
Alzheimers
Survival prediction
Infection
Metastasis prediction
Psoriatic Arthritis
Drug susceptibility
Leber’s Congenital Amaurosis
.
Usher syndrome
.
.
.
Development
.
Embryonic development
.
Cellular differentiation
Radial symmetry
Internal structure
.
.
.
Obesity
Oligo GEArray® Mouse Obesity
Microarray: OMM-017
•PharmaFrontier Co., Ltd.
•Genetel Pharmaceuticals
•Hong Kong DNA Chips
.
.
.
•Liver Int. 2005 Dec;25(6):1091-6.
•Obesity Research 11:188-194 (2003)
•Physiological Genomics 20:224-232 (2005)
.
.
.
Alzheimers
Oligo GEArray® Human
Alzheimer's Disease Microarray
•PNAS 2004 Feb 17;101(7):2173-8.
Epub 2004 Feb 9
•The Journal of Neuroscience, Feb 9,
2005, 25(6):1571-1578
•Ann Neurol. 2005 Dec;58(6):909-19
.
.
.
•Primorigen Biosciences
•ProteomTech
•Ciphergen Biosystems, Inc
Cancer
Disease Diagnosis
“Incipient Alzheimer's disease:
Microarray correlation analyses
reveal major transcriptional and
tumor suppressor responses” PNAS,
February 17, 2004, vol. 101 no. 7,
2173-2178
• “Microarray Analyses of Peripheral Blood
Cells Identifies Unique Gene Expression
Signature in Psoriatic Arthritis” Mol Med.
2005 Jan–Dec; 11(1-12): 21–29
Microarrays at IGSP
Joseph Nevins – Rb/E2F pathway
Jen-Tsan Ashley Chi – tumor microenvrionment
Phil Febbo – gene expression as phenotypes
Anil Potti – individualized chemotherapy
Tom Kepler – activation of dendritic cells
Gregory Wray – development in echinoderms
Paul Magwene – co-expression in microorganisms
Geoffrey Ginsburg – expression in peripheral blood
John Olson – surgical oncology
Ornit Chiba-Falek –
Philip Benfy – development and cell differentiation in Arabidopsis
>95 papers published by the IGSP microarray facility since 1999
Expanding the Role of
Need not include only collection bias
Identifying signatures in other samples
Factors associated with:
Lactic acidosis
Hypoxia
Various oncogenes
Other sources
Gene lists
Simple experiment
Change in g,j
Estrogen Receptors
Trained on Miller
Predictive on others (Wang)
Progesteron Receptors
Trained on Miller
Predictive on others (Massague)
• Makes use of the ER
factor and a new PgR
specific factor
Predicting Survival
Trained on Miller
Predictive on the others (Wang)
TGF -
Progesteron Receptor
Breast Cancer Samples
Lung Cancer Samples
P53 Mutants Breast vs. Lung
Breast Cancer Samples
Lung Cancer Samples
TGF -
Breast Cancer Samples
Lung Cancer Samples
Estrogen Receptor Breast vs. Ovarian
Breast Cancer Samples
Ovarian Cancer Samples