Transcript Slide 1

Toward a Characterization of Gene
Expression in Single Tumor Samples
JOSEPH LUCAS
The Power of Microarrays
 Promise of “personalized medicine”
 Lack of consistency/reproducability
 Problem with overfitting
 Microarrays from lab bench to clinic
 Data collection bias and quality control
 In vitro -> in vivo
 In vivo -> in vitro
 Translation of meta-genes from in vitro to in vivo
Overview
 Laboratory bias
 Experiments on the lab bench
 Correction doping controls
 Modeling to alleviate bias
 Tumor expression
 Factors as markers of pathway activity
 Biological relevance
 Clinical relevance
 Beyond
Oncogene Upregulation
 Human Mammary Epithelial Cells (HMEC)
 9 upregulated oncogenes and one set of controls
 Data collected in three batches
 Demonstrates collection bias
 Bild et al., “Oncogenic pathway signatures in human cancers as
a guide to targeted therapies” Nature 439 (19), 2006
Collection Bias
 Doping control
 Should be identical
across all observations
Collection Bias
 Consistent errors across many
genes
 May obscure interesting biology
Modeling to Correct Collection Bias
Systematic Errors
NFE2L1
Before Subtracting Error
Systematic Errors, Corrected
Before Subtracting Error
NFE2L1
Upregulation
of MYC
• NFE2L1 – regulation of apoptosis
• MYC binding sequence in promoter
• GENES & DEVELOPMENT (2003-01-15)
After Subtracting Error
Single Sample – Factor Modeling
 Personalized medicine
 Need to deal with one array at a time
 Can not use the same correction technique
 Relative levels of genes within a sample should be informative
Single Sample – Factor Modeling
 Personalized medicine
 Need to deal with one array at a time
 Can not use the same correction technique
 Relative levels of genes within a sample should be informative
Design Matrix
Latent Factors
Latent factors for Correction of Lab Bias
 Microenvironment Experiments
 Chen et al., “Genomic analysis of response to lactic acidosis in
human cancers”
 Exact same conditions as oncogene experiment
 24 “control” arrays split across 2 labs and 4 time
points
 Uncorrelated measurements of gene expression?
Correlation between Two Different Samples
Microenvironment, array #1
Consistently Correlated across all Pairs
Correlation of ¼ -0.6
Microenvironment, array #1
Factor Model almost Eliminates Correlation
Before Correction
After Correction
Microenvironment, array #1
Factor Model almost Eliminates Correlation
Before Correction
After Correction
Oncogene, array #7
Microarray Quality Control (MAQC)
 120 arrays, also U133+ 2.0
 6 different labs
 5 repetitions per group
 4 groups
 Universal Human Reference RNA
 Human Brain Reference RNA
 Titration of RNA to form groups
 Nature Biotechnology, all of volume 24 (2006)
Example 2
 We believe these are collection errors
 Due to pH, temperature, duration before washing, etc
 Errors should be universal for U133+ 2.0 arrays
 Keep all oncogene and microenvironment control
observations
 Keep all 120 observations from MAQC
 Mean expression for each gene is different between
MAQC and HMEC’s
 Refit model, but assume error correction is same!
Retain Ability to Correct Bias in HMEC
Improved Fidelity
 Have we improved the fidelity?
Raw data, Labs 1,2,3,5
Raw data, Lab 4
Raw data, Lab 6
Corrected data
UH
75% UH
25% UH
HB
UH – Universal Human Reference RNA
HB – Human Brain Reference RNA
Improved Fidelity
 Very different error types, both corrected
Differentially Expressed Gene
UH
75% UH
25% UH
HB
UH – Universal Human Reference RNA
HB – Human Brain Reference RNA
UH
75% UH
25% UH
HB
Defining Success
 By design, should be monotone ordering
 Does probability of correctly ordering increase?
Before Correction
UH
75% UH
25% UH
After Correction
HB
Red points are not monotone ! failure
UH
75% UH
25% UH
HB
Red points are monotone ! success
MAQC Experiment
More than Error Correction?
 Can correct biases from vastly different experiments
 Aggregate data from multiple labs across multiple time points
 Analyze and incorporate new data as it comes in
More than Error Correction?
 Can correct biases from vastly different experiments
 Aggregate data from multiple labs across multiple time points
 Analyze and incorporate new data as it comes in
 Metagenes discovered in vitro can be used as in vivo
phenotypes, however



Signatures developed in cloned cells
Lack biological variability
In vivo, other pathways will be active/inactive
Factor Evolution
 Break down into multiple pathways in vivo
 Evolutionary factor search to dissect and enhance signatures


Carvalho, et al., “High-dimensional sparse factor modelling Applications in gene expression genomics.”, submitted
Consider behavior of genes in vivo

Miller, et al., “An expression signature for p53 status in human
breast cancer predicts mutation status, transcriptional effects, and
patient survival”, PNAS, 102, 13550-13555 (2005)
in vitro
!
in vivo
* Mean()
Initial
genes
New
genes
'225378_at'
'225399_at'
'225407_at'
'225493_at'
'225527_at'
'225681_at'
'225768_at‘
.
.
.
Expression
differences from
lactic acidosis
experiment
Highly
differentially
expressed genes
factors
P53 Wild Type versus Mutant
• Each factor is a collection of genes
that are expressed together across all
samples
P53 Wild Type versus Mutant
• Combinations of factors
are predictive of important
phenotypes
Tamoxifen
Didn’t receive Tamoxifen
Treated with Tamoxifen
Dark Blue
• All patients receiving Tamoxifen were ER positive
• Tamoxifen sensitivity independent of ER status
Light Blue
Endothelial Cell Signature?
 Contains 143 of the 188 genes in a known
microvascular endothelial cell signature

Chi et al., “Endothelial cell diversity revealed by global
expression profiling”, PNAS, 100 (19), 2003
 “independently of ER status of tumor cells, Tam
could affect the microvessel structure through the
antagonism with endothelial cells ER”

Clinical Cancer Research Vol. 7, 2656-2661, September 2001
 [Tamoxifen] “inhibited tube formation by rat
microvascular endothelial cells”

Gen Pharmacol (2000) 34: 107-16
Estrogen Receptors
Trained on Miller
Predictive on others (Wang)
Breast Tumor Factors in Lung
 Factor behavior in Lung tissue
 Endothelial cell factor
 Estrogen Receptor factor
Endothelial Cell Factor
Breast Cancer Samples
Lung Cancer Samples
Estrogen Receptor Factor
Breast Cancer Samples
Lung Cancer Samples
Summary
 Correction of laboratory biases
 Allows aggregation of multiple data sets
 Discovery of conserved metagenes relevant to
 Survival
 Cellular phenotypes, ER, PgR, P53
 Identification of novel biology
 Within a framework that allows identification of
meta-genes on single arrays
 Beyond?

Concurrent modeling of multiple different tumor types
Collaborators
Statistics
Mike West
Carlos Carvalho
Dan Merl
Quanli Wang
Biology
Jen-Tsan Ashley Chi
Joe Nevins
Julia Ling-Yu Chen
Andrea Bild
Microarrays to Identify Phenotypes
Disease Diagnosis
Cancer
Alzheimers
Survival prediction
Infection
Metastasis prediction
Psoriatic Arthritis
Drug susceptibility
Leber’s Congenital Amaurosis
.
Usher syndrome
.
.
.
Development
.
Embryonic development
.
Cellular differentiation
Radial symmetry
Internal structure
.
.
.
Obesity
Oligo GEArray® Mouse Obesity
Microarray: OMM-017
•PharmaFrontier Co., Ltd.
•Genetel Pharmaceuticals
•Hong Kong DNA Chips
.
.
.
•Liver Int. 2005 Dec;25(6):1091-6.
•Obesity Research 11:188-194 (2003)
•Physiological Genomics 20:224-232 (2005)
.
.
.
Alzheimers
Oligo GEArray® Human
Alzheimer's Disease Microarray
•PNAS 2004 Feb 17;101(7):2173-8.
Epub 2004 Feb 9
•The Journal of Neuroscience, Feb 9,
2005, 25(6):1571-1578
•Ann Neurol. 2005 Dec;58(6):909-19
.
.
.
•Primorigen Biosciences
•ProteomTech
•Ciphergen Biosystems, Inc
Cancer
Disease Diagnosis
“Incipient Alzheimer's disease:
Microarray correlation analyses
reveal major transcriptional and
tumor suppressor responses” PNAS,
February 17, 2004, vol. 101 no. 7,
2173-2178
• “Microarray Analyses of Peripheral Blood
Cells Identifies Unique Gene Expression
Signature in Psoriatic Arthritis” Mol Med.
2005 Jan–Dec; 11(1-12): 21–29
Microarrays at IGSP












Joseph Nevins – Rb/E2F pathway
Jen-Tsan Ashley Chi – tumor microenvrionment
Phil Febbo – gene expression as phenotypes
Anil Potti – individualized chemotherapy
Tom Kepler – activation of dendritic cells
Gregory Wray – development in echinoderms
Paul Magwene – co-expression in microorganisms
Geoffrey Ginsburg – expression in peripheral blood
John Olson – surgical oncology
Ornit Chiba-Falek –
Philip Benfy – development and cell differentiation in Arabidopsis
>95 papers published by the IGSP microarray facility since 1999
Expanding the Role of 
 Need not include only collection bias
 Identifying signatures in other samples
 Factors associated with:
 Lactic acidosis
 Hypoxia
 Various oncogenes
 Other sources
 Gene lists
Simple experiment
Change in g,j
Estrogen Receptors
Trained on Miller
Predictive on others (Wang)
Progesteron Receptors
Trained on Miller
Predictive on others (Massague)
• Makes use of the ER
factor and a new PgR
specific factor
Predicting Survival
Trained on Miller
Predictive on the others (Wang)
TGF - 
Progesteron Receptor
Breast Cancer Samples
Lung Cancer Samples
P53 Mutants Breast vs. Lung
Breast Cancer Samples
Lung Cancer Samples
TGF - 
Breast Cancer Samples
Lung Cancer Samples
Estrogen Receptor Breast vs. Ovarian
Breast Cancer Samples
Ovarian Cancer Samples