Transcript Document
Breast Cancer, Expression Profiles and
Binary Regression in 7000 Dimensions
Computational Diagnostics
We are a new research group in the department of
Computational Molecular Biology at the Max Planck
Institute for Molecular Genetics in Berlin-Dahlem.
Our group is part of the Berlin Center for Genome
Based Bioinformatics and participates in the NGFN
( National Genome Research Network ).
Rainer Spang, Harry Zuzan, Carrie Blanchette, Erich Huang, Holly Dressman, Jeff Marks, Joe Nevins, Mike West
Duke Medical Center & Duke University
Estrogen Receptor Status
•7000 genes
Research
•49 breast tumors
A comprehensive understanding of the mostly
subtle differences in gene expression in patient
specific cell samples is crucial for elucidating the
molecular characteristics of diseases as well as
for the optimal choice of treatment. Large scale
gene expression profiling allow for a systematic
investigation of the molecular characteristics of
diseases. Recently, there was tremendous
progress in the development of technologies that
allows for the parallel measurement of expression
levels for tens of thousands of genes. However, it is
still very challenging to interpret the data, and use
it in clinical decision processes.
•25 ER+
The focus of this group is to develop statistical
methodology for the use of gene expression
profiles in medical diagnostics. We aim to identify
pattern in expression profiles that improve or
facilitate diagnosis, help to predict clinical
outcome or refine common diagnostic schemes.
Members
Stefan Bentink
Web: www.molgen.mpg.de/~bentink
email: [email protected]
•24 ER-
7000 Numbers Are More
Numbers Than We Need
Overfitting: We Can Not
Identify a Model
Informative Priors
•There are many different models that assign
high probabilities for ER+ tumors and low
probabilities for ER- tumors in the training set
•For a new patient we find among these models
some that support that she is ER+ and others
that predict she is ER-
Likelihood
Prior
Posterior
Fon:(++49 +30) 8413 - 1352
Claudio Lottaz
Prior Choice
Web: www.molgen.mpg.de/~lottaz
email: [email protected]
Fon: (++49 +30) 8413 - 1352
Center
Orientation
Not to wide not to narrow
Florian Markowetz
auto adjusting model
Web: www.molgen.mpg.de/~markowet
hyper-parameters with
their own priors
email: [email protected]
Fon: (++49 +30) 8413 - 1352
Rainer Spang (head)
Assumptions on the model correspond to
assumptions on the diagnosis
orthogonal super-genes
Web: www.molgen.mpg.de/~spang
email: [email protected]
Fon: (++49 +30) 8413 - 1352
Which Genes Have Driven the Prediction ?
Stefanie Scheid
Web: www.molgen.mpg.de/~scheid
email: [email protected]
Gene
Weight
nuclear factor 3 alpha
0.853
cysteine rich heart protein
0.842
estrogen receptor
0.840
Publications
intestinal trefoil factor
0.840
x box binding protein 1
0.835
Prediction and uncertainty in the analysis of gene expression profiles
gata 3
0.818
Rainer Spang, Carrie Blanchette, Harry Zuzan, Jeffrey R. Marks, Joseph Nevins and Mike West
ps 2
0.818
liv1
0.812
Fon: (++49 +30) 8413 - 1352
Proceedings of the German Conference on Bioinformatics GCB 2001
... many many more ...
...
Predicting the clinical status of human breast cancer by using gene expression profiles
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA Jr, Marks JR, Nevins
JR.
Proc Natl Acad Sci U S A. 2001 Sep 25;98(20):11462-7
Role for E2F in control of both DNA replication and mitotic functions as revealed from DNA
microarray analysis
Ishida S, Huang E, Zuzan H, Spang R, Leone G, West M, Nevins JR.
Mol Cell Biol. 2001 Jul;21(14):4684-99
What are the additional assumptions that came in by the prior?
•The model can not be dominated by only a few super-genes ( genes! )
•The diagnosis is done based on global changes in the expression profiles influenced by many
genes
•The assumptions are neutral with respect to the individual diagnosis