Li-An - American Statistical Association
Download
Report
Transcript Li-An - American Statistical Association
Prognostic Model Building with
Biomarkers in Pharmacogenomics Trials
Li-an Xu & Douglas Robinson
Statistical Genetics & Biomarkers
Exploratory Development, Global Biometric Sciences
Bristol-Myers Squibb
2006 FDA/Industry Statistics Workshop
Theme - Statistics in the FDA and Industry: Past, Present, and Future
Washington, DC
September 27-29, 2006
Outline
Statistical Challenges in Prognostic Model Building
Data quantity and quality across multiple platforms
Dimension reduction in model building process
Model performance measures
Realistic assessment of model performance
Handling correlated predictors: when p >> n
2
Data Quantity and Quality Across Platforms
Tumor samples for mRNA
Trial A Sample Size : 161 Subjects
Trial B Sample Size : 110 Subjects
Trial B Sample Size : 110 Subjects
83 usable mRNA samples (75%)
Plasma protein profiling (Liquid Chromatography / Mass
Spectrometry)
134 usable (sufficient quality and quantity) mRNA samples
(85%)
90 usable plasma samples (82%)
Even if sample collection is mandatory, usable sample size <
subject sample size
Need to design studies based on expected usable
sample size
3
Dimension Reduction in Prognostic Model Building
Number of potential predictors is greater than number of
subjects (p>>n) in high throughput biomarker studies
No unique solutions in prognostic model fitting with
traditional methods
Regularized methods can provide some possible solutions
Penalized logistic regression (PLR) + Recursive
Feature Elimination (RFE)
Threshold gradient descent + RFE
Further dimension reduction may still be needed
Incorporate prior information (e.g. results from
preclinical studies as the starting point for p)
Intersection of single-biomarker results from multiple
statistical methods
4
Dimension Reduction Through Penalized
Logistic Regression with Recursive Feature
Elimination to Select Genes
Training Set
Average Cross-validation Error
~22,000
genes
G
e
n
e
s
Patients
1 gene
Choose the model with
the smallest crossvalidation error and
fewest genes
Number of predictors in model
5
Dimension Reduction Through Preclinical Studies
Predicting cell line sensitivity to a compound
18 cancer cell lines (12 sensitive, 6 resistant)
Identified top 200 genes associated with in vitro
sensitivity/resistance
Sensitive
Zr-75-1
MCF7
18 Caner Cell Lines
Zr-75-30
BT474
Her2MCF7
HCC1428
MDAMB436
HCC70
Hs578T
SkBr3
MDAMB157
HCC1954
BT20
BT549
HCC38
MDAMB435S
Low
Resistant
20000
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
AU565
Expression level
High
Resistant
HCC1806
Sensitive
Expression
Example of one gene
6
Predicting Response in Trial A
All treated
patients
N=161
Patients included in the
genomics analysis
N=134
29 (18%)
23 (17%)
Response
Models
PPV
(95% CI)
NPV
(95% CI)
Sensitivity
(95% CI)
Specificity
(95% CI)
Error
Starting with full
gene list, resulting
in 6-gene model
0
(0-0.30)
0.81
(0.69-0.89)
0
(0 -0.26)
0.84
(0.72 -0.91)
0.580
Starting with
preclinical top 200,
resulting in 10-gene
model
0.45
(0.21-0.72)
0.89
(0.79-0.95)
0.45
(0.21-0.72)
0.89
(0.79-0.95)
0.326
Dimension reduction by using prior preclinical results
seemed to help in this trial
7
Dimension Reduction Through Intersection of SingleBiomarker Results from Multiple Statistical Methods
Method
Resp1
Resp2
Resp3
Resp4
Log Reg
X
X
X
X
t - Test
X
X
X
X
Cox
TTP
X
Logistic Regression
297 Probesets
46
97
51
t – Test
396 Probesets
Cox Proportional Hazards: 446 Probesets
Intersection resulted in 51 potential candidates
It may be more beneficial to start model building with this set than
the complete set of potential predictors (work currently in progress)
8
Model Performance Measures
Model 2
1
1
1
0.9
0.9
0.9
0.8
0.8
0.8
0.7
0.6
0.5
0.4
0.3
0.7
0.6
0.5
0.4
0.3
0.5
0.4
0.3
0.2
0.1
0.1
0.1
0
0
0
Non-Responder
Response Status
0.6
0.2
Responder
• These
figures are
from
simulated
perfect
predictors
0.7
0.2
Non-Responder
Model 3
Response Probability
Model 1
Response Probability
Sensitivity, Specificity, Positive and Negative Predictive Value are
common measures of model performance
Dependent on the threshold
Area under the ROC curve (AUC) may be a better measure for
comparing models
Response Probability
All three models yield complete
separation between responders
and non-responders
Arbitrary threshold of 0.5
probability may lead one to
believe that model 2 is superior
AUC correctly shows equivalence
Responder
Non-Responder
Response Status
Responder
Response Status
Sensitivity
Specificity
PPV
NPV
AUC
Model 1
0.73
1
1
0.79
1
Model 2
1
1
1
1
1
Model 3
1
0.77
0.81
1
1
9
Realistic Assessment of Model Performance
When sample size is reasonably large
Split sample into a training set and
independent test Set
Build the model on the training
set and test the model
performance on the test set
Pro: One independent test of model
performance for the model picked in
the training set
Cons:
When sample size is small, the
estimate of performance may
have a large variance
Reduced sample size for training
may yield sub-optimal model
Entire model building procedure
should be cross-validated
• Christophe Ambroise & Geoffrey J.
McLachlan, PNAS 99(10): 2002
10
Realistic Assessment of Model Performance
When sample size is small, one cannot split data into training /
test set
Cross–validation alone is a reasonable alternative
Warning: Initial performance estimate may be misleading
Cross-validated AUC
Individual runs
Average AUC
Number of Predictors
Cross-validation should be repeated multiple times
Allows one to observe effects of sampling variability
The average of replicate estimators gives a more accurate assessment 11
of model performance
Handling Correlated Predictors: When p >> n
Complex correlation structure (mRNA as example)
Multiple probe sets interrogate the same gene
Multiple genes function together in pathways
Not all pathways are known
Multiple response definitions that are interrelated
False positive genes may be correlated with true
positives
Most prognostic modeling techniques do not handle this
well
Recursive feature elimination may remove important
predictors because of correlations
This is an open research problem
12
Summary
Need to design studies based on expected usable sample size
Dimension reduction in the model building process
Overfitting problem can be mitigated by regularized methods
To further reduce the candidate set of predictors
Preclinical information can be useful
Intersection of single-biomarker results by different statistical
methods may also be useful
Model performance
Independent test set may be important for validation
purposes. When sample size is small, cross-validation is a
viable alternative.
Cross-validation should include biomarker selection
procedures and needs to be performed appropriately
Cross-validation should be repeated multiple times
Performance measures should be carefully chosen when
comparing multiple models. AUC often is a good choice.
Handling correlated predictors is still an open research problem
13
Acknowledgments
Can Cai
Scott Chasalow
Ed Clark
Mark Curran
Ashok Dongre
Matt Farmer
Alexander Florczyk
Shirin Ford
Susan Galbraith
Ji Gao
Nancy Gustafson
Ben Huang
Tom Kelleher
Christiane Langer
Hyerim Lee
Haolan Lu
David Mauro
Shelley Mayfield
Oksana Mokliatchouk
Relekar Padmavathibai
Barry Paul
Lynn Ploughman
Amy Ronczka
Katy Simonsen
Eric Strittmatter
Dana Wheeler
Shujian Wu
Shuang Wu
Kim Zerba
Renping Zhang
14