Li-An - American Statistical Association

Download Report

Transcript Li-An - American Statistical Association

Prognostic Model Building with
Biomarkers in Pharmacogenomics Trials
Li-an Xu & Douglas Robinson
Statistical Genetics & Biomarkers
Exploratory Development, Global Biometric Sciences
Bristol-Myers Squibb
2006 FDA/Industry Statistics Workshop
Theme - Statistics in the FDA and Industry: Past, Present, and Future
Washington, DC
September 27-29, 2006
Outline

Statistical Challenges in Prognostic Model Building

Data quantity and quality across multiple platforms

Dimension reduction in model building process

Model performance measures

Realistic assessment of model performance

Handling correlated predictors: when p >> n
2
Data Quantity and Quality Across Platforms

Tumor samples for mRNA

Trial A Sample Size : 161 Subjects


Trial B Sample Size : 110 Subjects


Trial B Sample Size : 110 Subjects


83 usable mRNA samples (75%)
Plasma protein profiling (Liquid Chromatography / Mass
Spectrometry)


134 usable (sufficient quality and quantity) mRNA samples
(85%)
90 usable plasma samples (82%)
Even if sample collection is mandatory, usable sample size <
subject sample size
Need to design studies based on expected usable
sample size
3
Dimension Reduction in Prognostic Model Building



Number of potential predictors is greater than number of
subjects (p>>n) in high throughput biomarker studies
 No unique solutions in prognostic model fitting with
traditional methods
Regularized methods can provide some possible solutions
 Penalized logistic regression (PLR) + Recursive
Feature Elimination (RFE)
 Threshold gradient descent + RFE
Further dimension reduction may still be needed
 Incorporate prior information (e.g. results from
preclinical studies as the starting point for p)
 Intersection of single-biomarker results from multiple
statistical methods
4
Dimension Reduction Through Penalized
Logistic Regression with Recursive Feature
Elimination to Select Genes
Training Set
Average Cross-validation Error
~22,000
genes
G
e
n
e
s
Patients
1 gene
Choose the model with
the smallest crossvalidation error and
fewest genes
Number of predictors in model
5
Dimension Reduction Through Preclinical Studies
Predicting cell line sensitivity to a compound
 18 cancer cell lines (12 sensitive, 6 resistant)
 Identified top 200 genes associated with in vitro
sensitivity/resistance
Sensitive
Zr-75-1
MCF7
18 Caner Cell Lines
Zr-75-30
BT474
Her2MCF7
HCC1428
MDAMB436
HCC70
Hs578T
SkBr3
MDAMB157
HCC1954
BT20
BT549
HCC38
MDAMB435S
Low
Resistant
20000
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
AU565
Expression level
High
Resistant
HCC1806
Sensitive
Expression

Example of one gene
6
Predicting Response in Trial A
All treated
patients
N=161
Patients included in the
genomics analysis
N=134
29 (18%)
23 (17%)
Response
Models
PPV
(95% CI)
NPV
(95% CI)
Sensitivity
(95% CI)
Specificity
(95% CI)
Error
Starting with full
gene list, resulting
in 6-gene model
0
(0-0.30)
0.81
(0.69-0.89)
0
(0 -0.26)
0.84
(0.72 -0.91)
0.580
Starting with
preclinical top 200,
resulting in 10-gene
model
0.45
(0.21-0.72)
0.89
(0.79-0.95)
0.45
(0.21-0.72)
0.89
(0.79-0.95)
0.326
Dimension reduction by using prior preclinical results
seemed to help in this trial

7
Dimension Reduction Through Intersection of SingleBiomarker Results from Multiple Statistical Methods
Method
Resp1
Resp2
Resp3
Resp4
Log Reg
X
X
X
X
t - Test
X
X
X
X
Cox
TTP
X
Logistic Regression
297 Probesets
46
97
51
t – Test
396 Probesets
Cox Proportional Hazards: 446 Probesets


Intersection resulted in 51 potential candidates
It may be more beneficial to start model building with this set than
the complete set of potential predictors (work currently in progress)
8
Model Performance Measures
Model 2
1
1
1
0.9
0.9
0.9
0.8
0.8
0.8
0.7
0.6
0.5
0.4
0.3
0.7
0.6
0.5
0.4
0.3

0.5
0.4
0.3
0.2
0.1
0.1
0.1
0
0
0
Non-Responder
Response Status

0.6
0.2
Responder
• These
figures are
from
simulated
perfect
predictors
0.7
0.2
Non-Responder

Model 3
Response Probability
Model 1
Response Probability

Sensitivity, Specificity, Positive and Negative Predictive Value are
common measures of model performance
 Dependent on the threshold
Area under the ROC curve (AUC) may be a better measure for
comparing models
Response Probability

All three models yield complete
separation between responders
and non-responders
Arbitrary threshold of 0.5
probability may lead one to
believe that model 2 is superior
AUC correctly shows equivalence
Responder
Non-Responder
Response Status
Responder
Response Status
Sensitivity
Specificity
PPV
NPV
AUC
Model 1
0.73
1
1
0.79
1
Model 2
1
1
1
1
1
Model 3
1
0.77
0.81
1
1
9
Realistic Assessment of Model Performance
When sample size is reasonably large
 Split sample into a training set and
independent test Set
 Build the model on the training
set and test the model
performance on the test set
 Pro: One independent test of model
performance for the model picked in
the training set
 Cons:
 When sample size is small, the
estimate of performance may
have a large variance
 Reduced sample size for training
may yield sub-optimal model


Entire model building procedure
should be cross-validated
• Christophe Ambroise & Geoffrey J.
McLachlan, PNAS 99(10): 2002
10
Realistic Assessment of Model Performance
When sample size is small, one cannot split data into training /
test set
 Cross–validation alone is a reasonable alternative
 Warning: Initial performance estimate may be misleading
Cross-validated AUC

Individual runs
Average AUC
Number of Predictors

Cross-validation should be repeated multiple times


Allows one to observe effects of sampling variability
The average of replicate estimators gives a more accurate assessment 11
of model performance
Handling Correlated Predictors: When p >> n

Complex correlation structure (mRNA as example)
 Multiple probe sets interrogate the same gene
 Multiple genes function together in pathways
 Not all pathways are known
 Multiple response definitions that are interrelated
 False positive genes may be correlated with true
positives

Most prognostic modeling techniques do not handle this
well
 Recursive feature elimination may remove important
predictors because of correlations

This is an open research problem
12
Summary
Need to design studies based on expected usable sample size
Dimension reduction in the model building process


Overfitting problem can be mitigated by regularized methods
To further reduce the candidate set of predictors


Preclinical information can be useful

Intersection of single-biomarker results by different statistical
methods may also be useful
Model performance







Independent test set may be important for validation
purposes. When sample size is small, cross-validation is a
viable alternative.
Cross-validation should include biomarker selection
procedures and needs to be performed appropriately
Cross-validation should be repeated multiple times
Performance measures should be carefully chosen when
comparing multiple models. AUC often is a good choice.
Handling correlated predictors is still an open research problem
13
Acknowledgments
Can Cai
Scott Chasalow
Ed Clark
Mark Curran
Ashok Dongre
Matt Farmer
Alexander Florczyk
Shirin Ford
Susan Galbraith
Ji Gao
Nancy Gustafson
Ben Huang
Tom Kelleher
Christiane Langer
Hyerim Lee
Haolan Lu
David Mauro
Shelley Mayfield
Oksana Mokliatchouk
Relekar Padmavathibai
Barry Paul
Lynn Ploughman
Amy Ronczka
Katy Simonsen
Eric Strittmatter
Dana Wheeler
Shujian Wu
Shuang Wu
Kim Zerba
Renping Zhang
14