Statistical Evaluation of Surrogate Markers
Download
Report
Transcript Statistical Evaluation of Surrogate Markers
University of Pennsylvania Annual Conference on
Statistical Issues in Clinical Trials
Statistical Evaluation of Surrogate Markers:
Validity, Efficiency and Sensitivity
Yongming Qu, PhD
Eli Lilly and Company
Indianapolis, Indiana
April 18, 2012
This is based on previous and ongoing research through collaboration with Michael Case,
Somnath Sarkar, Wen Li, and Pandurang M. Kulkarni.
Outline
Introduction
Biomarker, surrogate marker and surrogate endpoint
Validity and efficiency of surrogate marker
Quantities used in statistical validation
Proportion of Treatment Effect (PTE)
General Association
Likelihood reduction factor (LRF)
Proportion of Information Gain (PIG)
Effect of measurement error and adjustment for it
Summary
UPENN Clinical Trials Conference
April 18, 2012
2
Biomarker and Surrogate Endpoint (SE)
Biomarker: "a characteristic that is objectively
measured and evaluated as an indicator of normal
biologic processes, pathogenic processes, or
pharmacologic responses to a therapeutic
intervention.” (Clinical Pharmacology Therapy
2001;69:89-95.)
Surrogate endpoint: “a laboratory measurement or
a physical sign used as a substitute for a clinically
meaningful endpoint that measures directly how a
patient feels, functions or services. Changes
induced by a therapy on a surrogate endpoint are
expected to reflect changes in a clinically
meaningful endpoint” (Temple 1995)
UPENN Clinical Trials Conference
April 18, 2012
3
Validation of Surrogate Endpoint (SE)
Surrogate endpoint is intended to replace clinical outcome for
any therapy
Surrogate endpoint is independent of therapy
Traditional way of validating surrogate endpoint using
treatment is not feasible
Surrogate endpoint needs to be validated
To evaluate the surrogate endpoint, large confirmatory clinical
trials need to be conducted for both surrogate and clinical
endpoints
If large confirmatory clinical trials are conducted, the drug
efficacy should have been established.
There is no need for surrogate endpoint for this drug
The conclusion from this drug cannot be extrapolated to other
drugs because different drugs may work through different
pathways
UPENN Clinical Trials Conference
April 18, 2012
4
Validation of SE – New Thinking
Validation of SE should be based on the disease mechanism,
not the effect of treatment
Hemoglobin A1c (HbA1c) is a widely used SE for the average
of glucose
The validation of this SE is not based on any clinical studies involving
different treatment
It is based on biochemistry and physiology
Progression-free survival (PFS) is widely used as an SE for
cancer survival
The validation should be based on biology of the disease and tumor, not
individual drugs
UPENN Clinical Trials Conference
April 18, 2012
5
Surrogate Marker (SM)
Surrogate marker for a drug is a marker which could be used to
predict the drug’s efficacy or safety
Example of the usefulness of a surrogate marker
Suppose bone mineral density (BMD) is a surrogate marker for
osteoporotic fracture
The long-term effect of an osteoporosis drug on fracture is difficult to
know and is too costly to know
A woman takes an osteoporosis drug
Clinicians measure her BMD after 6 months of use
If BMD is increased, this drug works for this woman and she should
continue to use this drug
If BMD is not increased, this drug does not work for this woman, and she
should switch to a different drug
SM is very useful to monitor patients and identify which drug
works best for a patient in the early stage of the disease
SM can NOT be used to replace clinical outcome for drug approval!
UPENN Clinical Trials Conference
April 18, 2012
6
SE, SM and Biomarker
SE
SM
Biomarker
Question: A particular biomarker is a SM?
UPENN Clinical Trials Conference
April 18, 2012
7
SE Validation (Prentice)
Prentice (Stat. Med. 1989, 8:431–440) proposed a necessary and
sufficient condition for a surrogate endpoint
f(S|Z) = f(S) f(T|Z)=f(T)
This definition is too stringent: it essentially requires surrogate
endpoint is “equivalent” to clinical outcome
Prentice’s key operational criterion f(T|S, Z) = f(T|S) does not
guarantee this condition
This condition can be weakened. A marker is said to be a SE if
f(S|Z) = f(S)
f(T|Z)=f(T) for any Z
Practically, this condition cannot be validated through clinical
trials testing drug effect
One can NOT prove a mathematical theory through enumeration!
One can invalidate a SE if the above relationship does not hold for one
treatment Z
UPENN Clinical Trials Conference
April 18, 2012
8
Surrogate Marker - Concepts
Validity: A marker S is said to be a valid surrogate
marker for a clinical outcome T for a particular
treatment if
f(T|Z) ≠ f(T)
f(T|S, Z) = f(T|S)
where Z is the treatment indicator with Z = 1 for the
treatment and Z = 0 for control
Efficiency: For two surrogate markers S1 and S2, we say
S1 is more efficient than S2 if
Var[T|Z, S1] < Var[T|Z, S2]
Validity is a much higher hurdle than efficiency in
practice
UPENN Clinical Trials Conference
April 18, 2012
9
Proportion of Treatment Effect
Consider two models
T|Z =a0 + aZZ
T|S, Z =b0 + bZZ + bsS
The PTE (Freedman et al, Stat. Med. 1992;
11:167-178) is
PTE = 1 – bZ/aZ
Drawbacks of PTE
Not bounded by [0,1]
Large variability makes the results not
informative
UPENN Clinical Trials Conference
April 18, 2012
10
General Association
Consider two models
YS , j S a Z , S Z j S , j
YT , j T a Z ,T Z j T , j
S , j SS
where Var
T , j SS
ST
TT
Buyse and Molenberghs (Biometrics 1998:54:10141029) suggested using the coefficient of
determination to evaluate the surrogate marker
2
R2 ( SS TT )1 ST
1 2
Var[T | S ] SS TT
ST SS (1 R2 )
UPENN Clinical Trials Conference
April 18, 2012
11
Artificial Example 1
S , j SS
Var
T , j SS
YS , j S a Z , S Z j S , j
YT , j T a Z ,T Z j T , j
ST
TT
2
R2 ( SS TT )1 ST
Let S,j= T,j, then R2 = 1
E[YT , j | Z j 0, YS , j ] T S YS , j
E[YT , j | Z j 1, YS , j ] T S YS , j (a Z ,T a Z ,S )
The relationship between clinical outcome and
marker depends on treatment group
YS,j is not a good surrogate marker!
UPENN Clinical Trials Conference
April 18, 2012
12
Artificial Example 2
YS , j S a Z , S Z j S , j
YT , j b 0 b1YS , j u j
R
2
0 SS
S, j
u ~ NID 0 , 0
j
0
uu
1
1
1 b12 SS
uu
Depending on the parameters, R2 can be any
number
The effect of treatment on the clinical outcome acts
solely through the marker YS,j
YS,j is a perfect surrogate marker!
UPENN Clinical Trials Conference
April 18, 2012
13
Likelihood Reduction Factor (LRF)
Consider two models
T|Z =a0 + aZZ
(1)
T|S, Z =b0 + bZZ + bsS
(2)
Alonso et al. (Biometrics 2004; 60:724-728) defined
the likelihood reduction factor (LRF) as
LRF(Z , S : S ) 1 exp(LRT(Z , S : Z ) / n)
where LRT(Z,S:Z) is the likelihood ratio test statistic
comparing the two models (2) and (1)
LRF is bounded by [0,1] but may be impossible to
reach 1 for some models
The LRF adjusted (LRFa) was proposed
UPENN Clinical Trials Conference
April 18, 2012
14
A Different Approach
Instead of comparing
T|Z =a0 + aZZ
LRFa(Z,S:Z)
T|S, Z =b0 + bZZ + bsS
Alonso, et al
We compare
T|S =g0 + gZS
New Quantity
T|S, Z =b0 + bZZ + bsS
UPENN Clinical Trials Conference
April 18, 2012
15
Proportion of Information Gain (PIG)
Consider three models
T =c0
(1)
T|S =g0 + gZS
(2)
T|S, Z =b0 + bZZ + bsS
(3)
Qu and Case (Biometrics 2007;63:958-963) defined the
proportion of information gain (PIG) as
P IG
LRT( S : 1)
LRT( Z , S : 1)
where LRT(Z,S:1) is the likelihood ratio test statistic
comparing the models (3) and (1), and LRT(S:1) is the
likelihood ratio test statistic comparing the models (2)
and (1)
UPENN Clinical Trials Conference
April 18, 2012
16
A Simple Simulation
logit(Pr(T=1) | S, Z) = -S
S = Z + u, u~N(0,s2)
Validity of SE is met
Compare the performance of PTE, LRFa and
PIG for various s2
Sample size = 1,000 (n=500 per group)
1,000 simulation samples
Qu and Case (Biometrics 2007;63:958-963)
UPENN Clinical Trials Conference
April 18, 2012
17
Simulation Results: Mean (SD)
s
PTE
LRFa
PIG
0.01
1.38 (6.66) 0.02 (0.02)
0.98 (0.02)
0.10
1.04 (0.70) 0.06 (0.06)
0.98 (0.02)
1.00
1.02 (0.20) 0.82 (0.05)
1.00 (0.01)
2.00
1.06 (0.34) 0.96 (0.02)
1.00 (0.00)
4.00
1.28 (1.57) 0.99 (0.01)
1.00 (0.00)
Qu and Case (Biometrics 2007;63:958-963)
UPENN Clinical Trials Conference
April 18, 2012
18
EFFECT OF MEASUREMENT
ERROR ON EVALUATION OF
BIOMARKERS
UPENN Clinical Trials Conference
April 18, 2012
19
Measurement Error in Biomarker
Biomarker may be measured with error
W = S + U, S = the true value for the marker, U is the
measurement error and W is the observed value
The magnitude of measurement error is
generally described by
Proportion of variation due to measurement
error: Var(U)÷Var(W)
<30% is considered small
30-50% is considered moderate
> 50% is considered large
Reliability: Var(S)÷Var(W)
Measurement error could attenuate the
estimate for PIG (and in PTE, etc)
UPENN Clinical Trials Conference
April 18, 2012
20
Simulation extrapolation (SIMEX)
PIG(X) is what we want
PIG(W) is the estimate with measurement error
E[PIG(W 1U * ) | X ] has the same expectation as PIG(X), where U* and U are IID
Above quantity is generally hard to estimate. SIMEX is a method to use
simulation to estimate the trend of the bias (often using assuming a quadratic
curve) and then extrapolate to obtain a less biased estimator.
E[PIG(W U * ) | W ]
Cook and Stefanski,
JASA1994; 89:1314--1328.
Li and Qu, Stat in Med.
2010: 2338–2346
UPENN Clinical Trials Conference
April 18, 2012
21
Bone Mineral Density (BMD) and Fracture
Healthy
spine
Kyphotic
spine
Dual-energy x-ray
absorptiometry
(DEXA)
Vertebral Fracture
BMD
UPENN Clinical Trials Conference
April 18, 2012
BM C
BMA
22
Multiple Outcomes of Raloxifene Evaluation (MORE)
MORE study was a 3-year placebo-controlled, double blind, and randomized clinical trial
evaluating the treatment effect of raloxifene on vertebral fracture.
Vertebral fracture was assessed at year 2 and 3, or with a symptom of back pain
BMD was measured at baseline and years 1, 2 and 3.
Sarkar, et al, J Bone
Miner Res 2002;17:1–10
UPENN Clinical Trials Conference
April 18, 2012
23
Adjustment for Measurement Error in PIG Estimation
Objective: to evaluate if the change in femoral neck BMD is a good surrogate
marker for vertebral fracture
Femoral neck BMD was measured twice at baseline
The estimated standard deviation of the measurement error = 0.023 g/cm2
The proportion of the variability due to measurement error in the observed BMD change was ~70%
(Qu, et al. Stat in Med 2007; 26:197--211)
PIG
95% CI
Naive
Adjusted
0.30
0.50
(0.05, 0.62)
(0.08, 0.91)
Li and Qu, Stat in Med. 2010: 2338–23
Even adjust for measurement error, change in femoral neck BMD is still not a good
surrogate marker
UPENN Clinical Trials Conference
April 18, 2012
24
Summary
New concepts of surrogate marker and
surrogate endpoint
Definition of validity and efficiency of a
surrogate marker
PIG is so far a very reasonable quantity to
evaluate surrogate marker
Measurement error in the marker can
attenuate the estimation for PIG
SIMEX is a general method to correct for
bias due to measurement error
UPENN Clinical Trials Conference
April 18, 2012
25
UPENN Clinical Trials
Conference
April 18, 2012
26
Abstract
Statistical Evaluation of Surrogate Markers: Validity, Efficiency and
Sensitivity
Yongming Qu, PhD
Surrogate markers are important in drug development as they may reduce the
development cost and cycle dramatically, as compared to using actual clinical outcomes.
Statistical evaluation of surrogate markers can be dated back to thirty years ago. So far,
little progress has been made in identifying new surrogate endpoints. Demonstarting
treatment effect with clinical outcomes still remain mandatory requirement for clinical
drug development for many disease areas. For example, “the FDA approved Avastin for
advanced breast cancer in February 2008, after one clinical trial showed that combining
Avastin with another drug, paclitaxel, delayed the median time before tumors worsened by
5.5 months, compared with using paclitaxel alone. But the women who got Avastin did not
live significantly longer than those who got only paclitaxel, which is also known by its brand
name Taxol” (http://www.nytimes.com/2011/06/27/health/27drug.html). In this research,
we will discuss the validity, efficiency and sensitivity in statistical evaluation of surrogate
markers. New definitions with simulation and examples will be provided.
UPENN Clinical Trials
Conference
April 18, 2012
27