Controlling False Positive Rate Due to Multiple Analyses Unstratified

Download Report

Transcript Controlling False Positive Rate Due to Multiple Analyses Unstratified

Controlling False Positive Rate
Due to Multiple Analyses
Unstratified vs. Stratified Logrank Test
Peiling Yang, Gang Chen, George Y.H. Chi
DBI/OB/OPaSS/CDER/FDA
The view expressed in this talk are those of the authors and may not
necessarily represent those of the Food and Drug Administration.
1
Motivation: Example of Drug X
Primary endpoint: Survival
Hypothesis: Overall constant H.R.  1 vs. >1
Primary Analysis: Unstratified logrank
Results
Unstratified
Stratified
Observed
statistic
1.762
2.228
P-value
(1-sided)
0.039
0.013
Q: Is this finding statistically significant?
2
Issues to Explore
• Implication of these tests/analyses.
• Eligibility of efficacy claim based on these
tests/analyses.
• Practicability of multiple testing/analyses.
3
Outline
• Notations / Settings
• Introduction to logrank test
– Unstratified, stratified
• Comparisons
– Hypotheses, test statistic, test procedure, inference
•
•
•
•
Practicability of hypotheses Testing
Multiple testing/analyses
Example of Drug X
Summary
4
Settings / Notations
•
•
•
•
•
2 arms (control j=1; experimental: j=2).
K strata: k=1, .., K
Patients randomized within strata
t1 < t2 < …< tD: distinct death times
dijk: # of deaths & Yijk: # of patients at risk
at death time ti, in jth arm & kth stratum.
5
Settings / Notations
# of deaths at
time ti
# of patients at
risk at time ti
In Stratum k: di.k =  2 dijk Yi.k   2 Yijk
j=1
j 1
In Arm j:
K
dij. =  k=1
dijk
Yij.   kK1Yijk
Total:
2
di.. =  j=1 dij.
2
Yi..   j 1Yij.
6
Settings / Notations
• Hazard ratio (ctrl./exper.): constant
– Across strata: c
– Within stratum: ck
• Non-informative censoring
7
Introduction: Unstratified Logrank
H 0u : c  1 vs.
H 1u : c > 1
Test statistic: W 
u
d.1.  E [d.1. ]
u
VARu [d.1. ]
, where
 di1. 
E [d.1. ] =  Yi1. 

i
 Yi.. 
Yi1.  Yi 2.  Yi..  di.. 
u
VAR [d.1. ] = 


 di..
i Yi..  Yi..  Yi..  1 
u
8
Introduction: Unstratified Logrank
• Wu ~ N(0,1) under least favorable parameter
configuration (c=1) in H 0u .
• Reject H 0u if Wu > z.
• Type I error rate is controlled at level .
9
Introduction: Stratified Logrank
H 0s : ck  1 for all k vs.
H 1s : ck  1 for at least one k.
Test statistic: W 
s
d.1.  E [d.1. ]
s
VAR s [d.1. ]
, where
 di1k 
E [ d.1. ] =  Yi1k 

k i
 Yi.k 
 Yi 2k  Yi.k  di.k
Y
s
i
1
k
VAR [ d.1. ] = 


Y
Y
k i i.k  i.k  Yi.k  1
s

 di.k

10
Introduction: Stratified Logrank
• Ws ~ N(0,1) under least favorable parameter
s
H
configuration (ck = 1 for all k) in 0 .
• Reject H 0s if Ws > z.
• Type I error rate is controlled at level .
11
Comparison of Hypotheses
• Different hypotheses formulations:
– Unstratified:
u
H0 : c  1
vs.
u
H1
:c 1
– Stratified:
H 0s : ck  1 for all k vs.
H1s : ck  1 for at least one k.
12
Comparison of Test Statistics
u
s
• Corr(W , W ) = 1 because of same r.v. d.1.
• Ws = a Wu + b, wherewhere
a
Var u [d.1. ]
s
Var [d.1. ]
• Wu
&b
E u [d.1. ]  E s [d.1. ]
.
Var s [d.1. ]
~ N(0, 1)  Ws ~ N(b, a2)
13
Comparison of Test Procedure
 To test H 0u : c  1 vs.
H 1u : c > 1
u
u
u
W
W
– Use
and reject H 0 if
>z .
s
– If use W , adjusted critical value (az  b )
required for a valid level- test.
14
Comparison of Test Procedure
 To test H 0s : ck  1 for all k vs.
H 1s : ck  1 for at least one k.
s
– Use W and reject H0s if W
s
>z .
u
W
– If use
, adjusted critical value  ( z  b) / a 
required for a valid level- test.
15
Comparison of Inference
u
H
• Rejection of 0 :
– Infer overall positive treatment effect in entire
population.
s
H
• Rejection of 0 :
– Can only infer positive treatment effect in "at least one
stratum".
– Further testing to identify those strata required to make
claim & error rate for identifying wrong strata also
needs to be controlled.
16
Practicability of Hypotheses Testing
• Unstratified hypotheses are tested when desired to
infer overall positive treatment effect in entire
population.
• Stratified hypotheses are tested when desired to
infer positive treatment effect in certain strata.
• Multiple testing of both unstratified & stratified
hypotheses ok when not sure whether treatment is
effective in entire population or certain strata (but
both nulls need to be prespecified in protocol).
17
Multiple Testing/Analyses
• Multiple testing unstratified (use Wu) & stratified (use
Ws) hypotheses.
• Error to control: strong familywise error (SFE),
including the following:
– When c1 & all ck1: falsely infer c or some ck’s>1.
– When c1 & some ck’s>1: falsely infer c>1 or wrong ck’s>1
Note: parameter space of “all ck1 but c>1” impossible.
18
Multiple Testing/Analyses
Property of SFE: FE nested in another FE.
Which ck>1?
Nested FE
c1 & at least
one ck>1
FE
c1 & all ck 1
c>1 & at least
one ck>1
impossible space
19
Example -- Drug X
H 0u : c  1 vs.
Logrank Test
Unstratified Wu
Stratified Ws
H 1u : c > 1
Observed
statistic
1.762
2.228
P-value
(1-sided)
0.039
0.013
for H 0s
• Ws = aWu+b, where a = 1.039, b=0.409
• Critical value using Ws should be adjusted to az+b.
• False positive error rate using Ws w/o adjustment = 0.066;
– Inflation = 0.066 - 0.025 = 0.041.
• Ans.: This finding is not statistically significant.
20
Figure 1: False positive rate vs. desired
level (w/o adjustment)
21
Summary
• Hypotheses (unstratified or stratified or both)
– should reflect what is desired to claim.
– need to be prespecified in protocol.
• If stratified null is rejected, further testing required
to identify in which strata treatment effect is
positive.
• Strong family error rate needs to be controlled
regardless of single or multiple testing.
22