Lecture 18 - 97 - School of Public Health

Download Report

Transcript Lecture 18 - 97 - School of Public Health

Trial Objectives
Superiority, Non-inferiority,
and Equivalence
Questions of Interest
• Is the new treatment better than the control
treatment that I am using now? (superiority
trial)
• If it is not better, is the new treatment as good
(not unacceptably non-inferior) as the control
treatment that I am using now? (non-inferiority
trial)
• Can I use the new treatment and the control
treatment interchangeably? (equivalence trial)
Non-inferiority and equivalence trials are usually
considered when there is an active control.
Definitions (ICH Guidelines – E9)
• Superiority trial – a trial with the primary objective of
showing that the response to the investigational
product is superior to a comparative agent (active or
placebo control).
• Equivalence trial – a trial with primary objective of
showing that the response to two or more treatments
differs by an amount which is clinically unimportant
(active control).
• Non-inferiority trial – a trial with the primary objective
of showing that the response to the investigational
product is not clinically inferior (or not unacceptably
inferior) to a comparative agent (active or placebo
control but usually active) – very common in the
regulatory setting either for a new treatment or for a
new label indication.
FDA Guidance
• “The objective of a non-inferiority trial is to show that
any difference in the effectiveness of the two drugs is
small enough to allow a conclusion that the new drug
is not substantially less effective than the active
control.”
• “FDA considers the selection of a non-inferiority
margin to be the single greatest challenge in
designing, conducting, and interpreting non-inferiority
trials…If a non-inferiority margin is incorrectly
calculated and set to large, a drug that is not effective
may appear to be effective; if the margin is too small,
an effective drug may appear ineffective.”
GAO-1-798 Evidence from Clinical Trials
Reasons for Active Controls
• An active treatment (comparator) with
established efficacy exists.
• If superiority can be established, the standard
of care is improved.
• While a short-term study with a placebo
control might be ethical, if the outcome is
morbidity/mortality, a trial with use of a
placebo is not ethical if an accepted standard
of care treatment exists (recall papers by
Temple and Ellenberg).
The Number and Type of Active Comparator
Studies Vary by Sponsor (Commercial versus
Non-Commercial)
• Among published reports of trials between
June 2008 and September 2009 in major
medical journals, 97/212 (46%) used an active
comparator.
• 36/108 (33%) with commercial sponsors and
61/104 (59%) with non-commercial sponsors.
• 18/36 (50%) of active controlled commercial
trials were non-inferiority versus 5/61 (8%) of
non-commercial trials.
JAMA 2010; 303:951-958
Examples – Non-Inferiority - 1
• Safety: Is a new vaccine for pertussis
(whooping cough) that has an improved safety
profile as effective in preventing whooping
cough as the currently licensed vaccine?
• Ease of use: Is a new oral anticoagulant noninferior to warfarin for stroke and systemic
embolism among patients with atrial
fibrillation? (N Engl J Med 2011)
Examples – Non-Inferiority - 2
• Treatment duration: Is a short course of
treatment for latent TB infection (3
months of INH plus rifapentine) as
effective as 9 months of INH in
preventing active TB? (N Engl J Med
2011)
• Cost: Is an inexpensive alternative to
ranibizumab called bevacizumab noninferior for visual acuity among patients
with age-related macular degeneration?
(N Engl J Med 2011)
Example - HIV Trial:
Abacavir-Lamivudine-Zidovdine vs
Indinavir-Lamivudine-Zidovudine
JAMA 2001;285:1155-1163.
“The study was powered to assess
treatment equivalence for the primary
endpoint (i.e., a plasma HIV RNA level
<= 400 copies/mL at week 48 for the
intent- to-treat population). For the
primary end point, treatments were
considered equivalent if the 95%
confidence interval was within the
bound -12% to 12%.”
Motivation Evaluating New Treatments in
for Non-Inferiority and Equivalence Trials
New Treatment
• Costs less
• More convenient to use (e.g., short course of
prophylaxis for TB, no blood tests as for
warfarin)
• Lower risk of side effects (e.g., pertussis
vaccine)
But is it as effective?
Active and Placebo Controls in One Trial
(Usually concurrent placebo arm is absent, but this
may be practical in some short-term studies)
Randomize
Drug A
Control
Drug B
Experimental
Placebo
Superiority
Non-inferiority
Effect of Hypericum perforatum (St. John’s
Wort) in Major Depressive Disorder
Randomize
Sertraline
Active Control
St. John’s Wort
Experimental
Placebo
Control
Neither sertraline or St. John’s Wort was significantly different
from placebo in this 8 week study. The authors noted “without a placebo,
hypericum could easily have been considered as effective as sertraline…”
JAMA 2002; 287:1807-1814.
In the absence of a concurrent placebo,
have to provide assurance that the
active control would have been superior
to placebo, if it had been used, and the
test treatment would have beat placebo
had it been used (indirect inference).
Non-inferiority or Equivalence Trials:
Key Features
• Efficacy of reference or control treatment
(anchor) must be clearly established (control is
better than nothing).
• Target population and outcome measures
must be similar to the trial that established
efficacy of control (constancy assumption).
• Margin of non-inferiority/equivalence must be a
priori stated, clinically relevant, and chosen to
ensure new treatment is better than “imputed”
nothing (non-inferiority margin).
Assay Sensitivity and Constancy are Critical
Assumptions in Interpreting Non-inferiority and
Equivalence Trials
Assay Sensitivity (def.) – ability to demonstrate a difference between
active and inactive treatments
• Can you assume that the standard treatment (active control) is
effective?
• How do you tell the difference between a good trial that
establishes two active treatments to be similarly effective from a
bad trial that incorrectly claims similarity?
– External evidence: historical data that the control treatment is
effective
– Internal evidence : a high quality trial
Constancy (def.)
• Historical data showing that the control treatment is effective
(better than placebo), holds in the setting of the current noninferiority trial
Hung and O’Neill, Encyclopedia of Clinical Trials
Historical Evidence Concerning Efficacy of
Active Control and Defining the Non-Inferiority or
Equivalence Margin
• One trial
• Meta-analysis or overview of trials (need to be
cognizant of “file-drawer” problem)
• Point-estimate or lower bound of 95% CI
• Retention of certain fraction of superiority of active
control over placebo (e.g., 50%)
– True probability of event for active control and placebo are
20% and 30%
– Show probability of event with new treatment is smaller than
25% (a difference, or non-inferiority margin, between new
treatment and active control of 5%)
Would like to convince people that if you had used placebo you would have won!
General Problems in Determining NonInferiority Margin
• What is “unacceptably inferior” or an acceptable level of
non-inferiority – often in the eyes of the beholder!
• Multiple outcomes are at play – non-inferiority margins are
typically defined for the primary endpoint but many
outcomes may be considered.
• Constancy assumption: same endpoint, duration of followup as trial(s) that established efficacy of active control.
• The margin assumes we know “true” effect of active control
and often there is substantial variability.
• In some cases, there are multiple choices for active control.
How do you prove two treatments are
equal?
Cannot prove HO: Δ=0
“It is never correct to claim that treatments
have no effect or that there is no difference in
the effects of treatments. It is impossible to
prove … that two treatments have the same
effect. There will always be uncertainty
surrounding estimates of treatment effects,
and a small difference can never be excluded…
An analysis of 45 reports of trials purporting to
test equivalence found that only a quarter set
boundaries on their equivalence.”
The non-inferiority/equivalence margin must
be specified in the protocol!
Alderson P, Chalmers I. BMJ 2003:326:1691-8.
Relationship Between Significance
Tests and Confidence Intervals
Superiority strongly
shown
p=0.002
p=0.05
Superiority shown
p=0.20
Control
Better
Superiority not shown
0
New Agent Better
Treatment Difference
Superiority Trial – ALLHAT:
Lisinopril vs Chlorthalidone for CHD Incidence,
CVD Composite Outcome, and ESRD*
CHD (95% CI:0.91-1.08)
CVD Composite (95% CI:
1.05-1.16)
ESRD (95% CI: 0.88-1.38)
Lisinopril better
1.00
Chlorthalidone better
HR (Lisinopril/Chlorthalidone)
In ALLHAT, 15,255 participants were randomized to chlorthalidone and 9,000+
participants were randomized to each of 3 other treatments.
JAMA 2002;288:2981-2997.
Interpretation of Head to Head (Equivalence)
Trials:
CONVINCE and CAPPP
CONVINCE equivalence bounds (0.86-1.16)
CONVINCE Trial result
CAPPP Trial result
Overview (9 trials)
Calcium Channel Blocker better
1.00
SOC better
HR (Verapamil/SOC) for CONVINCE
(Captopril/SOC) for CAPPP
CAPPP = Captopril Primary Prevention Project. Authors
concluded: “captopril and conventional treatment did not differ in efficacy.”
See JAMA 2003;289: 2073-2082 for Convince Trial
Example: 2NN Study
• A study of first-line antiretroviral therapy in HIV
• Main comparison between nevirapine twice daily and
efavirenz (plus stavudine and lamivudine) in terms of
‘treatment failure’ (based on virology, disease
progression, therapy change)
• Primary objective was to establish the non-inferiority
of nevirapine twice daily (δ =10%)
Lancet 2004, 363:1253-63
Results: 2NN Study
• Confidence intervals for failure rates
(EFV-NVP)
– All data
(-12.8%, 0.9%)
– Those starting med. (-14.6%, -0.8%)
• Neither interval is completely above δ
value of -10%; one interval also excludes
zero.
Conclusions: 2NN Study
• BUT, the authors concluded:
‘Antiviral therapy with nevirapine or efavirenz
showed similar efficacy, so triple-drug
regimens with either … are valid for first-line
treatment’
Lancet 2004, 363:1253-63
Interpretation of Non-Inferiority Trials:
6 Examples (A – F): Hazard ratio (Test Drug/Standard)
and 95% CI
Zone of
noninferiority
Test drug
better
0.6
0.7
Superiority
0.8
0.9
1.0
1.1
1.2
Estimated benefit of
standard drug over placebo
Standard
drug better
1.3
1.4
A
B
C
Noninferiority
(i.e., Equivalence)
Inferiority
D
Underpowered trial
E
F
Anteman EM, Circulation 2001;103:e101-e104.
Interpretation of Non-Inferiority Trials:
6 Examples (A – F) (Hazard ratio and 95% CI)
Zone of
noninferiority
Test drug
better
0.6
0.7
Superiority
0.8
0.9
1.0
1.1
1.2
Estimated benefit of
standard drug over placebo
Standard
drug better
1.3
1.4
A
B
C
Noninferiority
(i.e., Equivalence)
Inferiority
D
Underpowered trial
E
F
A = Test drug is superior to standard
Interpretation of Non-Inferiority Trials:
6 Examples (A – F) (Hazard ratio and 95% CI)
Zone of
noninferiority
Test drug
better
0.6
0.7
Superiority
0.8
0.9
1.0
1.1
1.2
Estimated benefit of
standard drug over placebo
Standard
drug better
1.3
1.4
A
B
C
Noninferiority
(i.e., Equivalence)
Inferiority
D
Underpowered trial
E
F
B = Test drug is better than standard and can be considered
“non-inferior” to standard
Interpretation of Non-Inferiority Trials:
6 Examples (A – F) (Hazard ratio and 95% CI)
Zone of
noninferiority
Test drug
better
0.6
0.7
Superiority
0.8
0.9
1.0
1.1
1.2
Estimated benefit of
standard drug over placebo
Standard
drug better
1.3
1.4
A
B
C
Noninferiority
(i.e., Equivalence)
Inferiority
D
Underpowered trial
E
F
C = Test drug is worse than standard but not that much worse,
and can be considered “non-inferior” to standard
Interpretation of Non-Inferiority Trials:
6 Examples (A – F) (Hazard ratio and 95% CI)
Zone of
noninferiority
Test drug
better
0.6
0.7
Superiority
0.8
0.9
1.0
1.1
1.2
Estimated benefit of
standard drug over placebo
Standard
drug better
1.3
1.4
A
B
C
Noninferiority
(i.e., Equivalence)
Inferiority
D
Underpowered trial
E
F
D = Test drug is inferior to standard and non-inferiority
criteria not satisfied.
Interpretation of Non-Inferiority Trials:
6 Examples (A – F) (Hazard ratio and 95% CI)
Zone of
noninferiority
Test drug
better
0.6
0.7
Superiority
0.8
0.9
1.0
1.1
1.2
Estimated benefit of
standard drug over placebo
Standard
drug better
1.3
1.4
A
B
C
Noninferiority
(i.e., Equivalence)
Inferiority
D
Underpowered trial
E
F
E = Test drug is very inferior to standard (non-inferiority
criteria not satisfied)
Interpretation of Non-Inferiority Trials:
6 Examples (A – F) (Hazard ratio and 95% CI)
Zone of
noninferiority
Test drug
better
0.6
0.7
Superiority
0.8
0.9
1.0
1.1
1.2
Estimated benefit of
standard drug over placebo
Standard
drug better
1.3
1.4
A
B
C
Noninferiority
(i.e., Equivalence)
Inferiority
D
Underpowered trial
E
F
F = Trial is inconclusive due to small size and resultant wide CI
Possible Reasons for
Non-Significant Difference
• Small sample size
• Poor compliance to study treatments
• Losses-to-follow-up
• Equivalent regimens
Absence of proof of a treatment difference does
not constitute proof of an absence of a treatment difference.
Non-Inferiority and Equivalence Trials
Considerations
• Cannot prove Pe = Pc or µ1 = µ2 therefore Ho: δ < 0 versus
HA : δ > 0 is not correct because a small, underpowered
study could incorrectly lead to a claim of equivalence –
absence of evidence is not evidence of absence, and if
power is too high, Ho may be rejected when the difference is
not important.
• Since Ho cannot be accepted, either reverse the roles of
type 1 and 2 errors (i.e., rejection of Ho implies equivalence)
or focus on confidence intervals
• Treatment difference must be chosen not only to rule out
smallest clinically meaningful difference, but also to be sure
new treatment is better than no treatment
• Consensus on what equivalence means, especially in a
broad sense, is hard to achieve
1-Sided Hypothesis Testing (Non-inferiority)
A = new treatment; B = standard;
PA and PB = event rates (failure rate)
  PA  PB ;   0 Impliesstandardis better(lower values are better
for new treatment
)
H o :    o (B betterby at least  o )
H A :    o (A not worseby as much as  o ;
A is close to B)
If Ho is rejected, treatments are “equivalent”
Roles of null and alternative hypotheses are reversed. In
practice, this is confusing to people.
Blackwelder W, Cont Clin Trials 1982
Parallel Group Studies
with Continuous Outcomes: Sample Size
Formula is the Same Except for δ0
   A  B
n  n A  nB 
2
2 z1  z1  
2
  O 2
  0.025; z1  1.96
1    .90; z1   1.28
2 10.5 
n A  nB 
  O 2
2
Note: If Δ=0, then this
is equivalent to
superiority trial to detect
δo with 90% power.
Example
Non-Inferiority Trial for New BP Lowering Drug
δO = 4 mmHg
Δ = 0, -2 (A better) and +2 (B better)
σ2 = 100; α = 0.025 (1-sided); 1-β = 0.90
1:1 allocation
δO
Δ
No. per
group
4
4
4
0
+2
-2
132
525
58
Confidence Interval Approach
Example of Type I Error
A (new
treatment
better)
ˆ
0

B (standard
treatment
better)
(1 2 ) CI
T ypeI error= P rob(incorrectly rejectingnull
hypothesis)
In thiscase - incorrectly claiming" equivalence"
when thetreatment
s are not (reverseof usual situation)
Upperlimit of (1- 2 ) CI <  o , but    o
We want toreject H o when    o , not when   o (Ho is true)
Confidence Interval Approach
Example of Type II Error
A (new
treatment
better)
ˆ


0
B (standard
treatment
better)
(1 2 ) CI
T ypeII error = P rob(incorrectly acceptingnull
hypothesis)
In thiscase - incorrectly claiming the treatment
s
are not equivalent when theyare (also thereverseof usual situation)
Upperlimit of (1- 2 ) CI >  o , but    o
We want toreject H o when    o , not acceptit.
Sample Size for Equivalence
Design Based on CI Limits
A = New Treatment; B = Standard
Prob (upper limit of CI exceeds 0 when  < 0 ) =
^
^

Prob (PA  PB )  Z    0   
1- 2


 ^

^
(PA  PB )  (PA PB ) 0  (PA PB )

Prob 

- Z    
1- 2






Z
2 
PA (1 PA ) PB (1 PB )

NA
NB
Sample Size for Equivalence
Design Based on CI Limits (cont.)
A = New Treatment; B = Standard
Choose NA and NB to ensure is small.
2


Z   Z 
1

2
(
NA  NB  N  PA (1 PA )  PB(1 PB ))
2
0  (PA  PB)
N = 2P(1-P)

2
Z   Z 
 1- 2


2
0
if PA  PB
Makuch and Simon (Cancer Treatment Reports, 1978) suggest  =
0.10 (1-sided) and  = 0.20; I like  = .05 (and usually 2-sided)
For Proportions and Relative Risks, Farrington
and Manning’s Approach is Better
• Problem arises because of estimation of
variance under the null hypothesis.
• Farrington and Manning (Stat Med 1990) have
shown that their maximum likelihood approach
is better particularly for small values of pc and
pe.
• Algorithm can be easily programmed.
Stat Med 1990; 9:1447-1454
Sample Size for Proportions for Non-Inferiority Trial: Makuch and
Simon versus Farrington and Manning (PA=PB)*
Sample Size per Group
PA(PE)
0.05
0.10
0.15
0.20
0.20
PB(PC)
0.05
0.10
0.15
0.20
0.20
* α = 0.025 (1-sided)
1-β = 0.90
1:1 allocation
δO
0.01
0.05
0.05
0.05
0.10
Makuch
and Simon
9,972
756
1,071
1,344
336
Farrington
and Manning
10,032
775
1,080
1,348
340
Sample Size for Proportions for Non-Inferiority Trial: Makuch
and Simon versus Farrington and Manning (PA = or ≠ PB)*
Sample Size per Group
PA(PE)
PB(PC)
δO
Makuch
and Simon
0.10
0.125
0.10
0.10
0.10
0.125
0.05
0.05
0.05
756
3,343
371
* α = 0.025 (1-sided)
1-β = 0.90
1:1 allocation
Farrington
and Manning
775
3,379
384
Sample Size for Proportions: Superiority Trial with
Specified Delta or Inferiority with Farrington and Manning
(1:1 allocation and 1-β = 0.90)
Sample Size per Group
PA(PE)
0.05
0.10
0.15
0.20
0.20
PB(PC)
0.05
0.10
0.15
0.20
0.20
δO
0.01
0.05
0.05
0.05
0.10
Superiority*
9,021
581
917
1,211
266
* α = 0.05 (2-sided)
PE=PC - δO
Farrington
and Manning**
10,032
775
1,080
1,349
340
8,174
630
880
1,099
277
** α = 0.025 (1-sided)
in 1st column;
α = 0.05 (1-sided)
in 2nd column
General Approach
New
Treatment
Better
RR
RRo
(1  2) CI
  .025
95% CI
RRo chosen so that if upper limit
< RRo, we conclude
“equivalence”
RRo usually ≠ 1.0
Standard
Treatment
Better
CONVINCE Design
• Based on the findings from 17 trials with over
50,000 participants, the CVD risk reduction
associated with BP lowering by diuretics and
beta-blockers was estimated as 24%.
• Equivalence margin was set to ensure that
there would be no more than a 50% loss of
efficacy based on this point estimate.
• Upper bound = 1.16 = 0.88 (12% reduction)/
0.76 (24% reduction).
• Lower bound = 1/1.16 = 0.86.
Confidence Interval Approach
to Monitoring for Convince
0.86
Lower
limit of
equiv.
Ca+ Blocker
Better
1.0
No diff.
1.16
Upper
limit of
equiv.
Diuretic/β-blocker
Better
Equivalence
Inconclusive
Non-inferiority and superiority
The 95% CI for the difference
between the control and the
intervention are all >-δ, i.e. noninferiority demonstrated.
In this case both noninferiority and
superiority have been
demonstrated
-δ
0
Control treatment better No difference
New treatment better
Non-inferiority and Inferiority
The 95% CI for the difference
between the control and the
intervention are all >-δ, i.e. noninferiority demonstrated.
In this case both noninferiority and
superiority have been
demonstrated
In this case both noninferiority and inferiority
have been demonstrated
-δ
0
Control treatment better No difference
New treatment better
Summary - Determining Equivalence
• First step in establishing equivalence define ‘limits of equivalence’ (± δ)
• Having conducted the trial, calculate the
95% confidence intervals for the difference
between the control and the new treatment
• If the confidence interval is entirely within ±
δ then equivalence is established
Summary - Determining Non-inferiority
• Equivalence requires that the difference
control - new intervention is both > -δ and < δ,
the new treatment must be neither worse
nor better than the control by a fixed
amount.
• In contrast to equivalence with non-inferiority
we are only interested in determining
whether new treatment is no worse by an
amount δ.
Analysis of Non-inferiority/Equivalence
Trials
• Superiority trials are analysed by intention-to-treat
(ITT) because it is the most conservative and least
likely to be biased.
• ITT analysis of non-inferiority trials is not
conservative - there is a bias towards no
difference.
• Per protocol analysis is biased since not all
randomised patients included.
• Recommendation: Analyze by both ITT and per
protocol (need to ensure power for both).
Testing for Superiority after Non-Inferiority
• In some situations it may be appropriate to test
for superiority after testing for non-inferiority.
• Regulatory authorities do not require any
multiplicity adjustment for this.
• In this situation, while the primary analysis for
non-inferiority might be based on a “per
protocol” population, the primary analysis for
the superiority analysis should be intention to
treat.
Equivalence/Non-Inferiority Trials
Summary
• Equivalence/non-inferiority trials may be larger, smaller or similar to
superiority trials – depends on margin chosen and whether new therapy is
assumed to be more efficacious.
• Equivalence is “in the eyes of the beholder” – select margins carefully!
• The absence of a significant difference in a superiority trial does not imply
equivalence
• Need to be sure about the efficacy of the active control treatment based
on earlier trials.
• It is critical that the conduct of equivalence/non-inferiority trials is
excellent.
• Because of difficulty of interpretation, equivalence and non-inferiority trials
should be used cautiously.
• More head to head superiority comparisons of approved treatments are
needed.
Quality of Reporting of Non-inferiority and
Equivalence Trials
(JAMA 2006;295:1147-1151)
• Margin of non-inferiority/equivalence defined in
most trials, but rationale for margin missing in
majority of studies.
• About 25% of reports did not give sample size
justification in sufficient detail to reproduce it.
• Less than 50% described both intention to treat
and per protocol analysis.
• About 15% of reports did not state confidence
intervals.
Guidelines for Reporting Non-inferiority and
Equivalence Trials+
(JAMA 2006;295:1152-1160)
• Specification of whether the trial is a non-inferiority
study
• Sample size details (specification and rationale for
non-inferiority margin)
• Use of 1- or 2-sided confidence interval
• Nature of analysis: intention to treat, per protocol
or both
• Presentation of results: confidence intervals
+ Builds on CONSORT guidelines for superiority trials.