FALL_2004_JC

Download Report

Transcript FALL_2004_JC

Statistical Issues in Interpreting
Clinical Trials
D. L. DeMets
Journal of Internal Medicine
255: 529-537. 2004
“Lies, Damn Lies, and Clinical Statistics”
Justin L. Grobe
September 1, 2004
Drug Development Paradigm

Medicinal Chemistry


Animal Testing


Targeted development of new
compounds
Test efficacy, potency, safety
Human Clinical Trials

Multiple phases to test efficacy,
potency, safety, and to compare new
intervention to standard
Clinical Trials – Design Paradigm

Randomization



Assignment to treatment group
Order effects
Placebo


“Control”
(Ethical considerations)
Input = Return


“No clever analysis can rescue a
flawed design or poorly conducted
trial.”
Compliance issues
Five Major Statistical Issues
1.
2.
3.
4.
5.
Intention-to-treat principle
Surrogate outcome measures
Subgroup analyses
Missing data
Noninferiority trials
Statistical Issue 1:
Intention-to-treat principle


“… all patients are accounted for in
the primary analysis, and primary
events observed during the followup period are to be accounted for as
well.”
Results can be biased if either of
these aspects are not adhered to
Myths and examples

Myth: Large trials are free of these
concerns



Increased numbers of patients
decreases variability of response
variable, thereby making detection of
differences easier;
EXCEPT, this amplifies biases in the
outcome measurement
WHICH MAY cause detection of
differences which do not actually exist
To include or not to include…

Two common reasons to drop
patient data


Post hoc ineligibility assessment
Lack of patient compliance
TABLE 1: Post-hoc ineligibility assessment
Anturane Reinfarction Trial

1629 patients who had survived a heart
attack

813 patients received Anturane
816 patients received placebo

71 patients deemed “ineligible” for analysis by protocol

Table 1 1980 Anturane mortality results
Randomized
‘Eligible’
‘Ineligible’
P-values for eligible
versus ineligible
Anturane (%)
74/813 (9.1)
64/775 (8.3)
10/38 (26.3)
0.0001
Placebo (%)
89/816 (10.9)
85/783 (10.9)
4/33 (12.1)
0.92
P-value
0.20
0.07
0.12
Striking statistical comparisons are made by including/excluding patients in
each group: thus the results are biased by post hoc exclusions
TABLE 2: Patient compliance
Coronary Drug Project

3885 post-heart attack men were
given clofibrate or placebo

708 clofibrate and 1813 placebo patients were
at least 80% compliant
Table 2 Coronary drug project 5-year mortality
Clofibrate
Total (as reported)
By compliance
<80%
>80%
n
1103
1065
357
708
% Deaths
20.0
18.2
24.6
15.0
Placebo
n
2782
2695
882
1813
% Deaths
20.9
19.4
28.2
15.1
Compliance itself is considered an outcome: thus to base the interpretation
of the ‘drug outcome’ on the ‘compliance outcome’ is confounding
Dealing with noncompliance

Larger sample sizes are required to
compensate for the dilution effect of
noncompliance


10% noncompliance requires 23%
increase in sample size
20% noncompliance requires a 56%
increase in sample size
Statistical Issue 2:
Surrogate outcome measures

Outcome measures of primary
question must be:






Clinically relevant
Sensitive to intervention
Ascertainable in all patients
Resistant to bias
Result: Large, time-consuming,
costly studies
Alternative approach: surrogate
outcome measures
Surrogate outcome measure:
Assumption

If the intervention will modify surrogate
outcome, it will modify the primary
clinical outcome
Surrogate outcome measure:
Requirements
1.
Surrogate outcome must be predictive of
clinical outcome
2.
Surrogate outcome must fully capture
the total effect of the intervention on the
clinical outcome
“Necessary and sufficient”
Surrogate outcome measures:
Difficult to obtain and validate


Intervention may
modify the surrogate
and have no or only
partial effect on the
clinical outcome
Intervention may
modify the clinical
outcome without
affecting the
surrogate
(Note: NOT surprisingly, track record for use
of surrogate outcome measures is very bad)
Surrogate outcome measures:
Example:

Cardiac Arrhythmia Suppresion Trial (CAST)
Three drugs tested for suppression of cardiac
arrhythmias


All three drugs had been shown to suppress premature
cardiac ventricular contractions (surrogate)
Two drugs terminated early (10-15% into study)
because both drugs dramatically increased causespecific sudden death and total mortality
Table 3 Cardiac Arrhythmia Suppression Trial
Early termination in two drug arms
Drugs
Placebo
Sudden death
33
9
Total mortality
56
22
Clearly the interventions (drugs) had differential effects on the surrogate measure
(premature cardiac ventricular contractions) and the clinical outcome (mortality)
Statistical Issue 3:
Subgroup analyses

Clinical trials usually try to include
as many (diverse) patients as
possible for multiple reasons:




Large sample size
Reasonable recruitment time
Assess internal consistency of results
Seemingly logical use of the large
data set is to do many post hoc
analyses on subgroups
Subgroup analysis:
Mathematical problems

Introduction of subgroups increases
probability of false positives

5 subgroups yields greater than 20%
chance of at least one (p=0.05)
statistically significant difference BY
CHANCE
Subgroup analysis:
MERIT trial

Beta-blocker (metoprolol) treatment for patients
with congestive heart failure

Showed a 34% reduction in mortality overall
Subgroup analysis:
MERIT trial
Consistency of mortality
results across lots of
subgroups found with
subgroup analysis:
Subgroup analysis:
MERIT trial
In the USA, total
mortality is not reduced,
yet total mortality plus
any hospitalization is…?
Subgroup analysis:
MERIT trial


Two other similar heart failure trials
evaluating other beta-blockers
showed no regional difference;
THUS, it is likely that the MERIT
finding is due to chance alone.
Subgroup analysis:
PRAISE-I and PRAISE-II trials

PRAISE-I performed to evaluate
amlodipine for the treatment of
congestive heart failure



Subgroups:
 Ischemia
 Nonischemia
Analysis of subgroups separately showed
a significant (p<0.001) effect of
amlodipine on heart failure in
nonischemic patients, but no effect on
ischemic patients
Researchers decided to perform PRAISE-II
trial on nonischemic patients only
Subgroup analysis:
PRAISE-I and PRAISE-II trials


PRAISE-II showed remarkably
similar mortality results in the drug
and placebo groups
PRAISE-II directly opposed the
exciting results of PRAISE-I’s
subgroup analysis
Statistical Issue 4:
Missing data


Missing data is often simply
“dropped”
This violates two rules:
1.
2.
Intention-to-treat rule  all patients
must be accounted for in primary
outcome analysis
Common sense rule  if patient is too
sick to complete trial, this may be
informative!
Missing data

In “time to event” trials (like
mortality), data can be missing
because the study ends before the
event happens



Patients are then “censored” (dropped)
This can introduce serious
mathematical bias
(Mortality studies in USA have no excuse  death
indices allow follow-up without help from patient)
Statistical Issue 5:
Noninferiority trials

“New intervention is not worse than the
standard”


New intervention may be:
 Easier to administer
 Better tolerated
 Less toxic
 Less expensive
Any given study may be a superiority and/or
noninferiority trial, depending on results
Noninferiority trials
Noninferiority trials

Three challenges must be met:
1.
2.
3.
Noninferiority trial must be of highest
quality to detect clinically meaningful
differences
Noninferiority trial must have a
strong, effective control intervention
(state-of-the-art care)
Margin of indifference is arbitrary,
depending on medical importance of
treatment and risk-to-benefit tradeoffs
Noninferiority trials:
OPTIMAAL Trial

Losartan (angiotensin II receptor
blocker) vs captopril (ACE inhibitor)
in heart failure patient population


Losartan has fewer (and less severe)
side effects than captopril
OPTIMAAL


Designed to detect 20% reduction in
relative risk, with 95% power
Margin of indifference set at 1.1

Thus 95% confidence interval needed to
exclude risk of 1.1 to declare losartan
“noninferior” to captopril
Noninferiority trials:
OPTIMAAL Trial

Mortality results for OPTIMAAL

Relative risk of 1.126 with 95%
confidence interval of 1.28


NEITHER superiority nor noninferiority
were achieved
Researchers computed that captopril
had (historical data) a relative risk of
0.806 vs. placebo, and thus calculated
that losartan must therefore have a
relative risk of 0.906 vs. placebo…

The statistically appropriate conclusion at
this point is:

NO ACCEPTABLE CONCLUSIONS POSSIBLE
FROM THIS DATA
CONCLUSIONS



Statistics can not make up for bad
design
Statistics can not make up for poor
execution of design
Statistics is very limited in being
able to compensate for





Ineligible patients being enrolled
Noncompliance
Unreliable outcome measures
Missing data
Underpowered trials