FALL_2004_JC
Download
Report
Transcript FALL_2004_JC
Statistical Issues in Interpreting
Clinical Trials
D. L. DeMets
Journal of Internal Medicine
255: 529-537. 2004
“Lies, Damn Lies, and Clinical Statistics”
Justin L. Grobe
September 1, 2004
Drug Development Paradigm
Medicinal Chemistry
Animal Testing
Targeted development of new
compounds
Test efficacy, potency, safety
Human Clinical Trials
Multiple phases to test efficacy,
potency, safety, and to compare new
intervention to standard
Clinical Trials – Design Paradigm
Randomization
Assignment to treatment group
Order effects
Placebo
“Control”
(Ethical considerations)
Input = Return
“No clever analysis can rescue a
flawed design or poorly conducted
trial.”
Compliance issues
Five Major Statistical Issues
1.
2.
3.
4.
5.
Intention-to-treat principle
Surrogate outcome measures
Subgroup analyses
Missing data
Noninferiority trials
Statistical Issue 1:
Intention-to-treat principle
“… all patients are accounted for in
the primary analysis, and primary
events observed during the followup period are to be accounted for as
well.”
Results can be biased if either of
these aspects are not adhered to
Myths and examples
Myth: Large trials are free of these
concerns
Increased numbers of patients
decreases variability of response
variable, thereby making detection of
differences easier;
EXCEPT, this amplifies biases in the
outcome measurement
WHICH MAY cause detection of
differences which do not actually exist
To include or not to include…
Two common reasons to drop
patient data
Post hoc ineligibility assessment
Lack of patient compliance
TABLE 1: Post-hoc ineligibility assessment
Anturane Reinfarction Trial
1629 patients who had survived a heart
attack
813 patients received Anturane
816 patients received placebo
71 patients deemed “ineligible” for analysis by protocol
Table 1 1980 Anturane mortality results
Randomized
‘Eligible’
‘Ineligible’
P-values for eligible
versus ineligible
Anturane (%)
74/813 (9.1)
64/775 (8.3)
10/38 (26.3)
0.0001
Placebo (%)
89/816 (10.9)
85/783 (10.9)
4/33 (12.1)
0.92
P-value
0.20
0.07
0.12
Striking statistical comparisons are made by including/excluding patients in
each group: thus the results are biased by post hoc exclusions
TABLE 2: Patient compliance
Coronary Drug Project
3885 post-heart attack men were
given clofibrate or placebo
708 clofibrate and 1813 placebo patients were
at least 80% compliant
Table 2 Coronary drug project 5-year mortality
Clofibrate
Total (as reported)
By compliance
<80%
>80%
n
1103
1065
357
708
% Deaths
20.0
18.2
24.6
15.0
Placebo
n
2782
2695
882
1813
% Deaths
20.9
19.4
28.2
15.1
Compliance itself is considered an outcome: thus to base the interpretation
of the ‘drug outcome’ on the ‘compliance outcome’ is confounding
Dealing with noncompliance
Larger sample sizes are required to
compensate for the dilution effect of
noncompliance
10% noncompliance requires 23%
increase in sample size
20% noncompliance requires a 56%
increase in sample size
Statistical Issue 2:
Surrogate outcome measures
Outcome measures of primary
question must be:
Clinically relevant
Sensitive to intervention
Ascertainable in all patients
Resistant to bias
Result: Large, time-consuming,
costly studies
Alternative approach: surrogate
outcome measures
Surrogate outcome measure:
Assumption
If the intervention will modify surrogate
outcome, it will modify the primary
clinical outcome
Surrogate outcome measure:
Requirements
1.
Surrogate outcome must be predictive of
clinical outcome
2.
Surrogate outcome must fully capture
the total effect of the intervention on the
clinical outcome
“Necessary and sufficient”
Surrogate outcome measures:
Difficult to obtain and validate
Intervention may
modify the surrogate
and have no or only
partial effect on the
clinical outcome
Intervention may
modify the clinical
outcome without
affecting the
surrogate
(Note: NOT surprisingly, track record for use
of surrogate outcome measures is very bad)
Surrogate outcome measures:
Example:
Cardiac Arrhythmia Suppresion Trial (CAST)
Three drugs tested for suppression of cardiac
arrhythmias
All three drugs had been shown to suppress premature
cardiac ventricular contractions (surrogate)
Two drugs terminated early (10-15% into study)
because both drugs dramatically increased causespecific sudden death and total mortality
Table 3 Cardiac Arrhythmia Suppression Trial
Early termination in two drug arms
Drugs
Placebo
Sudden death
33
9
Total mortality
56
22
Clearly the interventions (drugs) had differential effects on the surrogate measure
(premature cardiac ventricular contractions) and the clinical outcome (mortality)
Statistical Issue 3:
Subgroup analyses
Clinical trials usually try to include
as many (diverse) patients as
possible for multiple reasons:
Large sample size
Reasonable recruitment time
Assess internal consistency of results
Seemingly logical use of the large
data set is to do many post hoc
analyses on subgroups
Subgroup analysis:
Mathematical problems
Introduction of subgroups increases
probability of false positives
5 subgroups yields greater than 20%
chance of at least one (p=0.05)
statistically significant difference BY
CHANCE
Subgroup analysis:
MERIT trial
Beta-blocker (metoprolol) treatment for patients
with congestive heart failure
Showed a 34% reduction in mortality overall
Subgroup analysis:
MERIT trial
Consistency of mortality
results across lots of
subgroups found with
subgroup analysis:
Subgroup analysis:
MERIT trial
In the USA, total
mortality is not reduced,
yet total mortality plus
any hospitalization is…?
Subgroup analysis:
MERIT trial
Two other similar heart failure trials
evaluating other beta-blockers
showed no regional difference;
THUS, it is likely that the MERIT
finding is due to chance alone.
Subgroup analysis:
PRAISE-I and PRAISE-II trials
PRAISE-I performed to evaluate
amlodipine for the treatment of
congestive heart failure
Subgroups:
Ischemia
Nonischemia
Analysis of subgroups separately showed
a significant (p<0.001) effect of
amlodipine on heart failure in
nonischemic patients, but no effect on
ischemic patients
Researchers decided to perform PRAISE-II
trial on nonischemic patients only
Subgroup analysis:
PRAISE-I and PRAISE-II trials
PRAISE-II showed remarkably
similar mortality results in the drug
and placebo groups
PRAISE-II directly opposed the
exciting results of PRAISE-I’s
subgroup analysis
Statistical Issue 4:
Missing data
Missing data is often simply
“dropped”
This violates two rules:
1.
2.
Intention-to-treat rule all patients
must be accounted for in primary
outcome analysis
Common sense rule if patient is too
sick to complete trial, this may be
informative!
Missing data
In “time to event” trials (like
mortality), data can be missing
because the study ends before the
event happens
Patients are then “censored” (dropped)
This can introduce serious
mathematical bias
(Mortality studies in USA have no excuse death
indices allow follow-up without help from patient)
Statistical Issue 5:
Noninferiority trials
“New intervention is not worse than the
standard”
New intervention may be:
Easier to administer
Better tolerated
Less toxic
Less expensive
Any given study may be a superiority and/or
noninferiority trial, depending on results
Noninferiority trials
Noninferiority trials
Three challenges must be met:
1.
2.
3.
Noninferiority trial must be of highest
quality to detect clinically meaningful
differences
Noninferiority trial must have a
strong, effective control intervention
(state-of-the-art care)
Margin of indifference is arbitrary,
depending on medical importance of
treatment and risk-to-benefit tradeoffs
Noninferiority trials:
OPTIMAAL Trial
Losartan (angiotensin II receptor
blocker) vs captopril (ACE inhibitor)
in heart failure patient population
Losartan has fewer (and less severe)
side effects than captopril
OPTIMAAL
Designed to detect 20% reduction in
relative risk, with 95% power
Margin of indifference set at 1.1
Thus 95% confidence interval needed to
exclude risk of 1.1 to declare losartan
“noninferior” to captopril
Noninferiority trials:
OPTIMAAL Trial
Mortality results for OPTIMAAL
Relative risk of 1.126 with 95%
confidence interval of 1.28
NEITHER superiority nor noninferiority
were achieved
Researchers computed that captopril
had (historical data) a relative risk of
0.806 vs. placebo, and thus calculated
that losartan must therefore have a
relative risk of 0.906 vs. placebo…
The statistically appropriate conclusion at
this point is:
NO ACCEPTABLE CONCLUSIONS POSSIBLE
FROM THIS DATA
CONCLUSIONS
Statistics can not make up for bad
design
Statistics can not make up for poor
execution of design
Statistics is very limited in being
able to compensate for
Ineligible patients being enrolled
Noncompliance
Unreliable outcome measures
Missing data
Underpowered trials