Subgroup analyses driven by

Download Report

Transcript Subgroup analyses driven by

THE RELIABILITY AND UNRELIAB ILITY
OF SUB-GROUP ANALYSES
JEFFREY L. PROBSTFIELD, MD, FACP, FACC, FAHA, FESC, FSCT
Professor of Medicine (Cardiology)
University of Washington
Research grants- Abbott, Boehringer Ingelheim, King,
Sanofi-Aventis Pharmaceuticals, NHLBI, NCI;
Consultantship-King Pharmaceuticals;
no stocks, options or BOD positions
DILEMMA
“The response of the average patient to
therapy is not necessarily the response of
the patient being treated.”
Bernard, 1865
SUBGROUP ANALYSES
DRIVEN BY:
“Should all patients be given XYZ
before, during, or after ABC or
can/should treatment be limited to
a select group?”
SUBGROUPS- PETO
• Only one thing is worse than doing
subgroup analyses---believing the
results
PETO: HOW TO SPOIL A
GOOD TRIAL RESULT
1. Undertake many data-dependent
subgroup analyses.
2. Find some subgroups where treatment
has no significant effect (or even,
perhaps, no apparent effect
whatsoever).
3. Publish the findings in such a way that
many readers believe them.
CARE RESULTS - NO CHD RISK
REDUCTION BELOW LDL-C 125 mg/dL
“…Although our finding cannot be
considered definitive and requires
confirmation, it suggests that an LDL
cholesterol level of 125 mg per deciliter
may be an approximate lower boundary
for a clinically important influence of the
LDL cholesterol level on coronary heart
disease…”
NEJM 1996;335:1001-1009
CHOLESTEROL LEVELS AND
CHD RISK REDUCTION
Cholesterol and Recurrent Events (CARE)
4159 Participants
Plasma
LDL-C
 137
> 137
Participants
With Events
Placebo
Pravastatin
269
280
210
220
NEJM 1996;335:1001-1009
% Risk
Reduction
23 (8 to 36)
24 (10 to 36)
CHOLESTEROL LEVELS AND
CHD RISK REDUCTION
Cholesterol and Recurrent Events (CARE)
4159 Participants
Plasma
LDL-C
Participants
with Event
Placebo
Pravastatin
% Risk
Reduction
 125
125-150
> 150-175
93
311
145
-3 (-38 to 23)
26(13 to 38)
35(17 to 50)
89
239
102
NEJM 1996;335:1001-1009
CARE SUBGROUP ANALYSIS
RISK REDUCTION BELOW
BASELINE LDL-C mg/dL
Concerns about divisions described
Plasma
LDL-C
> x(20%)
> 150mg/dL
125 - 150
< 137.5
< 130
< 127
< 125(20%)
Participants
N
850
953
2355
2090
1386
1034
851
%CHD Risk
Reduction
N/A
35
26
23
15
10
-3
95% CI
N/A
17 to 50
13 to 38
8 to 36
N/A
N/A
-38 to 23
HPS
LDL-C
<3 (<116)
>3<3.5
>3.5
LDL-C
<100
100-130
>130
N
RRR CHD
6793
5063
8680
33%
25%
42%
N
Δ LDL-C
E
RRR CHD
3421
7068
9927
-35
-37
-39
69
86
104
22%
28%
24%
No interaction with Vitamin Cocktail
PROPER SUBGROUP
“A common set
of baseline parameters”
IMPROPER SUBGROUP:
“Characterized by a variable
measured after randomization”
Hypokalemia Associated With Diuretic
Use and CV Events in SHEP
Franse, et al. Hypertension. 2000;35:1025-1030
Cumulative CHD Event Rate
CHD Event Rate by Year 1 K+ Strata
HR
95% CI
1.28 0.69, 2.40
1.03 0.82, 1.30
Hyper/Normo-K+
Hypo/Normo-K+
0.15
Y1 K+ = < 3.5
Y1 K+ = 3.5-5.4
Y1 K+ = > 5.4
0.12
0.09
0.06
0.03
0.00
0
1
2
Years to CHD
3
4
5
EXAMPLES OF IMPROPER
SUBGROUPS:
1. Responders vs. nonresponders
2. Adherers vs Non-adherers
FIVE-YEAR MORTALITY ACCORDING TO
BASE-LINE CHOLESTEROL AND CHANGE
FROM BASE-LINE, ADJUSTED
FOR 40 BASE-LINE CHARACTERISTICS
Treatment Group
Clofibrate
Placebo
Baseline
Cholesterol
MG/DL
Cholesterol
Change
No. of Pts.
% mortality
No. of Pts.
% mortality
< 250
All men
507
20.0 ± 1.8
1319
19.9 ± 1.1
> 250
All men
490
17.5 ± 1.7
1216
20.6 ± 1.2
All men
Fall
680
17.2 ± 1.4
1376
20.7 ± 1.1
All men
Rise
317
22.2 ± 2.3
1159
19.7 ± 1.2
< 250
Fall
295
16.0 ± 2.1
614
21.2 ± 1.6
< 250
Rise
212
25.5 ± 3.0
705
18.7 ± 1.5
> 250
Fall
385
18.1 ± 2.0
762
20.2 ± 1.5
> 250
Rise
105
15.5 ± 3.5
454
21.3 ± 1.9
FIVE-YEAR MORTALITY: PATIENTS GIVEN
CLOFIBRATE OR PLACEBO, ACCORDING TO
CUMULATIVE ADHERENCE TO PROTOCOL
PRESCRIPTION
Treatment Group
Clofibrate
Placebo
No. of Pts.
% mortality
No. of Pts.
% mortality
< 80%
357
882
> 80%
708
Total study
group
1065
24.6 ± 2.3
(22.5)
15.0 ± 1.3
(15.7)
18.2 ± 1.2
(18.0)
28.2 ± 1.5
(25.8)
15.1 ± 0.8
(16.4)
19.4 ± 0.8
(19.5)
1813
2695
STRUCTURED HYPOTHESES
• Carefully state hypothesis
• Allow analyses to capture the effect
INTERACTION
(Differential Subgroup Effect)
“A treatment effect that
differs by subgroup.”
QUANTATIVE INTERACTION
Different amount (quantity) of
benefit in various subgroups.
QUALITATIVE INTERACTION
True Benefit in some subgroups
and True Harm in others
(1 of over 700)
QUALITATIVE DIFFERENCES
-WHY NOT?
• Extremes excluded
• Lack of replication in other studies
BIASES AND ERRORS IN
DETERMINING SUBGROUP
EFFECTS
1. Subgroups lack statistical power
2. Random variation - widely divergent
estimates of treatment benefit
3. Statistical multiplicity
4. Post-hoc analyses - extreme results the
product of random errors
5. Replication - a posteriori vs. a priori
FURBERG AND BYINGTON
CIRCULATION 1983;67:I98-I101
• 146 Subgroups in BHAT: Few defined a priori
• Distribution of subgroup results - Gaussian
• Impact of change on data set inversely related
to sample size. (Participants or deaths gives
similar distribution)
3 CRITERIA FOR CONFIDENCE
IN SUBGROUP FINDING
• Dose response relationship
• Independent findings within the study
• Replication by outside trial
EXPECTED EFFECTS OF TRIAL
SIZE ON TRIAL RESULTS
Total no. of deaths
in trial
(treated+control)
(Approx. no. of
patients
randomized if risk
10%
Approx Probability
of failing to achieve
1 P<0.01
significance if true
risk reduction is
1/4
Comments that
might be made of
size before trial
begins
0-50
(under 500)
Over 0.9
Utterly inadequate
50-150
(1000)
0.7-0.9
Probably
inadequate
150-350
(3000)
0.3-0.7
Probably adequate,
possibly not
350-650
(6000)
0.1-0.3
Possibly adequate
Over 650
(10,000)
Under 0.1
Definitely Adequate
Actual effects of trial size on trial results. Relationship between
the total number of deaths in the two treatment groups and the
result actually attained, in the 24 trials of a treatment (long-term
beta-blockade) that reduces the odds of death by about 22 + 4%
No. of trials resulting in:
Total no. of
deaths in
trial
(β-bl.+plac.)
(Mean no. of
patients
randomized)
Statistical
power
P<0.5
against
Non-sigt.
against
Non-sigt.
favorable
P<0.5
favorable
0-50
(255)
Utterly
Inadequate
0
5
5
0
50-150
(861)
Probably
Inadequate
0
1
9
1
150-350
(2925)
Possibly
adequate,
probably not
0
0
1
2
350-650
(No such β bl. trials
exist
Probably
Adequate
-
-
-
-
Over 650
No such β bl. trials
exist
Definitely
Adequate
-
-
-
-
TOTAL
(866)
Inadequate
separately,
adequate only
in aggregate
0
6
15
3
SUBGROUP EFFECT
Treatment effect in a specific proper subgroup.
Must be significantly different from
overall effect!!
HYPOTHETICAL SUBGROUP EFFECTS
ILLUSTRATING THE “PLAY OF CHANCE” IN A
TRIAL THAT SHOWS CLEAR OVERALL BENEFIT
(%)
Overall
240/3,000
result
(8)
Subgroup A 80/1,000
(8)
Subgroup B 70/1,000
(7)
Subgroup C 90/1,000
(9)
(%)
Risk
Decrease
(%)
300/3,000
20
(10)
100/1,000
20
(10)
110/1,000
36
(11)
90/1,000
0
(9)
p Value
< 0.01
NS
< 0.001
NS
MULTIPLE COMPARISONS
Example:
• 1,000 participants
Mortality Rate = 10%
Treat A
Treat B
Treatments equally effective
• 10 subgroups (equal size) randomly formed
Relative Risk
Probability
Reduction to:
(percent)
.5
99
.33
80
.1
5
• Nominal p value - inappropriate
• Conservative approach - p/Sn
• Especially important in trial where main outcome is
not statistically significant
Subgroup Results
Mortality and Morbidity
Subgroup
% pts
Overall
.77 .87 .97
100
Beta blocker
(yes)
35
RR of death
P
no beta blocker 65
 41%
<0.05*
ACEI (yes) ACEI 93
+ Beta blocker
 42%
0.009
no ACEI
no ACEI*
*n=366
44.0% , P=.0002
7
0.50
0.75
Valsartan better
1.0
1.25
Valsartan worse
Cohn. N Engl J Med. 2001; *FDA analysis/package insert
CV Death or Hospitalization for CHF
Candesartan
event/n
Placebo
event/n
Test for
interaction
Diabetes
No
Yes
680/2715
470/1088
815/2721
495/1075
P=0.09
Hypertension
No
Yes
484/1710
666/2093
579/1703
731/2093
P=0.17
ACEIs
No
Yes
586/2230
564/1573
688/2244
622/1552
P=0.51
Beta
blocker
No
Yes
611/1701
539/2102
710/1695
600/2101
P=0.32
Spironolactone
No
Yes
880/3160 1041/3167
270/643
269/629
P=0.19
Overall
1150/3803 1310/3796
0.6 0.8 1.0 1.2 1.4
candesartan Hazard
placebo
better
ratio
better
EXAMPLE OF “SUBGROUPING” IN
INTERNATIONAL SOCIETY FOR THE
INVESTIGATION OF STRESS-2:
ASTROLOGY AND ASPIRIN
Vascular Mortality at Week 5
Aspirin (%)
Placebo (%)
Odds
Decrease
(% ± SD)
Patients born
under Libra
and Gemini
150/1,357
(11.1)
147/1,442
(10.2)
8% adverse
(NS)
Patients born
under other
“birth signs”
654/7,228
(9.0)
868/7,157
(12.1)
26% ± 5
(p<0.0001)
Overall results
804/8,587
(9.4)
1,016/8,600
(11.8)
23% ± 4
(p<0.0001)
ORDERED SUBGROUPS
•
•
•
•
Strong biological rationale
Reflects natural ordering
Correct for multiplicity
Only indicate those as significant
which have a p value less than p/Sn
Reduction of Stroke
According to Sex
Stroke Rates
Placebo
% Difference
N
Active
Rx
Males
2046
5.5%
8.7%
-38%
Females
2690
4.7%
7.3%
-31%
GISSI-1
• Overall result - streptokinase treatment,
20% reduction in total mortality
• Benefit confined to:
– Anterior MI
– Age  65 years
– Treatment  6 hours
• Subsequent trials and pooled results do
not confirm
HYPOTHETICAL EXAMPLE OF
ORDERED SUBGROUPS: RELATIVE
RISK AS A FUNCTION OF TIME OF
THROMBOLYTIC THERAPY
Hours
After Pain
≤2
14-d Mortality (%)
Treated
Control
10
20
Relative
Risk
P
.50
.01
>2–4
11
16
.70
NS
>4–8
17
22
.77
NS
> 8 – 12
20
25
.80
NS
> 12
23
24
.95
NS
Overall
14
21
.67
.001
SUBGROUPS
DEFINED A-PRIORI
• Suggestive differential subgroup effect
• State in design of new trial
• Publish (multiplicity, design analysis, plan
over-sampling)
• Test within an existing data set
STROKE SUBGROUP
HYPOTHESIS
On BP Meds at ICV
Off BP Meds at ICV
35% of participants
65% of participants
Net reduction in
stroke rate =10%
Net reduction in
stroke rate =40%
80% power to detect 30% treatment difference
STROKE EVENTS BY
MEDICATION STATUS
GROUP
N
NFS
Not on Medications
FS
CS
Active
Placebo
1584
64
5
67
1577
88
11
96
‫א‬2 = 6.72 Relative Risk (active/placebo)= 0.69
P = .0096
95% CI = 0.51-0.95
On Medications
Active
Placebo
781
32
5
36
794
61
3
63
‫א‬2 = 5.11 Relative Risk (active/placebo)= 0.57
P = .0237
95% CI = 0.38-0.85
MRFIT RESEARCH GROUP, AM J
CARDIOL 1985;55:1-15
ECG ABNORMALITIES AT BASELINE
Present
Absent
Person
Yrs.
No. of
Deaths
Per 1000
Per Yrs.
Person
Yrs.
No. of
Deaths
Per 1000
Per Yrs.
Not on
Diuretic
6074
9
1.48
17,356
35
2.02
On Diuretic
6433
38
5.91
14,399
33
2.29
SUBGROUP HYPOTHESIS
Will the treatment of ISH reduce the
frequency of major coronary events
more in those free of baseline ECG
abnormalities than in those with such
abnormalities?
OTHER COMBINED EVENTS
BY TREATMENT GROUP
Event
Active
Placebo
Rel. Risk
95% CI
Nonfatal
MI/CHD
Death
104
141
0.73
0.57-0.94
Stroke/MI/
CHD/Death
199
289
0.67
0.56-0.81
CHD
140
184
0.75
0.60-0.94
CVD
289
414
0.68
0.58-0.79
NONFATAL MI & CHD DEATH BY
BASELINE ECG ABNORMALITIES
Treatment
Number
Events
With baseline ECG abnormalities
Active
1429
67
Placebo
1426
96
‫א‬2 = 5.73
Rel. Risk = 0.69
P = 0.02
95% CI = 0.50-0.94
Without baseline ECG abnormalities
Active
903
35
Placebo
922
43
‫א‬2 = 0.70
Rel. Risk = 0.83
P = 0.40
95% CI = 0.53-1.29
Rate per 100 (SE)
6.0 (0.8)
8.0 (0.9)
4.5 (0.8)
4.6 (0.7)
SUDDEN DEATH BY BASELINE
ECG ABNORMALITIES
Treatment
Number
Events
With baseline ECG abnormalities
Active
1429
15
Placebo
1426
17
‫א‬2 = 0.14
Rel. Risk = 0.88
P = 0.71
95% CI = 0.44-1.75
Without baseline ECG abnormalities
Active
903
8
Placebo
922
5
‫א‬2 = 0.77
Rel. Risk = 1.64
P = 0.38
95% CI = 0.54-5.01
Rate per 100 (SE)
1.2 (0.3)
1.3 (0.4)
1.2 (0.4)
0.6 (0.3)
SUBGROUPS DEFINED
A-POSTERIORI
• “Grist” for formulating hypothesis
• Watch for alternative definitions!
• Should be clearly stated and reported
as an estimate of effect with appropriate
confidence interval
SUBGROUPS AND
MONITORING TRIALS
•
•
•
•
•
Use statistically sound monitoring method
Interference with main trial endpoint - rare
Formulate hypothesis and test prospectively
Terminate subgroup
“Mega trials” - special problems
BUCHWALD AND COLLEAGUES:
POSCH, NEJM1990;323;946-955
• Cholesterol lowering by ileal bypass for
secondary prevention of CHD
• 838 participants randomized
• Primary outcome-Fatal and non-fatal CHD
– 62 vs 49, p=0.164
• Subgroup, EF<50% vs >50%,
– Final 24 vs 39, p=0.021
– post-hoc analysis 6 vs 17
– after observed phenomenon - 18 vs 22
ESSENTIALS IN
SUBGROUP REPORTING
•
•
•
•
•
•
PRESPECIFICATION OF SUBGROUPS
NUMBER OF SUBGROUPS
NUMBER OF SUBGROUP OUTCOMES
STATISTICAL METHODS (INTERACTION)
NUMBER OF SIGNIFICANT SUBGROUPS FOUND
EMPHASIS OF SUBGROUP VS PRIMARY
OUTCOME
SUGGESTIONS TO APPROPRIATELY PERFORM
AND INTERPRET SUBGROUP ANALYSIS
Hernandez AV, et al. Am Heart J. 2006;151:257-64.
DICTUM 1
“The treatment effect in all subgroups of
patients without obvious contraindications
to treatment is likely to be qualitatively (in
the same direction) similar.”
DICTUM 2
“The treatment effect in all subgroups of
patients without obvious contraindications
to treatment is likely to be quantitatively
(difference in degree) dissimilar even
when effects appear to be identical.”
DICTUM 3
“Estimates of treatment effect within a
subgroup chosen for special emphasis
are usually ‘biased’ and so the most
appropriate estimate in a subgroup is
closer to the overall result.”
KEY POINTS IN SUBGROUP
ANALYSES & INTERPRETATION I
Design
1. State clearly a few important and plausible subgroup
hypotheses in advance. Include the direction of the expected
effect.
2. Rank the subgroup hypotheses in order or plausibility.
3. Calculate power to detect key subgroup effects. If it is
inadequate, consider building adequate power to detect key
subgroup effects.
4. State whether the trial will be continued even after the
overall results are convincing but the subgroup effects are
not significant. Decide whether the primary method of
monitoring will focus on the subgroups or on the overall trial.
5. State primary analytical methods in advance.
KEY POINTS IN SUBGROUP
ANALYSES AND INTERPRETATION II
Monitoring a trial
1. Rigorous evidence of benefit or harm in
subgroups postulated a priori: consider
selective discontinuation of that subgroup.
2. Evidence of benefit or harm in
unexpected subgroups: postulate a
hypothesis to be tested in the remaining
part of the study.
KEY POINTS IN SUBGROUP
ANALYSES & INTERPRETATION III
Analyses and interpretation
1. Use statistical methods that capture the framework of the prior
hypotheses.
2. Place greater emphasis on the overall results than on what may be
apparent within a particular subgroup.
3. Distinguish between prior and data-derived hypotheses. Do not
calculate p values for data-derived hypotheses because such p
values usually bear little resemblance to what could occurs if the
hypothesis were tested independently in another study.
4. Use tests of “interactions” and/or correct for multiplicity of
statistical comparisons. (“Nominal” p values are usually misleading.)
5. Interpret the results in the context of similar data from other trials,
from the architecture of the entire set of data on all patients, and from
principles of biological coherence.
KEY POINTS IN SUBGROUP
ANALYSES AND INTERPRETATION IV
Improper subgroups and analyses
1. Avoid analyses of subgroups based on postrandomization response, adherence, etc.
2. Avoid emphasizing nominal p values.
3. Do not emphasize data-derived analyses or
analyses based on post-hoc definitions of
subgroups.