Handout/Slides

Download Report

Transcript Handout/Slides

Biostatistics Case Studies 2006
Session 5:
Reporting Subgroup Results
Peter D. Christenson
Biostatistician
http://gcrc.LAbiomed.org/Biostat
Subgroup Issues
• Measuring subgroup effect
• Subgroups separately
• Interaction
• Selection of subgroups
• A priori
• Post-hoc
• Based on data
• Significance/strength of Conclusions
• Transparency of analysis
• Formal statistical comparisons; p-values, CIs.
Case Study
Editorial:
pp. 1667-69
Case Study: Abstract
Main Subgroup Result
Separate Subgroup Comparisons
% with
Events
7.9
Symptomatic
N = 12153
Δ= 1.0 p=0.05
RR=0.88 0.77 to 1.0
6.9
6.6
N = 3284
Asymptomatic
Combination
5.5
Aspirin Only
Δ=-1.0 p=0.20
RR=1.2 0.91 to 1.59
Separate Subgroup Conclusions
• Symptomatic group: Combination better
• Large N.
• Is magnitude of effect relevant? See CIs.
• Asymptomatic group: Inconclusive (0.91 ≤RR≤ 1.59)
• Same magnitude, apparent inverse from symptomatics.
• Much smaller N; less power.
• Have not demonstrated subgroup difference.
• Use interaction to do so.
• Need to, based on CIs?
Subgroup Interaction
N = 12153
Δ= 1.0 p=0.05
RR=0.88 0.77 to 1.0
Interaction = Δ Δ = 2.0%
N = 3284
vs.
Δ=-1.0 p=0.20
RR=1.2 0.91 to 1.59
with 95% CI ~ 0.65% to 3.35%
Why Is Interaction Relevant? Next slide
Subgroup Conclusions with Interaction
• Symptomatic group: Combination better
• Large N.
• Is magnitude of effect relevant? See CIs.
• Asymptomatic group: Inconclusive (0.91 ≤RR≤ 1.59)
• Same magnitude, apparent inverse from symptomatics.
• Much smaller N; less power.
• Difference between subgroups:
• Significant according to interaction.
• Inverse “non-effect” nevertheless incorporated.
Change Data to Give Non-Significant Interaction
Suppose:
% with
Events
7.9
Symptomatic
N = 12153
Δ= 1.0 p=0.05
RR=0.88 0.77 to 1.0
6.9
6.6
6.4
Δ=-0.2 p=0.80
RR=1.03 0.40 to 1.4
Asymptomatic
Combination
N = 3284
Aspirin Only
→ P for interaction ~ 0.50.
Change conclusions?
Changed Data Subgroup Conclusions
• Symptomatic group: Combination better
• Large N.
• Is magnitude of effect relevant? See CIs.
• Asymptomatic group: Inconclusive (0.40 ≤RR≤ 1.40)
• Apparently negligible, but not proven.
• Much smaller N; less power.
• Difference between subgroups:
• Not demonstrated.
• Use CI for ΔΔ to quantify magnitude of difference.
Change Data to Give Non-Significant Interaction
Suppose:
% with
Events
7.9
Symptomatic
N = 12153
Δ= 1.0 p=0.05
RR=0.88 0.77 to 1.0
6.9
6.6
6.4
N = 3284 10000
Δ=-0.2 p=0.80
RR=1.03 0.40 to 1.3
Asymptomatic
0.96 to 1.1
Combination
Aspirin Only
New Changes
→ P for interaction will be small.
Twice-Changed Data Subgroup Conclusions
• Symptomatic group: Combination better
• Large N.
• Is magnitude of effect relevant? See CIs.
• Asymptomatic group: Negligible (0.96 ≤RR≤ 1.1)
• Negligible, proven.
• Larger N → smaller CI; power not relevant.
• Difference between subgroups:
• Significantly demonstrated with interaction.
• Use CI for ΔΔ to quantify magnitude of difference.
Many Subgroup Analyses
12 Subgroups + Overall
Formal Multiple Comparison Adjustment
• Number of comparisons: k.
• Individual comparison false positive error rate = α.
• Experiment-wise error rate = α*.
• Bonferroni adjustment:
• Assume k comparisons are independent.
• True negative rate = specificity = 1 – α.
• Set α* = 1 - (1 – α)k → solve for α = 1 - (1 – α*)1/k =~ α*/k.
• So, typically p< 0.05/(# tests) = 0.05/13= 0.004 here.
• Conservative if comparisons are correlated; can improve
if correlation is known.
• No adjustment: Prob[≥1 false pos]=1-0.95k =0.49 if k=13. See
next slide.
Likelihood of False Positive Conclusions
Subgroup Multiple Comparison Comments
• Many other specialized methods.
• Pre-specified comparisons count just as post-hoc, if
post-hoc not based on results.
• Why limit “experiment-wise” count to subgroup
comparisons?
• No formal comparisons in this paper (but what if a
large diff was observed?): Table 1-3: 22+20+26
potential covariates.
• P-values: Table 4 – 12 efficacy and safety
comparisons.
• Figure 2: 12 Subgroups. At least one explicit test.
Subgroup Multiple Comparison Conclusions
• Obviously usually need to examine subgroups.
• If want to claim more than observations, need to
adjust in a well-defined way.
• Typically, report as observational and:
• Explain decisions and choices of subgroups.
• Formal adjustment typically not necessary.
• Avoid p-values. Emphasize CI range.
• Separate planned from data mining results.
• Number of comparisons should be explicit.
Recommendations for Reporting on Subgroups:
• See Editorial. Use to justify the following approach to journal.
• Do not make multiple comparison adjustment.
• Be transparent about all analyses.
• State where conclusions are based on interactions.
• Report number of comparisons that were planned prior to
looking at data (1) included and (2) not included in
paper.
• Report which results were a consequence of looking at data;
no p-values.
• Report if alternate definitions for a subgroup were examined.
• Give confidence intervals for effects that are compatible with
the data, not p-values, for subgroups.
Recommendations: Example of a Start at Them
Cohan(2005) Crit Care 23;10:2359-66.