Session Slides/Handout

Download Report

Transcript Session Slides/Handout

Biostatistics in Practice
Session 6:
Data and Analyses:
Too Little or Too Much
Youngju Pak
Biostatistician
•http://research.LABioMed.org/Biostat
Too Little or Too Much: Data
•Too
Little
• Too few subjects: study not sufficiently
powered (Session 4)
• A biasing characteristic not measured:
attributability of effects questionable
(Session 5)
• Subjects do not complete study, or do not
comply, e.g., take all doses (This session)
•“Too Much”
• All subjects, not a sample (This session)
• Irrelevant detectability (This session)
Too Little or Too Much: Analyses
•Too
Few: Miss an Effect
•Too
Many: Spurious Results
•Numerous
analyses due to:
Multiple possible outcomes.
•
Ongoing analyses as more subjects accrue.
•
Many potential subgroups.
Non-Completing
or
Non-Complying
Subjects
All Study Subjects or “Appropriate” Subset
What is the most relevant group of studied
subjects: all randomized, or mostly compliant,
or completed study, or …?
•Possible
Bias Using Only Completers
•Comparison:
% cured, placebo vs. treated.
•Many
more placebo subjects are not curing
and go elsewhere; do not complete study.
•Cure
rate is biased upward in placebo
completers
 under-estimate the treatment effect
•If
cure rate is biased upward in treatment
completers  over-estimate the treatment
effect
•Criteria
•Study
for Appropriate Subset
Goal:
Scientific effect?
Societal impact?
•Potential
Primarily
Compliance
Biased Conclusions:
Why not completed?
Study arms equivalent?
Primarily
Dropout
Possible Study Populations
•Per-Protocol
•Had
Subjects:
all measurements, visits, doses, etc.
•“Modified”:
relaxations , e.g., 85% of
doses.
•Emphasis
on scientific effect.
•Intention-to-Treat
•Everyone
who was randomized.
•“Modified”:
•Emphasis
Subjects:
slight relaxations, e.g., ≥ 1 dose.
on non-biased policy conclusion.
•Intention-to-Treat
(ITT)
•ITT
specifies the population; it includes noncompleters.
•Still
need to define outcomes for noncompleters, i.e., “impute” values.
•Typical
to define non-completers as not cured.
ITT: Two Ways to Impute Unknown Values
•Observations
•LOCF:
•0
•Ignore
•Change
from
Baseline
Presumed
Progression
•Individual
Subjects
•Baseline
•Intermediate
Visit
•Final
Visit
•Ranks
•LRCF:
•0
•Maintain
Expected
Relative
Progression
•Change
from
Baseline
•Baseline
•Intermediate
Visit
•Final
Visit
•“Too
Much” Data
•All
Possible Data, No Sample
•“Too
much” data to need probabilistic
statements; already have the whole truth.
•Not
always as obvious as it sounds.
•Examples:
Electric Medical Records(EMR),
some chart reviews; site-specific, not samples.
•Confidence
•Reference
intervals usually irrelevant.
ranges, some non - generalizable
comparisons may be valid.
•Irrelevant
•Significant
(?) Detectability with Large
Study
differences (p<0.05) in %s between
placebo and treatment groups:
•N/Group
Difference
#Treated* to Cure
1
•100
50% vs. 63.7%
7
•1000
50% vs. 54.4%
23
•5000
50% vs. 52.0%
50
•10000
50% vs. 51.4%
71
•50000
50% vs. 50.6%
167
•*NNT
= Number Needed to Treat = 100/Δ
Too Little or Too Much: Analyses
•Too
Little or Too Much: Analyses
•Multiple:
•
Outcomes
•
Subgroups
•
Ongoing effects
•Exploring
vs. Proving
Multiple Outcomes
•Balance
Between Missing an Effect and
Spurious Results
•Food Additives
and Hyperactivity Study:
•
Uses composite score.
•
Many other indicators of hyperactivity.
Multiple Outcomes
•10
Items
•10
Items
•12
Items
•…
•…
4 Items
•Could
ADHD
•Teacher
ADHD
•GHA:
Global
Hyperactivity
Aggregate
•Class
•…
•
•Parent
•…
ADHD
•Conner
perform: 10 + 10 + 12 + 4 = 36 item analyses.
•Multiple
Subgroup Analyses: Example
•Editorial:
•pp.
1667-69
•Multiple
Subgroup Analyses: Example
•Comparing
Two Treatments
in 25 Subgroups + Overall
Multiple Subgroup Analyses
•Lagakos
NEJM 354(16):1667-1669.
•False
Positive
Conclusions
•
72%
chance of
claiming at
least one
false effect
with 25
comparisons
A Correction for Multiple Analyses
•No
Correction:
•If
using p<0.05, then P[ true negative] = 0.95.
•If
25 comparisons are independent,
P[all true negative] = (1-0.05)25 = (0.95)25 = 0.28.
•So,
P[at least 1 false pos] = 1 - 0.28 = 0.72.
•Bonferroni
Correction:
maintain P[true negative in k tests] = 0.95 = (1-p*)k,
need to use p* = 1 - (0.95)1/k ≈ 0.05/k
•To
•So,
use p<0.05/k to maintain <5% overall false
positive rate(type I error rate).
Accounting for Multiple Analyses
•Some
formal corrections “built-in” to p-values:
•
Bonferroni: general purpose
•
Tukey: for pairs of group means, >2 groups
•
Many statistical software will compute
adjusted p-values due to the multiple tests
using these methods
•Formal
corrections may not be necessary:
•
Transparency of what was done is most important.
•
Should be aware yourself of number of
analyses and report it with any conclusions.
Reporting Multiple Analyses
•Clopidogrel
paper 4 slides back:
•No
p-values or probabilistic conclusions for 25
subgroups, and:
•Another
paper’s transparency:
•Cohan,
Crit Care Med 33(10):2358-2366.
Multiple Mid-Study Analyses
•Should
effects be monitored as more and
more subjects complete?
•Some
•
mid-study analyses:
Interim analyses
• Study size re-evaluation
• Feasibility analyses
Mid-Study Analyses
•Too
many
analyses
•Effect
•0
•Wrong
early
conclusion
•Number
•Need
of Subjects Enrolled
•Time
→
to monitor, but also account for many analyses
Mid-Study Analyses

Mid-study comparisons should not be made
before study completion unless planned for
(interim analyses). Early comparisons are
unstable, and can invalidate final comparisons.

Interim analyses are planned comparisons at
specific times, usually by an unmasked advisory
board. They allow stopping the study early due
to very dramatic effects, and final comparisons,
if study continues, are adjusted to validly
account for “peeking”.
•Continued
…
Mid-Study Analyses

Mid-study reassessment of study size is advised
for long studies. Only standard deviations to
date, not effects themselves, are used to assess
original design assumptions.

Feasibility analysis:
 may use the assessment noted above to
decide whether to continue the study.
 may measure effects, like interim analyses, by
unmasked advisors, to project ahead on the
likelihood of finding effects at the planned end
of study.
•Continued
…
Mid-Study Analyses
•
•
•
Examples: Studies at Harbor
Randomized; not masked; data available to PI.
Compared treatment groups repeatedly, as more
subjects were enrolled.
•
Study 1: Groups do not differ; plan to add more
subjects.
• Consequence → final p-value not valid; probability
requires no prior knowledge of effect.
•
Study 2: Groups differ significantly; plan to stop
study.
• Consequence → use of this p-value not valid; the
probability requires incorporating later comparison.
Bad Science That Seems So Good
1.
Re-examining data, or using many outcomes,
seeming to be due diligence.
2.
Adding subjects to a study that is showing
marginal effects; stopping early due to strong
results.
3.
Looking for effects in many subgroups.
•
Actually bad? Could be negligent NOT to do
these, but need to account for doing them.
How to avoid the misled result
 Analyses
should be planned before the
data are collected (how many dependent
and independent variables are to be
collected, what hypotheses to be tested.
 All
planned analyses should be completed
and reported.
We have learned ..
1. Study designs
2. Descriptive vs. Inferential
statistics
3. Hypothesis testing and a p-value
4. Five elements to determine a
sample size
5. Covariates and multivarite
regression models
6. Bonferroni’s correction
EPILOGUE
GIVE A BIG CLAP TO YOURSELF
SINCE YOU ‘VE MADE THIS FAR !
CONGRATULATION !!!
32