Transcript Document
What is wrong
with
What is said?
Hasan Yazici, MD
University of Istanbul
Vitamin-E in gout
the Hypothesis
• Several recent reports hint that vitamin E
can lower serum uric acid levels (1-3).
• These are either single case reports or
uncontrolled, open studies in small groups
of patients.
• Thus we tested the hypothesis to see
whether vitamin E was effective treatment
for gout in a 6 month, double blind,
placebo controlled study.
What is wrong?
A hypothesis should not be
formulated as a question.
Methods - 1
• The patients were randomized into the
vitamin E (n = 60) and placebo (n= 60)
groups.
• After randomization each patient signed a
written informed consent.
• The patients were assessed monthly with
serum uric acid, hematocrit, serum creatinine,
cholesterol and testosterone levels.
• The number of gout attacks were also noted.
What is wrong?
• The randomization follows, not
precedes, the informed consent.
Methods -2
• 30 patients had to leave the study for various
reasons.
– 20 patients from the active drug group and 10 from the
placebo left the study before its completion (ns).
– Thus the final analysis for drug efficiency was made among
90 patients.
• Among the 20 patients who dropped out from the
treatment group 3 moved to another city.
• 15 complained of severe indigestion and no
information was available for the remaining 2.
• Dyspepsia was the reason for withdrawal in 7
patients in the placebo group while headaches were
the main reason for discontinuation of treatment in
the remaining 3.
What is wrong?
• The intention to treat principle is
neglected.
Intention to treat
another study
• 90 patients were initially randomized into 30 patients
each of placebo, QXY 2mg qd and QXY 5mg qd
groups.
• One patient in the QXY 5mg group developed
pneumonia and had to be hospitalized 10 days after
treatment started.
• The patient was withdrawn and, according to the
protocol, another patient was recruited.
• Thus, the intention to treat analysis brought the
number analyzed to 31 in the QXY 5mg group and
the total number to 91.
What is wrong?
• How was randomization achieved in the
newcomer?
• The newcomer inflates the denominator
when looking at harm.
Methods-3
• The allocation of 60 patients to the
active drug and the placeo arms
ensured a 86% power to detect a
difference rate of 25% in the uric acid
levels between the 2 groups using a 2sided Fisher’s exact test with α set at
0.05
What is wrong?
• In a power calculation it is not enough
to state the magnitude of change, the
anticapated event rates should also be
given.
• Ensured 86% power?
Sample size calculations
4 components
I. Type I (alpha) error
II. Type II (beta) error
III. Event rate in the control group
IV. Event rate in the treatment group
Power calculations for
efficacy and harm
• The primary end point was the achievement of at
least 25% improvement according to the ACR criteria
(ACR20) at week 20.
• The sample size of 180 patients per group was
chosen to ensure an adequate safety evaluation.
• The sample size also ensured that there was 90%
power to detect a significant difference in the
proportion of ACR20 responders between the
treatment groups using a significance level of 0.05,
assuming 20% and 42% of the patients in the control
and the treatment groups respectively achieved ACR
20 responses.
What is wrong?
• No basis for the assumptions related to harm
• Post hoc power calculations
Powering for safety
• A sample size of 300 patients each in
the study drug and the control groups
was determined to demonstrate a
specific adverse event rate of 1% or
less with 95% confidence.
What is wrong?
The probability of an event not
happening does not give us information
about the likelihood of events
happening in the different arms of the
study.
The zero patient method
• If one screens n individuals and does not find
an attribute y among this group;
• Then one can conclude that the frequency of
y is less than 1/0.33n among the n individuals
surveyed, with 95% confidence.
• This approximation is true for prevalances
< 0.02 of the attribute surveyed.
H Yazici et al Rheumatology Oxford (2001)
Efficacy
a subgroup analysis
• The uric acid levels decreased to 5.2 ± 1.5mg/dL from
a baseline of 9.3 ± 2.3 mg/dL in the treatment and
from 9.7 ± 1.9 mg/dL in the placebo groups.
• There were no differences in the lowering of uric acid
between the 2 groups (p>0.05).
• There were also no differences in the number of gout
attacks between the 2 groups of patients. However a
subgroup analysis was also done.
• Among those patients in the treatment group who gave
a history of more than 5 gouty attacks per year, as
compared to those who had less than 3, the uric acid
levels were significantly lower after treatment (p<
0.05).
Reviewer comments
• The findings in the subgroup analysis,
conducted among a small number of patients,
should be interpreted with caution.
• Even though the authors found a statistically
significant difference in efficacy the study was
not primarily planned to assess this
difference.
• This lessens the external validity of these
results.
What is wrong?
The main problem with this subgroup
analysis is not that the subgroup is
small. It is an effort in the direction of
proving that the new drug at hand is
effecacious.
Harm
another subgroup analysis
• Table 3 gives the rate of observed adverse events.
Indigestion was rather frequent in either group;
15/45 patients in the treatment and 14/45 in the
placebo groups.
• A subgroup analysis was also done after the patients
who participated in the trial were questioned about a
pre-trial history of indigestion. It turned out that
20/45 patients in the treatment and 18/45 in the
placebo group had pre-trial indigestion.
• A further analysis among these patients revealed that
14/20 in the treatment and 2/18 in the placebo group
(p< 0.03) with a history of pre-trial indigeston also
reported indigestion during the trial.
Reviewer comments
• The findings in the subgroup analysis,
conducted among a small number of patients,
should be interpreted with caution.
• Even though the authors found a statistically
significant difference in harm the study was
not primarily planned to assess this
difference. This lessens the external validity
of these results.
What is wrong?
Again the main issue is not the size of
the subgroup. This subgroup analysis
looks at the possibility that this new
drug might not be all that harmless.
Thus the exercise is in the direction of
falsification and therefore is justified.
Dr. Pincus study
• Recently, in a placebo controlled
withdrawal study, Dr. Pincus showed
that small doses of prednisone (1- 4
mg/day) were significantly effective in
the management of rheumatoid
arthritis.
• The study was conducted among 31
patients
Reviewer comments
This beneficial effect of prednisone
described among a small number of
patients should be interpreted with
extreme caution even if the authors
found a statistically significant
difference. The number of patients
studied, thus the study power, was
simply too small.
What is wrong?
• The reviewer is partially right.
• The issue of power applies only if we are
missing a more important outcome that
would have been more evident in a larger
group.
• External validity? The small number of
patients might be quite different from the real
life RA patients.
• However (!) this is not simply an issue of
numbers but of patient selection.
The coxib study
• 6000 patients with osteoarthritis of the knee were
randomized to recieve either the new coxib (NCB) or
the traditional (TCB).
• 40% of the NCB and 35% of the TCB patients found
total pain relief.
• Hypertension was a problem in 2.5 % of the NCB and
4.5 % of the TCB patients.
• It was concluded:
a. NCB decreased pain by 13%
b. There was 45% less hypertension with NCB
What is wrong?
• Nothing. Needs more interpretation.
• A NNT analysis will tell you that you have to
treat 10 patients to see the superiority of NCB
over TCB in 1 patient.
• A NNH analysis will say that you have to treat
50 patients with TCB to harm 1 more patient
with hypertension as compared to using NCB.
NNT & NNH
• Relative risk (RR): outcome ratio in the control
group/ outcome ratio in the control group
• Absolute risk reduction (ARR): difference in the ratios
• Relative risk reduction (RRR): ARR/outcome in the
control group OR 1-RR
• NNH: 1/ARR (frequency of hypertension in the
control group 0.045, frequency of hypertension in the
study group 0.025, ARR= 0.020; NNT = 50.)
A withdrawal study
• Two previous double blind studies of colchicine in BS
had shown no superiroity of this agent over placebo
in treating the oral ulcers in this condition.
• Recently a withdrawal study was done among those
patients who claimed benefit from colchicine. They
were randomized to contiune to receive the active
drug or placebo.
• After 3 months those who stopped taking colchicine
had significantly more ulcers (sign test, p = 0.02).
Reviewer comments
• This study is interesting but is of limited
use. A problem with all withdrawal trials,
they seldom represent real life use.
• The authors used the “sign test” to analyze
the differences in oral ulcers between the 2
groups. Is this a new test? I suspect it is
not quite powerful. Why not use the more
powerful tests of significance to show the
real differences, if any?
What is wrong? (I)
• The main problem with withdrawal
studies is that they are not done often
enough.
• They provide excellent information
about possible a type II error in
previous, traditional RCT’s.
• An important issue is that they do not
give a fair picture of drug associated
harm.
What is wrong? (II)
• The sign test is a time honored,
conservative tool.
• Significance in a conservative test gives
more validity to the significant differences
observed.
The mighty “p”
Among the 96 patients allocated to the
new medication there were 3 cases of
myocardial infarction while the same
was true for 2/94 patients allocated to
placebo (p=0.86).
What is wrong?
• Better give it with the statistic used
(sign test, p = 0.02).
• Do not use it to mesmerize the reader.
The Commandments
•
•
•
•
•
•
•
•
•
•
Hypothesis as an affirmative statement
Intention to treat
Components of the power calculation
Post hoc power analyses
Powering for harm, no short cuts
Justifications for subgroup analyses
Small but significant
Meaning of treatment effects
Withdrawal studies to be cherished
The “mighty p”
Summary
• In a RCT, like in all scientific
endeavor, all efforts aimed
at proving the hypotheses
are wrong.
• The aim is falcification.
Vitamin-E in gout
the Conclusion
Vitamin E is an innocuous agent and with
the possible additional benefit of an
antihypertensive effect, as was also
serendipitously noted in this study, such
studies are surely warranted. We are about
to start a study along those lines at our
institution.
What is wrong?
Do not preempt!