Assessment of Blinding in Drug Clinical trials

Download Report

Transcript Assessment of Blinding in Drug Clinical trials

Blinding Assessment
in Clinical Trials
Heejung Bang, PhD
Outline
 Background
 Statistical methods
 Examples
 Other issues and discussion
Background
 Human behavior is influenced by what we know or believe
and everybody is tempted to find out what is going on.
 Blinding or masking--single, double, triple (Dictionary for
Clinical Trials 1999)
 Reduce selection and information bias, and improve
compliance
 Huge efforts directed to disguise the dissimilarity between
treatments (e.g., taste, smell, appearance, mode of
delivery)
 Treatment assignment sequence: the use of blocks of
random lengths (suggested by ICH-E9)
Background
 Blinding is not always feasible or relevant, e.g.,
-- some surgical treatments
-- treatment vs. nothing
-- early vs. late interventions
-- animal studies?
 Imperfect blinding is preferable to an open design
(Furberg & Soliman 2008)
 Bias can occur in every aspect of trials: e.g.,
treatment assignment, data collection, assessment
and analysis.
 Bias may be more pronounced in trials with
commercial sponsors.
4
Some online search (pub bias?)
Double blind
- pubmed: ~126K
- google scholar: ~1400K
Single blind
- pubmed: ~30K
- google scholar: ~1750K
Unblinded
- pubmed: ~1K
- google scholar: ~26K
5
Glossary added in Evidence-Based
Medicine (2000)
 Allocation concealed
 Allocation not concealed
 Unclear allocation concealment
 Blinded
 Blinded (unclear)
 Unblinded
Remark: In reality, ‘Double-blind without checks and
balances’ is highly common, especially in the Title
and Keywords.
Definitions of Double-Blind
(Park et al. Anesthesiology, 2002)
7
Important historical examples:
1. Zinc study
In the single-blind placebo-controlled trial, the
benefit of zinc on taste disorders was
shown to be statistically significant.
The identical trial was repeated with only one
difference, double-blind, and showed no
benefit.
Problems: 1. responses were very subjective
2. vested interest
8
2. Women’s Health Initiative
attacked?
High drop-out, premature unblinding among
clinicians and participants (e.g., due to vaginal bleeding).
Unblinding could lead to behavioral changes and
detection bias (e.g., experimental trt is more closely
monitored).
It is necessary to ask why hazard ratios for ‘blinded’
and ‘unblinded’ women have not been presented
(Shapiro 2003)
Detection bias due to unblinding could lower the
crude rate ratio of 1.28 to 1.02 (Garbe & Suissa 2004)
WHI became an observational study (Shapiro 2003)
9
FDA said:
“[drug name]-related sided effects have the
potential to unblind subjects and
investigators. Unblinding may result in
ascertainment bias of subjective study
endpoints. We recommend that you
administer a questionnaire at study
completion to investigate the effectiveness
of blinding the subjects and treating and
evaluating physicians”
Office of Therapeutics Research and Review, Center for Biologics Evaluation
and Research, FDA 2003
10
FDA also said:
“DRUDP requests that subjects and
investigators state at the end of the
subject’s participation as to what treatment
assignment they think was made, in order
to assess the adequacy of blinding”
Office of Drug Evaluation ODE III. Center for Drug Evaluation and
Research“, FDA 2005
11
CONSORT (revised version, 2001)
recommended reporting “how the success of
blinding was evaluated.”
International Committee of Medical
Journal Editors (1979)
… describe the methods for and success of
any blinding of observations.
12
Blinding assessment in nonpharmacologic
treatment studies (Boutron et al. 2007)
13
Can nonpharmacologic treatment can
be blinded?
Investigators: even if surgeon could not be blinded,
it is still possible that the healthcare providers
following participants after the procedure could be
blinded, and contact between other caregivers and
the surgeon could be avoided (Boutron et al. 2007).
Patients: use of drape, eye patches, opaque
goggles, specific positioning, curtain or box, etc.
Blinding to study hypothesis
Blinded centralized assessment of the outcome
More Background
 Everybody knows blinding is important but….
 Grossly incomplete reporting of procedures and any
assessment for blinding. Call for urgent improvement
(Schulz et al. 1996; Fergusson et al. 2004;
Hróbjartsson et al. 2007; Kolahi et al. 2009)
 Not many statistical methods available. Only two
blinding indexes in the literature.
 Some papers present data only or naïve analyses.
 How to handle “Don’t know” answer? --- different from
missing data!
Common formats of blinding questionnaire
With 3 response categories about their guess:
“Drug”, “Placebo” or “Don’t know (DK)”
With 5 response categories about guess and certainty of
guess:
“Strongly believe the treatment is drug”
“Somewhat believe the treatment is drug”
“DK”
“Somewhat believe the treatment is placebo”
“Strongly believe the treatment is placebo”
Remarks:
1. We may re-ask those who answered DK initially.
2. Some don’t allow DK and force to guess --- I don’t think
this is a good idea.
16
Common data structures
 2x3 format
Treatment
Assignment
Drug
Placebo
Total
Drug
n11 (P1|1)
n21 (P1|2)
n.1
Treatment Guess
Placebo DK
Total
n12 (P2|1) n13 (P3|1) n1.
n22 (P2|2) n23 (P3|2) n2.
n.2
n.3
N
Pj|i = P(guess j | assigned treatment i)
DK = Don’t know, N = total number of patients
Common data structures
 2x5 format
Treatment
Assignment
Drug
1
n11
(P1|1)
Treatment Guess
2
3
4
n12
n13
n14
(P2|1)
(P3|1)
(P4|1)
5
n15
(P5|1)
Total
n1.
Placebo
n21
(P1|2)
n22
(P2|2)
n23
(P3|2)
n24
(P4|2)
n25
(P5|2)
n2.
Total
n.1
n.2
n.3
n.4
n.5
N
1) Strongly believe to be on Drug;
2) Somewhat believe to be on Drug;
3) DK;
4) Somewhat believe to be on Placebo; 5) Strongly believe to be on Placebo.
We may also collect
 Ancillary data from those who answered DK
Treatment
Assignment
Drug
Placebo
Total
Treatment Guess
Drug
Placebo
n311 (P31|1)
n312 (P32|1)
n321 (P31|2)
n322 (P32|2)
n3.1
n3.2
Total
n31.
n32.
n3.
Remark: n3. (in ancillary data) = n.3 (in 2x3 or 2x5
format) if no missing data
Naïve/standard methods
1.
Chi-Square test (Hughes & Krahn, 1985)
-
Comparing the proportions of correct and incorrect
answers
-
2x2 Chi-Square test to compare Pcor and Pinc among
participants excluding DKs
-
2x3 Chi-Square test to compare Pcor and Pinc among all
participants
Remark: Strictly speaking, this is like performing a one sample binomial test!
2. Kappa statistic
- Note that Kappa measures agreement but we
should measure disagreement!
3. McNemar’s test
- before/after assessment
Remark: measure/assessment of (un)blinding may
be more appropriate than test (e.g., p-value)
21
Blinding Index (James et al.,1996)
- Modified version of kappa statistics
- BI = {1+PDK+(1-PDK)*KD}/2
where
PDK  P ( DK )
 (P  P ) / P
D
Do De De
k k
P    w P /(1  P ), P
1
Do
ij ij
DK DK
i 1 j 1
K
Pij = nij/N
wij = 0 (correct guess)
= 0.5 (incorrect guess)
= 1 (DK)
k k
P    w P ( P  P ) /(1  P )^ 2
De
ij . j i. i3
DK
i 1 j 1
- 0≤BI≤1
If PDK = 1, then BI =1 (complete blinding);
If PDK = 0 and PDo = 0 (i.e., all responses are correct),
then BI = 0 (complete unblinding);
If PDK = 0 and PDo = PDe (i.e., 50% correct, 50% incorrect),
then BI = 0.5 (random guessing).
- Variance formulae available.
- Unblinding may be claimed: if two-sided CI does not cover 0.5.
Limitations of existing methods
 Most methods are descriptive or use naïve
statistics.
 DK is influential in James method
(DK should be real DK).
 Existing methods can not 1) detect different
behaviors of two arms, 2) qualitatively different
scenarios, nor 3) give the proportion of unblinding
beyond chance.
New blinding index
(Bang et al., 2004) (2x3 format)
Define ri|i = Pi|i /(P1|i + P2|i) i = 1 for drug, i = 2 for placebo
(i.e., proportion of correct guesses among participants who
provided their treatment guesses in the i-th arm)
Without DK: new BIi = 2ri|i – 1
(i.e., proportion of correct guesses in the i-th arm
beyond chance level)
 With DK:
new BIi = (2ri|i – 1)*(P1|i + P2|i ), estimated by

nii
ni1  ni 2
ne ŵBI i  ( 2
 1) * (
)
ni1  ni 2
ni1  ni 2  ni 3
Var (ne ŵBI i )  {P1|i (1  P1|i )  P2|i (1  P2|i )  2 P1|i P2|i } / ni.
Remark: new BIi is equal to
P1|1 – P2|1 for drug arm
P2|2 – P1|2 for placebo arm
under trinomial distribution.
26
New blinding index (2x5 format)
 More general 2x5 format (& ancillary data for DK)
new BIi = P1|i + w2|iP2|i + w31|iP31|i - P5|i- w4|iP4|i - w32|iP32|i
subject to 0 ≤ w31|i = w32|i ≤ w2|i = w4|i ≤ 1 &
P1|i + P2|i+ + P31|i + P32|i + P4|i + P5|i = 1
Remarks:
1. ‘2x3 format’ and ‘data without ancillary data’ are special
cases of this format.
2. Suggested weights: w31|i = w32|i =0.25 & w2|i = w4|i=0.5 for
sensitivity analysis.
 -1≤new BI≤1
If ri|i = 1 and ni3 = 0 (i.e., all responses are correct),
new BIi = 1 (complete unblinding).
If ri|i = 0 and ni3 = 0 (i.e., all responses are incorrect),
new BIi = -1 (complete blinding or complete
unblinding in opposite direction,
how to interpret??).
If ri|i = 0.5 (i.e., 50% correct and 50% incorrect among
participants with certain identification),
new BIi = 0 (random guessing)
 Unblinding may be claimed: if one-sided CI does not
cover 0.
 Bang’s BI is directly interpreted (as % of
unblinding beyond chance) and captures
different behaviors in different arms.
DON’T COMBINE!
 Bang’s BI easily extended to multi-arms.
29
Nine blinding scenarios
30
31
Example 1: Acupuncture for
subacute stroke rehabilitation
Park et al. (Arch Int Med 2005) conducted a
sham-controlled, subject- & assessor-blinded
RCT to evaluate acupuncture for recovery in
daily living activities after stroke.
Primary outcome: change in Barthel score
Secondary outcome: HRQOL
Blinding data were collected from patients at
the end of 2 weeks of treatment.
32
33
Results
James’s BI = 0.73 [95% CI: 0.66, 0.80]
Bang’s BIs = 0.47 [0.33, 0.61] in acupuncture
-0.31 [-0.49, -0.13] in sham
Possible interpretation: James’ BI indicates
successful blinding, while Bang’s BIs show that
significantly higher % of patients in both arms
(beyond chance level) reported they received
acupuncture. ‘Wishful thinking’ and/or ‘lack of
idea about control treatment’ scenarios.
34
Example 2: Warfarin-Aspirin
Symptomatic Intracranial Disease
(WASID) trial
Hertzberg et al. (2008) investigated if use
of dose modification schedule is effective
for blinding trials of warfarin in the WASID
study (Chimowitz et al. NEJM 2005).
Authors compared with blinding in the
SPINAF trial (Ezekowitz et al. NEJM 1992).
35
36
37
Results
In the WASID & SPINAF trials, Bang’s BI
uniformly showed increased unblinding for
warfarin than for aspirin, whereas other indices
did not capture this (e.g., James’ BI always
declares successful blinding).
-- if you combine BIs from different arms,
cancel-out effect can occur.
-- summarizing a pattern can be important.
Frequent unblinding in warfarin may be due to
number of dose change and hemorrhage, etc.
38
Example 3:
Gilron et al. (Pain 2005)
A placebo-controlled RCT of
perioperative administration of
gabapentin, rofecoxib and their
combination for spontaneous and
movement-evoked pain after abdominal
hysterectomy.
Patient satisfaction and blinding
questionnaires were completed on the
afternoon of postoperative day 2.
39
40
Our suggestion in practice
Clearly state who were blinded
Blinding procedure and assessment be
reported in publications for relevant trials
- perhaps, a good way could be reporting
James’ BI and Bang’s BI together for
totality of evidence, understanding pros
and cons of each method and different
underlying assumptions.
- selective reporting can be problematic.
Debates:
When to ask blinding questions?
Shortly after randomization vs. during vs.
after trial?
Sackett (2004), Henneicke-von Zepelin
(2005) and Hemilä (2005) claim that
‘assessment may be inappropriate after
the trial’ due to confounding between
efficacy and correct guessing.
42
Bang et al. (2005, 2010)
Statistically speaking, of course, the best
approach is to ask twice or more. However, we
still prefer ‘after the trial’. Although we may not
be able to know if blinding is true blinding or DK
is DK or who lied, we never want to break the
blind. As we ask more, they may become more
curious --- ‘Less is more’ .
Blinding could convey stories during the ‘entire’
course of the trial, such as early and late efficacy
and side effects, wrongdoings and lies.
If you want to test the blinding at the beginning,
do with the third party or in a pilot study.
43
Blinding & Patient satisfaction
questionnaires together?
We may ask some more qualitative questions:
For example,
‘Why do you believe you received treatment x?’
‘When/how did you find out?’
preferably together with other general questions
(e.g., participants’ satisfaction, problems or
comments/suggestions) at the study close-out.
--- Again, we do not want to make participants try to
guess even in their future trials and feel not
knowing is bad.
44
Discussion
 We should encourage all the participants to
provide their honest guesses and may include
extra questions to evaluate 1) the credibility of
DK and 2) reasons for guess, etc.
 Subgroup analysis by blinding status could be
important (Vitamin C trial 1975; Hemilä 1996).
 Assessment of blinding can be straightforward
statistically, but the final conclusion relies on
the subjectivity and nature of the study. (e.g.,
how large is large enough?)
Discussion (cont’)
 1) BI estimation (with 95% CI)
2) classification into 9 blinding scenarios
3) careful interpretation and potential cause
identification
may provide a comprehensive evaluation of
blinding in RCTs.
 If ‘undesirable unblinding’ occurs, it is important
to identify the causes and fix the problems for
future studies – not all unblinding is undesirable!
Discussion (cont’)
 Blinding research is destined to be subjective,
qualitative, and imperfect.
 However, empirical (quantitative) evidence is
almost always good to have.
 At the end of the day, impacts on primary
treatment effect??
--Unblinding may not invalidate primary results.
47
Personal belief about
good treatment/trial
‘Treatment effect’
should be greater than
‘Noncompliance effect’
should be greater than
‘Unblinding effect’
48
CONSORT has been extended to RCTs of
nonpharmacologic treatment
(Boutron et al. for the CONSORT group.
Ann Int Med 2008)
- Blinding issues are extensively discussed
in this paper/statement.
49
If you still think blinding is not
important? Think about:
Same verdict will be reached with or without
blinding?
Can you distinguish Coke vs. Pepsi after
being blinded?
“If blinding is not associated with treatment
effects, why do the vast majority of drug
trials use a double-blind design? Is the world
uninformed or wrong?” Furberg and Soliman
(2008)
A Sample
Blinding Assessment Protocol
See:
Bang et al. Blinding Assessment in Clinical
Trials: a Review of Statistical Methods and
a Proposal of Blinding Assessment
Protocol. Clinical Research & Regulatory
Affairs. (2010).
51
Acknowledgements
Dr. Isabel Canette and Ms. Jiefeng Chen in the
Stata team
-- "blinding" module to compute two BIs available
in Stata since 2008.
http://biostat.mc.vanderbilt.edu/wiki/Main/NM_R_
FUNCTIONS
SAS code available (Author: Chris Smith)
Co-authors: Ms. Liyun Ni (at Amgen) and Dr.
Clarence E. Davis (at UNC)
52