Statistical consideration for clinical trials
Download
Report
Transcript Statistical consideration for clinical trials
Statistical considerations for
clinical trials
Guei-Feng Tsai, Ph.D.
Center for Drug Evaluation, Taiwan
Outline
General principles
Study design considerations
Data analysis considerations
Concluding remarks
2
Four phases of clinical trials
Phase I
- Seeking maximum tolerated dose (MTD)
- Often performed on healthy volunteer patients
Phase II
- “proof of concept” stage
- Search for minimal effective dose
- Short-term efficacy/safety will be explored
Phase III (confirmatory phase)
- Provide firm evidence of efficacy and safety
- For approval of the drug, with a few exceptions, it is necessary to conduct
two adequate and well-controlled trials with results statistically
significant at 0.05 level.
Phase IV (post-marketing)
- Long term efficacy/safety
3
General Principles
When in the exploratory phase, analyses of a variety of
subgroups, endpoints, covariates, etc. are appropriate and
welcome. Such trials cannot be the basis of the formal
proof of efficacy.
When making a confirmatory study, the principle is to
perform the protocol-defined analyses to show the
investigational drug is safe and effective.
More statistical considerations arise on later phase,
confirmatory trials.
4
Considerations on clinical trials
Primary endpoint(s)
Study design
– Tools to against bias
– Choice of design
– Type of comparison
Sample size calculation
Interim analyses
Statistical analyses
– Analysis population and handling of missing data
– Statistical methods
– Control of type I error (multiplicity issue)
5
Primary endpoint(s)
Provide the most convincing evidence directly related to the
primary objective.
One primary endpoint is generally expected in a confirmatory
trial.
Specify how the outcome will be measured.
– binary (event occurred or not occurred)
– count (the frequency of an event observed in a time period)
– time to event (how long it takes to observe an event of interest)
If derived from questionnaire, validity and reliability should be
accredited.
Redefinition of the primary endpoint after unblinding will
almost always be unacceptable.
6
Case #1: Biased measurement
Indication
Prevention of metastatic bone disease due to breast cancer
Primary
endpoint
Total counts of bone event during 96 weeks of study
period (12 weeks a period, total 8 periods)
•有利於在試驗待的時間較短者(favor dropouts)
Amended
Total counts of bone event adjusted by follow-up time (n/t)
•有利於在試驗待的時間較短又未發生骨折者
May consider
“time to the first bone event”
7
Multiple primary endpoints
It may sometimes be desirable to use more than one
primary variable, each of which (or a subset of which)
could be sufficient to cover the range of effects of the
therapies.
– Alzheimer' s disease: ADAS-Cog and change from baseline
in ADCS-ADL
The impact on the type I error should be explained because
of the potential for multiplicity problems
– To demonstrate effects on all of the designed primary
variables, no need for adjustment of the type I error
8
Composite primary endpoints
A single measure of effect from a combined set of different
variables
Common in time to event analyses
CV example
– Primary: time to first occurrence of any event from the
composite of CV death, MI, and stroke
– Secondary:
• time to first occurrence of each component of the
primary composite endpoint individually in the order
of MI, death from vascular causes and then stroke
• The time to first occurrence of any event from the
composite of all-cause mortality, MI, and stroke
9
Tools to against bias
Controlled
– No control (single arm)
– Placebo or active controls
Blindness
– Limit the conscious and unconscious bias in the conduct
and interpretation of a trial
– double-blind, single-blind, open-label
– When double-blind is not feasible, single-blind and open
label may be applied if
• Objective primary efficacy endpoint
• Blinded reader
10
Tools to against bias(cont’d)
Randomization
– Reduce selection bias
– Produce comparable groups in which the distributions of
prognostic factors are similar
– Generate a randomization list using a validated system that will
automate the random assignment of treatment arms to
randomization numbers in a specified ratio
Type of randomization
–
–
–
–
Simple randomization
Block randomization
Stratified randomization
Dynamic randomization
11
Simple randomization
Example: The computer generated sequence:
4,8,3,2,7,2,6,6,3,4,2,1,6,2,0, …….
– Two treatment groups A & B (criterion: even-odd):
•
AABABAAABAABAAA …… (after 15 patients: 11A, 4B)
– Two groups: different randomization ratios (eg., 2:3):
•
•
(criterion: {0,1,2,3 }~A, {4,5,6,7,8,9 }~B)
BBAABABBABAABAA …
Advantage
– Simplest method (similar to flipping a coin)
– Allocation scheme is predefined and unchanged
Disadvantage
– If sample size is small, may cause severe imbalance in the
numbers of each arm
– May lead to imbalance among the treatment groups respect to
prognostic factors
12
Block randomization
Avoid overall imbalance
Generate a sequence of blocks, each block contains a prespecified number of treatment assignments in random order
Divide patients into blocks of equal sizes
Allocation scheme is predefined and unchanged
Example: 2 Rx’s (A&B)
– All 6 possible assignments for block size=4
• AABB, ABAB, ABBA, BBAA, BAAB, BABA
• For each block, randomly choose one possibility
ABABBABAABBA…
Patients: 1 2 3 4 5 6 7 8 9 10 11 12 …
13
Block randomization
If not blinded, may produce selection bias
– AABA?? must be BB
– ABABA? must be B
– Solution : blinded or random block sizes
Disadvantage
– Do not consider prognostic factors as well
14
Stratified randomization
Ensure treatment groups balanced with respect to
prognostic factors
Allocation scheme is predefined and unchanged
Do not use information on previously assigned subjects
Randomization is performed within each stratum and is
usually blocked
15
Stratified randomization
Example: age (<40, 40-60, >60); sex (M, F)
– Total number of strata = 3 x 2 = 6
Male
Female
<40
ABBA, BAAB, …
ABBA, BAAB, …
40-60
ABBA, BAAB, …
ABAB, BBAA, …
>60
AABB, ABBA, …
BAAB, ABAB, …
Disadvantage:
– Perform bad if there are too many strata
16
Dynamic randomization
Rational:
– 4 prognostic factors each with 3 levels, for example,
•
3 x 3 x 3 x 3 = 81 strata
– If only 120 patients will be enrolled, some strata probably
contain no patient and many more contain only 1
– Imbalance across treatments for any factor could still exist
Dynamic (adaptive) randomization –
– Treatment assignment probabilities are adjusted
– Use information on previously assigned subjects
– Ensure treatment groups balanced regarding prognostic factors
– Ensure overall treatment balance
17
Dynamic randomization
Example : 3 prognostic factors (age, sex, disease status )
– Suppose there are 50 patients enrolled and the 51st patient
is male, age 63, and stage III
Sex
Age
Disease
Trt A
Trt B
Male
16
14
Female
10
10
<40
13
12
40-60
9
6
>60
4
6
Stage I
6
4
Stage II
13
16
Stage III
7
4
18
Dynamic randomization
Trt A
Trt B
Sign of
difference (A-B)
Male
16
14
+
Age>60
4
6
–
Stage III
7
4
+
27A
24B
2 + &1 –
Two possible criteria
– Count only the direction (sign) of the difference in each category. Trt A
is “ahead” in two categories out of three, so assign the patient to Trt B
– Add the total overall categories (27 As vs 24 Bs). Since Trt A is
“ahead,” assign the patient to Trt B
19
Type of design
Parallel design
– The most common clinical trial design
– Subjects randomized to one of two (or more arms) treatment.
Treatment A
Randomization
(Baseline)
Treatment B
20
Type of design(cont’d)
Crossover design
– Minimize the number
– Within patient comparison
– Requirements for crossover design
• Disease is chronic and stable
• Effect developed fully within treatment period
• Washout period sufficiently long
• No carryover effect
21
Factorial design(cont’d)
Factorial design
– Two or more treatments are evaluated simultaneously through the use
of varying combinations of the treatments
Usually used in trials of combination drugs/treatments
Regulatory requirements: each component must make a
contribution to the claimed effect of the combination
Example: combination of ACE inhibitor and HCTZ for
treating hypertension
– In general, want to see ACE/HCTZ > ACE and ACE/HCTZ > HCTZ
placebo
ACE inhibitor
HCTZ
ACE/HCTZ
22
Type of comparison
Superiority
– To demonstrate new treatment (T) is superior to placebo (P) or
active control (C)
Equivalence
– To demonstrate new treatment (T) is equivalent to active control
(C) by a small margin (- <T-C<)
– Generic drug/vaccine lot-to-lot consistency
Non-inferiority (NI)
– Ethical concern
– To demonstrate new treatment is not worse than active control
by a small margin (T-C>- )
23
Issues in NI trials
Issue: assay sensitivity
Assay sensitivity is the ability of the trial to distinguish
effective from ineffective drugs
A trial that successfully demonstrates superiority has
simultaneously demonstrated assay sensitivity
Similarity of test drug and active control can mean either
both drugs were effective or neither was effective
The difficulty with NI trials is that we do not measure
the effect of the active control in the study
– Solution: 3-arm NI trials
24
Issues in NI trials(cont’d)
Issue: constancy assumption
Constancy of the active control effect: current active control
effect needs to be assessed with the following
–
–
–
–
–
patient population
endpoints
dose
use of washout period
medical practice and concomitant medications
Changes in above can affect the effect size of the active
control and, therefore, the appropriate margin, or
completely undermine assay sensitivity
25
Constancy assumption may not hold
Julious and Wang (2008)
26
Case #2: Invalid NI margin
Population
osteoarthritis (OA) of the knee
Treatment arms
Test drug (T), Active control (C), placebo (P)
Primary
endpoint
Change from baseline in the WOMAC pain subscale
NI margin
10 mm
Comment
The reason to choose 10 mm as non-inferiority margin was
not justifiable.
WOMAC Pain at Week 12
Change from BL LS mean
Diff vs. C LS mean (95% CI)
Diff vs. P LS mean (95% CI)
T
C
-41.99
-41.77
-0.22 (-4.76, 4.32)
-6.36 (-11.98, -0.73) -6.14 (-11.75, -0.52)
P
-35.64
-
27
Choice of NI margin
Selection of a NI margin is based upon a combination
of statistical reasoning and clinical judgment
An appropriate choice of margin should provide
assurance that the test drug has a clinically relevant
effect greater than zero
Points to Consider on Choice of the Non-Inferiority Margin (CPMP/EWP/2158/99)
28
Determination of NI margin
Fixed margin method
– Two 95% confidence intervals (CIs)
1) A conservative estimate of the control effect:
M = Lower limit of the 95% CI of treatment difference
(C– P) from the historical trials
2) Get a 95% CI for the difference between test drug and
control (T–C) and note if lower bound of the CI is > -M
Synthesis method
– The fraction retention approach
– Define the NI margin based on the concept of preserving a
certain fraction of the active control effect (historical data)
29
Case #3: NI margin (Fixed margin method)
Indication
Parkinson’s disease
Treatment arms
Test drug (T), active control (C)
Primary efficacy change from baseline to end of maintenance period
(Week 33) on the sum of Unified Parkinson’s Disease
Endpoint
Rating Scale (UPDRS) parts II and III score
NI margin
3 points
rational and
references
The margin of 3 points was chosen based on the
lower bound of the 95% CI for treatment difference
for UPDRS in prior drug C (vs. placebo) trials (Trials
#1 and #2)
30
Case #4: NI margin (Synthesis method)
Study Design
Phase III, randomized, multi-national, multi-center,
open-label, non-inferiority study in patients with
previously treated with metastatic colon cancer
Treatment arms
Test drug (T), active drug (C)
Primary efficacy
endpoint
Overall survival
Primary efficacy
analysis
Cox model
Primary
objective
T is non-inferior to C for OS (i.e., T retains at least
50% of the OS benefit of C relative to best supportive
care)
Non-inferiority
analysis method
Synthesis method
31
Sample size estimation
Early exploratory clinical trials: usually no sample size
justification is required
Phase III confirmatory trials: determination of sample size should
be justifiable
– Main hypothesis of the trial (superiority/non-inferiority/equivalence)
– Measure of outcome (continuous, categorical, time to event)
– Test statistic ( t-test ,χ2 test, Fisher’s exact test or log-rank test)
–
–
–
–
H0 and Ha , an important treatment difference () and SD (σ)
Type I error rate : probability of concluding an ineffective drug is effective
Type II error rate : probability of concluding an effective drug is ineffective
Drop-out rate (withdraws and protocol violations)
32
Sample size estimation(cont’d)
Sample size calculation based on: (comparisons of means)
–
–
–
–
–
–
type I error (, often 0.05)
type II error (, often 0.2 or 0.1; 1- is the power)
Mean difference Δ
Standard deviation of observation (σ)
2
predicted drop out rate
4 Z /2 Z 2
Allocation ratio (2 arms 1:1, 2:1, or 3:1) N
2
Example: =0.05, =0.1, Δ=1, σ=3
• 1:1 allocation
380 subjects
• 2:1 allocation
429 subjects (13% increase)
• 3:1 allocation
508 subjects (34% increase)
• 5:1 allocation
684 subjects (80% increase)
Case #5: sample size estimation
Treatment
Test drug vs. placebo
Primary hypothesis
Superiority (comparisons of proportions)
Sample size
92 subjects (46/group)
Parameter
assumption
=0.05, power=80%;
p0=5%, p1=25%
Final results
p-value>0.05
p0=40%, p1=50%
Required sample
size (40% vs. 50% )
770 subjects (385/group)
34
Interim analysis
Long term studies for life-threatening disease usually use
group sequential design and perform interim analyses for
stopping the trial early due to efficacy or futility.
Interim analysis to assess efficacy do spend
# of efficacy analysis
1
2
3
0.050 0.083 0.107
4
5
10
0.126
0.142
0.190
35
Interim analysis(cont’d)
Common method for controlling type I error
– Fixed boundaries (Pocock, O’Brien- Fleming, HaybittlePeto)
• Specify the number K of planned interim analyses and the exact
time of interim in advance
– spending function approach
• The spending function *(t) is an increasing function of
information fraction t (0<t<1)
• Specify the spending function in advance
• Can not change the spending function during the trial
Design adaptations based on the results of interim analyses
should be pre-planned at protocol design stage
36
Analysis population (ITT or PP)
Superiority trials
– ITT is recommended
•
all randomized subjects (with baseline measurement and at least 1 posttreatment measurement)
– as it implies a conservative effect on the outcome of the trial
Non-inferiority/Equivalence trials
– ITT can be subject to bias of non-compliance whereas PP can be
subject to selection bias
– PP biased since not all randomized patients included
– It is recommended that non-inferiority trials should be analyzed
in both ITT and PP
37
Handling of missing data
Excluding some patients with missing data is not
compatible with the ITT principle
Methods for the imputation of missing values
–
–
–
–
Non-responder for binary data
Last observation carrying forward (LOCF)
Single/Multiple imputation
Other mathematical imputations
Sensitivity analyses (best case scenario, worse case
scenario, etc) are essential
38
Case #6: Handling of missing data
Study design
A multicenter, randomized, double-blind, placebo-controlled,
parallel-group Phase III study for the treatment of
osteoporosis in men : Test drug(T), placebo (P)
Primary endpoint Proportion of subjects with at least one new morphometric
vertebral fracture over 24 months (logistic regression)
Handling missing Missing fracture status was imputed using the LOCF method
data
in the primary analyses (not conservative)
Sensitivity
analyses
1. Perform the analysis using non-missing data only
2. Perform the analysis after using the single imputation
method of assigning a value of “No fracture” to all subjects
who had undetermined fracture value at Month 24 in the
ITT population
3. Perform the analysis using the multiple imputation method
for binary outcome on the ITT population
39
Statistical method
Check the assumptions of the statistical method being used to
make comparisons
Any transformations on the data likely to be required before
analysis
– mathematical transformations (logarithms, square root, etc) for
normalization of the outcome variables
Choose an appropriate statistical method
If the assumptions of proposed statistical method do not hold,
then alternative statistical approaches (eg, non-parametric)
should be described and pre-specified
40
Statistical methods wrt different data scales
Data
Independent
Dependent
Normal
t test
ANOVA / ANCOVA
Paired t test /GLMM/GEE
Repeated measure ANOVA
Non-normal
(non-parametric)
Mann-Whitney U test
Kruskal-Wallis test
Wilcoxon signed-rank test/
Friedman statistic
Fisher’s exact test
χ2 test/CMH test
Logistic regression
McNemar’s test (binary)/
GLMM/GEE
Continuous
Categorical
Time to event
Kaplan-Meier analysis
Log rank test
Cox proportional HR model
41
Control of type I error
Multiple treatment comparisons
– eg, 2 or more dosage levels vs. placebo
– Dunnett’s procedure
– Closed testing procedure
• high vs. placebo, middle vs. placebo, low vs. placebo
Multiple primary endpoints
– Bonferroni, Holm, Hochberg, etc.
Multiple primary and secondary endpoints
– Closed testing strategy (Marcus, 1976)
– Gatekeeping methods (Westfall et al., 1999; Dimetrienko, 2003)
Multiple interim analysis
– Pocock (1977), O’Brien and Fleming (1979) Lan and DeMets (1983) 42
Case #7: multiplicity (multiple primary endpoints)
Population
Clinical trial in patients with COPD
Treatment
Experimental drug vs. placebo
Three primary E1: Change from baseline in FEV1
endpoints
E2: Change from baseline in total COPD symptom score
E3: Percentage of subjects with one or more COPD
exacerbations
Testing
strategy
Trial is declared successful if drug superior to placebo with respect to E1
and at least one in E2 and E3
Steps: Perform the 1st primary analysis (on E1) and perform the 2nd
primary analyses with an adjustment for multiplicity (Hochberg’s method)
43
Case #8: multiplicity (primary/secondary endpoints)
Population
Clinical trial in patients with
depression
Treatment
Experimental drug vs. placebo
Primary
endpoint
Change from baseline in HAMD17 score (H01)
Key
secondary
endpoints
1) Response
rate based on the
HAMD 17 score (H02)
2) Remission rate based on the
HAMD 17 score (H03)
Testing strategy
(fixed sequence)
H01
P>
P<
H02
Stop!
P>
P<
H03
Stop!
44
Concluding remarks
Adequate and well-controlled clinical trials
Clear, specific, relevant objectives
Selection of representative subjects
Sound efficacy and safety endpoints
Incorporate methods to minimize/avoid bias
Calculation of sample size and power
Appropriate statistical methods of analysis
45
Selected references
ICH E9 ‘Note for Guidance on Statistical Principles for Clinical Trials’,
September 1998
ICH E10 ‘Note for Guidance on Choice of Control Group’, July 2000
FDA Guidance for Industry: ‘Non-inferiority Clinical Trials’ March 2010
FDA Draft Guidance for Industry: ‘Adaptive Design Clinical Trials for
Drugs and Biologics’, February 2010
CPMP ‘Points to Consider on Switching between Superiority and NonInferiority’, July 2000
CHMP ‘Guideline on the Choice of the Non-Inferiority Margin’, July 2005
CHMP ‘Guideline on missing data in confirmatory clinical trials’, 2010
46
Thank you for your attention!
Q&A