Transcript Name
Stratified Analysis of A
Binary Endpoint and “Beyond”
Christy Chuang-Stein
Statistical Research and Consulting Center
Pfizer Inc
ASA Biopharm Section Webinar
May 7 2009
1
delete these guides from slide master before printing or giving to the client
Related Webinars Offered Previously
October 21, 2008
Devan Mehrotra - Stratified Analyses: Tips for Improving
Power
(http://www.biopharmnet.com/doc/2008_10_21_webinar.pdf )
April 3, 2009
Frank Harrell – Case Study in Parametric Survival
Modeling
First 16 slides or so on “Covariable Adjustment in
Randomized Clinical Trials”
(http://www.biopharmnet.com/doc/2009_04_03_webinar.pdf )
delete these guides from slide master before printing or giving to the client
2
delete these guides from slide master before printing or giving to the client
Outline of This Webinar
Stratified Analysis of a Binary Endpoint
Inverse vs CMH Weighting
Simpson’s Paradox and Collapsibility
Beyond
Stratified Randomization vs Stratified Analysis
Stratification and Subgroup Analysis
Sample Sizing for a Multi-regional Trial
Regulatory Guidances on Global Trials, Data
Extrapolation
Conclusion
delete these guides from slide master before printing or giving to the client
3
delete these guides from slide master before printing or giving to the client
A Sepsis Study
A confirmatory trial in severe sepsis, a double-blind
placebo control trial; IV with 96 hours duration;
randomization stratified by center.
Primary analysis was 28-day mortality rate after
treatment onset, stratified by 3 pre-specified covariates:
APACHE II score, age and protein C activity.
Trial was terminated by an independent DSMB for
efficacy after 2nd interim analysis of 1520 patients.
Many subgroup analyses were conducted, including
APACHE II subgroups (4 defined by the observed
quartiles), subgroups defined by the components of the
APACHE II score, and subgroups defined by 1, or 2, or 3,
or at least 4 organ dysfunctions.
delete these guides from slide master before printing or giving to the client
4
delete these guides from slide master before printing or giving to the client
Notations for 28 Day Mortality Rate
Treatment
APACHE II Score Stratum
3-19
(1Q)
20-24
(2Q)
25-29
(3Q)
30-53
(4Q)
New Trt
p11
p12
p13
p14
Placebo
p21
p22
p23
p24
Risk
Difference
d1
d2
d3
d4
delete these guides from slide master before printing or giving to the client
5
delete these guides from slide master before printing or giving to the client
When Dealing with Binary Outcome
Three measures are commonly used to assess
efficacy within the j th APACHE II stratum
Risk difference dj : p1j – p2j
Relative risk rj : p1j / p2j
Odds ratio oj : { p1j (1 - p2j ) } / { (1 - p1j ) p2j }
Denote the observed rate by pij, pij = nij1 /nij+.
We will focus on risk difference. In each stratum,
estimate p1j – p2j by p1j – p2j. We will get an overall
treatment effect estimate and construct a test statistic.
delete these guides from slide master before printing or giving to the client
6
delete these guides from slide master before printing or giving to the client
Test an Overall Treatment Effect
A common approach is to form a weighted average and
construct a test statistic for the overall effect as
dˆ
ˆ
w
d
j j j
X2
j
wj
dˆ 2
var( dˆ )
X2 has an asymptotic chi-square distribution with 1
degree of freedom if Sj wj dj = 0.
delete these guides from slide master before printing or giving to the client
7
delete these guides from slide master before printing or giving to the client
Choice of Weights – Method I
Inverse variance – {wi} is equal to the inverse of the
sample variance of dˆ j . In this case, X2 will be
2
j w j dˆ j
w
j
2
j
When dj = d (the risk difference is uniform across the
strata), the inverse variance weighting produces the
minimum variance estimate for the common risk
difference d, which is unbiased for large samples. This
method is favored by meta analysts.
delete these guides from slide master before printing or giving to the client
8
delete these guides from slide master before printing or giving to the client
Choice of Weights – Method 2
CMH method – {wi} is equal to the inverse of the
harmonic mean of n1j+ and n2j+. This method produces
the X2 test by Cochran, which is asymptotically
equivalent to a test developed by Mantel and
Haenszel. Continuity correction could be applied.
n1 j n2 j
2
X C j p j (1 p j )
n j
2MH
n1 j n2 j
j p j (1 p j )
n j 1
-1
n1 j n2 j ˆ
dj
j n
j
1
j
2
n1 j n2 j ˆ
dj
n j
2
delete these guides from slide master before printing or giving to the client
9
delete these guides from slide master before printing or giving to the client
CMH Method
Let fi represent the relative frequency of patients in
the jth stratum in the population. When the study
population mimics the target population, CMH
estimate is approximately unbiased for Sj fj dj.
The above makes CMH weighting attractive when
one is not sure if the treatment effect is the same
across the strata.
delete these guides from slide master before printing or giving to the client
10
delete these guides from slide master before printing or giving to the client
Assumptions on True Mortality Rates
Treatment
Disease Severity Score
1Q
2Q
3Q
4Q
New Trt
12%
21%
27%
36%
Placebo
12%
24%
36%
48%
True Risk
Difference
0%
-3%
-9%
-12%
When the mortality rate is low, there is not much room to
improve. Most of the benefit is in the high-risk population.
delete these guides from slide master before printing or giving to the client
11
delete these guides from slide master before printing or giving to the client
Impact of Weighting
Weighting by the relative frequency of a stratum within
the population leads to an overall treatment effect Sj fj dj
of 0.25*(0)+0.25*(3%)+0.25*(9%)+0.25*(12%)= 6% .
Assume equal allocation within each stratum. The overall
treatment effect estimate under the CMH weighting will
approach 6% for large samples.
If we use the inverse variance weighting, we will weigh
treatment effects in the 1Q, 2Q, 3Q and 4Q by 2.23 : 1.38
: 1.20 : 1.00. The effect estimate will approach 4.5% for
large samples.
The inverse variance weighting will underestimate the
parameter Sj fj dj of interest in this case.
delete these guides from slide master before printing or giving to the client
12
delete these guides from slide master before printing or giving to the client
Results for 28 Day Mortality Rate
Treatment
New Trt
Placebo
Risk
Difference
APACH II Score Stratum
3-19
(1Q)
20-24
(2Q)
25-29
(3Q)
30-53
(4Q)
15%
22%
24%
38%
(N=218)
(N=218)
(N=204)
(N=210)
12%
26%
36%
49%
(N=215)
(N=222)
(N=162)
(N=241)
3%
-4%
-12%
-11%
delete these guides from slide master before printing or giving to the client
13
delete these guides from slide master before printing or giving to the client
Findings from the Sepsis Trial
The CMH test statistic has a value 7.310 with 1 degree
of freedom (no continuity correction). The two-sided Pvalue is 0.0068. The CMH test statistic computes the
variance assuming p1j = p2j for all j.
A 95% CI for the overall difference in the mortality rate
(new treatment – placebo) under the CMH weighting is
(-9.8%,-1.6%). The calculation of variance in this case
does not assume p1j = p2j .
The inverse variance approach produces a 95% for the
difference in the mortality rate (new treatment –
placebo) of (-8.1%, -0.1%).
delete these guides from slide master before printing or giving to the client
14
delete these guides from slide master before printing or giving to the client
Comparing across Strata
The difference in the mortality
rates (new treatment – placebo)
in the 4 APACHEII strata range
between 3% to –12%.
The graph suggests a possible
interaction that might be
qualitative in nature.
We will look at an approach
proposed by Gail and Simon
(1985, Biometrics, 41:361-372)
to test for qualitative interaction.
0.05
0
-0.05
-0.1
-0.15
1Q 2Q 3Q 4Q
Dmitrienko et al (2005).
Analysis of Clinical Trials
Using SAS.
delete these guides from slide master before printing or giving to the client
15
delete these guides from slide master before printing or giving to the client
Test for Qualitative Interaction
Let O+ = {di = set of non-negative differences
Let O- = {di = set of non-positive differences
Q j 1
J
dˆ 2j
s 2j
I (dˆ j 0 ) , Q j 1
J
dˆ 2j
s 2j
I (dˆ j 0 )
Q min ( Q , Q )
Q > c can be used to test the null hypothesis of no
qualitative interaction.
Q follows a fairly complex distribution based on a
weighted sum of chi-square distribution. SAS codes are
available in the book by Dmitrienko et al.
delete these guides from slide master before printing or giving to the client
16
delete these guides from slide master before printing or giving to the client
Test for Qualitative Interaction
Q+ can be used to test the null hypothesis of all
differences being negative. Q- can be used to test the
null hypothesis of all differences being positive.
For the sepsis study, the two-sided Gail-Simon test has
a P-value of 0.4822.
The one-sided P-value for H0 of positive differences
(new treatment – placebo) is 0.0030. The one-sided Pvalue for H0 of negative differences is 0.6005.
Like other interaction tests, G-S test requires strong
evidence before we can reject the no qualitative
interaction hypothesis.
delete these guides from slide master before printing or giving to the client
17
delete these guides from slide master before printing or giving to the client
In the End…
Data from this single study led to the approval of
Xigris®
Xigris® INDICATIONS AND USAGE
Xigris is indicated for the reduction of mortality in
adult patients with severe sepsis (sepsis associated
with acute organ dysfunction) who have a high risk
of death (e.g., as determined by APACHE II).
Safety and efficacy have not been established in
adult patients with severe sepsis and lower risk of
death.
delete these guides from slide master before printing or giving to the client
18
delete these guides from slide master before printing or giving to the client
Table in the Package Insert
APACHE
II
Quartile
score
1st + 2nd
(3-24)
3rd + 4th
(25-53)
Xigris
Placebo
Total
436
Mortality
rate
18.8%
437
Mortality
rate
19.0%
414
30.9%
403
43.7%
Total
Patients who have a high risk for death are represented
by an APACHE II score in the 3rd and 4th APACHE II
score categories.
Treatment effects need to differ more than what shown
in this case for Gail-Simon test to conclude interaction.
delete these guides from slide master before printing or giving to the client
19
delete these guides from slide master before printing or giving to the client
Questions
Could one have anticipated this extent of treatment
difference before the trial?
If yes, what would have been a good design and
analysis strategy?
Options
Specify the high risk population as the primary analysis
population and enroll adequate patients in this group.
Test both the high risk population and the entire
population with adjustment for multiplicity.
Analysis follows the design strategy.
delete these guides from slide master before printing or giving to the client
20
delete these guides from slide master before printing or giving to the client
The LIFE Study
Losartan Intervention For Endpoint Reduction in
Hypertension Study.
Conducted at 945 sites in 7 countries.
Enrolled 9193 hypertensive patients with left ventricular
hypertrophy (LVH)
The primary endpoint is a composite endpoint of
cardiovascular deaths, stroke, and myocardial infarction.
Results reviewed by the FDA Cardiovascular and Renal
Drugs AC on Jan 6 2003 for a new proposed indication
Cozaar is indicated to reduce the risk of cardiovascular
morbidity and mortality as measured by the combined
incidence of cardiovascular death, stroke, and myocardial
infarction in hypertensive patients with left ventricular
hypertrophy.
delete these guides from slide master before printing or giving to the client
21
delete these guides from slide master before printing or giving to the client
Some Background
Losartan’s then label states that the effect in blood
pressure reduction in blacks was somewhat less than in
that in whites (a common statement for beta-blockers).
FDA statistician quoted data from three endpoint studies
of other drugs. These studies demonstrated less or no
treatment effect in blacks when compared to whites.
On the primary endpoint, when compared to atenolol,
losartan had a hazards ratio of 0.869 (95% CI from 0.772
to 0.979) with a P-value of 0.021. The effect came
primarily from the stroke component of the composite.
The issue of how losartan compared to atenolol in blacks
came up.
delete these guides from slide master before printing or giving to the client
22
delete these guides from slide master before printing or giving to the client
Hazard Ratio and 95% CIs - Primary Endpoint
O v
U
e r a ll
n ite d
Sta te s
M a le
F e m a le
Bla c
W
k
h ite
Ag e < 6 5
Ag e
6 5
o r
o v
v
o
r
s
L
o
delete theseF
guides a
from slide
master
before
printing
or giving
to the
client s
a
23
delete these guides from slide master before printing or giving to the client
Gail-Simon Test
Nominal p-value for Black vs. Non-Black Qualitative
Interaction = 0.016.
Impossible to correctly adjust this p-value for multiple
comparisons post hoc.
3 subgroups pre-specified for special importance (U.S.
region, Diabetics, ISH)
To do it correctly, the formal analysis plan would need to
list all important subgroups and specify a method to
correctly adjust for the number of tests.
Source: John Lawrence’s (FDA Statistical Reviewer) slides at the
January 6 2003 FDA AC meeting. For more discussion, see
http://www.fda.gov/ohrms/dockets/ac/03/slides/3920s1.htm
delete these guides from slide master before printing or giving to the client
24
delete these guides from slide master before printing or giving to the client
COZAAR® Package Insert
Indications and Usage
… COZAAR is indicated to reduce the risk of
stroke in patients with hypertension and left
ventricular hypertrophy, but there is evidence that
this benefit does not apply to Black patients. …
Clinical Pharmacology
In the LIFE study, Black patients treated with atenolol were at
lower risk of experiencing the primary composite endpoint
compared with Black patients treated with COZAAR…. This
finding could not be explained on the basis of differences in
the populations other than race or on any imbalances between
treatment groups… the LIFE study provides no evidence that
the benefits of COZAAR on reducing the risk of cardiovascular
events in hypertensive patients with left ventricular
hypertrophy apply to Black patients.
delete these guides from slide master before printing or giving to the client
25
delete these guides from slide master before printing or giving to the client
Observations
In the case of Xigris, subgroups defined by APACHE II
score were pre-specified. Statistical significance was
not achieved by the Gail-Simon test at the 5% level.
In the case of COZAAR, race subgroups were not prespecified. They are, however, among the “usual”
demographic subgroups and there is a priori reason for
looking at this subgroup. A post hoc Gail-Simon test
produced a value less than 0.05.
The end results (language in the product package
insert) are similar – the label describes differential
treatment effects in the subgroups.
delete these guides from slide master before printing or giving to the client
26
delete these guides from slide master before printing or giving to the client
Clinical Summary of Safety
Study
Drug A
# of Pts
1
8%
4%
2
7%
6%
3
1%
1%
4
1%
2%
5
21%
20%
6
8%
10%
Total Avg
13%
1000
Drug B
9.5%
# of Pts
750
13% vs 9.5%: a two-sided P-value of 0.023.
delete these guides from slide master before printing or giving to the client
27
delete these guides from slide master before printing or giving to the client
Clinical Summary of Safety
Study
Drug A
# of Pts
Drug B
# of Pts
1
8%
100
4%
100
2
7%
100
6%
100
3
1%
100
1%
100
4
1%
100
2%
100
5
21%
500
20%
250
6
8%
100
10%
100
Total Avg
13%
1000
9.5%
750
95% CI for the diff (A – B) using inverse variance weighting is
(-0.017, 0.018) with a point estimate of 0.001. What happens?
delete these guides from slide master before printing or giving to the client
28
delete these guides from slide master before printing or giving to the client
Clinical Summary of Safety
Study
Drug A
# of Pts
Drug B
# of Pts
1
8%
100
4%
100
2
7%
100
6%
100
3
1%
100
1%
100
4
1%
100
2%
100
5
21%
500
20%
250
6
8%
100
10%
100
Total Avg
13%
1000
9.5%
750
The study with the highest AE rates had twice as many
subjects on Drug A as on Drug B.
delete these guides from slide master before printing or giving to the client
29
delete these guides from slide master before printing or giving to the client
Simpson’s Paradox
Treatment
New
Control
Total
Study I
Event
No Event
Study 2
Event No Event
180
(60%)
60
(30%)
120
(40%)
60
40
(60%)
(40%)
New: 300
Control: 100
140
(70%)
60
140
(30%)
(70%)
New: 200
Control: 200
•Within each study, the two groups have the same event rates.
•Study 1 randomized patients 1:1:1:1 to 3 doses and 1 control.
•Study 2 randomized patients 1:1 to one dose and control.
delete these guides from slide master before printing or giving to the client
30
delete these guides from slide master before printing or giving to the client
Results Pooled over Studies
Treatment
Event
No Event
Combined
New
240
(48%)
120
(40%)
260
(52%)
180
(60%)
500
Control
300
Pooling produces an event rate of 48% for the new
treatment and 40% for the control.
The chi-square statistic has a two-sided P- value =
0.028.
Conducting un-stratified (un-adjusted) analysis in this
case will lead to an erroneous conclusion.
delete these guides from slide master before printing or giving to the client
31
delete these guides from slide master before printing or giving to the client
Collapsibility
In this example, the risk difference is not collapsible
over the studies (i.e., we can’t ignore “study”).
Randomization (treatment assignment) is not
independent of study in the two-way marginal table of
treatment by study.
Study 1
Study 2
Combined
New Treatment
Control
300
100
200
200
500
300
Total
400
400
delete these guides from slide master before printing or giving to the client
32
delete these guides from slide master before printing or giving to the client
Collapsibility
When both randomization ratio and risk difference are
the same across studies, risk difference is collapsible
over studies.
In this case, the proportion of event for each treatment
is a weighted average of the proportions in individual
studies with weights proportional to the study sizes.
Study 1
Study 2
Combined
New Treatment
Control
60%
60%
30%
30%
40%
40%
Total
(3:1) 400
(3:1) 800
delete these guides from slide master before printing or giving to the client
33
delete these guides from slide master before printing or giving to the client
In General
If the two treatments have the same effect in all
studies (null hypothesis) and in addition, the
randomization ratio is the same, then risk difference,
risk ratio, and odds ratio are all collapsible across
studies.
In the above case, the risk difference is 0 and the
relative risk and odds ratio are 1.
Otherwise, collapsibility depends on the chosen
measure for association (risk difference, risk ratio,
odds ratio) - Greenlander, 1998, Encyclopedia of
Biostatistics.
delete these guides from slide master before printing or giving to the client
34
delete these guides from slide master before printing or giving to the client
Collapsibility Depends on Measure
1:1 randomization, equal risk difference in two studies
Study 1
Study 2
Combined
New Trt
80%
N = 100
40%
N = 100
60%
Control
60%
N = 100
20%
N = 100
40%
Risk Diff
0.20
0.20
0.20
Risk Ratio
1.33
2.00
1.50
Odds Ratio
2.67
2.67
2.25
delete these guides from slide master before printing or giving to the client
35
delete these guides from slide master before printing or giving to the client
Observations
Meta analysis procedure is frequently used to
combine efficacy results.
Should use meta analysis (stratified analysis) when
summarizing safety data from different studies,
especially when studies have different patient
populations and/or different randomization ratios.
If there is no a priori information suggesting different
risk differences for different studies, inverse variance
weighting would be a good choice.
Should always consider stratified analysis when
covariates are highly correlated with the response.
delete these guides from slide master before printing or giving to the client
36
delete these guides from slide master before printing or giving to the client
Stratified (Adjusted) Analysis
Factor defining strata is prognostic of response.
Allowing comparison within more homogeneous
groups.
Factor defining strata is predictive of treatment
effect.
Issue of interaction
Evaluating treatment effect with subgroups
Overall treatment effect might be less meaningful if
the interaction between treatment and factor is
substantial
delete these guides from slide master before printing or giving to the client
37
delete these guides from slide master before printing or giving to the client
Stratified Randomization vs Analysis
If we employ stratified randomization, the convention is to
include the stratifying factor in the analysis
(CPMP/EWP/2863/99 on adjustment for baseline
covariates).
When there are >=50 patients in each treatment group,
Grizzle found that there was little advantage to using
stratified randomization with two strata when the strata
are roughly equally represented (Grizzle, Controlled
Clinical Trials, 1982).
The incremental benefit of stratified randomization
beyond that due to the stratified analysis is minimum
(Permutt, DIJ 2007).
delete these guides from slide master before printing or giving to the client
38
delete these guides from slide master before printing or giving to the client
Stratified Randomization vs Analysis
The above is due to the fact that, for a reasonable sample
size, the chance that the randomization will produce the
type of imbalance that will substantially affect the
inference is low.
If a stratum is small, stratified randomization could reduce
the chance of imbalance.
If we are forced to treat un-stratified analysis as the
primary analysis, stratified randomization could generally
give us results close to those from an adjusted analysis.
Stratified allocation is used to ensure adequate (or even
greater) representation of a particular type of patients in
the study.
delete these guides from slide master before printing or giving to the client
39
delete these guides from slide master before printing or giving to the client
Permutt, DIJ 2007
50 subjects will be randomized to one of two treatments.
There are 50 men and 50 women. Gender is a prognostic
factor and could be used as a stratifying factor for
randomization and/or analysis, resulting in 4 options:
stratified randomization and analysis (R&A), stratified
randomization only (R Only), stratified analysis only (A
Only), Neither.
Assume standard deviation is 10, and a treatment effect
that will result in 80% power with 25 per group per gender
under the R&A option (i.e., D = 5.6).
Assuming no treatment by gender interaction, but gender
effect varies between 0 and 20.
delete these guides from slide master before printing or giving to the client
40
delete these guides from slide master before printing or giving to the client
Permutt, DIJ 2007
Under “A Only” (stratified analysis without stratified
randomization), the power was calculated for each
possible (treatment,gender) allocation combination. The
power was then averaged using probability under the
hypergeometric distribution as the weight.
Under option “R Only” (stratified randomization without
stratified analysis), Type I error could be lower than the
nominal level (two-sided 5%) because the reduction in
the variance of the estimated treatment effect due to
stratified randomization is not properly accounted for in
the analysis. (See the original paper.)
delete these guides from slide master before printing or giving to the client
41
delete these guides from slide master before printing or giving to the client
Numerical Results (Permutt, DIJ 2007)
Gender
R&A
R Only
A Only
Neither
0
0.800
0.800
0.799
0.800
2
0.800
0.799
0.799
0.796
4
0.800
0.795
0.799
0.784
6
0.800
0.790
0.799
0.765
8
0.800
0.783
0.799
0.739
12
0.800
0.765
0.799
0.671
16
0.800
0.744
0.799
0.590
20
0.800
0.724
0.799
0.508
delete these guides from slide master before printing or giving to the client
42
delete these guides from slide master before printing or giving to the client
Stratification & Subgroup Analysis
How does the treatment perform in patients with mild
disease?
Do patients with mild/moderate disease respond to
the treatment similarly as patients with severe
disease?
This is typically phrased as an interaction between
treatment and disease severity at baseline
If heterogeneous effect (interaction) exists, is it
qualitative or quantitative?
delete these guides from slide master before printing or giving to the client
43
delete these guides from slide master before printing or giving to the client
Subgroup Analysis: Issues
Multiplicity leading to inflated false positive rate
Lack of statistical power leading to inflated false
negative rate
Treatment group incomparable because
randomization was not done within the subgroups
Appropriate reporting/interpretation to ensure
scientifically defensible and balanced conclusion
We will focus on the first two issues here.
delete these guides from slide master before printing or giving to the client
44
delete these guides from slide master before printing or giving to the client
False Positive
Multiplicity
With multiple
subgroup
analyses,
probability of a
false positive
finding
substantial.
With 10
independent tests
(α=0.05), chance
of at least one
false positive >
40%.
Lagakos (2006) NEJM 354;16
delete these guides from slide master before printing or giving to the client
45
delete these guides from slide master before printing or giving to the client
Forest Plot of Treatment Effect
Hypothetical
study
Typical Result
4000 patients in 20
countries (200
patients each) with
a control arm risk
of 20% and an
experimental arm
risk of 15%
Homogenous
absolute risk
reduction of 5% in
all countries.
Marschner (DIA Annual Meeting)
delete these guides from slide master before printing or giving to the client
46
delete these guides from slide master before printing or giving to the client
Simulation Study of Country Differences
In 10,000 simulations of similar studies, the largest
and smallest treatment effect among the 20
countries was calculated
– On average the largest treatment effect among the 20
countries was a 15% absolute risk reduction on the
experimental therapy
– On average the smallest treatment effect among the
20 countries was a 5% absolute risk increase on the
experimental therapy
Purely by chance, the observed experimental
treatment effect in different countries can be expected
to range from extremely beneficial to apparently
harmful.
Marschner (DIA Annual Meeting)
delete these guides from slide master before printing or giving to the client
47
delete these guides from slide master before printing or giving to the client
Prob of Neg Result for One Subgroup
Assuming two groups and a continuous endpoint:
Factors increasing the probability
• Substantial imbalance between treatment groups
• Substantial differences in the subgroup size
• A large number of subgroups
Factors decreasing the probability
• Balanced treatments and subgroup size
• A large treatment effect size
• A large sample size
delete these guides from slide master before printing or giving to the client
48
delete these guides from slide master before printing or giving to the client
Disjoint Subgroups
2-sided a = 0.05
1:1 ratio with perfect balance between treatments
Various scenarios for subgroup size
# of
Subgroups
3
80% Power
90% Power
15 – 35%
9 – 30%
5
40 – 60%
30 – 35%
Li, Chuang-Stein, Hoseyni, DIJ (2007), 41:47-56.
delete these guides from slide master before printing or giving to the client
49
delete these guides from slide master before printing or giving to the client
Overlapping Subgroups (Simulations)
Each baseline covariate defines 3 subgroups with
equal proportions (2 or 5 covariates).
Probabilities based on simulations (1000 replicates).
Unconditional on the overall result.
# of
Subgroups
Effect Size = 0.25
Effect Size = 0.50
80% Power
(253/group)
90% Power
(338/group)
80% Power
(64/group)
90% Power
(86/group)
6
24%
17%
22%
15%
15
38%
26%
43%
27%
delete these guides from slide master before printing or giving to the client
50
delete these guides from slide master before printing or giving to the client
Overlapping Subgroups (Simulations)
Each baseline covariate defines 3 subgroups with
equal proportions (2 or 5 covariates).
Probabilities based on simulations (1000 replicates).
Conditional on a statistically significant overall result.
# of
Subgroups
Effect Size = 0.25
Effect Size = 0.50
80% Power
(253/group)
90% Power
(338/group)
80% Power
(64/group)
90% Power
(86/group)
6
12%
11%
13%
9%
15
28%
21%
27%
21%
delete these guides from slide master before printing or giving to the client
51
delete these guides from slide master before printing or giving to the client
MERIT-HF Trial
The only pivotal trial to assess the efficacy and safety of
metoprolol (Toprol XL) as an adjunctive therapy to
optimal standard therapy for patients with congestive
heart failure.
There were 3991 patients from several hundred sites in
US and 13 European countries.
The study has two primary endpoints, total mortality and
a composite endpoint.
27% of the patients (539 on placebo and 532 on
metoprolol) were from the US.
delete these guides from slide master before printing or giving to the client
52
delete these guides from slide master before printing or giving to the client
HR & 95% CI - Total Mortality
All
US
Europe
NYHA II
NYHA III
NYHA IV
EF <= 0.25
EF > 0.25
Previous acute MI:Y
Previous acute MI:N
Gender - Male
Gender - Female
Age <= 69.4
Age > 69.4
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
Favors
Favors
Toprol-XL
Placebo
delete these guides from slide
master before printing
or giving to the client
53
delete these guides from slide master before printing or giving to the client
Implication for Designing Global Trials
Desire to control (minimize) the probability
of observing a negative treatment effect in
at least one region when the treatment
effect is positive and uniform across all
regions in a multi-regional (global) trial.
delete these guides from slide master before printing or giving to the client
54
delete these guides from slide master before printing or giving to the client
Current State
Bob O’Neill: PhRMA/FDA Workshop on Multi-Regional
Trials 2007.
delete these guides from slide master before printing or giving to the client
55
delete these guides from slide master before printing or giving to the client
Robert Califf: PhRMA/FDA Workshop 2007
delete these guides from slide master before printing or giving to the client
56
delete these guides from slide master before printing or giving to the client
MRCT Cross-Functional Key Issues Team
The Biostatistics and Data Management Group
convened a Multi-Regional Clinical Trials (MRCT)
Key Issues team after the workshop.
Bruce Binkowitz (Merck), stat co-chair of the MRCT
working group, will present the group’s progress at
the Harvard/Schering Plough workshop on May 2829. The theme of the workshop is “Global Trials:
Challenges and Opportunities”.
delete these guides from slide master before printing or giving to the client
57
delete these guides from slide master before printing or giving to the client
Simultaneous Global Development Committee
PhRMA also has a SGD Committee.
Its focus is Regulatory, seeking to enable a
regulatory framework for allowing global
development of therapies that could
result in simultaneous global submissions with one
single global data-set
expedite global patient access to these products
Current focus is China, Korea, Taiwan and Japan.
delete these guides from slide master before printing or giving to the client
58
delete these guides from slide master before printing or giving to the client
Asian Region Cooperation
The 1st China-Japan-Korea Ministerial Meeting on
Health was held in Korea in April 2007. The 2nd one
took place in Nov 2008 in Beijing.
They declared in the “Joint Statement of the First
Tripartite Health Ministers Meeting (THMM)” to jointly
promote cooperation in areas of Clinical Researches,
...
Cooperation in an investigation on ethnic factors
MHLW set up study group to investigate differences in
PK/PD and safety among Asian populations
The 1st report on PK difference is targeted 2Q2009
– The Goal : Could Asia be regarded as “one population”?
delete these guides from slide master before printing or giving to the client
59
delete these guides from slide master before printing or giving to the client
One Approach for Sample Sizing
1.
A continuous endpoint that follows a normal
distribution. Large values are desirable.
2.
Treatment effect within each region is estimated by the
difference in the observed means (or observed mean
changes from baseline).
3.
Effect size (D/s) is uniform across regions.
4.
The one-sided significance level for the primary
analysis on the overall treatment effect is 2.5%. Power
to detect (D/s) is 1-b.
5.
For simplicity, we will work with 3 regions with 1:1
allocation to 2 treatments.
delete these guides from slide master before printing or giving to the client
60
delete these guides from slide master before printing or giving to the client
Framework (Kawai et al DIJ, 2007)
Sample Size/Group: N
p1 p2
1
p1
The number N is determined to
provide an 80% or 90% power
for the primary analysis at the
one-sided 2.5% level.
2
D3
Region 3
[Largest]
p3
D2
Region 2
[2nd smallest]
p2
D1
Region 1
[Smallest]
p1
Estimated treatment effect
(New treatment - Placebo)
1 p1
p1 p2
2
Due to the constraints of
p1 ≤ p2 ≤ p3 and
p1+p2+p3=1
delete these guides from slide master before printing or giving to the client
61
delete these guides from slide master before printing or giving to the client
Basis for Deciding Regional Size
D3
Region 3
D2
D1
0
Region 2
Region 1
We want a high probability
(e.g. 80% or 90%) that
the point estimates for the
treatment effect in all
regions are positive.
Estimated treatment effect
(New treatment - Placebo)
PCS = Probability that three regions show consistent results.
delete these guides from slide master before printing or giving to the client
62
delete these guides from slide master before printing or giving to the client
Plots of Pcs against p1 with p2=p1
1.0
0.9
0.9
Probability of observing all Di >0
0.8
Power:90%
0.8
0.7
Pcs never
reaches
90%
0.6
Power:80%
0.5
0.4
0.3
0.2
0.151
0.1
0.213
0.277
0.0
0.05
0.10
0.15
0.20
0.25
0.30
p1
Worst case with two small regions and a large one.
delete these guides from slide master before printing or giving to the client
63
delete these guides from slide master before printing or giving to the client
But …
In practice, inference concerning regional results
(as a secondary analysis) is relevant only if the
overall treatment effect in the confirmatory trial is
statistically significant.
The above calls for looking at Pcs conditional on
first concluding a significant overall treatment
effect at the one-sided 2.5% level.
delete these guides from slide master before printing or giving to the client
64
delete these guides from slide master before printing or giving to the client
Conditional Pcs vs Unconditional Pcs
(p1, p2, p3)
80% power
90% power
(0.05, 0.05, 0.9)
57.5 (53.7)
64.6 (58.6)
(0.1, 0.1, 0.8)
71.1 (65.6)
73.8 (71.7)
(0.15, 0.15, 0.7)
82.5 (73.4)
82.5 (79.9)
(0.2, 0.2, 0.6)
86.1 (78.9)
90.0 (85.3)
(0.25, 0.25, 0.5)
90.9 (82.5)
92.2 (88.8)
(0.3, 0.3, 0.4)
93.2 (84.5)
94.4 (90.7)
Treatment effect = 0.250, s =1
delete these guides from slide master before printing or giving to the client
65
delete these guides from slide master before printing or giving to the client
Conditional Pcs When p2 = p1
0.9
Power:90%
0.8
Conditional Pcs
Power:80%
Unconditional
Power = 90%
○: D/s = 0.125
+: D/s = 0.250
Power = 80%
○: D/s = 0.125
+: D/s = 0.250
delete these guides from slide master before printing or giving to the client
66
delete these guides from slide master before printing or giving to the client
PMDA Guidance (Sept 28 2007)
Basic Principles on Global Clinical Trials (
http://www.pmda.go.jp/english/publications/index.html )
Method 1
Look at DJapan/Dall. Want
Pr (DJapan/Dall > 0.5 | Common D) > 80%
Method 2
The “consistency” approach.
delete these guides from slide master before printing or giving to the client
67
delete these guides from slide master before printing or giving to the client
EMEA Reflection Paper
Released for public comments in January 2009.
Questions the relevance of some clinical data from emerging
regions to support marketing applications in EU due to
Intrinsic factors including genetic and nature of disease
Extrinsic factors including medical practice, disease definition and
study population
Includes 5 product areas where extrapolation of study results
to European population had been found to be difficult.
Encourages an in-depth prospective evaluation of factors if the
trial is to provide evidence to support EU filing. It is possible
that additional clinical trials within EU might be necessary if
extrapolation is judged to be problematic.
delete these guides from slide master before printing or giving to the client
68
delete these guides from slide master before printing or giving to the client
Summary
When there is no reason to suspect the risk difference to
differ across strata, IV weighting produces the minimum
variance and asymptotically unbiased estimate. However,
when the proportions are in the range of (0.25, 0.75), CMH
estimates are generally quite close to the IV estimates.
When risk difference is suspected to differ across strata,
CMH tends to produce more sensible estimates.
It is critically important to know the studies and where the
data came from. Naïve pooling could produce very
misleading results and should be avoided.
Stratification often leads to subgroup analysis. We need to
consider the role subgroup analysis will play in reporting and
interpreting trial results.
delete these guides from slide master before printing or giving to the client
69
delete these guides from slide master before printing or giving to the client
References
Califf RM. (2007). Multiregional clinical trials. Presented at the PhRMA-FDA
workshop, Oct 29-30, Washington DC.
Dmitrienko A, Molenberghs G, Chuang-Stein C, and Offen W. (2005)
Analysis of Clinical Trials Using SAS: A Practical Guide. Cary, NC: SAS
Institute Inc.
EMEA Points to consider on adjustment for baseline covariates.
CPMP/EWP/2863/99 (Nov 2003, coming into operation).
EMEA Reflection paper on the extrapolation of results from clinical studies
conducted outside Europe to the EU-population. CHMP/EWP/692702/2008.
Released for public comments, January 2009.
Greenlander S. (1998). Collapsibility. Encyclopedia of Biostatistics, Wiley.
786-788.
Grizzle JE. (1982). A note on stratifying versus complete random
assignment in clinical trials. Controlled Clinical Trials, 3:365-368.
Kawai N, Chuang-Stein C, Komiyama O, Ii Y. (2008). An approach to
rationalize partitioning sample size into individual regions in a multi-regional
trial. Drug Information Journal, 42(2):139-147.
delete these guides from slide master before printing or giving to the client
70
delete these guides from slide master before printing or giving to the client
References
Li Z, Chuang-Stein C, Hoseyni C. (2007). The probability of observing
negative subgroup results when the treatment effect is positive and
homogeneous across all subgroups. Drug Information Journal, 41(1):47-56.
Ministry of Health, Labour and Welfare. (2007). Basic Principles on Global
Clinical Trials. Available at:
http://www.pmda.go.jp/operations/notice/2007/file/0928010-e.pdf
O’Neill R. (2007). Multi-regional Clinical Trials: Why be concerned? A
Regulatory perspective on Issues. Presented at the PhRMA-FDA workshop,
Oct 29-30, Washington DC.
Permutt T. (2007). A note on stratification in clinical trials. Drug Information
Journal, 41:719-722.
delete these guides from slide master before printing or giving to the client
71