Sample size determination
Download
Report
Transcript Sample size determination
Sample size determination
Nick Barrowman, PhD
Senior Statistician
Clinical Research Unit, CHEO Research Institute
March 29, 2010
Outline
• Example: lowering blood pressure
• Introduction to some statistical issues in
sample size determination
• Two simple approximate formulas
• Descriptions of sample size calculations
from the literature
Example
• Physicians design an intervention to
reduce blood pressure in patients with
high blood pressure
• But does it work? Need a study.
• How many participants are required?
• Too few: may not detect an effect even if
there is one.
• Too many: may unnecessarily expose
patients to risk.
The null hypothesis
• For intervention studies, the null hypothesis is
usually this: on average there is no effect.
• “Innocent until proven guilty”
• The physicians who designed the intervention
believe the null hypothesis is false.
• The study is designed to test the null
hypothesis.
• Often write H0 for the null hypothesis.
The study
• The population is considered to be all people
who might be eligible for the intervention (might
depend on age, other medical conditions, etc.)
• Study participants are viewed as a sample from
this population.
• Suppose for each study participant we measure
blood pressure at baseline, and after 6 weeks of
intervention
• Outcome is change in blood pressure
• H0 is that mean change in BP is 0.
Population vs. sample
Random sample
Population
Calculation
Population mean
of the change in
blood pressure
Inference
Sample mean
of the change in
blood pressure
Probability distributions
Population distribution of change in blood pressure
mean
Recall that variance is
the square of the
standard deviation,
often written as s2
± 1 standard deviation
Population distribution of change in blood pressure
Sampling distribution of mean change in blood
pressure (N=1)
Sampling distribution of mean change in blood
pressure (N=2)
Sampling distribution of mean change in blood
pressure (N=5)
Sampling distribution of
pressure (N=10)
mean change in blood
Increasing sample size
reduces the variability of
the sample mean.
standard error
SE =
SD
N
standard deviation
Variance and sample size
• As we’ve seen, increasing the sample size
is akin to reducing the variance
• Equivalently, reducing the variance (e.g.
using a more precise measurement
device) can reduce the sample size
requirements
Hypothesis test
Sampling distribution
of the mean under
the null hypothesis,
a.k.a. the null
distribution
Hypothesis test
Reject the null hypothesis if the observed mean is far in the tails of
the null distribution, i.e. we have ruled out chance
Observed
mean
Rejection
region
Possible scenarios
Based
on the
study
findings
we infer
either …
that the
intervention
has no effect
(accept H0)
or
that the
intervention
has an effect
(reject H0)
Possible scenarios
In reality, either …
the intervention has
no effect (H0 is true)
Based
on the
study
findings
we infer
either …
that the
intervention
has no effect
(accept H0)
or
that the
intervention
has an effect
(reject H0)
or
the intervention has
an effect (H0 is false)
Four possible scenarios
In reality, either …
the intervention has
no effect (H0 is true)
Based
on the
study
findings
we infer
either …
that the
intervention
has no effect
(accept H0)
or
that the
intervention
has an effect
(reject H0)
or
the intervention has
an effect (H0 is false)
Four possible scenarios
In reality, either …
the intervention has
no effect (H0 is true)
Based
on the
study
findings
we infer
either …
that the
intervention
has no effect
(accept H0)
or
that the
intervention
has an effect
(reject H0)
Correctly accept H0
or
the intervention has
an effect (H0 is false)
Four possible scenarios
In reality, either …
the intervention has
no effect (H0 is true)
Based
on the
study
findings
we infer
either …
that the
intervention
has no effect
(accept H0)
or
that the
intervention
has an effect
(reject H0)
or
the intervention has
an effect (H0 is false)
Correctly accept H0
Correctly reject H0
Four possible scenarios
In reality, either …
the intervention has
no effect (H0 is true)
Based
on the
study
findings
we infer
either …
that the
intervention
has no effect
(accept H0)
or
that the
intervention
has an effect
(reject H0)
or
the intervention has
an effect (H0 is false)
Correctly accept H0
Type-I error
Correctly reject H0
Four possible scenarios
In reality, either …
the intervention has
no effect (H0 is true)
Based
on the
study
findings
we infer
either …
that the
intervention
has no effect
(accept H0)
or
that the
intervention
has an effect
(reject H0)
or
the intervention has
an effect (H0 is false)
Correctly accept H0
Type-II error
Type-I error
Correctly reject H0
Type-I error
If the null hypothesis is true,
the rejection region of the
test represents type-I error.
The probability of type-I
error is the area of the red
region below, and is denoted
by .
Type-II error
• Type-II error is failing to reject the null
hypothesis when it is false.
• The probability of type-II error is denoted .
• It depends on how big the true effect is
• Sample size calculations require specification of
an alternative hypothesis, which indicates the
size of effect we would like to detect
Type-II error
Type-II error
Type-II error
Relationship between type-I and type-II error
(alpha=0.05)
Relationship between type-I and type-II error
(alpha=0.10)
Relationship between type-I and type-II error
(alpha=0.20)
Relationship between type-I and type-II error
• Sample size calculations depend on the tradeoff
between type-I and type-II error.
• We usually fix the probability of type-I error
(alpha) at 5% and then try to minimize the
probability of type-II error (beta).
• Define Power = 1 – beta
• We want to maximize power
• One way to do this is by increasing the sample
size
How sample size affects power
Sample size (doubled)
Sample size (quintupled)
An approximate formula for the blood
pressure example
• Suppose the variance in the change in blood
pressure, sigma2, is the same for the null and
alternative hypotheses
• Suppose alpha is fixed at 0.05 and we use twosided tests (allowing for the possibility that
blood pressure could be either increased or
decreased by the intervention)
• Then we will have approximately 80% power
to detect a mean change in blood pressure delta
if we enroll N participants, where
N = 8 sigma2 / delta2
(approximately)
Example
• Suppose the standard deviation of the
change in blood pressure is anticipated to
be 7 mmHg (so the variance is 49)
• Suppose we fix alpha at 0.05 and we’d like
to have approximately 80% power to
detect a mean change of 5 mmHg
• Then we would need about 16 participants
When there are two groups
• So far, the example has used a single group of
study participants
• Usually we want to compare two groups: a
control group that receives “standard of care” or
placebo, and an experimental group that
receives a new intervention
• This is how most randomized controlled trials
are set up
• In this case, delta is the difference between the
means of the two groups.
• For simplicity, assume that the variance is the
same in the two groups.
An approximate sample size formula
for the case of two groups
• A similar approximate formula applies, again
assuming alpha=0.05 and power=80%:
N per group = 16 sigma2 / delta2
(approximately)
• Careful! This is the required sample size per
group.
• Also, note that the constant is double what is
was for the case of a single group.
• So the total sample size is 4 times as large.
Example
• Suppose we want to compare patients
randomized to placebo with patients randomized
to a new intervention
• Suppose the standard deviation is anticipated to
again be 7 mmHg (so the variance is 49)
• Suppose we fix alpha at 0.05 and we’d like to
have approximately 80% power to detect a
change of 5 mmHg
• Then we would need about 32 participants per
group, for a total of about 64 participants
Summary
Required sample size …
•
•
•
•
increases with variance
decreases with size of effect to detect
decreases with probability of type-I error, alpha
decreases with probability of type-II error, beta
Sample size determination has many
other aspects
• Different types of outcomes: dichotomous
(e.g. mortality), time-to-event (e.g.
survival time), etc.
• Different designs: observational studies
(e.g. case-control), surveys, prevalence
studies
• Practical considerations: e.g. costs,
feasibility of recruitment
Questions?
Review: A comedy of errors …
α = Probability of type-I error
Probability of a false conviction
(Rejecting the null hypothesis when it is in fact true.)
Power = 1 – β
Probability of a true conviction
(Rejecting the null hypothesis when it is in fact false.)