3_Sample size determination
Download
Report
Transcript 3_Sample size determination
Sample Size
Determination
Bandit Thinkhamrop, PhD (Statistics)
Dept. of Biostatistics & Demography
Khon Kaen University
Essential of sample size calculation
No one accept any “magic number”
Too large vs Too small
To justify with the sponsor and the
Ethics Committee
To ensure:
– adequate power to test a hypothesis
– desired precision to obtain an estimate
Two main approaches
Hypothesis-based sample size calculation
– Involve “power” or beta error
– Ensure a significant finding but may not be
conclusive clinically
– Easy and widely available
Confidence interval methods of sample size
calculation
– Involve precision of the estimation
– Ensure a conclusive finding clinically as this
method is directly estimate the magnitude of effect
– Difficult and not widely available
Overall steps
Identify the primary outcome
Identify and review the magnitude of effect
and its variability that will be used as the
basis of the conclusion of the research.
Identify what statistical method that will be
used to obtain the main magnitude of
effect.
Calculate the sample size
Describe how the sample size is calculated
with sufficient details that allow explicability.
Steps in the calculation
Base sample size calculation
Design effect (for correlated outcome)
Contingency (increase to account for nonresponses or dropout)
Rounding up to a nearest (and comfortable)
number
Evaluate if this sample size would provide a
precise and conclusive answer to the
research question by analyze the data as if
it is as expected.
Suggested approaches
For unknown parameters in the formula, try
to find existing evidences or use your best
“GUESTIMATE”, a.k.a. educated guest.
Do not use only one scenario or based on
only one reference for the calculation. It is
highly recommended that all key parameters
should be varied to see how they effect on
the sample size.
Always evaluate its sufficiency by estimate
the main magnitude of effect and its 95%
CI and see if it provide a conclusive finding.
Consult with the statistician early
Common pitfalls
Unjustified sample size by specifying a “magic”
number
Based on a simplify formula or a sample size table
without understanding its limitations
"A previous study in this area recruited 50 subjects
and found highly significant results (p=0.001), and
therefore a similar sample size should be sufficient."
– never do it like this
Inconsistent with the protocol
Too much rely on the previous findings in sample
size calculation
Examples of common
calculations
Mean – one group
Mean – two independent groups
Proportion – one group
Proportion – two independent groups
Get some idea from those
Practice with your own research
Mean – one group
:Formula
Where:
n = The sample size
Z/2 = The standard normal coefficient, typically 1.96 for 95% CI
s =The standard deviation.
d = The desired precision level expressed as half of the maximum
acceptable confidence interval width.
Mean – one group
:Calculations (fix = 0.05)
Expected
Standard
deviation
25
Precision
(half width)
n
5
99
30
5
141
25
10
27
30
10
38
Mean – one group
:Descriptions
A sample size of 38 would be able to
estimate a mean with a precision of 10
assuming a standard deviation of 30
according to a study by <Reference>.
That is, based on the expected mean
of 55 <Reference>, the 95%
confidence interval of the estimated
mean would be between 45 and 65.
Mean – two independent group
:Formula
Sample size in each
group (assumes equal
sized groups)
Represents the desired
power (typically .84 for
80% power).
Represents the desired level
of statistical significance
(typically 1.96 for = 0.05).
A measure of variability
(This is a variance or a
square of the standard
deviation)
Minimum meaningful
difference or Effect Size
Mean – two independent groups
:Calculations (fix = 0.05)
H0: M1-M2=0. H1: M1-M2=D1<>0.
Test Statistic: Z test with pooled variance (SD1 = 20; SD2 = 25)
Power
Mean in
Control grp.
Minimum and
meaningful
difference
n1
n2
90%
30
10
109
109
80%
30
10
82
82
90%
30
20
28
28
80%
30
20
22
22
90%
35
5
432
432
80%
35
5
322
322
90%
35
15
49
49
80%
35
15
37
37
Mean – two independent groups
:Descriptions
A total sample size of 37 in group one and
37 in group two would have a power of
80% to detect a difference between group
of 15 assuming a mean of 35 in control
group with estimated group standard
deviations of 20 and 25, respectively,
according to a study by <Reference>.
The test statistic used is the two-sided two
sample t-test. The significance level of the
test was targeted at 0.05.
Proportion – one group
:Formula
Where:
n = The sample size
Z/2 = The standard normal coefficient, , typically 1.96 for 95% CI
p = The value of the proportion as a decimal percent (e.g., 0.45).
d = The desired precision level expressed as half of the maximum
acceptable confidence interval width.
Proportion – one group
:Calculations (fix = 0.05)
Expected
Prevalence
n
15%
Precision
(half width)
2%
1,225
20%
2%
1,537
15%
4%
307
20%
4%
385
Proportion – one group
:Descriptions
A sample size of 400 would have a
95% confidence interval of 16% to
24% assuming a prevalence of 20%
according to a study by <Reference>.
Proportion – two independent group
:Formula
Sample size in each
group (assumes equal
sized groups)
Represents the desired
power (typically .84 for
80% power).
Represents the desired level
of statistical significance
(typically 1.96 for = 0.05).
A measure of variability
(similar to standard
deviation)
Minimum meaningful
difference or Effect Size
Proportion – two independent groups
:Calculations (fix = 0.05)
H0: P1-P2=0. H1: P1-P2=D1<>0.
Test Statistic: Z test with pooled variance
Power
Proportion in
Control grp.
Minimum and
meaningful
difference
n1
n2
90%
40%
5%
2,053
2,053
80%
40%
5%
1,534
1,534
90%
50%
5%
2,095
2,095
80%
50%
5%
1,565
1,565
90%
40%
10%
519
519
80%
40%
10%
388
388
90%
50%
10%
519
519
80%
50%
10%
388
388
Proportion – two independent groups
:Descriptions
A total sample size of 388 in group one and
388 in group two would have a power of
80% to detect a difference between group
of 10% assuming a prevalence of 50% in
control group according to a study by
<Reference>.
The test statistic used is the two-sided Z
test. The significance level of the test was
targeted at 0.0500.
Other considerations
Sampling design affects the calculation of sample size
– Simple random sampling / assignment
– Stratified random sampling / assignment
– Clustered random sampling / assignment
Complex study designs affects the calculation of sample size
– Matching
– Multiple stages of sampling
– Repeated measures
Usually the sample size calculation is based on method of
analysis
–
–
–
–
–
–
Correlation, Agreement, Diagnostic performance
Z-test
Regression – multiple linear, logistic
Multivariate analyses such as principle component or factor analysis
Survival analyses
Multilevel models
Other considerations
Demonstrate superiority
– Sample size sufficient to detect difference
between treatments
– Require to specify “minimum meaningful”
difference
Demonstrate non-inferiority or equally
effective
– Sample size required to demonstrate equivalence
larger than required to demonstrate superiority
– Require to specify “non-inferiority margin or
equivalence range”
Precision or Power
Estimation
Equivalence to sample size calculation – do it in the
planning phase of the study
Do it when the number of available sample is
known
Wrong: “There are around 50 patients per year, of
whom 10% may refuse to take part in the study.
Therefore over the 2 years of the study, the sample
size will be 90 patients. “
Correct: “It is estimated that there will be 90
patients in the clinic. This will give a precision of
the prevalence estimation of 20% assuming a
prevalence of 65%.”
Suggested learning
resources
WWW:
Statistics Guide for Research Grant
Applicants at St George’s University of London
(maintained by Martin Bland):
– http://www-users.york.ac.uk/~mb55/guide/size.htm
Software:
PASS2008, nQuery, EpiTable,
SeqTrial, PS, etc.
Q&A