Summary of sample size determination for a desired margin of error
Download
Report
Transcript Summary of sample size determination for a desired margin of error
Sample Size and CI’s for
the Population Mean (m)
and the Population
Proportion (p)
Sample Size and CI’s for m
Suppose we wish to estimate a population mean m using
a 95% CI and have a margin of error no larger than E
units. What sample size do we need to use?
Recall the “large” sample CI for m is given by:
X (z - value) s
95% z = 1.96
n
90% z = 1.645
99% z = 2.576
MARGIN OF ERROR (E)
Note: The z-value should actually be a t-distribution value, but for
sample size planning purposes we will use a standard normal value.
Sample Size and CI’s for m
For a 95% CI if we want margin of error, E we have
E 1.96 s
n
After some wonderful algebraic manipulation
1.96 s
n
E
2
Oh, oh! We don’t know s !!
1. “Guesstimate”
2. Use sample SD from pilot or prior study.
3. Use fact 95% of observations generally lie
with 2 SD’s of the mean thus
Could also use fact that 99% lie
within 3 SD’s and use 6 instead
of 4 in our crude approximation.
Range
s
4
where Range represents the expected
maximum – minimum we would see
in sample.
Example: Estimating Mean Cholesterol
Level of Females 30 – 40 yrs. of age
Q: What sample size would be necessary to estimate
the mean cholesterol level for the population of
females between the ages of 30 – 40 with a 95%
confidence interval that has a margin of error no
larger than E = 3 mg/dl?
Sample Size and CI’s for m
Suppose from a pilot study we find s = 19.8 mg/dl
We can use this estimate to find the sample size that will
give E = 3 mg/dl.
1.96 s
1.96 19.8
E
167.34
3
E
2
n 168
2
Standard normal values
90% = 1.645
95% = 1.960
99% = 2.576
Sample Size and CI’s for m
Suppose we do not have any information about the
standard deviation of the cholesterol levels of
individuals in this population.
We could use the Range/4 or Range/6 as crude
approximations to the standard deviation.
What is the smallest serum cholesterol level we
would expect to see? 100 mg/dl (my guess)
What is the largest? 300 mg/dl (my guess again)
SD approximation = 200/4 = 50 mg/dl
or
SD approximation = 200/6 = 33.33 mg/dl
Sample Size and CI’s for m
Using this crude estimate for the standard deviation we
find the following sample size requirements
1.96 50
E
1067.11
3
2
n 1068
or
1.96 33.33
E
474.18 n 475
3
2
Sample Size and CI’s for p
Suppose we wish to estimate p using a 95% CI and have
a margin of error of 3%. What sample size do we need
to use?
Recall the CI for p is given by:
p̂(1 - p̂)
pˆ (z - value)
n
MARGIN OF ERROR (E)
Sample Size and CI’s for p
Here for a 95% CI we want E = .03 or 3%
p̂(1 - p̂)
E 1.96
.03
n
After some wonderful algebraic manipulation
1.96 pˆ (1 pˆ )
n
2
E
2
Oh, oh! We don’t know p-hat !!
1. “Guesstimate”
2. Use p-hat from pilot or prior
study.
3. Largest n we would ever need
comes when p-hat = .50.
Sample Size and CI’s for p
1.
Informed approach
1.962 pˆ (1 pˆ )
n
2
E
2.
pˆ from prior knowledge
Conservative approach (i.e. worst case scenario)
2
1.96
n
2
4E
uses pˆ .50
Standard normal values
90% = 1.645
95% = 1.960
99% = 2.578
Sample Size and CI’s for p
Original Question: Suppose we wish to estimate p
using a 95% CI and have a margin of error of 3%.
What sample size do we need to use?
Assume that we estimate the 5 yr. survival rate for a
new kidney cancer therapy, and we know historical that
it this survival rate is around 20%.
Using informed approach
1.962 pˆ (1 pˆ ) 1.962 (.20)(.80)
n
682.95 n 683 subjects
2
2
E
.03
Sample Size and CI’s for p
Original Question: Suppose we wish to estimate p
using a 95% CI and have a margin of error of 3%.
What sample size do we need to use?
Assume that we estimate the 5 yr. survival rate for a
new kidney cancer therapy, and we know historical that
it this survival rate is around 20%.
Using conservative approach
2
2
1.96
1.96
n
1067.1 n 1068 subjects
2
2
4E
4(.03 )
This is why in media polls you they usually report a sampling
error of + 3% and that the poll was based on a sample of
n = 1000 individuals.