Summary of sample size determination for a desired margin of error

Download Report

Transcript Summary of sample size determination for a desired margin of error

Sample Size and CI’s for
the Population Mean (m)
and the Population
Proportion (p)
Sample Size and CI’s for m


Suppose we wish to estimate a population mean m using
a 95% CI and have a margin of error no larger than E
units. What sample size do we need to use?
Recall the “large” sample CI for m is given by:
X  (z - value) s
95% z = 1.96
n
90% z = 1.645
99% z = 2.576
MARGIN OF ERROR (E)
Note: The z-value should actually be a t-distribution value, but for
sample size planning purposes we will use a standard normal value.
Sample Size and CI’s for m

For a 95% CI if we want margin of error, E we have
E  1.96 s

n
After some wonderful algebraic manipulation
 1.96  s 
n

 E 
2
Oh, oh! We don’t know s !!
1. “Guesstimate”
2. Use sample SD from pilot or prior study.
3. Use fact 95% of observations generally lie
with 2 SD’s of the mean thus
Could also use fact that 99% lie
within 3 SD’s and use 6 instead
of 4 in our crude approximation.
Range
s
4
where Range represents the expected
maximum – minimum we would see
in sample.
Example: Estimating Mean Cholesterol
Level of Females 30 – 40 yrs. of age
Q: What sample size would be necessary to estimate
the mean cholesterol level for the population of
females between the ages of 30 – 40 with a 95%
confidence interval that has a margin of error no
larger than E = 3 mg/dl?
Sample Size and CI’s for m

Suppose from a pilot study we find s = 19.8 mg/dl
We can use this estimate to find the sample size that will
give E = 3 mg/dl.
 1.96  s 
 1.96  19.8 
E 
 
  167.34
3
 E 


2

n  168
2
Standard normal values
90% = 1.645
95% = 1.960
99% = 2.576
Sample Size and CI’s for m




Suppose we do not have any information about the
standard deviation of the cholesterol levels of
individuals in this population.
We could use the Range/4 or Range/6 as crude
approximations to the standard deviation.
What is the smallest serum cholesterol level we
would expect to see? 100 mg/dl (my guess)
What is the largest? 300 mg/dl (my guess again)
SD approximation = 200/4 = 50 mg/dl
or
SD approximation = 200/6 = 33.33 mg/dl
Sample Size and CI’s for m

Using this crude estimate for the standard deviation we
find the following sample size requirements
 1.96  50 
E 
  1067.11
 3 
2

n  1068
or
 1.96  33.33 
E 
  474.18  n  475
3


2
Sample Size and CI’s for p


Suppose we wish to estimate p using a 95% CI and have
a margin of error of 3%. What sample size do we need
to use?
Recall the CI for p is given by:
p̂(1 - p̂)
pˆ  (z - value)
n
MARGIN OF ERROR (E)
Sample Size and CI’s for p

Here for a 95% CI we want E = .03 or 3%
p̂(1 - p̂)
E  1.96
 .03
n

After some wonderful algebraic manipulation
1.96 pˆ (1  pˆ )
n
2
E
2
Oh, oh! We don’t know p-hat !!
1. “Guesstimate”
2. Use p-hat from pilot or prior
study.
3. Largest n we would ever need
comes when p-hat = .50.
Sample Size and CI’s for p
1.
Informed approach
1.962 pˆ (1  pˆ )
n
2
E
2.
pˆ  from prior knowledge
Conservative approach (i.e. worst case scenario)
2
1.96
n
2
4E
uses pˆ  .50
Standard normal values
90% = 1.645
95% = 1.960
99% = 2.578
Sample Size and CI’s for p



Original Question: Suppose we wish to estimate p
using a 95% CI and have a margin of error of 3%.
What sample size do we need to use?
Assume that we estimate the 5 yr. survival rate for a
new kidney cancer therapy, and we know historical that
it this survival rate is around 20%.
Using informed approach
1.962 pˆ (1  pˆ ) 1.962 (.20)(.80)
n

 682.95  n  683 subjects
2
2
E
.03
Sample Size and CI’s for p



Original Question: Suppose we wish to estimate p
using a 95% CI and have a margin of error of 3%.
What sample size do we need to use?
Assume that we estimate the 5 yr. survival rate for a
new kidney cancer therapy, and we know historical that
it this survival rate is around 20%.
Using conservative approach
2
2
1.96
1.96
n

 1067.1  n  1068 subjects
2
2
4E
4(.03 )
This is why in media polls you they usually report a sampling
error of + 3% and that the poll was based on a sample of
n = 1000 individuals.