Transcript estimate

CHAPTER 7
ESTIMATES
AND
SAMPLE SIZES
1
ESTIMATION: AN INTRODUCTION
Introduction
We have come a long way. We started by learning “what is
statistics and the two areas of applied statistics.” In Chapter 1, we
learned that:
1. Descriptive statistics consists of methods for organizing,
displaying, and describing data by using tables, graphs, and
summary measures.
2. Inferential statistics consists of methods that used samples to
make decisions or predictions about the population.
In Chapters 2 and 3, we focused on descriptive statistics and
learned how to draw tables, how to graph data, and how to
calculate numerical summary measures such as mean, median,
mode, variance, and standard deviation.
Now in Chapters 7, we will focus on inferential statistics. We begin
by discussing estimation.
2
ESTIMATION: AN INTRODUCTION
Definition
Estimation is a process for assigning value(s) to a population
parameter based on information collected from a sample.
There are many real-life examples in which “estimation” is used. A
few of them are, for example, to estimate the:
1.
2.
3.
4.
Mean of fuel consumption for a particular model car.
Proportion of students that completed MAT 12 course with a
passing grade for the past 10 years.
Proportion of female high school students that dropped out of
school because of pregnancy.
Percentage of all California lawyers disbarred for committing a
criminal offense.
3
ESTIMATION: AN INTRODUCTION
Of course we can conduct a census to find the true mean or
proportion of the population in 1 through 4. However, for what we
now know about census, it would be:
1.
2.
3.
Expensive.
Difficult to reach or contact every member of
the population.
Time consuming.
So, because of the problem with census, a representative sample is
generally drawn from the population and the appropriate sample
statistic is calculated. Then,
1.
2.
A value is assigned to the population parameter based on the
calculated value of the sample statistic.
The value assigned to the population parameter based on the
value of sample statistic is called an estimate of the population
parameter.
4
ESTIMATION: AN INTRODUCTION
For example, the Mathematics Department draws a sample of 50
students from all students who have taken MAT 12 for the past 10
years. The department records the number of students that
passed and failed the course, and calculated the sample
proportion, p̂ , of students who passed the course to be 0.65. So,
•
If the department assigns the value of sample proportion, p̂, to
the population proportion, p, then 0.65 is called an estimate of
p and p̂ is called the estimator.
Summary
Estimation procedure involves:
• Draw a sample from the population.
• Collect required information from each element of the sample.
• Calculate the value of sample statistic.
• Assign the value to corresponding population parameter.
Note: The sample must be a simple random sample.
5
7-2 ESTIMATING A POPULATION
PROPORTION
The estimated value of population parameter can either be based on
a point estimate or an interval estimate.
Point Estimate - Definition
A point estimate is the value of sample statistic used to estimate
population parameter.
So, suppose we used the sample proportion, p̂ , as a point estimate
of p, then we can say that the proportion of all students that have
taken MAT 12 course with a passing grade for the past 10 years is
about 0.65. That is,
Point estimate of population parameter  Value of corresponding
sample statistic
We discussed in Chapter 6 that the value of sample statistic varies
from one sample to another that are of the same size and drawn
from the population. Therefore,
1.
2.
The value assigned to the population proportion, p , based on a
point estimate depends on the sample drawn.
The value assigned to population parameter is almost always
different from the true value of population parameter.
6
An Interval Estimation
Definition:
An interval estimate is an interval build around the point estimate
and then a probabilistic statement is made that the built interval
contains the corresponding population parameter.
Therefore following on to our example, rather than saying that the
proportion of all students that have taken MAT 12 in the last 10
years is 0.65, we would:
Add and subtract a number to 0.65 to obtain an interval and then
2. Say that the interval contains the population proportion, p.
1.
Now, let us add and subtract 0.2 to 0.65. Then we obtain an interval
(0.65  0.2 to 0.65  0.2)  (0.45 to 0.85)
We state that the population proportion, p, is likely to be contained in
the interval 0.45 to 0.85.
2. We also state that the proportion of all students that have taken MAT
12 with a passing grade in the past 10 years is between
0.45 and .85.
3. The 0.45 is called the lower limit and 0.85 is called the upper limit.
7
4. The number we subtracted and added to the point estimate is called
margin of error.
1.
An Interval Estimation
5.
6.
7.
The value of margin of error depends on:
a. Standard deviation, p̂, of the sample
proportion, p̂ .
b. Level of confidence that we like to attach
to the interval.
So,
a. The larger is  p̂ , the greater is margin of
error.
b. To ensure that the population proportion is
contained in the interval, we have to use a
higher confidence level.
c. We add a probabilistic statement so the
interval is based on the confidence level.
d. An interval constructed based on the
confidence level is called a confidence
interval.
Confidence interval is defined as
 p̂  p pˆ  .65
.45
Confidence Interval  Point estimate  Margin of error
.85
8
An Interval Estimation
8.
The confidence level associated with a confidence interval is
defined as
Confidence level  (1   )100% or
it is called confidence coefficient when expressed as probability and expressed as:
Confidence level  (1   )
  significance level.
This formula means that we have (1   )100% confidence that the
interval contains the true population proportion.
9
7.3-7.4 ESTIMATION OF A POPULATION
MEAN:  KNOWN
The three possible cases on how to construct a confidence interval
for population mean with known  are as follows:
I.
We use standard normal distribution to construct the confidence
interval for with
1.
2.
3.
II.
x 
n assuming that n N  0.05 if:
Standard deviation  is known.
Sample size is small, n<30
Population is normally distributed or at least close to normal
distribution provided there is no outliers.
We use standard normal distribution to construct the confidence
interval for  with
1.
2.
3.
x 
n assuming that n N  0.05 if:
Standard deviation  is known.
Sample size is large, n  30
By central limit theorem, the sampling distribution of the sample
mean is approximately normal. However, we may not be able to
use standard normal distribution if the population distribution is
very different from normal distribution.
10
ESTIMATION OF A POPULATION MEAN: 
KNOWN
III.
We use a nonparametric method to construct the confidence
interval if:
a.
b.
c.
Standard deviation  is known.
Sample size is small, n<30
Population is not normally distributed or is unknown.
The rest of this section will deal with Cases I and II. We will not
cover the 3rd case.
Formula
The (1   )100% confidence interval for  under Cases I and II is
defined as,
(1   )100% confidence interval  x  z x ,
where,
x 

n
and the margin of error, E  z x
11
ESTIMATION OF A POPULATION MEAN: 
KNOWN
Three Possible Cases
12
ESTIMATION OF A POPULATION MEAN: 
KNOWN
Let us revisit the definition of confidence level. Remember that
Confidence level  (1   ) where  is the significance level.
(1   )%
curve of
from  .
x
confidence level is the area under the standard normal
between two points on both sides and of equal distance
13
ESTIMATION OF A POPULATION MEAN: 
KNOWN
How to determine z given confidence level
1.
To find the 2 locations for z, first the
a.
b.
Area between the 2 z’s is (1   )
Since z1 and z2 are the same distance from
the mean,  , then the sum of areas to the
left of z1 and right of z2 is
1  (1   )  
c.
2.
3.
Since the area to the left of z1 and the area
to the right of z2 are equal, then:
Using table A-2, we can find the values of z1
and z2 that correspond to the required area.
Note that the values of z1 and z2 are the
same, but they have opposite signs.
Area to the left of z1 
Area to the left of z 2 

2

2
14
ESTIMATION OF A POPULATION MEAN: 
KNOWN
Interpretation of confidence level
Let us consider 20 samples of the
same size taken from the same
population. Then,
1.
2.
3.
4.
Let us calculate the sample mean,x
for each sample.
Let us then calculate the confidence
interval for  around each sample
mean, x , based on a confidence
level of 90%.
The normal curve of the sampling
distribution for x is shown to the
right.
In the context of this example, we
say that 90% of the intervals such
as for x1 and x2 will include  ,
and 10% such as the interval
around x3 will not.
15
ESTIMATION OF A POPULATION
MEAN:  KNOWN
Width of a confidence Interval
As stated previously, the confidence interval is defined as,
(1   )100% confidence interval  x  z x ,
where z x is margin of error. Then the width of the confidence
interval depends on z x , which in turn depends on:
1.
2.
z which depends on the confidence level

and n because  x 

n
Since  is out of control of the investigators, then the width of
confidence level can only be controlled by using z and n. Thus,
the width is controlled by the following relationships:
1.
2.
3.
4.
The value of z increases as the confidence level increases.
The value of z decreases as the confidence level decreases.
With n remaining constant, the higher the confidence level, the larger the
width of a confidence interval.
An increase in the sample size causes a decrease in the width of confidence
level
In conclusion, we can reduce the width of a confidence interval
16 by
lowering confidence level or increase sample size.
Determining the Sample Size for the Estimation of
Mean
Because of the problems associated with conducting a census or
even a sample survey, we need to find a way to determine a sample
size that will produce required results without wasting unnecessary
effort or financial resources on surveying larger sample size.
E
z
n
z
n
n
z 2 2
n
E2
E n 
 E n  z  n 
z
E
So, to find the appropriate sample size, n, we need:
Confidence level
Width of a confidence interval
So, having a predetermined margin of error, we can find the sample
size that will produce the required results.
Note that if is not known, one could take a small sample and
calculate sample standard deviation, s, and then use the s in lieu of 17
in the formula.
ESTIMATION OF A POPULATION
MEAN:  KNOWN
Example #1 – Problem 8.10
Find z for each of the following confidence levels
a) 90%
b) 95%
Example #1 – Solution

a) Given: 1    .90
   .10,  .05
2
From Table IV, the value of z that corresponds
to the area .05 to the left of z 1 is  1.65 or
 1.64. Also, the value of z that corresponds to
the area .05 to the right of z 2 is 1.64 or 1.65.
Thus, the value of z that corresponds to
a confidence level of 90% is 1.64 or 1.65.
.05
.90
z1
.05
z2
b) Given: 1    .95
   .05,

 .025
2
From Table IV, the value of z that corresponds
to the area .025 to the left of z 1 is  1.96
Also, the value of z that corresponds to
the area .025 to the right or 0.975 to the
left of z 2 is 1.96
Thus, the value of z that corresponds to
a confidence level of 95% is 1.96.
.025
.95
z1
.025
z2
18
ESTIMATION OF A POPULATION
MEAN:  KNOWN
Example #2
For a data set obtained from a sample n = 81 and x=48.25. It is known that
 = 4.8.
a) What is the point estimate of  ? b) Make a 95% confidence interval for 
c) What is the margin of error of estimate for part b?
Example #2 – Solution
Given : n  81, x  48.25,   4.8, population is normally distribute d.
a) What is the point estimate of 
Point estimate of   x  48.25
19
ESTIMATION OF A POPULATION
MEAN:  KNOWN
Example #2 – Solution
Given : n  81, x  48.25,   4.8, population is normally distribute d.
b) Make a 95% confidence interval for .
The confidence level is 95%. Hence, the areas in each tail of the normal
 .05
curve is 
 0.025
2
2
Since population is normally distribute d then we can use normal distributi on to
make confidence interval. Thus, from Table IV, the value of z that correspond s
to the area of 0.025 in each tail of the curve is 1.96.

4.8
Thus,  x 

 0.533333
n
81
The confidence interval for   x  z x  48.25  1.96(.5333)
 48.25  1.05  47.20 to 49.30
c) What is the margin of error
20
E  z x  1.96(.5333)  1.05
ESTIMATION OF A POPULATION
MEAN:  KNOWN
Example #3
The standard deviation for population is  = 14.8. A sample of 25 observations
selected from this population gave a mean equal to 143.72. The population is
known to have a normal distribution.
a) Make a 99% confidence interval for 
b) Construct a 95% confidence interval for 
c) Determine a 90% confidence interval for 
d) Does the width of the confidence intervals constructed in parts a through c
decrease as the confidence level decreases? Explain your answer.
21
ESTIMATION OF A POPULATION
MEAN:  KNOWN
Example #3 – Solution
Given : n  25, x  143.72,   14.8, population is normally distribute d.
Since sample is drawn from a normally distribute d population , then,

14.8
x 

 2.96
n
25
a) Make a 99% confidence interval for  .
The confidence level is 99%. Hence, the areas in each tail of the normal
 .01
curve is 
 0.005
2
2
Since population is normally distribute d then we can use normal distributi on to
make confidence interval. Thus, from Table IV, the value of z that correspond s
to the area of 0.005 in each tail of the curve is 2.57 or 2.58.
Thus, the 99% confidence interval for   x  z x  143.72  2.58(2.96)
 143.72  7.64  136.08 to 151.36
22
ESTIMATION OF A POPULATION
MEAN:  KNOWN
Example #3 – Solution
Given : n  25, x  143.72,   14.8, population is normally distribute d.
Since sample is drawn from a normally distribute d population , then,

14.8
x 

 2.96
n
25
b) Make a 95% confidence interval for  .
The confidence level is 95%. Hence, the areas in each tail of the normal
 .05
curve is 
 0.025
2
2
Since population is normally distribute d then we can use normal distributi on to
make confidence interval. Thus, from Table IV, the value of z that correspond s
to the area of 0.025 in each tail of the curve is 1.96.
Thus, the 95% confidence interval for   x  z x  143.72  1.96(2.96)
 143.72  5.80  137.92 to 149.52
23
ESTIMATION OF A POPULATION
MEAN:  KNOWN
Example #3 – Solution
Since sample is drawn from a normally distribute d population , then,

14.8
x 

 2.96
n
25
c) Make a 90% confidence interval for  .
The confidence level is 90%. Hence, the areas in each tail of the normal
 .10
curve is 
 0.05
2
2
Since population is normally distribute d then we can use normal distributi on to
make confidence interval. Thus, from Table IV, the value of z that correspond s
to the area of 0.05 in each tail of the curve is 1.64 or 1.65.
Thus, the 95% confidence interval for   x  z x  143.72  1.65(2.96)
 143.72  4.88  138.84 to 148.60
d) Yes, because as the confidence level decreases, so is the z value and the width 24
of the interval.
ESTIMATION OF A POPULATION
MEAN:  KNOWN
Example #4
For a population, the value of the standard deviation is 4.96. A sample of 32
observations taken from this population produced the following data.
74 85 72 73 86 81 77 60 83 78 79 88 76 73 84 78
81 72 82 81 79 83 88 86 78 83 87 82 80 84 76 74
a) What is the point estimate of 
b) Make a 99% confidence interval for 
c) What is the margin or error of estimate for part b?
Example #4 – Solution
2543
Given : n  32 and from the given data, x 
 79.4688
32
Although t he sampling distributi on for x is not known, the sampling
distributi on of x is approximat ely normal because sample size is large,

4.96
n  30 by the central limit theo rem. Thus,  x 

 .8768
n
32
a) What is the point estimate of 
Point estimate of   x  479.4688
25
ESTIMATION OF A POPULATION
MEAN:  KNOWN
Example #4 – Solution
Since sample is approximat ely normally distribute d, then,

4.96
x 

 .8768
n
32
b) Make a 99% confidence interval for  .
The confidence level is 99%. Hence, the areas in each tail of the normal
 .01
curve is 
 0.005
2
2
Since sample is approximat ely normally distribute d, then we can use normal
distributi on to make confidence interval. Thus, from Table IV, the value of z
that correspond s to the area of 0.005 in each tail of the curve is 2.57 or 2.58.
Thus, the 99% confidence interval for   x  z x  79.4688  2.58(.8768)
 79.4688  2.2621  77.21 to 81.73
c) What is the margin of error
26
E  z x  2.58(.8768)  2.2621
ESTIMATION OF A POPULATION
MEAN:  KNOWN
Example #5
For a population data set,  = 14.50.
a) What should the sample size be for a 98% confidence interval for  to
have a margin of error of estimate equal to 5.50?
b) What should the sample size be for a 95% confidence interval for  to
have a margin of error of estimate equal to 4.25?
Example #5 – Solution
Given :   14.50
a) Given that the confidence level  98% and E  5.50, find sample size.
 .02
The areas in each tail under the normal curve is 
 0.01
2
2
From Table IV, the value of z that correspond s to the area of 0.01 in
each tail under the normal curve is 2.33. Thus,
n
z 2 2
E
2

( 2.33)2 (14.50)2
(5.5)
2
 37.73  38
27
ESTIMATION OF A POPULATION
MEAN:  KNOWN
Example #5 – Solution
Given :   14.50
b) Given that the confidence level  95% and E  4.25, find sample size.
 .05
The areas in each tail under the normal curve is 
 0.025
2
2
From Table IV, the value of z that correspond s to the area of 0.025 in
each tail under the normal curve is 1.96. Thus,
n
z 2 2
E
2

(1.96)2 (14.50)2
( 4.25)
2
 44.71  45
28
ESTIMATION OF A POPULATION
MEAN:  KNOWN
Example #6
Inside the Box Corporation makes corrugated cardboard boxes. One type of
these boxes states that the breaking capacity of this box is 75 pounds. Fifty-five
randomly selected such boxes were loaded until they break. The average
breaking capacity of these boxes was found to be 78.52 pounds. Suppose that
the standard deviation of the breaking capacities of all such boxes is 2.63
pounds. Calculate a 99% confidence interval for the average breaking capacity
of all boxes of this type.
29
ESTIMATION OF A POPULATION
MEAN:  KNOWN
Example #6 – Solution
Given : x  78.52,   2.63, n  55
Since sample is large, n  30, we can assume that the sampling distributi on of x is
normally distribute d, then,

2.63
x 

 .3546
n
55
The confidence level is 99%. Hence, the areas in each tail under the normal
 .01
curve is 
 0.005
2
2
Since sampling distributi on of x is approximat ely normally distribute d, then we
can use normal distributi on to make confidence interval for  . Thus, from
Table IV, the value of z that correspond s to the area of 0.005 in each tail under
the normal curve is 2.57 or 2.58.
Thus, the 99% confidence interval for   x  z x  78.52  2.58(.3546)
30
 78.52  .9147  77.61 to 79.43 pounds
ESTIMATION OF A POPULATION
MEAN:  NOT KNOWN
The three possible cases on how to construct a confidence interval
for population mean when  is unknown are as follows:
I.
We use t distribution to construct the confidence interval for
1.
2.
3.
II.
We use t distribution to construct the confidence interval for
1.
2.
III.
Standard deviation, , is unknown.
Sample size is small, n<30
Population is normally distributed.
Standard deviation,  , is unknown.
Sample size is large, n  30
 if:
 if:
We use a nonparametric method to construct the confidence
interval for  if:
1.
2.
3.
Standard deviation,  , is unknown.
Sample size is small, n <30.
Population is not normally distributed.
31
ESTIMATION OF A POPULATION MEAN: 
NOT KNOWN
Three Possible Cases
32
The t Distribution
•
•
The t distribution is also called student’s t distribution.
It is similar to the normal distribution because it has:
1.
2.
3.
•
A bell-shape curve,
A total area of 1.0 under the curve, and
A population mean,  , of zero
It is different from the normal distribution curve because:
1.
2.
3.
4.
It has a lower height and wider spread,
The units are denoted by t, and
It’s population standard deviation, , is defined as

  df /( df  2)
df is the degree of freedom, and is defined as the number of
observations that can be chosen freely. It is denoted as
df  n  1, where n is sample size
•
•
t distribution depends only one parameter, df .
As the sample size becomes larger, the t distribution approaches
the standard normal distribution.
33
Figure 8.5 The t distribution for df = 9 and the
standard normal distribution.
34
The t Distribution
1. Table A-3 lists t value for a given degree of freedom and an area in the
right tail under a t distribution curve.
2. This area is the same as the area in the left tail under the t distribution
curve because of symmetry.
Steps to read t distribution in Table V:
1. Locate the value of degree of freedom
under the column labeled “df”, and draw
a horizontal line through the row.
2. Locate the area under one of the
columns for areas in the right tail under
the t distribution curve, and draw a
vertical line through the column.
3. The entry where the horizontal line and
vertical line meet is the required t
value.
4. For example, let us find a t value for a t
distribution with a sample size of 9 and
an area of 0.01 in the right rail of the t
distribution curve.
35
The t Distribution
Example #7
Find the value of t for t distribution for each of the following,
a) Area in the right tail = .05 & df = 12 b) Area in the left tail = .05 & df = 12
Example #7 – Solution
(a)
Given : Area in the right tail  0.05, df  12, then

 .05
2
From Table V, the required t value for t distributi on is 1.782
(b)
Given : Area in the left tail  0.025, n  66,
then df  n  1  66  1  65, and

 .025
2
From Table V, the required t value for t distributi on is - 1.997
36
The t Distribution
Example #8
For each of the following, find the area in the appropriate tail of the t distribution.
a) t = 2.467 & df = 28 b) t = -1.672 & df = 58
c) t = -2.670 & n = 55
Example #8 – Solution
(a) Given : t  2.467 & df  28
From Table V, the required area in each tail
under the curve is 0.01
b) Given : t  - 1.672 & df  58
From Table V, the required area in each tail
under the curve is 0.05
c) Given : t   2.670 & n  55, then df  n  1  55  1  54
From Table V, the required area in each tail
37
under the curve is 0.005
Confidence Interval for μ Using the t Distribution
In Section 4.3, we define  x as
x 

n
However, since  is normally unknown, we can estimate a sample
standard deviation, s, and use it in lieu of  and s x in place of  x
s x is calculated as,
sx 
s
n
Therefore, the (1 – α)100% confidence interval for  is
(1   )100% confidence interval  x  ts x and Margin of error  ts x
Note: If df>75, we can either use:
1. The entries in last row of Table V, where df   , or
2. A normal distribution to approximate the t distribution.
38
Confidence Interval for μ Using the t
Distribution
Example #9
Find the value of t from the t distribution table for each of the following.
a) Confidence level = 99% & df = 13 b) Confidence level = 95% & n = 36
Example #9 – Solution
a) Given : Confidence level  99% & df  13, then
  .01 and

 0.005
2
From Table V, the required t value  3.012
b) Given : Confidence level  95% & n  36, then df  n  1  35
  .05 and

 0.025
2
From Table V, the required t value  2.030
39
Confidence Interval for μ Using the t
Distribution
Example #10 – Problem 8.47
A sample of 11 observations taken from a normally distributed population
produced the following data.
-7.1 10.3 8.7 -3.6 -6.0 -7.5 5.2 3.7 9.8 -4.4 6.4
a) What is the point estimate of 
b) Make a 95% confidence interval for 
c) What is the margin of error of estimate for part b?
Example #10 – Solution
Given : n  11
a) What is the point estimate of 
Poin t estimate of   x 
15.5
 1.4091
11
40
Confidence Interval for μ Using the t
Distribution
Example #10 – Solution
b) Make a 95% confidence interval for 
Since  is unknown, we have to determine
x
( x ) 2
(15.5) 2
2
534.49 
x  n
11  7.1600
s

n 1
10
s
7.1600
sx 

 2.1588
n
11
For a 95% confidence level,   0.05 and

 .025
2
The required t value for an area of .025 and df  10, is 2.228
 95% confidence interval for   x  ts x
 1.4091  2.228(2.1588)  1.4091  4.8098
 - 3.40 to 6.22
c) E  ts x  4.8098
x2
-7.1
50.41
10.3
106.09
8.7
75.69
-3.6
12.96
-6.0
36.00
-7.5
56.25
5.2
27.04
3.7
13.69
9.8
96.04
-4.4
19.36
6.4
40.96
2
 x  15.5  x  534.49
41
Confidence Interval for μ Using the t
Distribution
Example #11
A random sample of 16 airline passengers at the Bay City airport showed that
the mean time spent waiting in line to check in at the ticket counter was 31
minutes with a standard deviation of 7 minutes. Construct a 99% confidence
interval for the mean time spent waiting in line by all passengers at this airport.
Assume that such waiting times for all passengers are normally distributed.
Example #11 – Solution
Given n  16, x  31 minutes, s  7 minutes, Confidence level  99%, then df  15
Population is normally distribute d.
s
7
sx 

 1.75
n
16
For a 99% confidence level,   0.01 and

 .005
2
The required t value for an area of .025 and df  15, is 2.947
99% confidence interval for   x  ts x 
 31  2.947(1.75)  31  5.16  - 25.84 to 36.16
42
ESTIMATION OF A POPULATION
PROPORTION: LARGE SAMPLES
We already learned that for a large sample size, that is,
np>5 and nq > 5, then
1. The sampling distribution of p̂ is approximately normally
distributed
2. The mean,  p̂ , of the sampling distribution of p̂ is equal to the
population proportion
3. The standard deviation,  p̂ , of the sampling distribution of the
sample proportion, p̂ , is define as,
pq
 pˆ 
where q  1  p
n
Since we may not know 
of  p̂
The
p̂
, we will need to use
s pˆ as an estimate
ˆˆ
pq
spˆ 
where pˆ  point estimate of p
n
(1   )100%
Confidence interval for the p =
Margin of error =
zs p̂
pˆ  zs pˆ
43
DETERMINING THE SAMPLE SIZE FOR
THE ESTIMATION OF PROPORTION
Given the confidence level and the values
of p̂ and q̂ , the sample size that will
produce a predetermined maximum of
error E of the confidence interval
estimate of p is
ˆˆ
z pq
n
2
E
2
44
DETERMINING THE SAMPLE SIZE FOR
THE ESTIMATION OF PROPORTION
In case the values of p̂ and q̂ are not known
1.
We make the most conservative estimate
of the sample size n by using pˆ  .5 and
qˆ  .5
2.
We take a preliminary sample (of
arbitrarily determined size) and calculate
p̂ and q̂ from this sample. Then use
these values to find n.
45
Example
Example #12
Check if the sample size is large enough to use the normal distribution to
make a confidence interval for P for each of the following cases.
a.
n=50, p̂ =.25,
b.
N=160, p̂ =.03
Answers:
a. n p̂ = (50)(.25)=12.5, and n q̂ = (50)(.75)=37.5 so, the sample size is
large enough t use the normal distribution.
b.
n p̂ = (160)(.03)= 4.8 , the sample size is not large enough to use the
normal distribution .
46
Example


Example #12
A sample of 200 observation selected from a population produced a
sample proportion equal to .91. Make a 90% confidence interval for
p.



Answer:
n=200, p̂ =.91, q̂ =1-.91=.09, s pˆ  pˆ qˆ / n =.02023611



The 90% confidence interval for p is pˆ  zs pˆ =.91+1.65(.02023611)=
=.877 to .943
47