Transcript Week8
Confidence Interval
Estimation
Week 8
Objectives
•
•
•
•
On completion of this module you should be
able to:
calculate and interpret confidence interval
estimates for the mean and proportion,
determine sample size for means and
proportions
understand the application of confidence
intervals, particularly in auditing and
consider ethical issues relating to confidence
interval estimation.
2
Confidence interval estimation
• Until now we have estimated population
parameters using point estimates – a single
value.
• Last week we saw how these point estimates can
vary from sample to sample.
• If we now use our understanding of the variability
in the sampling distribution of the mean, we can
develop an interval estimate of the population
mean.
• We construct this with some specified level of
confidence (90%, 95% and 99% are common).
3
Confidence interval estimation
of the mean ( known)
• A confidence interval allows us to make an
inference about the population based on data
from a sample.
• Just as each sample results in a difference
point estimate of a parameter, each sample will
also result in a different confidence interval.
• The level of confidence is given by (1-)100%
where is the proportion of the tails of the
distribution that is outside the confidence
interval (so for 95% confidence, =0.05).
4
Confidence interval estimation
of the mean ( known)
• Our chosen confidence level indicates how
confident we can be (in the long run) that the
interval resulting from our sample contains the
true population statistic.
• For example, a 95% confidence level indicates
that we can expect (in the long run) that in 95
out of 100 samples, the resulting interval will
contain the true population parameter value.
5
Confidence interval estimation
of the mean ( known)
X Z 2
n
or
X Z 2
n
X Z 2
n
where Z/2 is the standardised normal distribution
value which has an upper tail probability of /2.
6
0
Confidence interval estimation
of the mean ( known)
• For example, for a 95% confidence interval
0.95/2=0.475:
0.025
0.475 0.475
0.025
-1.96
0
+1.96
7
0
Confidence interval estimation
of the mean ( known)
• For a 99% confidence interval 0.99/2=0.495:
0.005
0.495 0.495
0.005
-2.58
0
+2.58
8
Example 8-1
A lecturer is interested in the amount of time it
takes students to complete a particular
assignment.
It is known that the standard deviation is 24
minutes.
The lecturer takes a random sample of 40 students
and discovers that in this sample, the mean time to
complete the assignment is 150 minutes.
(a) Set up a 95% confidence interval estimate for the
population mean time take to complete the
assignment.
9
Solution 8-1
• We are given 24, X 150 and n 40.
• For a 95% confidence interval, Z = 1.96 and:
24
X Z 2
150 1.96
n
40
150 7.4377 (to 4 dec. pl.)
142.56 157.44
• So the lecturer can be 95% confident that the
true mean time taken to complete the
assignment is between 142.56 and 157.44
minutes.
10
0
Solution 8-1
(b) Recalculate your answer to (a) based on a
90% confidence interval.
• For a 90% confidence interval:
0.1
0.4
0.4
0.1
-1.28
0 +1.28
11
Solution 8-1
X Z 2
24
150 1.28
n
40
150 4.8573 (to 4 dec. pl.)
145.14 154.86
• So the lecturer can be 90% confident that the
true mean time taken to complete the
assignment is between 145.14 and 154.86
minutes.
12
Solution 8-1
(c) It was the lecturer’s intention that this
assignment take students no more than two
hours to complete. Comment on how well the
sampled students reflect this goal. Based on
your results, what would you recommend to the
lecturer?
• Even the widest interval (95%: 142.56 to 157.44)
does not contain 120 minutes (2 hours).
• This sample seems to indicate that average
time taken to complete the assignment is much
greater than the lecturer intended.
13
Solution 8-1
• It seems unlikely that students are going to be
able to complete the assignment in 2 hours.
• Recommend that the lecturer reduces the
assignment or changes their expectations…
Some important points to note:
• We would expect in the long run that only 5% of
samples would result in an interval which does
not contain the true mean time to complete the
assignment.
14
Solution 8-1
• It is possible that this sample is one such case,
BUT, given how far outside the interval the
desired time fell, this seems unlikely.
• The confidence interval relates to mean
(average) time taken to complete the
assignment.
• It is likely that some individual students will be
able to complete the assignment with the two
hours, but this does not appear to be the case
for the ‘average’ student.
15
Solution 8-1
(d) Suppose an extra class was offered to
students which enable them to better
understand the goal of the assignment.
This resulted in a change of the standard
deviation for all students to only 12 minutes.
What effect would this have on your answer to
(a)?
12
X Z 2
150 1.96
n
40
150 3.7188 (to 4 dec. pl.)
146.28 153.72
16
Solution 8-1
• So the lecturer can be 95% confident that the true
mean time taken to complete the assignment is
between 146.38 and 153.72 minutes.
• Reducing the standard deviation smaller
confidence interval.
• The extra class has enabled a more accurate
estimate of the true time taken to complete the
assignment (due to student completion times
being less varied in the sample), BUT, it has not
reduced the assignment completion time (in this
example)!!
17
Confidence interval estimation
of the mean ( unknown)
• Just as we use X to estimate μ, so we use S to
estimate .
• If X is a normally distributed random variable,
then
X
t
S
n
follows a t distribution with n-1 degrees of
freedom.
• The t distribution (or Student’s t distribution) looks
very similar to the normal distribution (bellshaped and symmetrical) but has more area in
the tails and less in the centre.
18
Confidence interval estimation
of the mean ( unknown)
• As the degrees of freedom increase, the t
distribution approaches the standardised normal
distribution.
• For sample sizes of 120 or more, there is little
difference between Z and t (in which case the
normal distribution is often used even when is
unknown).
• Critical values for the t distribution (see Table
E.3 in the text) depend on the degrees of
freedom and the confidence level.
19
Confidence interval estimation
of the mean ( unknown)
• Check for yourself that you can find the critical t
values in the example that follows.
• Read the information on page 289 of the text
which discusses degrees of freedom.
• The confidence interval for the mean ( unknown)
is given by
S
X tn1
n
or
X tn1
S
S
X tn1
n
n
20
Example 8-2
A group of students are concerned that bags of
a particular brand of potato chips weigh less
than the 50 grams that the packaging claims.
They take a random sample of 20 bags and
discover that for this sample, the mean weight
is 49.2 grams and the standard deviation is 1.1
grams.
Calculate the 95% confidence interval based on
their sample data.
Do you think the students are justified in their
belief?
21
Example 8-2
What potential problems are there with the way
the students have conducted this experiment?
Solution
• We are given X 49.2 and S 1.1
• Since the population standard deviation is
unknown (we have a sample value), we will use
the t distribution in the confidence interval
formula.
22
Solution 8-2
We have a sample size of 20, so the degrees of
freedom are: n 1 20 1 19.
2
= .025
1 – = .95
t = – 2.0930
2
= .025
t = 2.0930
23
Solution 8-2
X tn1
S
1.1
49.2 2.0930
n
20
49.2 0.5148
48.69 49.71
• So the students can be 95% confident that the
true population mean weight of the bags of
chips is between 48.69 and 49.71 grams.
• Based on this sample, it appears that the mean
weight of bags is less than 50 grams.
24
Solution 8-2
• We cannot be sure of this conclusion unless we
know that this sample is representative of the
population. For example,
– Were the bags of chips randomly selected?
(consider the randomness of the choice of store,
location in stores, location of store (town, city etc))
– Is a sample of 20 bags sufficiently large?
(Doubtful!!)
– Were the weights of the bags measured accurately?
– Are there factors which result in a change of the
weight of the bags after packing (such as settling,
moisture content changes etc)?
• What other factors can you think of that might
affect the results?
25
Confidence interval estimation
of the proportion
• Recall that when both np and n(1-p) are at least 5,
the binomial distribution can be approximated by
the normal distribution.
• In this case, the confidence interval estimate for
the population proportion is:
ps 1 ps
ps Z
n
or
ps Z
ps 1 ps
p ps Z
n
ps 1 ps
n
26
Confidence interval estimation
of the proportion
where
X number of successes
ps = sample proportion =
n
sample size
p = population proportion
Z = critical value from standard normal distribution
n = sample size
27
Example 8-3
A financial advice firm has been recommending
a particular investment opportunity to a large
number of its clients.
They surveyed a random sample of 500 clients
who took advantage of the investment and
discover that 408 of them are glad they made
the investment.
Construct both a 95% and 99% confidence
interval for the population proportion of clients
who are glad they made the investment.
28
Solution 8-3
408
ps
0.816 n 500
500
• Both np and n(1-p) are at least 5 so normal
distribution is appropriate.
• For the 95% confidence interval, Z=1.96 and
ps Z
ps 1 ps
0.816 1 0.816
0.816 1.96
n
500
0.816 0.033964 (to 6 dec. pl.)
0.782 p 0.850 (to 3 dec. pl.)
29
Solution 8-3
• So we can say with 95% confidence that the true
population proportion of clients who were happy with
the investment is between 0.782 and 0.850.
• For a 99% confidence interval, Z=2.58 and
ps 1 ps
0.816 1 0.816
ps Z
0.816 2.58
n
500
0.816 0.044708 (to 6 dec. pl.)
0.771 p 0.861 (to 3 dec. pl.)
• So we can say with 99% confidence that the true
population proportion of clients who were happy with
the investment is between 0.771 and 0.861.
30
Determining sample size
• Sample size has a huge impact in statistical
analyses.
• The chosen size is based on a balance
between accuracy and cost.
• The statistician decides how big a sampling
error is acceptable in estimating each of the
parameters.
31
Determining sample size
• Recall that a confidence interval for the mean is
found via: X Z
n
• The amount added or subtracted to the sample
mean is half the interval – this represents the
amount of imprecision resulting from sample error.
• The sample error is therefore: e Z
n
• Rearranging then gives an expression for n, the
sample size.
32
Determining sample size
• The sample size required to construct the
confidence interval estimate for the mean is:
Z 2 2
n 2
e
• To determine sample size you must know:
– the desired confidence level (which
determines Z)
– the acceptable sampling error (e)
– the standard deviation ()
33
Example 8-4
A consumer watchdog organisation is
interested in the mean amount charged per
hour by accountants for their services.
Based on studies in other similar countries, the
standard deviation is believed to be $12.75.
The organisation wants to estimate the mean
amount charged per hour to within ±$4 with
95% confidence.
What sample size is needed?
If 99% confidence were required, what would
the required sample size be?
34
Solution 8-4
e4
• We are given: 12.75
and based on 95% confidence, Z=1.96.
• The sample size is therefore:
1.96 12.75
Z
n 2
39.03 40
2
e
4
• Important note: we round up to the next whole
integer when determining sample size.
• So a sample of 40 accountants should be taken
to be 95% confident that the estimate of the
mean is within ±$4 of the true mean.
2
2
2
2
35
Solution 8-4
• Based on 99% confidence, Z=2.58.
• The sample size is therefore:
2.58 12.75
Z
n 2
67.63 68
2
e
4
2
2
2
2
• So a sample of 68 accountants should be taken
to be 99% confident that the estimate of the
mean is within ±$4 of the true mean.
36
Determining sample size
• Recall that a confidence interval for the proportion
is found via:
ps 1 ps
ps Z
n
p 1 p
• Sample error is therefore: e Z
n
• Rearranging this gives the sample size required to
construct the confidence interval estimate for the
2
proportion:
Z p 1 p
n
e2
37
Determining sample size
• To determine sample size you must know:
– the desired confidence level (which determines Z)
– the acceptable sampling error (e)
– the true proportion of successes (p)
• Unfortunately we don’t usually know p (as that is
normally what we are trying to estimate!!)
• We can therefore:
– use past information or relevant experiences to provide
an educated estimate of p
– use p = 0.5 as this results in the largest sample size
(often referred to as the ‘most conservative estimate’).
38
Example 8-5
The same group of students that were
discussed in Example 8-2 discovered a flaw in
the process of random selection of chip bags.
They decide to conduct the experiment again
and this time work out what sample size will be
needed to accurately estimate the population
proportion of chip bags that are underweight.
If they wish to be 95% confident that their
estimated proportion is within ±0.025 of the
population proportion, determine the required
sample size.
39
Solution 8-5
• We have Z=1.96 (95% confidence) and e=0.025.
• Since we have no information about p, the most
conservative estimate (p=0.5) is chosen.
• The sample size should be
2
2
Z p 1 p 1.96 0.5 0.5
n
1536.64 1537
2
2
e
0.025
• So the students need to sample 1537 bags of
chips in order for their estimate of the proportion
to be within ±0.025 of the population proportion
with 95% confidence.
40
Solution 8-5
• Note we could reduce the sample size
dramatically by:
– allowing a larger sample error (eg e=0.05 reduces
sample size to 385)
– using a better estimate of p (perhaps based on results
of the first survey if these are sufficiently reliable – we
weren’t given this information, however, and so can’t
try this here)
• But, care is needed – these options might
reduce the effectiveness of the survey!!!
41
Applications of confidence
interval estimation in auditing
• Auditing makes use of statistical sampling in
order to estimate the total amount.
• The point estimate for the population total is
given by:
Total NX
where N is the population size.
• The confidence interval estimate for the total is:
s
N X N t
n
N n
N 1
42
Example 8-6
A budget eyewear store (which sells frames for
glasses, lens cleaner, cases, cloths etc) is
conducting the end of quarter inventory of its
stock.
It was determined that there were 1296 items in
stock of which a sample of 100 was randomly
selected.
An audit was conducted which found that the
mean value of the merchandise in the sample
was $196 and the standard deviation was
$67.50.
43
Example 8-6
Based on this information, find the 95%
confidence interval estimate of the total
estimated value of the merchandise in inventory
at the end of the quarter.
Solution
• We are given
N 1296 n 100 X 196
S 67.50
and given 95% confidence we can find:
tn1, t99,0.025 1.9842
44
Example 8-6
• The 95% confidence interval estimate will be:
s
N X N t
n
N n
N 1
67.5 1296 100
1296 196 1296 1.9842
100 1296 1
254016 16681.1092 (4 dec. pl.)
$237,334.11 Population total $270,697.11
• So the store can be 95% confident that the
population total merchandise value will be
between $237,334.11 and $270,697.11.
45
Example 8-6
• Note that this sample has resulted in quite wide
bounds on the total merchandise caused by a
relatively large standard error.
• This result may indicate the store must add up
every item in stock, or, take perhaps just take a
larger sample!
46
Applications of confidence
interval estimation in auditing
• Difference estimation is used in auditing when
there are believed to be errors in a set of items
being audited.
• It allows the estimation of the magnitude of the
errors based on a sample.
• The average difference is:
n
D
D
i 1
i
n
where Di audited value original value
47
Applications of confidence
interval estimation in auditing
• The standard deviation of the difference is:
n
SD
D D
i 1
2
i
n 1
• The confidence interval estimate for the total
difference is:
SD
ND N tn1
n
N n
N 1
48
Applications of confidence
interval estimation in auditing
• Often organisations are interested in
determining the maximum allowable proportion
of a certain event (such as a non-complying
item).
• This requires a one-sided confidence interval
for a proportion:
ps 1 ps N n
ps Z
n
N 1
where Z is the right hand tail probability of .
49
Ethical issues and confidence
interval estimation
Ethical issues to consider:
• Are confidence intervals included with point
estimates? (Can you find examples in the
media where this has not occurred?)
• Is the sample size stated?
• Is the confidence interval interpreted so that a
non-statistician can clearly understand it?
• Is every effort made to avoid ambiguity or
misleading conclusions?
50
After the lecture each week…
• Review the lecture material
• Complete all readings
• Complete all of recommended problems
(listed in SG) from the textbook
• Complete at least some of additional problems
• Consider (briefly) the discussion points prior to
tutorials
51