34 Confidence Intervals

Download Report

Transcript 34 Confidence Intervals

“Teach A Level Maths”
Statistics 1
Confidence Intervals
© Christine Crisp
Confidence Intervals
Statistics 1
AQA
Normal Distribution diagrams in this presentation have been drawn using FX Draw
( available from Efofex at www.efofex.com )
"Certain images and/or photos on this presentation are the copyrighted property of JupiterImages and are being used with
permission under license. These images and/or photos may not be copied or downloaded without permission from JupiterImages"
Confidence Intervals
We know that the best estimate of a population mean
is the mean of a random sample. The larger the
sample size, the better the estimate.
We also saw that the standard error of the distribution
of sample means gives a measure of the accuracy of the
estimate and that poor estimates occur rarely.
All of these comments are vague, so the final thing we
want to do is give some numerical indication of accuracy.
We do this by
• giving an interval of values within which the population
mean is likely to lie and
• saying how likely it is that the interval contains the
mean.
Confidence Intervals
We’ll start by looking again at our weights of hens eggs and
the distribution of sample means taken from the population.
Standard deviation of the
population is  .
Standard deviation of the
sample means ( the 
standard error ) is
.
n
The distribution of sample means is given approximately by
2




X  N   ,

n


X
95%
Suppose we want to know the
limits within which 95% of the
means, x , lie.
a
60
We want to find a, the upper limit, and then symmetry
will give the lower limit.
Confidence Intervals
X
95%
2





X  N  ,

n


60
a
Using the table “Percentage points of the Normal
Distribution”, we can find the z value
z  1  96
( Remember the table uses
(z ) so we needed 0·975 not
0·95 )
Z
97.5%
0
z
Confidence Intervals
X
95%
2





X  N  ,

n


60
a
Using the table “Percentage points of the Normal
Distribution”, we can find the z value
z  1  96
Z
95%
95% of the Z distribution lies
within  1  96 of the mean.
0
z
Confidence Intervals
X
95%
2





X  N  ,

n


a
60
Using the table “Percentage points of the Normal
Distribution”, we can find the z value
z  1  96
Z
95%
95% of the Z distribution lies
within  1  96 of the mean.
 1  96
We now have to convert from z to a.
0
1  96
Confidence Intervals
X
95%
60
a
Z
95%
 1  96
0
1  96
2




X  N   ,

n


As we are dealing with the distribution of sample
means, the usual formula
x   becomes z  a  
z


n
a


 1  96 
 1  96
 a    a    1  96

n
n
n
Confidence Intervals
X
95%
a    1  96

n
60
a
So, 95% of the sample means, ( the X distribution ) lie
within the interval
  1  96
   1  96


n
 x    1  96

n
n
This inequality gives an interval within which x lies.
However, we want to know how good our estimate of  is,
so we need an interval for  .
We just need to rearrange.
Confidence Intervals
  1  96

 x    1  96
n

n
Dealing with the 2 parts separately:
  1  96

 x
n

   x  1  96
n
and
x    1  96
x  1  96
Now putting them together again:

n

n

Confidence Intervals

  1  96
 x    1  96
n

n
Dealing with the 2 parts separately:
  1  96

 x
n

   x  1  96
n
and
x    1  96
x  1  96
n
Now putting them together again:
x  1  96


n
 

n

Confidence Intervals
  1  96

 x    1  96
n

n
Dealing with the 2 parts separately:
  1  96

 x
n

   x  1  96
n
and
x    1  96
x  1  96
n
Now putting them together again:
x  1  96


n
   x  1  96

n

n

Confidence Intervals
  1  96

 x    1  96
n

n
Dealing with the 2 parts separately:
  1  96

 x
n

   x  1  96
n
and
x    1  96
x  1  96
n

n
Now putting them together again:
x  1  96



n
   x  1  96

n
Since the inequality expresses an interval, we often write


 
 x  1  96

, x  1  96
n
n

Confidence Intervals
The interval


 
 x  1  96

, x  1  96
n
n

is called the 95% confidence interval for the estimate of
the population mean using a sample mean of size n.
The percentage of confidence tells us that if we had 100
samples each of size n and we formed an interval for
each, we would expect 95 of the intervals to contain the
population mean.
The hens eggs data had a mean of 60 and a standard
deviation of,  = 2·94. Suppose we take a sample of size 5
and it has mean, x = 61·0, the interval is

2  94
2  94 
 61  1  96

, 61  1  96
5
5 

( 58  4, 63  6 )

Confidence Intervals
The 95% confidence interval (c.i.) is
( 58  4, 63  6 )
This statement says that there is a probability of 0·95
that the interval from 58·4 to 63·6 contains .
So, we expect a similar statement to be true 95 times
out of a 100 ( and wrong 5 times out of 100 ! ).
The diagram that follows shows this c.i. and those for
several other samples of size 5 from the hens eggs data.
Confidence Intervals
Population mean,  = 60
Confidence Intervals
1st sample
56·2, 57·8, 61·4, 62·5,
67·2
x  61  0
The c.i. for the 1st sample
Confidence Intervals
The c.i. for the 2nd sample
Confidence Intervals
Confidence Intervals
The 4th sample has a c.i.
that doesn’t span the
population mean.
The diagram that follows shows the confidence intervals for
100 samples. It was copied from the software package
“Autograph 3” using “Autograph resources”, “Extras”.
Confidence Intervals
95% confidence intervals for 100 samples of size 5 from a
Normal Distribution.
N.B. If we have only one sample and 1 c.i., it could be any
of the above.
Confidence Intervals
When I took another 100 samples, again of size 5 and I again
drew the 95% confidence intervals, I got the following:
Why are there only 4 intervals that don’t include  ?
ANS: As we are dealing with samples, whilst on
average we will get 5, we could, with any one set of
100 samples, get more or less than 5.
Confidence Intervals
SUMMARY
 The 95% confidence interval (c.i.) for an estimate of
the population mean is


 
 x  1  96

, x  1  96
n
n

where,
x is the sample mean and
•
•

is the standard deviation of the distribution
n
of sample means, the standard error.
•
1·96 is the “z” value corresponding to the central
95% of a Normal distribution.
 If the population does not have a Normal
distribution, we need n  30
Confidence Intervals
The 95% confidence interval (c.i.) for an estimate of the
population mean is


 
 x  1  96

, x  1  96
n
n

Now suppose we can’t accept being wrong 5% of the time
and want to be wrong only 1% of the time.
Exercise
What will happen to the width of the c.i.?
Which part of the formula will change and what will
it change to?
Confidence Intervals
The 95% confidence interval (c.i.) for an estimate of the
population mean is


 
 x  1  96

, x  1  96
n
n

Now suppose we can’t accept being wrong 5% of the time
and want to be wrong only 1% of the time.
What will happen to the width of the c.i.?
ANS: It will increase.
Which part of the formula will change and what will
it change to?
ANS: The z value, 1·96, changes.
Z
99%
Using p = 0·995 in the table:
z = 2·5758
0
z
Confidence Intervals
e.g. 1(a) Find the 80% confidence interval for the mean
of a population with standard deviation 5 using a
random sample of size 40 with mean 20.
(b) In (a) did we need to assume the population has
a Normal distribution?
Solution:


 


, xz
(a) The formula for a c.i. for  is  x  z
n
n


5

 0  7906
Standard error:
n
40
Z
Find the z value: z  1  2816
90%
80%

 z
 1  2816 0  7906  1  01
n
( 3 s. f . )
z
 the 80% c.i. for  is
0
20  1  01 , 20  1  01
 19  0 , 21  0
Confidence Intervals
e.g. 1(a) Find the 80% confidence interval for the mean
of a population with standard deviation 5 using a
random sample of size 40 with mean 20.
(b) In (a) did we need to assume the population has
a Normal distribution?
Solution:
(b) No. The Central Limit Theorem says that with a large
sample, the population need not be Normal.
Confidence Intervals
e.g. 2. Find the width of the 95% c.i. for the population
mean of a variable with variance 16 using a
random sample of size 40.
Solution: The formula for the 95% c.i. for  is


 
 x  1  96

, x  1  96
n
n

x  1  96

x
n
1  96
x  1  96

n
1  96
n


n

so, the width is 2  1  96
( which doesn’t depend on x )
n
4
 2  1  96
 2  48 ( 3 s. f . )
40
Confidence Intervals
Exercise
1. Calculate the 95% confidence interval for each of the
following samples taken from a Normal distribution:
(a) x  15,   3, n  5
(b) x  15,   3, n  20
2. Write down the width of the intervals found in 1(a)
and (b). By what factor did the width of the c.i.
change from (a) to (b) and why did it change by this
amount?
3. What is the formula for a 90% c.i. for a population
mean?
Confidence Intervals
Solutions:
1. Formula for 95% confidence interval for  is


 
 x  1  96

, x  1  96
n
n

(a) x  15,   3, n  5

3
3 


, 15  1  96
C.i. is  15  1  96
5
5

 12  4, 17  6 
(b) x  15,   3, n  20
C.i. is ( 13  7, 16  3 )
Confidence Intervals
2. Write down the width of the intervals found in 1(a)
and (b). By what factor did the width of the c.i.
change from (a) to (b) and why did it change by this
amount?
Solution: The intervals were:
(a)
(b)
( 12  4, 17  6 ) , n = 5
( 13  7, 16  3 ) , n = 20
Widths: (a) 5·2
(b) 2·6
Width is halved.
In (b) the sample size, n, was 4 times larger than in
(a) but, in finding the c.i., we divide by n so the
result is divided by
4 ( = 2 ).
Confidence Intervals
3. What is the formula for a 90% c.i.?
Solution:
We want ( z )  0  95
 z  1  6449
Z
90%
5%
So, the 90% c.i. is


 
 x  1  6449

, x  1  6449
n
n

0
z
Confidence Intervals
SUMMARY
 Formulae for some of the confidence intervals are:
90% :
95% :
99% :


 
 x  1  6449

, x  1  6449
n
n



 
 x  1  96

, x  1  96
n
n



 
 x  2  5758

, x  2  5758
n
n

 Apart from the z value for the 95% interval, which you
may want to remember, use a sketch and look up the z
value in the table. Remember the percentage for the
c.i. is the middle area and the table uses the left-hand
area ( e.g. the 90% c.i. uses 0·95 ).
continued:
Confidence Intervals
SUMMARY
 Increasing the sample size by a factor of 4, divides the
previously calculated standard error by 2, so halves the
width of the confidence interval.
e.g. 95% c.i., x  15
n5
n  20
x
( 12  4, 17  6 )
x
( 13  7, 16  3 )
 Taking large samples can be very expensive, so, in
practice, to reduce the width of the c.i., we may need
to reduce the level of confidence instead of increasing
the sample size.
Confidence Intervals
Unknown Population Standard Deviation
It is quite likely that we won’t know the standard deviation
of the population. In this case, we must estimate it from
the sample.
We use the unbiased estimator:
n
S  s
n1
Confidence Intervals
e.g. Find the 90% c.i. for the population mean using
the following random sample from a variable with a
Normal distribution:
3, 6, 10, 14, 17
Solution:
Using calculator functions:
x  10 , s  5  10 , sample values
S  5  70 , unbiased estimator 
90% c.i. is


 
 x  1  6449

, x  1  6449
n
n


57
57

  10  1  6449
, 10  1  6449
5
5 

 ( 5  81, 14  19 )
Confidence Intervals
Exercise
1. Potatoes are sold in bags marked “5 kg”. The weights
can be assumed to be normally distributed. A random
sample of 10 bags were weighed ( weights in kg. ) and
found to be as follows:
5·04, 5·21, 5·11, 4·82, 5·32, 5·41, 4·82, 4·89, 5·22,
5·23
(a) Calculate a 95% confidence interval for the population
mean weight giving the limits to two decimal places.
(b) Use the sample and the confidence interval to
comment on the claim that the bags weigh 5 kg.
Confidence Intervals
Solution:


 

(a) 95% c.i. is given by  x  1  96
, x  1  96
n
n

5·04, 5·21, 5·11, 4·82, 5·32, 5·41, 4·82, 4·89, 5·22,
5·23
Sample mean: x  5  11
Unbiased estimator of population standard deviation:
S  0  2087

0  2087

 0  066 ( 3 d . p.)
Standard error =
n
10
95% c.i.:   x  1  96  0  066 , x  1  96  0  066 
 4  98, 5  24 
(b) Although only 3 values in the sample are less than 5kg,
and the sample mean is greater than 5kg, the c.i. shows
that the mean weight of the bags could be less than 5kg.
Confidence Intervals
The following slides contain repeats of
information on earlier slides, shown without
colour, so that they can be printed and
photocopied.
For most purposes the slides can be printed
as “Handouts” with up to 6 slides per sheet.
Confidence Intervals
We know that the best estimate of a population mean
is the mean of a random sample. The larger the
sample size, the better the estimate.
We also saw that the standard error of the distribution
of sample means gives a measure of the accuracy of the
estimate and that poor estimates occur rarely.
All of these comments are vague, so the final thing we
want to do is give some numerical indication of accuracy.
We do this by
• giving an interval of values within which the population
mean is likely to lie and
• we say how likely it is that the mean is within the
interval.
Confidence Intervals


 

The interval  x  1  96
, x  1  96
n
n

is called the 95% confidence interval for the estimate of
the population mean using a sample mean of size n.
The percentage of confidence tells us that if we had 100
samples each of size n and we formed an interval for
each, we would expect 95 of the intervals to contain the
population mean.
The hens eggs data had a mean of 60 and a standard
deviation of,  = 2·94. Suppose we take a sample of size 5
and it has mean, x = 61·0, the interval is

2  94
2  94 
 61  1  96

, 61  1  96
5
5 

( 58  4, 63  6 )

Confidence Intervals
The 95% confidence interval (c.i.) is
( 58  4, 63  6 )
This statement says that there is a probability of 0·95
that the interval from 58·4 to 63·6 contains .
So, we expect a similar statement to be true 95 times
out of a 100 ( and wrong 5 times out of 100 ! ).
The diagram that follows shows this c.i. and those for
several other samples of size 5 from the hens eggs data.
Confidence Intervals
Weights of hens eggs: population and sample of size n = 5
4th sample mean
The 4th sample has a c.i.
that doesn’t span the
population mean.
95% confidence intervals
Confidence Intervals
SUMMARY
 The 95% confidence interval (c.i.) for an estimate of
the population mean is


 
 x  1  96

, x  1  96
n
n

where,
x is the sample mean and
•
•

is the standard deviation of the distribution
n
of sample means, the standard error.
•
1·96 is the “z” value corresponding to the central
95% of a Normal distribution.
 If the population does not have a Normal distribution,
we need n > 30.
Confidence Intervals
The 95% confidence interval (c.i.) for an estimate of the
population mean is


 
 x  1  96

, x  1  96
n
n

Now suppose we can’t accept being wrong 5% of the time
and want to be wrong only 1% of the time.
The z value, 1·96, must change.
Instead of the z value that gives the central 95% of the
distribution we want 99%.
So, we need the area to the
left of z to equal 99·5% .
Z
99%
Using p = 0·995 in the table:
z = 2·5758
0
z
Confidence Intervals
SUMMARY
 Formulae for some of the confidence intervals are:
90% :
95% :
99% :


 
 x  1  6449

, x  1  6449
n
n



 
 x  1  96

, x  1  96
n
n



 
 x  2  5758

, x  2  5758
n
n

 Apart from the z value for the 95% interval, which you
may want to remember, use a sketch and look up the z
value in the table. Remember the percentage for the
c.i. is the middle area and the table uses the left-hand
area ( e.g. the 90% c.i. uses 0·95 ).
continued:
Confidence Intervals
 Increasing the sample size by a factor of 4, divides the
standard error by 2, so halves the width of the
confidence interval.
e.g. 95% c.i., x  15
n5
n  20
x
( 12  4, 17  6 )
x
( 13  7, 16  3 )
 Taking large samples can be very expensive, so, in
practice, to reduce the width of the c.i., we may need
to reduce the level of confidence instead of increasing
the sample size.
Confidence Intervals
Unknown Population Standard Deviation
It is quite likely that we won’t know the standard
deviation of the population. In this case, we must
estimate it from the sample.
We use the unbiased estimator:
n
S  s
n1
Confidence Intervals
e.g. Find the 90% c.i. for the population mean using
the following random sample from a variable with a
Normal distribution:
3, 6, 10, 14, 17
Solution:
Using calculator functions:
x  10 , s  5  10 , sample values
S  5  70 , unbiased estimator 
90% c.i. is


 
 x  1  6449

, x  1  6449
n
n


57
57

  10  1  6449
, 10  1  6449
5
5 

 ( 5  81, 14  19 )