Transcript Ch 11

Chapter 11 Problems of Estimation
11.1 Estimation of means
11.2 Estimation of means (unknown variance)
11.3 Skip
11.4 Estimation of proportions
11.1 The Estimation of Means
How to estimate the population mean μ, and
standard deviation σfrom sample data x1, x2, …, xn?
We usually use sample mean x to estimate μ
and sample standard deviation s to estimate σ.
x
and s are called point estimates.
Point estimate of the mean
• For a certain sample, sample mean, which is
the point estimate of the population mean,
is a single number.
• Since sample means fluctuate from sample
to sample, we must expect an error .
• A point estimate along does not tell us about
the possible size of the error.
Interval Estimate—Confidence intervals
• An interval estimate consists of an interval which
will contain the quantity it is supposed to estimate
with a specified probability (or degree of
confidence).
• Recall that for large random samples from infinite
populations, the sampling distribution of the mean
is approximately a normal distribution with

 x   and  x 
n
• So we will utilize some properties of normal
distribution to explain a confidence interval.
For a standard normal curve
1-a
Standard normal
a
a
-za/2
za/2
Define Za/2 to be such that P(Z > Za/2)=a/2. Hence the area
under the standard normal curve between -Za/2 and Za/2 is
equal to 1-a.
1-a
a /2
Za/2
0.8
0.10
1.282
0.9
0.95
0.98
0.99
0.05
1.645
0.025
1.96
0.010
2.326
0.005
2.576
For X normal with mean  and standard deviation ,
1-a
Distribution of
a
 - za / 2 (
x
a

n
)
With probability 1-a,

  za / 2 (

n
)
x deviates from  by no more than
E  za
2

n
This is called maximum error of estimate with probability 1-a.
For X normal with mean  and standard deviation ,
.95
Distribution of
.05
.05
 - 1.96(

n
x
)   - 2( se)

The probability is 0.95 that
E  1.96
  1.96(

n
)    2( se)
x will differ from  by at most

n
or approximately to be “off” either way by at most 1.96 standard
errors of the mean.
Maximum error E with probability 1-a
• With probability 0.95,
more than
x
deviates from μ by no

E  1.96 
n
(approximately 2 standard error away from the true value)
• Probability Maximum error E
0.80

1.282 
0.90
1.645 
0.95
1.96 
0.99
2.576 
n

n

n

n
Maximum error E with probability 1 - a
• The maximum error depends on both the
confidence level and sample size!
• You can determine the sample size according to
the confidence level and the maximum error.
Sample size for estimating 
• How large must our sample to keep our
error no more than E with probability 1-a?
za / 2

n
E
za / 2  E n
za / 2
 n
E
za / 2 2 2
n
E2
As 2 increases, n increases.
As E decreases, n increases.
As our error probability a
decreases, n increases.
Confidence Interval for Means
After computing sample mean x , find a range of values
such that 95% of the time the resulting range includes the
true value .
For  known and x normal,
x -
P(-1.96 
 1.96)  0.95
/ n
P( x - 1.96
x  1.96

n

n
   x  1.96

n
)  0.95
or x  1.96( SE x ) is a 95% confidence interval for .
Degree of Confidence
The degree of confidence states the probability
that the interval will give a correct answer.
• If you use 95% confidence interval often, in the long
run 95% of your intervals will contain the true
parameter value.
• When the method is applied once, you do not know if
your interval gave a correct value (95% of the time)
or not (5% of the time).
Example 11.1
• Suppose we measure specific gravity of a
metal, and σ=0.025.
• Send each of you into the lab to take n=25
measurements:
x 

0.025

 0.005
5
n
Example 11.1
• 95% CI for the mean:
x - 1.96(0.005)    x  1.96(0.005)
• If the true value is 2, then about 95% of
students will find this is true:
x - 1.96(0.005)  2  x  1.96(0.005)
Confidence Intervals
100(1-a)% CI:
X  za / 2

80%
X  1.282
90%
X  1.645
95%
X  1.96
99%
X  2.576
n

n

n

n

n
Example 11.2
• X=breaking strength of a fish line.
σ=0.10. In a random sample of size n=10,
x  10.3
Find a 95% confidence interval for μ, the
true average breaking strength.
Solution:
• Standard error of the mean:
x 

0.10

 0.0316
n
10
• Critical value=1.96; maximum error is
1.96 x  0.062
• CI:
10.3  0.062
from 10.24 to 10.36
Example 11.2 (continued)
• How large a sample size is needed in order
to get a maximum error no more than
0.01with 95% probability if the sample
mean is used to estimate the true mean?
• Solution
za2 / 2 2 1.96 2  0.10 2
n

 384.16
2
2
E
0.01
n=385, always round up!
11.2 Estimation of Means (unknown variance)
• A sample of size n:
x1, x2, …, xn
from a normal population with mean μ, and
standard deviation, σ.
• If σis known, with probability
1-a
x - za / 2 / n    x  za / 2 / n
If σis unknown
•
•
Estimate σby sample standard deviation s
The estimated standard error of the mean will be
SE  s / n
•
Using the estimated standard error we have a confidence
interval of
x  ____( s
n
)
•
The multiplier needs to be bigger than Za/2 (e.g., 1.96). The
confidence interval needs to be wider to take into account
the added uncertainty in using s to estimate .
•
The correct multipliers were figured out by a Guinness
Brewery worker.
What is the correct multiplier? “t”
• 100(1-a)% confidence interval when  is
unknown
x  ta ( s / n )
2
• 95% CI =100(1-0.05)% confidence interval when
 is unknown
x  t0.025 (s / n )
Properties of t distribution
• The value of ta/2 depends on how much
information we have about . The amount
of information we have about  depends on
the sample size.
• The information is “degrees of freedom”
and for a sample from one normal
population this will be: df=n-1.
t curve and z curve
Both the standard normal curve N(0,1) (the z distribution), and all
t(k) distributions are density curves, symmetric about a mean of 0,
but t distributions have more probability in the tails.
As the sample size increases, this decreases and the t distribution
more closely approximates the z distribution. By n = 1000 they are
virtually indistinguishable from one another.
Critical values of t distribution
• t table is given in the book (p. 497)
P(t  ta )  a
•
•
•
•
•
•
It depends on the degrees of freedom as well
Df
alpha
t
5
0.10
1.476
10
0.05
1.812
20
0.01
2.528
25
0.025 2.060
Areas under the curve
• The area between
- ta / 2
and
ta / 2
is 1 - a
P(-ta / 2  t  ta / 2 )  1 - a
P(-ta / 2
x-

 ta / 2 )  1 - a
s/ n
Confidence interval for the mean
when  is unknown
• With probability
1-a
s
s
x - ta / 2 
   x  ta / 2 
n
n
• Maximum error
s
E  ta / 2 
n
Example (ex.
•
11.16, p 273)
Noise level, n=12
74.0 78.6 76.8 75.5 73.8 75.6
77.3 75.8 73.9 70.2 81.0 73.9
1. Point estimate for the average noise level
of vacuum cleaners;
2. 95% Confidence interval
Solution
x  75.53 s  2.75
• n=12,
• Critical value with df=11
t0.025  2.201
• 95% CI:
2.75
75.53  2.201
 75.53  1.75
12
73.78    77.28
11.4 The Estimation of Proportions
• Notation:
1. μ, σ mean and variance
2. p proportion=probability of a success
Consider count data:
n=# of trials, p=probability of a success
Estimate of p
•
•
•
•
Xi=0, or 1 with probability 1-p or p
Mean of Xi =p: population mean
X=sum of Xi
Sample proportion (mean) X/n  p
X
ˆ 
p
n
Example 11.4
• Toss a coin 100 times and you get 45 heads
• Estimate p=probability of getting a head
Solution:
45
pˆ 
 0.45
100
Is the coin balanced one?
Estimate of p
pˆ  sample proportion
 pˆ 
p (1 - p )
n
If np≥5 and n(1-p)≥5, then p̂ is approximately
normal.
Maximum error
• We have (1-a)100% confidence that the error in
our estimate is at most
E  za
2
p(1 - p)
 za
2
n
(worst case is p=1/2.)
1 *1
2 2
n
CI
• An approximate 100(1-a)% confidence
interval for p is
ˆ  za
p
2
ˆ (1 - p
ˆ)
p
n
Sample Size
• The sample size required to have
probability 1-a that our error is no more
than E is
za / 2 2 1 1 za / 2 2
n  p(1 - p)  (
)  * (
)
E
2 2 E
Since p is unknown, you have to estimate it
in the formula.
Maximize p(1-p) to get the sample size
• If you don’t have any prior information
about p, then
Maximum p(1-p)=1/4
za2 / 2
n
4E 2
If you know p is somewhere …
• If p  0.3 then
maximum p(1-p)=0.3(1-0.3)=0.21
za2 / 2
n  0.21 
E2
• If p  0.6 then
maximum p(1-p)=0.4(1-0.4)=0.24
za2 / 2
n  0.24 
E2
How to estimate the maximum
• Estimate p(1-p) by substitute p with the
value closest to 0.5
(0, 0.1), p=0.1
(0.3, 0.4), p=0.4
(0.6, 1.0), p=0.6
Example 11.4 (continued)
• 95% CI for p
0.45(0.55)
0.45  1.96 
 0.45  0.0975
100
•
0.3525<p<0.5475 with 95% probability
Example 11.5 (example 11.13 in text)
•
A state highway dept wants to estimate
what proportion of all trucks operating
between two cities carry too heavy a load
• 95% probability to assert that the error is
no more than 0.04
• Sample size needed if
1. p between 0.10 to 0.25
2. no idea what p is
Solution
1. E=0.04,
z0.025  1.96 p=0.25
1.96 2
n  0.25(0.75) 
 450.19
2
004
Round up to get n=451
2. E=0.04,
z0.025  1.96 p(1-p)=1/4
1.962
n
 600.25
2
4  0.04
n=601