Transcript Document

Estimation and Confidence Intervals
Chapter
Nine
McGraw-Hill/Irwin
© 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
A Point estimate is a single value (statistic) used to estimate a
population value (parameter).
Eg. μx is a point estimate of μ
We cannot be sure that Point estimate is the mean. But we can
calculate an interval around this estimate and assert with a certain
confidence that the true population mean will lie inside it.
A Confidence Interval is a range of values within which the
population parameter (eg. μ ) is expected to occur at a specified
level of confidence generally expressed as a percent.
Level of confidence
Confidence Interval
Let us recall from Chapter 8 that …
•The best estimator of μ is X
•The SD of X distribution is σ/√n
Any X you calculate based on a sample will have to be within
3.(σ/√n) of μ (based on the Empirical rule)
σ/√n
σ / √n
x
3.(σ / √n)
μ
3.(σ / √n)
How much width around X ?
From Chapter 8,
Sampling Error = X – μ
We also know from Chapter 8,
Z = (X – μ) / (σ/√n)
Combining the two,
Sampling Error, X – μ = Z . (σ / √n)
So, if we add & subtract the above Sampling Error factor to X, we
can estimate the range (called, CI ) within which μ must lie.
- Z . (σ / √n)
X
+ Z . (σ / √n)
If σ is not known and n >30, the SD of the sample s is used.
CI for the population mean μ is:
X

z
s
n
Problem (page 250)
The AM Association wants info on the mean income of managers working in the
retail industry. A random sample of 256 managers had a mean of $45420 with a
standard deviation of $2050. What is the interval in which the population mean
would lie with a 95% confidence level.
Since Z for 95% is 1.96*, the formula for CI can rewritten as:
X  1.96
s
n
= 45420 ± 1.96 (2050 / √256) = 45420 ± 251
So, the CI is $45169 - $45671
*See next slide
Why use Z=1.96 for CI at 95% ?
Because, area under the curve between Z = +1.96 and – 1.96, is 95%
(see Appendix D)
Question: What would be the value of Z for CI at 99%?
Z = 2.58 !
Notice that the CI widens when confidence level is increased from 95% to 99%
What does the CI at a 95% level of confidence mean ?
It means that 95% of the sample intervals will contain the
population mean μ
Try experimenting With
Visual Statistics software
How do we increase our confidence?
1. Widen the interval (Z )
Let us say, based on past exams, I claim with 75% confidence that in
the coming test, the class average (μ ) will be between 70-80 points.
If I want to raise my confidence to 95%, I can do two things:
1) widen the CI from 70-80 to 60-90
2) increase n to reduce dispersion of the distribution
2. Increase the sample size (n )
Larger n squishes the area (and therefore, the probabilities) into a
thinner peak; so, the level of confidence will be a high percentage
even with a smaller interval.
SD = σ/√n
X
μ
t-Distribution
Use t-distribution when:
•n < 30 (eg. You are crash-testing expensive autos!)
•only s is known (ie. σ is unknown)
•underlying population is approximately normal
X t
s
n
In general, if you see n<30 in the exam problem,
you must think t-distribution!
The Story of t-Distribution
Once upon a time, there was a statistician called Gosset …
When you don’t know σ, you have to use s instead. But the problem is, when n is
small (n<30), s has a wide dispersion and is not a good estimator of σ
Gosset created a new distribution called ‘t’ that spreads the area under the curve
wider when s is small but automatically converges to normal when n increases
beyond 30!
Compare with
Chart 9-2 in
text (page 255)
Z=1.96
Note:n=5
t=2.776
Visual Statistics Demo
Using Continuous Distribution module
Observe how the ± 1.96 (95%) in Z in stretched outward to ± 2.776 in t to keep
the area under the curve same at 0.95, when sample size is only 5.
Look at it this way: Since n is small, we are not sure s would be a good
estimate of σ; so, we play it safe by increasing CI for the same confidence
level.
Practice! (problem on page 256)
A tire manufacturer wishes to investigate the tread life of its tires. A
sample of 10 tires driven 50000 miles revealed a sample mean of
0.32 inch of tread remaining with a standard deviation of 0.09 inch.
Construct a 95% CI for the population mean.
What is the formula to be used?
X t
s
n
What is the value of t for df=9* and CI=95% (page 498) = 2.262
What is the 95% CI?
= 0.32 ± 2.262 ( 0.09 / √10) = 0.32 ± 0.064
= 0.256 to 0.384
*df = (n -1)
Degrees of Freedom
You are in a room with 10 chairs and you are sitting in one of them. The
other chairs are empty. How many other chairs can you move to?
Ans: 9
So in general, df = n-1
CI for a population proportion
•So far we studied variables that use a ratio scale. There we can
calculate the means. Eg. Manager’s $ income & Tire wear
•What if we have to work with a nominal scale variable where
values are categorized into one of two groups?
Eg. CSUN career center reports that 75% of its graduates get a
job related to their major.
You cannot calculate the mean of Yes & No’s.
But, you can calculate a proportion of students who said Yes.
Getting the job in your major can be termed as ‘success’;
if the student got a job in a different field, then it is a
‘failure’.
So, Binomial distribution formulas we studied in Chapter
6 can be used to describe sampling distribution of a
proportion RV!
Mean successes in a Binomial distribution is nπ [Ch 6; Page 167]
SD for Binomial is √nπ(1-π) [Page 167]
Binomial Distribution (See Page 170)
No. of heads (successes) in 10 trials of throwing a coin
Mean (expected number of heads) = 5 [notice the peak at X=5 ]
If X-axis is redrawn as X/10 (ie proportion of successes), the
curve will squish by 10 times; and so will its SD.
X/n
0 .1 .2 .3
...
...
1.0
Estimating population proportion
Here, we focus on the proportion of successes; so, we divide the
number of successes, x, by the total number of trials, n.
√p(1-p)/n
Note: p=x/n
X
n
π
CI for the population proportion π
σp = √p(1-p)/n
π has to be within 3σ’s
(Empirical rule)
p
π
CI = p ± Z . √p(1-p)/n
(Note the pattern: CI = Sample Mean ± (Confidence level) * (SD of Sample Distrbn)
A sample of 500 executives who own their own home revealed 175
planned to sell their homes and retire to Arizona. Develop a 98%
confidence interval for the proportion of executives that plan to sell
and move to Arizona.
(.35 )(. 65 )
.35  2.33
 .35  .0497
500
A word of caution
Binomial approximation works well when the following
two conditions are satisfied:
n.p ≥ 5
&
n.(1-p) ≥ 5.
Here is why: (see page 170)
Calculating the sample size
3 factors affect the sample size:
•The level of confidence desired
•The margin of error the researcher will tolerate.
•The variability in the population being studied.
The formula for estimated sample size is:
 zs
n

 E 
2
where
n is the size of the sample
E is the allowable error
z is the z- value corresponding to the selected level of confidence
(for 99%, from Appendix, Z=2.58)
s the sample deviation of the pilot survey
P(r)oof !
Z = X – μ / ( s/√n ) [Ch 8; Page 235]
X - μ = Z. ( s/√n )
E = Z. ( s/√n )
E2 = Z2. s2 / n
n = Z2.s2 /E2
n = Z.s
E
2
A utility company would like to estimate the mean monthly
electricity charge for a single family house within $5 using a 99%
level of confidence. The standard deviation is estimated to be
$20.00. How large a sample is required?
2
 (2.58)( 20) 
n
  107
5


The formula for
determining the
sample size in the case
of a proportion is
 Z
n  p(1  p) 
 E
2
[You can derive this by rearranging
Formula 9-6 in page 262]
where
p is the estimated proportion, based on past
experience or a pilot survey
z is the z value associated with the degree of
confidence selected
E is the maximum allowable error the
researcher will tolerate
Study the example worked out in Page 267
Finite population Correction
If the population is finite (ie, a known number), multiply
the SD by the following term.
N  n
N 1
N, population size
n, sample size
When n is small, the value of the factor is close to 1.
As n gets larger, the value of the correction factor, gets smaller;
the logic is that if the sample is a substantial percentage of the
population, the estimate of SD is more precise (Table 9-1,p.264)
Rule of thumb: Ignore correction factor if n/N < 0.05