Section 6-1, 6-2 - Los Rios Community College District

Download Report

Transcript Section 6-1, 6-2 - Los Rios Community College District

Chapter 6
Estimates and Sample Sizes
6-1 Estimating a Population Mean: Large
Samples / σ Known
6-2
Estimating a Population Mean:
Small Samples / σ Unknown
6-3 Estimating a Population Proportion
6-4 Estimating a Population Variance:
Will cover with chapter 8
1
Overview
This chapter presents:
 Methods for estimating population means
and proportions
 Methods for determining sample sizes
2
6-1
Estimating a Population Mean:
Large Samples / σ Known
3
Assumptions
 Large Sample is defined as samples
with n > 30 and σ known.
 Data collected carelessly can be absolutely worthless, even if
the sample is quite large.
4
 Estimator
Definitions
a formula or process for using sample data to
estimate a population parameter
 Estimate
a specific value or range of values used to
approximate some population
parameter
 Point Estimate
a single value (or point) used to approximate a
population parameter
The sample mean x is the best point estimate of
the population mean µ.
5
Definition
Confidence Interval
(or Interval Estimate)
a range (or an interval) of values used to
estimate the true value of the population
parameter
Lower # < population parameter < Upper #
As an example
Lower # <  < Upper #
6
Definition
Why Confidence Intervals
A couple of points
1.
Even though x is the best estimate for  and s is the best estimate for  they
do not give us an indication of how good they are.
2.
A confidence interval gives us a range of values based on
a)
variation of the sample data
b)
How accurate we want to be
3.
The width of the range of values gives us an indication of how good the
estimate is.
4.
The width is called the Margin of Error (E). We will discuss how to calculate
this later.
7
Definition
Degree of Confidence
(level of confidence or confidence coefficient)
Proportion of times that the confidence
interval actual contains the population
parameter
Degree of Confidence = 1 - 
often expressed as a percentage value
usually 90%, 95%, or 99%
So ( = 10%), ( = 5%), ( = 1%)
8
Interpreting a Confidence Interval
98.08 < µ < 98.32
o
o
Let: 1 -  = .95
Correct: we are 95% confident that the interval from
98.08 to 98.32 actually does contain the true value of
.
This means that if we were to select many
different samples of sufficient size and construct the
confidence intervals, 95% of them would actually
contain the value of the population mean .
Wrong: There is a 95% chance that the true value of 
will fall between 98.08 and 98.32. (there is no way to
calculate the probability for a population parameter only a sample
statistic)
9
Confidence Intervals from 20 Different Samples
Simulations
http://www.ruf.rice.edu/~lane/stat_sim/conf_interval/index.html
10
Definition
Critical Value
The number on the borderline separating sample statistics
that are likely to occur from those that are unlikely to
occur. The number z/2 is a critical value that is a z score
with the property that it separates an area /2 in the right
tail of the standard normal distribution.
ENGLISH PLEASE!!!!
11
The Critical Value
z
2
2
2
-z
2
z=0
z
2
Found from calculator
12
Finding z2 for 95% Degree of Confidence
95%
 = 5%
2 = 2.5% = .025
.95
.025
.025
z2
-z2
Critical Values
13
Finding z2 for 95% Degree of Confidence
 = 0.05
 = 0.025
.025
Use calculator
to find a z score of 1.96
z2 =  1.96
.025
- 1.96
.025
1.96
14
Finding z2 for other Degrees of
Confidence
Examples:
1. 1 - 
2. 1 - 
3. 1 - 
4. 1 - 
5. 1 - (will use on test for ease of calculation)
Find critical value and sketch
15
Definition
Margin of Error
is the maximum likely difference observed
between sample mean x and true population
mean μ.
denoted by E
μ
lower limit
upper limit
16
Confidence Interval (or Interval Estimate)
for Population Mean µ
(Based on Large Samples: n >30)
x -E <µ< x +E
Where
E = z/2 •

n
17
When can we use zα/2?
 If n > 30 and we know 
 If n  30, the population must have
a normal distribution and we must know
.
Knowing  is largely unrealistic.
18
Round-Off Rule for Confidence
Intervals Used to Estimate µ
1. When using the original set of data, round the
confidence interval limits to one more decimal place than
used in original set of data.
2. When the original set of data is unknown and only the
summary statistics (n, x, s) are used, round the
confidence interval limits to the same number of decimal
places used for the sample mean.
19
Example: A study found the starting salaries of 100
college graduates who have taken a statistics course. The
sample mean was $43,704 and the sample standard deviation
was $9,879. Find the margin of error E and the 95%
confidence interval.
n = 100
E = z / 2 •  = 1.96 • 9879 = 1936.3
x = 43704
n
100
σ = 9879
 = 0.95
 = 0.05
/2 = 0.025
z / 2 = 1.96
x -E << x +E
<<
$41,768 <  <
43704 - 1936.3
43704 + 1936.3
$45,640
Based on the sample provided, we are 95% confident the population
(true) mean of starting salaries is between 41,768 & 45,640.
20
TI-83 Calculator
Finding Confidence intervals using z
1. Press STAT
2. Cursor to TESTS
3. Choose ZInterval
4. Choose Input: STATS*
5. Enter σ and x and confidence level
6. Cursor to calculate
*If your input is raw data, then input your raw data in L1
then use DATA
21
Width of Confidence
Intervals
Test Question
What happens to the width of confidence
intervals with changing confidence
levels?
22
Finding the Point Estimate and E
from a Confidence Interval
Point estimate of x:
x = (upper confidence interval limit) + (lower confidence interval limit)
2
Margin of Error:
E = (upper confidence interval limit) - x
23
Example
Find x and E
26 < µ < 40
x = (40 + 26) / 2 = 33
E = 40 - 33 = 7
Use for #4 on hw
24
Sample Size for Estimating Mean 
E=
z/ 2

• n
(solve for n by algebra)
n=
z/ 2 
2
E
z/2 = critical z score based on the desired degree of confidence
E = desired margin of error
 = population standard deviation
25
Example:
If we want to estimate the mean weight of
plastic discarded by households in one week, how many
households must be randomly selected to be 99%
confident that the sample mean is within 0.25 lb of the true
population mean? (A previous study indicates the
standard deviation is 1.065 lb.)
2
2
 = 0.01
z = 2.575
E = 0.25
σ = 1.065
n = z
E
= (2.575)(1.065)
0.25
= 120.3 = 121 households
If n is not a whole number, round it up
to the next higher whole number.
26
Example:
If we want to estimate the mean weight of
plastic discarded by households in one week, how many
households must be randomly selected to be 99%
confident that the sample mean is within 0.25 lb of the true
population mean? (A previous study indicates the
standard deviation is 1.065 lb.)
2
2
 = 0.01
z = 2.575
E = 0.25
σ = 1.065
n = z
E
= (2.575)(1.065)
0.25
= 120.3 = 121 households
We would need to randomly select 121 households to be
99% confident that this mean is within 1/4 lb of the
population mean.
27
Example:
How large will the sample have to be if we
want to decrease the margin of error from 0.25 to 0.2?
Would you expect it to be larger or smaller?
 = 0.01
z = 2.575
E = 0.20
σ = 1.065
n = z
E
2
2
= (2.575)(1.065)
0.2
= 188.01 = 189 households
We would need to randomly select a larger sample
because we require a smaller margin of error.
28
What happens when E is doubled ?
2
2
(z/ 2 )
z/ 2
E=1:
n=
E=2:
(z/ 2 )
z
/ 2
n=
=
4
2
1
=
2
1
2
Sample size n is decreased to 1/4 of its
original value if E is doubled.
Larger errors allow smaller samples.
Smaller errors require larger samples.
29
Class Assignment
1. Use OLDFAITHFUL Data in Datasets File
2. Construct a 95% and 90% confidence interval for the mean
eruption duration. Write a conclusion for the 95% interval.
Assume σ to be 58 seconds
3. Compare the 2 confidence intervals. What can you conclude?
4. How large a sample must you choose to be 99% confident the
sample mean eruption duration is within 10 seconds of the true
mean
Guidelines:
1. Choose a partner
2. Suggest having one person working the calculator and one writing
3. Due at the end of class (5 HW points)
4. Each person must turn in a paper
30
6-2
Estimating a Population Mean:
Small Samples / σ Unknown
31
Small Samples
Assumptions
1. n  30
2. The sample is a random sample.
3. The sample is from a normally
distributed population.
Case 1 ( is known): Largely unrealistic;
Case 2 (is unknown): Use Student t
distribution if normal ; if n is very large use z
32
Determining which distribution
to use
Case 1 ( is known):
n > 30
n < 30 & Normal
n < 30 & Skewed
use z
use z
neither
Case 2 (is unknown):
n very large
n > 30
n < 30 & Normal
n < 30 & skewed
use z
use t
use t
neither
33
Determining which distribution
to use
1.
2.
3.
4.
5.
n = 150 ; x = 100 ; s = 15 skewed distribution
n = 8 ; x = 100 ; s = 15 normal distribution
n = 8 ; x = 100 ; s = 15 skewed distribution
n = 150 ; x = 100 ; σ = 15 skewed distribution
n = 8 ; x = 100 ; σ = 15 skewed distribution
34
Important Facts about the Student t Distribution
1.
2.
3.
Developed by William S. Gosset in 1908
Density function is complex
Shape is determined by “n”
4.
Has the same general symmetric bell shape as the normal
distribution but it reflects the greater variability (with wider
distributions) that is expected with small samples.
5.
The Student t distribution has a mean of t = 0, but the
standard deviation varies with the sample size and is always
greater than 1
6.
Is essentially the normal distribution for large n. For values of
n > 30, the differences are so small that we can use the
critical z or t value.
35
Student t Distributions for
n = 3 and n = 12
Student t
Standard
normal
distribution
distribution
with n = 12
Student t
distribution
with n = 3
0
Greater variability than standard
normal due to small sample size
36
Student t Distribution
If the distribution of a population is
essentially normal, then the distribution of
t =
x-µ
s
n
critical values denoted by
t/ 2
37
Book Definition
Degrees of Freedom (df )
Corresponds to the number of sample
values that can vary after certain restrictions
have imposed on all data values.
This doesn’t help me, how about you?
38
Definition
Degrees of Freedom (df )
In general, the degrees of freedom of an estimate is equal to the
number of independent scores (n) that go into the estimate minus
the number of parameters estimated.
In this section
df = n - 1
because we are estimating  with x
39
Table A-3 / Calculators /
Excel
 Table from website
 TI – 84 (only)
 Excel function (tinv)
40
Table A-3 t Distribution
Degrees
of
freedom
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Large (z)
.005
(one tail)
.01
(two tails)
63.657
9.925
5.841
4.604
4.032
3.707
3.500
3.355
3.250
3.169
3.106
3.054
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.575
.01
(one tail)
.02
(two tails)
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.625
2.602
2.584
2.567
2.552
2.540
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.327
.025
(one tail)
.05
(two tails)
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.132
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
1.960
.05
(one tail)
.10
(two tails)
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.645
.10
(one tail)
.20
(two tails)
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.320
1.318
1.316
1.315
1.314
1.313
1.311
1.282
.25
(one tail)
.50
(two tails)
1.000
.816
.765
.741
.727
.718
.711
.706
.703
.700
.697
.696
.694
.692
.691
.690
.689
.688
.688
.687
.686
.686
.685
.685
.684
.684
.684
.683
.683
.675
41
Critical z Value vs Critical t Values
See “t distribution pdf.xls”
42
Finding t2 for the following Degrees of
Confidence and sample size
Examples:
1. 1 - n = 12
2. 1 - n = 15
3. 1 - n = 9
4. 1 - n = 20
Find critical value and sketch
43
Confidence Interval for the
Estimate of µ
Based on an Unknown  and a Small Simple Random
Sample from a Normally Distributed Population
x-E <µ< x +E
where
E = t/2 s
n
t/2 found in Table A-3
44
Using the Normal and t Distribution
45
Example:
Let’s do an example comparing z and t.
Construct confidence interval’s for each using the following
data.
n = 16
x = 50
s = 20
 = 0.05
/2 = 0.025
Now we wouldn’t use a z distribution
here due to the small sample but let’s do
it anyway and compare the width of the
confidence interval to a confidence
interval created using a t distribution
46
Example:
A study of 12 Dodge Vipers involved in
collisions resulted in repairs averaging $26,227 and a
standard deviation of $15,873. Find the 95% interval
estimate of , the mean repair cost for all Dodge Vipers
involved in collisions. (The 12 cars’ distribution appears to
be bell-shaped.)
x = 26,227
s = 15,873
 = 0.05
/2 = 0.025
t/2 = 2.201
E = t2 s = (2.201)(15,873) = 10,085.3
n
x -E
<µ<
26,227 - 10,085.3 < µ <
$16,141.7 < µ <
12
x +E
26,227 + 10,085.3
$36,312.3
We are 95% confident that this interval contains the
average cost of repairing a Dodge Viper.
47
TI-83 Calculator
Finding Confidence intervals using t
1. Press STAT
2. Cursor to TESTS
3. Choose TInterval
4. Choose Input: STATS*
5. Enter s and x and confidence level
6. Cursor to calculate
*If your input is raw data, then input your raw data in
L1 then use DATA
48
Table A-3 t Distribution
Degrees
of
freedom
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Large (z)
.005
(one tail)
.01
(two tails)
63.657
9.925
5.841
4.604
4.032
3.707
3.500
3.355
3.250
3.169
3.106
3.054
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.575
.01
(one tail)
.02
(two tails)
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.625
2.602
2.584
2.567
2.552
2.540
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.327
.025
(one tail)
.05
(two tails)
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.132
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
1.960
.05
(one tail)
.10
(two tails)
.10
(one tail)
.20
(two tails)
.25
(one tail)
.50
(two tails)
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.645
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.320
1.318
1.316
1.315
1.314
1.313
1.311
1.282
1.000
.816
.765
.741
.727
.718
.711
.706
.703
.700
.697
.696
.694
.692
.691
.690
.689
.688
.688
.687
.686
.686
.685
.685
.684
.684
.684
.683
.683
.675
49
6-3
Estimating a population
proportion
50
Assumptions
1. The sample is a random sample.
2. The conditions for the binomial
distribution are satisfied (See Section 4-3.)
3. The normal distribution can be used
to approximate the distribution of sample
proportions because np  5 and nq  5
are both satisfied.
51
Notation for Proportions
p=
population proportion
ˆp = nx
sample proportion
of x successes in a sample of size n
(pronounced
‘p-hat’)
qˆ= 1 - pˆ = sample proportion
of x failures in a sample size of n
52
Definition
Point Estimate
The sample proportion pˆis the best
point estimate of the population
proportion p.
53
Confidence Interval for
Population Proportion
pˆ - E < p < pˆ + E
where
E = z 
pˆ qˆ
n
54
Round-Off Rule for Confidence
Interval Estimates of p
Round the confidence
interval limits to
three significant digits.
55
Determining Sample Size
E = z
pˆ qˆ
n
(solve for n by algebra)
n=
( z
2
)
pˆ qˆ
E2
56
Sample Size for Estimating Proportion p
ˆ
When an estimate of p is known:
n=
(
2 pq
)
z  ˆ ˆ
E2
When no estimate of p is known:
n=
(
2 0.25
)
z
E2
57
Example: We want to determine, with a margin of error
of two percentage points, the percentage of Americans
who own their house. Assuming that we want 90%
confidence in our results, how many Americans must we
survey? An earlier study indicates 67.5% of Americans
own their own home.
ˆˆ
n = [z/2 ]2 p q
E2
= [1.645]2 (0.675)(0.325)
0.022
= 1483.8215
= 1484 Americans
To be 90% confident that our
sample percentage is within
two percentage points of the
true percentage for all
Americans, we should
randomly select and survey
1484 households.
58
Round-Off Rule for Sample Size n
When finding the sample size n, if the
result is not a whole number, always
increase the value of n to the next larger
whole number.
n = 1483.8215 = 1484 (rounded up)
59
Example: We want to determine, with a margin of error
of two percentage points, the percentage of Americans
who own their house. Assuming that we want 90%
confidence in our results, how many Americans must we
survey? There is no prior information suggesting a
possible value for the sample percentage.
n = [z/2 ]2 (0.25)
E2
= (1.645)2 (0.25)
0.022
= 1690.9647
= 1691 Americans
With no prior information,
we need a larger sample to
achieve the same results
with 90% confidence and an
error of no more than 2%.
60
TI-83 Calculator
Finding Confidence intervals using z (proportions)
1. Press STAT
2. Cursor to TESTS
3. Choose 1-ProbZInt
4. Enter x and n and confidence level
5. Cursor to calculate
61