Confidence Intervals for the Mean

Download Report

Transcript Confidence Intervals for the Mean

•
•
•
•
t distributions
t confidence intervals for a
population mean 
Sample size required to estimate 
hypothesis tests for 
In the 2012-2013 NFL season Adrian
Peterson of the Minn. Vikings rushed for
2,097 yards. The all-time single-season
rushing record is 2,105 yards (Eric Dickerson
1984 LA Rams). Shown below are Peterson’s
rushing yards in each game:
84 60 86 102 88 79 153 123
182 171 108 210 154 212 86 199
We would like to estimate Adrian Peterson’s
mean rushing ABILITY during the 2012-2013
season with a confidence interval.

When we select simple random samples of
size n, the sample means we find will vary
from sample to sample. We can model the
distribution of these sample means with a
probability model that is
 

N  ,

n

z
x

n
Note thatSD( x ) 

n
SD( x ) 

n
The sample standard deviation s provides an estimate
of the population standard deviation 
For a sample of size n,
1
2
s

(
x

x
)
 i
the sample standard deviation s is:
n 1
n − 1 is the “degrees of freedom.”
The value s/√n is called the standard error of x ,
denoted SE(x).
s
SE( x ) 
n

Substitute s (sample standard deviation)
for 
x
x
z
zs s
s
s
s







ss


n
s
n
Note quite correct
Not knowing  means using z is no
longer correct
Suppose that a Simple Random Sample of size n is drawn from a
population whose distribution can be approximated by a N(µ, σ)
model. When  is known, the sampling model for the mean x is N(,
/√n).
When  is estimated from the sample standard deviation s,
the sampling model for the mean x follows a t distribution
t(, s/√n) with degrees of freedom n − 1.
x 
t
s n
is the 1-sample t statistic

CONFIDENCE
INTERVAL for 
s
x t
 where:
n




t = Critical value
from t-distribution
with n-1 degrees of
freedom
x = Sample mean
s = Sample standard
deviation
n = Sample size



For very small samples (n <
15), the data should follow a
Normal model very closely.
For moderate sample sizes (n
between 15 and 40), t methods
will work well as long as the
data are unimodal and
reasonably symmetric.
For sample sizes larger than
40, t methods are safe to use
unless the data are extremely
skewed. If outliers are present,
analyses can be performed
twice, with the outliers and
without.
Very similar to z~N(0, 1)
 Sometimes called Student’s t distribution;
Gossett, brewery employee
 Properties:
i) symmetric around 0 (like z)
ii) degrees of freedom 

if  > 1, E(t ) = 0
if  > 2,  =   - 2, which is always
bigger than 1.
x - x
z =
x
x - x
s
t =
, sx =
sx
n
Z
-3
-3
-2
-2
-1
-1
00
11
22
33
z=
x - x
x - x
t=
s
n

n
Z
t
-3
-3
-2
-2
-1
-1
00
11
22
33
Figure 11.3, Page 372
Degrees of Freedom
s =
x - x
t=
s
n
s2
n
s2 =
2
(X

X)
 i
i=1
Z
n -1
t1
-3
-3
-2
-2
-1
-1
00
11
22
33
Figure 11.3, Page 372
Degrees of Freedom
s =
x - x
t=
s
n
s2
n
s2 =
2
(X

X)
 i
i=1
Z
n -1
t1
t7
-3
-3
-2
-2
-1
-1
00
11
22
33
Figure 11.3, Page 372

90% confidence interval; df = n-1 = 10
Degrees of Freedom
1
2
.
.
10
0.80
3.0777
1.8856
.
.
1.3722
0.90
6.314
2.9200
.
.
1.8125
0.95
0.98
12.706
4.3027
.
.
2.2281
31.821
6.9645
.
.
2.7638
.
.
.
.
.
.
.
.
.
.
100

1.2901
1.282
1.6604
1.6449
1.9840
1.9600
s
90% confidenceint erval: x  1.8125
11
2.3642
2.3263
0.99
63.657
9.9250
.
.
3.1693
.
.
2.6259
2.5758
P(t > 1.8125) = .05
P(t < -1.8125) = .05
.90
.05
-1.8125
0
.05
1.8125
t10
z
z
z
z
=
=
=
=
1.645
1.96
2.33
2.58
Conf.
level
90%
95%
98%
99%
n = 30
t = 1.6991
t = 2.0452
t = 2.4620
t = 2.7564
In the 2012-2013 NFL season Adrian
Peterson of the Minn. Vikings rushed for
2,097 yards. The all-time single-season
rushing record is 2,105 yards (Eric Dickerson
1984 LA Rams). Shown below are Peterson’s
rushing yards in each game:
84 60 86 102 88 79 153 123
182 171 108 210 154 212 86 199
Construct a 95% confidence interval for
Peterson’s mean rushing ABILITY during the
2012-2013 season.
x  131.06 s  51.59
s
x t
n
d. f .  n 1
x  131.06 s  51.59
degrees of freedom 16  1  15
from t - t able,for 95% confidencet  2.1314
s
51.59
x t
 131.06  2.1314
n
16
 131.06  27.49  103.57,158.55
We are 95% confidentt hat t heint erval
(103.57,158.55)cont ainsP et erson's mean
rushing ABILIT Yper game.



Because cardiac deaths increase after
heavy snowfalls, a study was conducted
to measure the cardiac demands of
shoveling snow by hand
The maximum heart rates for 10 adult
males were recorded while shoveling
snow. The sample mean and sample
standard deviation were
x 175
s the
15 population mean
Find a 90%
CI ,for
max. heart rate for those who shovel
snow.
s
x t
n
d. f .  n 1
x  175, s 15 n  10
From t he t - t able, t 1.8331
15
175 1.8331
 175 8.70
10
 (166.30, 183.70)
We are 90% confidentt hat t heint erval
(166.30,183.70)cont ainst hemean
maximumheart rat efor snow shovelers
Determining Sample Size to
Estimate 
Required Sample Size To Estimate a
Population Mean 
• If you desire a C% confidence interval
for a population mean  with an
accuracy specified by you, how large
does the sample size need to be?
• We will denote the accuracy by ME,
which stands for Margin of Error.
Example: Sample Size to Estimate a
Population Mean 
• Suppose we want to estimate the
unknown mean height  of male
undergrad students at NC State with a
confidence interval.
• We want to be 95% confident that our
estimate is within .5 inch of 
• How large does our sample size need to
be?
Confidence Interval for 
In terms of the margin of error ME,
the CI for  can be expressed as
x  ME
The confidence interval for  is
 s 
x t 

 n
*  s 
so ME  tn 1 

 n
*
n 1
So we can find the sample size by solving
this equation for n:
ME  t
*
n 1
 s 


 n
t s
which gives n  

 ME 
*
n 1
2
• Good news: we have an equation
• Bad news:
1. Need to know s
2. We don’t know n so we don’t know the
degrees of freedom to find t*n-1
A Way Around this Problem: Use
the Standard Normal
Use the corresponding z* from the standard normal
to form the equation
 s 
ME  z 

n


Solve for n:
*
 zs
n

ME


*
2
Sampling distribution of x
Confidence level
.95


  1.96
n

  1.96
n
ME
ME
set ME  1.96
 1.96  
n

ME


2

n
and solve for n
Estimating s
• Previously collected data or prior
knowledge of the population
• If the population is normal or nearnormal, then s can be conservatively
estimated by
s  range
6
• 99.7% of obs. Within 3  of the mean
Example: sample size to
estimate mean height µ of
NCSU undergrad. male students
 z s 
n

ME


*
We want to be 95% confident that we are
within .5 inch of , so
 ME = .5; z*=1.96
• Suppose previous data indicates that s
is about 2 inches.
• n= [(1.96)(2)/(.5)]2 = 61.47
• We should sample 62 male students
2
Example: Sample Size to Estimate a
Population Mean -Textbooks
• Suppose the financial aid office wants to
estimate the mean NCSU semester textbook
cost  within ME=$25 with 98% confidence.
How many students should be sampled?
Previous data shows  is about $85.
2
 z *σ 
 (2.33)(85) 
n
 
  62.76
25


 ME 
round up to n = 63
2
Example: Sample Size to Estimate a Population
Mean -NFL footballs
• The manufacturer of NFL footballs uses a machine to
inflate new footballs
• The mean inflation pressure is 13.5 psi, but
uncontrollable factors cause the pressures of
individual footballs to vary from 13.3 psi to 13.7 psi
• After throwing 6 interceptions in a game, Peyton
Manning complains that the balls are not properly
inflated.
The manufacturer wishes to estimate the
mean inflation pressure to within .025 psi
with a 99% confidence interval. How many
footballs should be sampled?
Example: Sample Size to Estimate a n   z * 
Population Mean 
 ME 
• The manufacturer wishes to estimate the mean
inflation pressure to within .025 pound with a 99%
confidence interval. How may footballs should be
sampled?
• 99% confidence  z* = 2.58; ME = .025
•  = ? Inflation pressures range from 13.3 to 13.7 psi
• So range =13.7 – 13.3 = .4;   range/6 = .4/6 = .067
 2.58 .067
n
  47.8  48
 .025 
2
. . .
1
2
3
48
2
Chapter 23
Hypothesis Tests for a
Population Mean 
33
25 pitchers with highest average
fastball velocity:
2007:
y1  95.92mph
2013:
 96.33mph of
s2 the
.794mph
Was they2 ABILITY
top 25
pitchers in 2013 to throw hard
greater than the ABILITY of the
top 25 pitchers in 2007 to throw
hard?
As in any hypothesis tests, a hypothesis test for
 requires a few steps:
1. State the null and alternative hypotheses (H0 versus HA)
a) Decide on a one-sided or two-sided test
2. Calculate the test statistic t and determine its degrees of
freedom
3. Find the area under the t distribution with the t-table or
technology
4. Determine the P-value with technology (or find bounds on
the P-value) and interpret the result
Step 1:
State the null and alternative hypotheses (H0 versus HA)
Decide on a one-sided or two-sided test
H0:  = 0 versus HA:  > 0 (1 –tail test)
H0:  = 0 versus HA:  < 0 (1 –tail test)
H0:  = 0 versus HA:  ≠ 0 (2 –tail test)
Step 2:
obtain data and calculate y and s
We perform a hypothesis test with null
hypothesis
H0 :  = 0 using the test statistic
y  0
t
SE ( y )
where the standard error of y is .
s
SE ( y ) 
n
When the null hypothesis is true, the test
statistic follows a t distribution with n-1
degrees of freedom. We use that model to
obtain a P-value.
The one-sample t-test; P-Values
Recall:
The P-value is the probability, calculated assuming the null
hypothesis H0 is true, of observing a value of the test statistic
more extreme than the value we actually observed.
The calculation of the P-value depends on whether the
hypothesis test is 1-tailed
(that is, the alternative hypothesis is
HA : < 0 or HA :  > 0)
or 2-tailed
(that is, the alternative hypothesis is HA :  ≠ 0).
38
P-Values
Assume the value of the test statistic t is t0
If HA:  > 0, then P-value=P(t > t0)
If HA:  < 0, then P-value=P(t < t0)
If HA:  ≠ 0, then P-value=2P(t > |t0|)
39
25 pitchers with highest average
fastball velocity:
2007:
y1  95.92mph
2013:
 96.33mph of
s2 the
.794mph
Was they2 ABILITY
top 25
pitchers in 2013 to throw hard
greater than the ABILITY of the
top 25 pitchers in 2007?
H0: μ = 95.92
HA: μ > 95.92
where  is the average fastball
velocity of the top 25 2013 pitchers
t, 24 df
H0: μ = 95.92
HA: μ > 95.92
.008
 n = 25; df = 24
0
y  96.33, s  .794
t 
yμ
96.33  95.92

 2.58
s
.794
n
25
2. 58
P-value = .008
P  value  P(t > 2.58)
Reject H0: Since P-value < .05, there is sufficient evidence
that top 25 pitchers in 2013 on average throw harder
Conf. Level
Two Tail
One Tail
df
24
0.1
0.9
0.45
0.3
0.7
0.35
0.5
0.5
0.25
0.1270
0.3900
0.6848
0.7
0.3
0.15
0.8
0.9
0.2
0.1
0.1
0.05
Values of t
1.0593 1.3178 1.7109
0.95
0.05
0.025
0.98
0.02
0.01
0.99
0.01
0.005
2.0639
2.4922
2.7969
t
y  0
s
1.02  0

 2.58
n 1.196 10
2.4922 < t = 2.58 < 2.7969; thus 0.01 < p < 0.005.
t, 24 df
.008
0
2. 58
42
A popcorn maker wants a combination
of microwave time and power that
delivers high-quality popped corn with
less than 10% unpopped kernels, on
average. After testing, the research
department determines that power 9 at
4 minutes is optimum. The company
president tests 8 bags in his office
microwave and finds the following
percentages of unpopped kernels: 7,
13.2, 10, 6, 7.8, 2.8, 2.2, 5.2.
Do the data provide evidence that the
mean percentage of unpopped kernels is
less than 10%?
H0: μ = 10
HA: μ < 10
where μ is true unknown mean percentage of unpopped
kernels
t, 7 df
H0: μ = 10
HA: μ < 10
.02
 n = 8; df = 7
0
y  6.775, s  3.64
t 
y
6.775  10

 2.51
s
3.64
n
8
-2. 51
Exact P-value = .02
P  value  P(t < 2.51)
Reject H0: there is sufficient evidence that true mean
percentage of unpopped kernels is less than 10%
Conf. Level
Two Tail
One Tail
df
7
0.1
0.9
0.45
0.3
0.7
0.35
0.5
0.5
0.25
0.1303
0.4015
0.7111
0.7
0.3
0.15
0.8
0.9
0.2
0.1
0.1
0.05
Values of t
1.1192 1.4149 1.8946
0.95
0.05
0.025
0.98
0.02
0.01
0.99
0.01
0.005
2.3646
2.9980
3.4995
2.3646 < |t| = 2.51 < 2.9980 so .01 < P-value < .025