Transcript Slide 1
HOW MUCH SLEEP DID YOU GET LAST
NIGHT?
1.
2.
3.
4.
5.
6.
<6
6
7
8
9
>9
17%
17%
17%
17%
17%
17%
Slide
1- 1
1
2
3
4
5
6
UPCOMING IN CLASS
Homework #11 due Sunday
Start working on the final step of your data
project
Quiz #4 is Wednesday November 14th
Exam #2 is Wednesday November 28th
Slide
1- 2
POSSIBLE TESTS
One-proportion z-test
Two-proportion z-test
One-sample t-test for mean
Two-sample t-test for differences of means
Slide
1- 3
EXAMPLES OF ONE-PROPORTION TEST
Everyone (100%) believes in ghosts
More than 10% of the population believes in
ghosts
Less than 2% of the population has been to jail
90% of the population wears contacts
Slide
1- 4
EXAMPLES OF ONE-SAMPLE T-TEST
All Priuses have fuel economy > 50 mpg
Ford Focuses get 5 mpg on average
The average starting salary for ISU graduates
>$100,000
The average cholesterol level for a person with
diabetes is 240.
Slide
1- 5
ETHANOL PLANT AND VOC LEVELS
Near Peoria, IL a struggling ethanol plant was recently
fined for toxic wastewater pollution.
In the midwest, the EPA reviews the VOC levels for 25
ethanol plants. The mean VOC for these plants is 300
tons per year, with a SD of 250.
Most of the plants have bypassed a stringent EPA
permitting process because they claimed to have levels
of VOC emission lower than 100 tons a year.
MORE CAUTIONS ABOUT INTERPRETING
CONFIDENCE INTERVALS
Remember that interpretation of your confidence
interval is key.
What NOT to say:
“95% of all the ethanol plants pollute betwee
between 214.45 and 385.55 tons per year.”
The confidence interval is about the mean not the
individual values.
“We are 95% confident that a randomly
selected ethanol plant will pollute between
214.45 and 385.55 tons per year.”
Again, the confidence interval is about the mean not
the individual values.
Slide
1- 7
MORE CAUTIONS ABOUT INTERPRETING
CONFIDENCE INTERVALS (CONT.)
What NOT to say:
“The mean pollution level of is 300 tons per year 95%
of the time.”
The true mean does not vary—it’s the confidence
interval that would be different had we gotten a
different sample.
“95% of all samples will have pollution levels between
214.45 and 385.55 tons per year.”
The interval we calculate does not set a standard for
every other interval—it is no more (or less) likely to
be correct than any other interval.
Slide
1- 8
CHAPTER 23
Inference About Means
FROM PROPORTIONS TO MEANS
Now that we know how to create confidence
intervals and test hypotheses about proportions,
it’d be nice to be able to do the same for means.
Just as we did before, we will base both our
confidence interval and our hypothesis test on
the sampling distribution model.
NOTATION FOR THE THEORY
The Central Limit Theorem told us that the
sampling distribution model for means is Normal
with mean μ and standard deviation
SD y
All we need is a random sample of quantitative
data.
And the true population standard deviation, σ.
Well, that’s a problem…
n
APPLYING THE THEORY TO OUR SAMPLE
Proportions have a link between the proportion value
and the standard deviation of the sample proportion.
This is not the case with means—knowing the sample
mean tells us nothing about SD( y)
We’ll do the best we can: estimate the population
parameter σ with the sample statistic s.
Our resulting standard error is
s
SE y
n
CAN’T USE NORMAL MODEL, MUST USE TDISTRIBUTION
We now have extra variation in our standard error
from s, the sample standard deviation.
And, the shape of the sampling model changes—the
model is no longer Normal. So, what is the sampling
model?
THE T-DISTRIBUTION AND OUR SAMPLE
A practical sampling distribution model for
means
When the conditions are met, the standardized sample
mean
y
t
SE y
follows a Student’s t-model with n – 1 degrees of
freedom.
s
We estimate the standard error with SE y
n
Slide
1- 14
T-DISTRIBUTION
Student’s t-models are unimodal, symmetric, and
bell shaped, just like the Normal.
But t-models with only a few degrees of freedom
have much fatter tails than the Normal.
Slide
1- 15
T-DISTRIBUTION
As the degrees of freedom increase, the t-models
look more and more like the Normal.
In fact, the t-model with infinite degrees of
freedom is exactly Normal.
Slide
1- 16
GOSSET’S T
William S. Gosset, an employee of the Guinness
Brewery in Dublin, Ireland, worked long and
hard to find out what the sampling model was.
The sampling model that Gosset found has been
known as Student’s t.
The Student’s t-models form a whole family of
related distributions that depend on a parameter
known as degrees of freedom.
We often denote degrees of freedom as df, and
the model as tdf.
Slide
1- 17
FINDING T-VALUES BY HAND
The Student’s tmodel is different for
each value of degrees
of freedom.
Because of this,
Statistics books
usually have one
table of t-model
critical values for
selected confidence
levels.
Slide
1- 18
USING THE T-TABLES FIND THE
FOLLOWING CRITICAL-T VALUES
What is the critical value of t for a 90%
confidence interval with df=18?
What is the critical value of t for a 99%
confidence interval with df 78?
Slide
1- 19
FIND P-VALUES
Online Program
http://www.tutorhomework.com/statistics_tables/statistics_tables.
html
TI-83/TI-84
http://www.keymath.com/documents/sia2/Calcula
torNotes_Ch09_SIA2.pdf
Slide
1- 20
USING YOUR SOFTWARE, FIND THE
FOLLOWING P-VALUES
What is the p-value for t≥2.61 with 4 degrees of
freedom?
What is the p-value for |t|>1.81 with 21 degrees
of freedom?
What is the p-value for |t|<1.53 with 21 degrees
of freedom?
Slide
1- 21
A CONFIDENCE INTERVAL FOR MEANS? (CONT.)
One-sample t-interval for the mean
When the conditions are met, we are ready to find the
confidence interval for the population mean, μ.
The confidence interval is
SE y
n1
where the standard error of the mean is
y t
s
SE y
n
The critical value depends on the particular
confidence level,
tn*1C, that you specify and on the number
of degrees of freedom, n – 1, which we get from the
sample size.
Slide
1- 22
HW 11 – PROBLEM 6
A nutrition lab retests the sodium content of hot
dogs.
This time they use a sample of 75 ‘reduced
sodium’ frankfurters instead of 40.
The NEW sample produces a mean of 319mg
with a SD of 31.
The OLD sample produced a mean of 310mg with
a SD of 36.
Slide
1- 23
SHOULD THE LARGER SAMPLE PRODUCE A
MORE ACCURATE PREDICTION OF REDUCED
SODIUM?
50%
50%
1.
2.
More accurate
Less accurate
Slide
1- 24
1
2
WHAT IS THE SE FOR THE NEW SAMPLE?
1.
2.
3.
4.
31/sqrt(75)
36/sqrt(75)
31/sqrt(40)
36/sqrt(40)
25%
25%
25%
25%
Slide
1- 25
1
2
3
4
FIND THE 95% CI, FOR THE NEW SAMPLE
1.
2.
3.
4.
319± 1.960*3.58
319± 1.992*3.58
319± 1.665*3.58
319± 2.021*3.58
20%
20%
20%
20%
20%
Slide
1- 26
1
2
3
4
5
INTERPRET
1.
25%
2.
25%
3.
25%
4.
25%
95% of all “reduced sodium” hot dogs will have a
sodium content that falls within the interval
We are 95% confident the interval contains the true
mean of sodium content in this type of “reduced
sodium” hot dog
The interval contains the true mean sodium content
in this type of “reduced sodium” hot dogs 95% of the
time
95% of the sodium content in this type of “reduced
sodium” hot dog will be contained in the interval.
Slide
1- 27
FOOD LABELING REGULATIONS REQUIRE THAT
FOOD IDENTIFIED AS “REDUCED SODIUM” MUST
HAVE AT LEAST 30% LESS SODIUM THAN ITS
REGULAR COUNTERPART.
Let’s say, we find that the regular hot dog has 465mg
of sodium.
Slide
1- 28
SHOULD THIS HOT DOG BE LABELED “REDUCED”
BASED ON THE SAMPLE?
1.
2.
3.
Yes, b/c a 95% CI is less than the maximum allowable
sodium for a ‘reduced sodium’ frank
No, b/c a 95% CI extends above the maximum
allowable sodium for a ‘reduced sodium’ frank
No, b/c a 95% CI is less than the maximum allowable
sodium for a ‘reduced sodium’ frank
Slide
1- 29
A TEST FOR THE MEAN
One-sample t-test for the mean
The assumptions and conditions for the one-sample t-test
for the mean are the same as for the one-sample tinterval.
We test the hypothesis H0: = 0 using the statistic
y 0
tn1
SE y
The standard error of the sample mean is
s
SE y
n
When the conditions are met and the null hypothesis is
true, this statistic follows a Student’s t model with n – 1
df. We use that model to obtain a P-value.
Slide
1- 30
INTERVALS AND TESTS (CONT.)
More precisely, a level C confidence interval
contains all of the possible null hypothesis values
that would not be rejected by a two-sided
hypothesis test at alpha level
1 – C.
So a 95% confidence interval matches a 0.05
level test for these data.
Confidence intervals are naturally two-sided, so
they match exactly with two-sided hypothesis
tests.
When the hypothesis is one sided, the
corresponding alpha level is (1 – C)/2.
Slide
1- 31
SAMPLE SIZE
To find the sample size needed for a particular
confidence level with a particular margin of error (ME),
solve this equation for n:
ME t
n 1
s
n
The problem with using the equation above is that we
don’t know most of the values. We can overcome this:
We can use s from a small pilot study.
We can use z* in place of the necessary t value.
Slide
1- 32
HW 11 PROBLEM 4
Slide
1- 33
WOULD A 99% CI BE WIDER OR
NARROWER THAN 98% CI?
1.
2.
3.
Wider
Narrower
Would remain the same
33%
33%
33%
Slide
1- 34
1
2
3
WHAT ARE THE (DIS)ADVANTAGES OF THE
98% CI?
1.
2.
3.
The 98% CI has a less chance of containing the
true mean than the 99% CI, but 99% CI is more
precise (narrower) than the 98% CI.
The 99% CI has a less chance of containing the
true mean than the 98% CI, but 98% CI is more
precise (narrower) than the 99% CI.
The 98% CI has a less chance of containing the
true mean than the 99% CI, but 98% CI is more
precise (narrower) than the 99% CI.
Slide
1- 35
SUPPOSE WE DECREASE OUR SAMPLE SIZE
FROM N=55 TO N=25. HOW WOULD WE EXPECT
OUR 98% CI TO CHANGE?
33%
1.
2.
3.
33%
33%
Wider
Narrower
The same
Slide
1- 36
1
2
3
HOW LARGE A SAMPLE WOULD YOU NEED TO
ESTIMATE THE MEAN BODY TEMPERATURE TO
WITHIN
1.
2.
3.
4.
5.
0.1 DEGREES WITH 99% CONFIDENCE.
0-100
101-200
201-300
301-400
400+
20%
20%
20%
20%
20%
Slide
1- 37
1
2
3
4
5
UPCOMING IN CLASS
Homework #11 due Sunday
Start working on the final step of your data
project
Quiz #4 is Wednesday November 14th
Exam #2 is Wednesday November 28th
Slide
1- 38