10.2 Comparing Two Means

Download Report

Transcript 10.2 Comparing Two Means

10.2 Comparing Two Means
Objectives
SWBAT:
• DESCRIBE the shape, center, and spread of the sampling distribution of the difference
of two sample means.
• DETERMINE whether the conditions are met for doing inference about µ1 − µ2.
• CONSTRUCT and INTERPRET a confidence interval to compare two means.
• PERFORM a significance test to compare two means.
• DETERMINE when it is appropriate to use two-sample t procedures versus paired t
procedures.
What is meant by “the sampling distribution of the difference between two
means?”
Both x1 and x 2 are random variables. The statistic x1 - x 2 is the difference
of these two random variables. In Chapter 6, we learned that for any two
independent random variables X and Y,
mX -Y = mX - mY and s X2-Y = s X2 + s Y2
The Sampling Distribution of the Difference Between Sample Means
Choose an SRS of size n1 from Population 1 with mean µ1 and
standard deviation σ1 and an independent SRS of size n2 from
Population 2 with mean µ2 and standard deviation σ2.
Shape When the population distributions are Normal, the sampling distribution
of x1 - x 2 is approximately Normal. In other cases, the sampling distribution will
be approximately Normal if the sample sizes are large enough (n1 ³ 30,n 2 ³ 30).
Spread The standard deviation of the sampling distribution of x1 - x 2 is
s 12
s 22
+
n1 n 2
as long as each sample is no more than 10% of its population (10% condition).
However:
This measures how far the difference in sample means will typically be from the
difference in population means if we repeat the random sampling or random
assignment many times.
What is the formula for the two-sample t statistic? Is this on the formula
sheet? What does it measure?
t=
(x1 - x 2 ) - ( m1 - m2 )
s12 s2 2
+
n1 n 2
It measures how far the
difference in the sample means
is from 0, in standardized units.
Conditions for Performing Inference About µ1 - µ2
• Random: The data come from two independent random samples or
from two groups in a randomized experiment.
o 10%: When sampling without replacement, check that
n1 ≤ (1/10)N1 and n2 ≤ (1/10)N2.
• Normal/Large Sample: Both population distributions (or the true
distributions of responses to the two treatments) are Normal or both
sample sizes are large (n1 ≥ 30 and n2 ≥ 30). If either population
(treatment) distribution has unknown shape and the corresponding
sample size is less than 30, use a graph of the sample data to
assess the Normality of the population (treatment) distribution. Do
not use two-sample t procedures if the graph shows strong
skewness or outliers.
What distribution does the two-sample t statistic have? Why do we
use a t statistic rather than a z statistic? How do you calculate the
degrees of freedom?
The two-sample t statistic has approximately a t distribution. We can use
technology to determine degrees of freedom OR we can use a conservative
approach, using the smaller of n1 – 1 and n2 – 1 for the degrees of freedom.
We use t rather than z when we do not know the population standard
deviation.
Alternate Example: Leaking Helium
After buying many helium balloons only to see them deflate within a couple of days, Erin and
Jenna decided to test if helium-filled balloons deflate faster than air-filled balloons. To find out,
they bought 60 balloons of the same type and randomly divided them into two piles of 30,
filling the balloons in the first pile with helium and the balloons in the second pile with air.
Then, they measured the circumference of each balloon immediately after being filled and
again three days later. The average decrease in circumference of the helium-filled balloons was
26.5 cm with a standard deviation of 1.92 cm. The average decrease of the air-filled balloons
was 2.1 cm with a standard deviation of 2.79 cm.
(a) Why was it important that they used the same type of balloons? What is this called in
experiments?
This is called control. Using the same type of balloon eliminates a possible source of variability,
making the standard deviations smaller and increasing the power.
b) Do these data provide convincing evidence that helium-filled balloons deflate faster than airfilled balloons?
Alternate Example: Leaking Helium
After buying many helium balloons only to see them deflate within a couple of days, Erin and
Jenna decided to test if helium-filled balloons deflate faster than air-filled balloons. To find
out, they bought 60 balloons of the same type and randomly divided them into two piles of 30,
filling the balloons in the first pile with helium and the balloons in the second pile with air.
Then, they measured the circumference of each balloon immediately after being filled and
again three days later. The average decrease in circumference of the helium-filled balloons was
26.5 cm with a standard deviation of 1.92 cm. The average decrease of the air-filled balloons
was 2.1 cm with a standard deviation of 2.79 cm.
Alternate Example: Leaking Helium
After buying many helium balloons only to see them deflate within a couple of days, Erin and
Jenna decided to test if helium-filled balloons deflate faster than air-filled balloons. To find
out, they bought 60 balloons of the same type and randomly divided them into two piles of 30,
filling the balloons in the first pile with helium and the balloons in the second pile with air.
Then, they measured the circumference of each balloon immediately after being filled and
again three days later. The average decrease in circumference of the helium-filled balloons was
26.5 cm with a standard deviation of 1.92 cm. The average decrease of the air-filled balloons
was 2.1 cm with a standard deviation of 2.79 cm.
Do:
t = 39.46
p=0
df = 51.4
Alternate Example: Leaking Helium
After buying many helium balloons only to see them deflate within a couple of days, Erin and
Jenna decided to test if helium-filled balloons deflate faster than air-filled balloons. To find
out, they bought 60 balloons of the same type and randomly divided them into two piles of 30,
filling the balloons in the first pile with helium and the balloons in the second pile with air.
Then, they measured the circumference of each balloon immediately after being filled and
again three days later. The average decrease in circumference of the helium-filled balloons was
26.5 cm with a standard deviation of 1.92 cm. The average decrease of the air-filled balloons
was 2.1 cm with a standard deviation of 2.79 cm.
c) Interpret the p-value you got in part (b) in the context of this study.
There is approximately a 0 probability of getting a difference of 24.4 cm or more by
chance alone, assuming that the mean decrease in circumference is the same for
helium-filled and air-filled balloons.
Is it OK to use your calculator for the Do step? Are there any drawbacks?
We already know we can use it, no partial credit…
Note: Make sure you always say “NO” to pooling when doing inference for means.
Same conditions as before.
Two-Sample t Interval for a Difference Between Two Means
Alternate Example: Chocolate Chips
Ashtyn and Olivia wanted to know if generic chocolate chip cookies have as many chocolate
chips as name-brand chocolate chip cookies, on average. To investigate, they randomly selected
10 bags of Chips Ahoy cookies and 10 bags of Great Value cookies and randomly selected 1
cookie from each bag. Then, they carefully broke apart each cookie and counted the number of
chocolate chips in each. Here are their results:
Chips Ahoy: 17, 19, 21, 16, 17, 18, 20, 21, 17, 18
Great Value: 22, 20, 14, 17, 21, 22, 15, 19, 26, 18
(a) Construct and interpret a 99% confidence interval for the difference in the mean number of
chocolate chips in Chips Ahoy and Great Value cookies.
Alternate Example: Chocolate Chips
Ashtyn and Olivia wanted to know if generic chocolate chip cookies have as many chocolate
chips as name-brand chocolate chip cookies, on average. To investigate, they randomly selected
10 bags of Chips Ahoy cookies and 10 bags of Great Value cookies and randomly selected 1
cookie from each bag. Then, they carefully broke apart each cookie and counted the number of
chocolate chips in each. Here are their results:
Chips Ahoy: 17, 19, 21, 16, 17, 18, 20, 21, 17, 18
Great Value: 22, 20, 14, 17, 21, 22, 15, 19, 26, 18
• Normal/Large Sample: Because there is no obvious
skewness or outliers in the graphs below, it is safe to use
t procedures.
Alternate Example: Chocolate Chips
Ashtyn and Olivia wanted to know if generic chocolate chip cookies have as many chocolate
chips as name-brand chocolate chip cookies, on average. To investigate, they randomly selected
10 bags of Chips Ahoy cookies and 10 bags of Great Value cookies and randomly selected 1
cookie from each bag. Then, they carefully broke apart each cookie and counted the number of
chocolate chips in each. Here are their results:
Chips Ahoy: 17, 19, 21, 16, 17, 18, 20, 21, 17, 18
Calculator:
Great Value: 22, 20, 14, 17, 21, 22, 15, 19, 26, 18
2-SampTInt
List1:
List 2:
Freq1: 1
Freq2: 1
C-Level: 0.99
Pooled: No
(-4.814, 2.814)
df = 13.145
Alternate Example: Chocolate Chips
Ashtyn and Olivia wanted to know if generic chocolate chip cookies have as many chocolate
chips as name-brand chocolate chip cookies, on average. To investigate, they randomly selected
10 bags of Chips Ahoy cookies and 10 bags of Great Value cookies and randomly selected 1
cookie from each bag. Then, they carefully broke apart each cookie and counted the number of
chocolate chips in each. Here are their results:
Chips Ahoy: 17, 19, 21, 16, 17, 18, 20, 21, 17, 18
Great Value: 22, 20, 14, 17, 21, 22, 15, 19, 26, 18
Conclude: We are 99% confident that the interval from −4.814 to 2.814 captures the true
difference in the mean number of chocolate chips in Chips Ahoy and Great Value
cookies.
b) Does your interval provide convincing evidence that there is a difference in the mean
number of chocolate chips?
Because the interval includes 0, there is not convincing evidence that there is a
difference in the mean number of chocolate chips in Chips Ahoy and Great Value
chocolate chip cookies.
When doing two-sample t procedures, should we pool the data to estimate a
common standard deviation? Is there any benefit? Are there any risks?
• Do not pool. Ever.
• Pooling assumes the two populations have the same variance and that the
population distributions are exactly Normal.
• In the real world, distributions are not exactly Normal, and the population
variances are not exactly equal. Therefore, do not pool.
• Note: The benefits to pooling are that we have more degrees of freedom (n-2), which
leads to narrower confidence intervals and smaller p-values. However, this is only
when population variances are equal and population distributions are exactly Normal.
As seen above, this is rarely the case. DON’T POOL!!!
What about a two-sample test for a difference in proportions? Why do we pool
for this test?
• We pool for proportions because we need a way to estimate p. We don’t
have to worry about that with means.
Should you use two-sample t procedures with paired data? Why not? How can
you know which procedure to use?
• Do not use two-sample t procedures on paired data.
• Think about it this way:
Would the test be messed up if the data were scrambled in the lists?
• If the answer is yes, then the data are paired and you should be using the
differences.
• When determining your procedure, pay attention to the context of the
problem. It is going to indicate what procedure you want to use.
Alternate Example: Testing with distractions
Suppose you are designing an experiment to determine if students perform better on
tests when there are no distractions, such as a teacher talking on the phone. You have
access to two classrooms and 30 volunteers who are willing to participate in your
experiment.
a) Design an experiment so that a two-sample t test would be the appropriate
inference method.
• On 15 index cards write “A” and on 15 index cards write “B.”
• Shuffle the cards and hand them out at random to the 30 volunteers. All 30 subjects will
take the same reading comprehension test.
• Subjects who receive A cards will go to a classroom with no distractions, and subjects who
receive B cards will go to a classroom that will have the proctor talking on the phone during
the test.
• At the end of the experiment, compare the mean score for subjects in room A with the
mean score for subjects in room B.
Alternate Example: Testing with distractions
Suppose you are designing an experiment to determine if students perform better on
tests when there are no distractions, such as a teacher talking on the phone. You have
access to two classrooms and 30 volunteers who are willing to participate in your
experiment.
b) Design an experiment so that a paired t test would be the appropriate inference
method.
• Using the same procedure in part (a), divide the subjects into two rooms and give them the
same reading comprehension test. One room will be distraction free, and the other room
will have a proctor talking on the phone.
• Then, after a short break, give all 30 subjects a similar reading comprehension test but have
the distraction in the opposite room.
• At the end of the experiment, calculate the difference in the two reading comprehension
scores for each subject and compare the mean difference with 0.
Alternate Example: Testing with distractions
Suppose you are designing an experiment to determine if students perform better on
tests when there are no distractions, such as a teacher talking on the phone. You have
access to two classrooms and 30 volunteers who are willing to participate in your
experiment.
c) Which experimental design is better? Explain.
The experimental design in part (b) is better since it eliminates an important source of
variability – the reading comprehension skills of the individual subjects. MORE
POWER!
d) What is the purpose of random assignment in this experiment?
To make sure one environment (treatment) isn’t favored by always going first (or
second).