Chapter 9: Inferences for Two –Samples

Download Report

Transcript Chapter 9: Inferences for Two –Samples

Chapter 9: Inferences for
Two –Samples
Yunming Mu
Department of Statistics
Texas A&M University
Outline
1 Overview
2 Inferences about Two Means: Independent
Small Samples
3 Inferences about Two Means: Independent
and Large Samples
4 Inferences about Two Proportions
5 Inferences about Two Means: Matched Pairs
and
Overview
There are many important and meaningful
situations in which it becomes necessary
to compare two sets of sample data.
Definitions
Two Samples: Independent
The sample values selected from one
population are not related or somehow paired
with the sample values selected from the
other population.
If the values in one sample are related to the
values in the other sample, the samples are
dependent. Such samples are often referred
to as matched pairs or paired samples.
Example
Do male and female college students differ with respect to
their fastest reported driving speed?
Population of all male
college students
Sample of n1 = 17 males
report average of 102.1 mph
Population of all female
college students
Sample of n2 = 21 females
report average of 85.7 mph
Graphical summary of
sample data
Gender
male
female
75
85
95
105
115
125
Fastest Driving Speed (mph)
135
145
Numerical summary of
sample data
Gender
female
male
Gender
female
male
N
21
17
Mean
85.71
102.06
SE Mean
2.05
4.14
Median TrMean
85.00
85.26
100.00 101.00
Minimum
75.00
75.00
StDev
9.39
17.05
Maximum Q1
Q3
105.00 77.50 92.50
145.00 90.00 115.00
The difference in the sample means is
102.06 - 85.71 = 16.35 mph
The Question
in Statistical Notation
Let M = the average fastest speed of all male students.
and F = the average fastest speed of all female students.
Then we want to know whether M  F.
This is equivalent to knowing whether M - F  0
All possible questions
in statistical notation
In general, we can always compare two averages by seeing
how their difference compares to 0:
This comparison… is equivalent to …
1  2
1 - 2  0
1 > 2
1 - 2 > 0
1 < 2
1 - 2 < 0
Set up hypotheses
• Null hypothesis:
– H0: M = F [equivalent to M - F = 0]
• Alternative hypothesis:
– Ha: M  F [equivalent to M - F  0]
Inferences about Two Means:
Independent and Small Samples
Pooled Two-Sample T Test
and T Interval
Assumptions:
1. The two samples are independent.
2. Both samples are normal or the two sample sizes are small, n1 < 30
and n2 < 30
3. Both variances are unknown but equal. Assume variances are
equal only if neither sample standard deviation is more than twice
that of the other sample standard deviation.
Confidence Intervals
Normal Samples w/ Unknown Equal
Variance
(x1 - x2) - E < (µ1 - µ2) < (x1 - x2) + E
where
E  t / 2,n1  n2 2 S p 2 (1/ n1  1/ n2 )
S p2
(n1  1) S12  (n2  1) S2 2

(n1  n2  2)
Leaded vs Unleaded
Each of the cars selected for the EPA study
was tested and the number of miles per
gallon for each was obtained and recorded
(Leaded=1 and Unleaded=2).
Leaded (1)
Unleaded(2)
n
11
10
x
17.2
19.9
S
2.1
2
95% Confidence Interval
  0.05, t / 2,n  n  2  t0.025,1110 2  2.093
1
2
(11  1)2.1  (10  1)2.0

 4.216
(11  10  2)
2
Sp
2
2
1 1
17.2  19.9  2.093* 4.216(  )
11 10
(-4.58, -0.82)
Pooled Two-Sample T Tests
Normal Samples w/ unknown Variance
X 1  X 2  ( 1  2 )
t
S p 2 (1/ n1  1/ n2 )
P-value:
Use t distribution with n1+n2-2 degrees of
freedom and find the P-value by following the same
procedure for t tests summarized in Ch 8.
Critical values:
t ,n1 n2 2
tests.
Based on the significance level ,
use t ,n n 2 for upper tail tests, use
1
2
for lower tail tests and use | t / 2,n1 n2 2
|
for two tailed
Leaded vs Unleaded
Claim: 1 < 2
Ho : 1 = 2
H1 : 1 < 2
 = 0.01
Reject H
t0.05,19  1.729
0
-1.729
Fail to reject H0
t
Leaded vs Unleaded
Pooled Two-Sample T Test
Claim: 1 < 2
Ho : 1 = 2 H1 : 1 < 2
t
X 1  X 2  ( 1  2 )
 = 0.05
17.2  19.9  0

 3.01
2
4.216(1/11  1/10)
S p (1/ n1  1/ n2 )
Leaded vs Unleaded
Claim: 1 < 2
Ho : 1 = 2
H1 : 1 < 2
 = 0.01
Reject H
There is significant evidence to
support the claim that the leaded
cars have a lower mean mpg
than unleaded cars
0
Fail to reject H0
Reject Null
sample data:
t = - 3.01
-1.729
t
P-value=0.0077(=area of red region)
Two-Sample T Test and T Interval
Assumptions:
1. The two samples are independent.
2. Both samples are normal or the
two sample sizes are small, n1 < 30
and n2 < 30
3. Both variances are unknown but
unequal
Confidence Intervals
Normal Samples w/ Unknown Unequal
Variance
(x1 - x2) - E < (µ1 - µ2) < (x1 - x2) + E
where
E  t / 2,v
S12 S 2 2

n1
n2
[( se1 )2  ( se2 )2 ]2
S1
S2
v
, se1 
, se2 
4
4
( se1 ) ( se2 )
n1
n2

n1  1 n2  1
(round v down to the nearest integer)
Unpooled Two Sample T-Test
Normal Samples w/ Unknown Variance
t
X 1  X 2  ( 1  2 )
S2 2 / n1  S12 / n2
P-value:
Use t distribution with v degrees of freedom
and find the P-value by following the same procedure for t
tests summarized in Ch 8.
Critical values:
t ,v
tests.
Based on the significance level ,
use t
for upper tail tests, use
 ,v
for lower tail tests and use
| t / 2,v |
for two tailed
Example
We compare the density of two different types
of brick. Assuming normality of the two
densities
distributions and unequal unknown
x1
variances, test if there is a difference in the
mean densities of two different types of brick.
Type I brick
Type 2 brick
n
6
5
x
22.73
21.95
S
0.10
0.24
Unpooled Two-Sample T-Test
Ho : 1 = 2
t
H1 : 1  2  = 0.05
X 1  X 2  ( 1  2 )
S12 / n1  S2 2 / n2

22.73  21.95  0
0.12 / 6  0.242 / 5
 6.792
v  6,| t0.025,6 || 2.446 |
P-Value = 0.001; Reject the null and conclude that
there is significant difference in the mean densities
of the two types of brick
Two-sample t-test in Minitab
• Select Stat. Select Basic Statistics.
• Select 2-sample t to get a Pop-Up window.
• Click on the radio button before Samples in one
Column. Put the measurement variable in
Samples box, and put the grouping variable in
Subscripts box.
• Specify your alternative hypothesis.
• If appropriate, select Assume Equal Variances.
• Select OK.
Pooled two-sample t-test
Two sample T for Fastest
Gender
female
male
N
21
17
Mean
85.71
102.1
StDev
9.39
17.1
SE Mean
2.0
4.1
95% CI for mu (female) - mu (male ): ( -25.2, -7.5)
T-Test mu (female) = mu (male ) (vs not =): T = -3.75
P = 0.0006
DF = 36
Both use Pooled StDev = 13.4
(Unpooled) two-sample t-test
Two sample T for Fastest
Gender
female
male
N
21
17
Mean
85.71
102.1
StDev
9.39
17.1
SE Mean
2.0
4.1
95% CI for mu (female) - mu (male ): ( -25.9, -6.8)
T-Test mu (female) = mu (male ) (vs not =): T = -3.54
P = 0.0017
DF = 23