Transcript document

Chapter 13: Comparing Two Population Parameters
13.1 – Comparing Two Means
Comparative studies are more convincing than singlesample investigations, so one-sample inference is not as
common as comparative (two-sample) inference. In a
comparative study, we may want to compare two
treatments, or we may want to compare two
populations. In either case, the samples must be chosen
randomly and independently in order to perform
statistical inference.
How is this different than a matched pairs design?
A matched pairs design is when you compare two
similar things given the same treatment. This is when
you are comparing two sets of samples given different
treatments!
Two-Sample inference:
Compare two treatments or two populations. The null
hypothesis is that there is no difference between the
two parameters.
H o : 1  2
or
H o : 1  2  0
Review:
How do you subtract two means?
1 –
2
How do you subtract two standard deviations?
12 + 22
Add their variances and take their square roots!
Two Sample Z:
Two Sample T:
 is known
SRS
 is not known
Normality
Normality
SRS
•Population approx normal
•Population approx normal
• n1 + n2  30 by CLT
• n1 + n2  30 by CLT
• n1 + n2 < 30 and data
doesn’t have strong
skewness
Independence
N  10n
The two samples are
independent
Independence
N  10n
The two samples are
independent
Note!
The t* statistic does not have an exact t-distribution.
The degrees of freedom are calculated differently.





 n1 n2 
2
s1
df 
2
2 
s1
2
s2
2
2


s2
1
1
  

n1  1  n1 
n2  1  n2



Your calculator will do this for you!
2
Confidence Interval:
estimate  test statistic  sd
Two Sample Z:
 x1  x2   Z *
Two Sample T:
 x1  x2   tdf *
2
1
n1
2
s1
n1


2
2
n2
2
s2
n2
Hypothesis Test:
estimate – hypothesized value
test statistic = standard deviation of statistic
Two Sample Z:
x1  x2    1  2 

Z
2
1
n1
Two Sample T:
2
2

n2
x1  x2    1  2 

t
s12 s22

n1 n2
Calculator Tip!
Two Sample Z:
Two Sample T:
STAT-TESTS- 2-SampZtest
STAT-TESTS- 2-SampTtest
STAT-TESTS- 2-SampZInt
STAT-TESTS- 2-SampTInt
Note: The only time you pool is when the standard
deviations are the same. This almost never happens, so just
don’t do it!
Example #1
Patients with heart-attack symptoms arrive at an emergency
room either by ambulance or self-transportation provided by
themselves, family, or friends. When a patient arrives at the
emergency room, the time of arrival is recorded. The time when
the patient’s diagnostic treatment begins is also recorded.
An administrator of a large hospital wanted to determine whether
the mean wait time (time between arrival and diagnostic
treatment) for patients with heart-attack symptoms differ
according to the mode of transportation. A random sample of 150
patients with heart-attack symptoms who had reported to the
emergency room was selected. For each patient, the mode of
transportation and wait time were recorded. Summary statistics
for each mode of transportation are shown in the table below.
Mode of
Transportation
Sample
Size
Mean Wait Time
(in minutes)
Standard
Deviation of
Wait Time
(in minutes)
Self
73
8.30
5.16
Ambulance
77
6.04
4.30
a. Use a 99% confidence interval to estimate the difference
between the mean wait times for ambulance transported
patients and self-transported patients at this emergency
room.
P:
μS = mean wait time for diagnostic treatment if
traveled by self-transportation
μA = mean wait time for diagnostic treatment if
traveled by ambulance
μD = μA - μS = Difference in wait times
A:
SRS (says so)
Normality nA + nS  30
73 + 77 ≥ 30
150 ≥ 30
By the CLT, ok to
assume normality
Independence
(More than 1500 people with heart-attack symptoms)
Self-transported patients shouldn’t influence the wait
time in ambulance transported patients
N: Two-Sample t-interval
I:
df 
2
sA
 xA  xS   tdf *
 s 2A sS2 



 n A nS 
2
nA

2
sS
nS
2
1  s12 
1  sS2


 
nA  1  nA 
nS  1  nS



2

 5.162 4.302 



73
77


1  5.162

73  1  73
2
2

1  4.32
 

77  1  77

0.36586

 140.3717611
0.00185  0.0007587



2

 xA  xS   t100 *
2
sA
nA

2
sS
nS
 4.30 5.16 

 6.04  8.30   2.626 

73 
 77
2.26  2.626  0.3557 
2.26  0.93409
 4.302,
 0.218
Note: Using the calculator!
*
x

x

t
 A S  df
 6.04  8.30 
2
sA
nA

2
sS
nS

*
 t140.37 
 4.2910,
4.30 5.16 


73 
 77
 0.2291
C:
I am 99% confident the true mean difference of wait
time of ambulance and self-transported patients is
between –4.2910 and –0.2291 minutes
b. Based only on this confidence interval, do you think the
difference in the mean wait times is statistically significant?
Justify your answer.
Since 0 is not in the confidence interval, we can say that the
ambulance wait times are statistically significantly shorter
than the wait times for self-transported patients at the 99%
confidence level.
Example #2: The following is a list of salary rates (per hour in
dollars) for men and women with a high school diploma.
Women
Men
8
10.6
7.5
11.9
8.25
10.8
8.5
11.95
9
11
8.5
12
9.25
11.5
9.85
12
9.35
11.9
10.5
12
9.8
12.25
10.5
12.5
9.95
12.5
10.5
13
10
12.5
10.9
13.7
10
12.95
10.95
13.75
10
13.9
11
14.5
10.25
13.95
11
14.75
10.5
14.45
11.65
15
10.5
14.8
11.9
15.5
If the two samples are independent and are taken randomly, is
there significant evidence that the men make more money than
the women? Assume that in past experience  = 1.99 dollars for
men and  = 2.01 for women.
P:
μM = mean dollars per hour for men with high
school diploma
μW = mean dollars per hour for women with
high school diploma
μD = μM - μW = Difference in dollars per hour
H:
H o : M  W
or
Ho : M  W  0
H A : M  W
or
H A : M  W  0
A:
SRS (says so)
Normality nM + nW  30
26 + 26 ≥ 30
52 ≥ 30
By the CLT, ok to
assume normality
Independence
(More than 520 people with engineer degree)
Men’s salaries shouldn’t influence the salaries of
women with high school diploma. Also, says
independent
N: Two-Sample Z-Test
T:
Z
( xM  xW )  ( M  W )
2
M
nM
2
W

nW
0.68653
 1.2376
0.5547

(11.76153  11.075)  (0)
1.992 2.012

26
26
O:
1.24
P(Z > 1.24) = 1 – P(Z < 1.24) =
1.24
P(Z > 1.24) = 1 – P(Z < 1.24) = 1 – 0.8925 = 0.1075
M:
>
p ____

0.1075
0.05
Accept the Null
S:
There is not enough evidence to say that men with a
high school diploma make more money per hour than
women.
13.2 – Comparing Two Proportions
If we want to compare two populations or compare the responses
to two treatments from independent samples, we look at a twosample proportion:
Ho : p1  p2
or
Ho : p1  p2  0
Conditions for Proportion Interval:
SRS
Normality
n1 pˆ1  5
n1 1  pˆ1   5
Independence
n2 pˆ 2  5
n2 1  pˆ 2   5
N  10(n1 + n2)
The two samples are independent
Confidence Interval:
estimate  test statistic  sd
pˆ 1 (1  pˆ 1 ) pˆ 2 (1  pˆ 2 )
 pˆ 1  pˆ 2   z *

n1
n2
Conditions for Proportion Test:
count of success in both samples
pˆ C 

count of individuals from both samples
SRS
Normality
n1 pˆ C  5
n1 1  pˆ C   5
Independence
n2 pˆ C  5
n2 1  pˆ C   5
N  10(n1 + n2)
The two samples are independent
x1  x2
n1  n2
Hypothesis Test:
estimate – hypothesized value
test statistic = standard deviation of statistic
z
 pˆ1  pˆ 2 
1 1 
pˆ c (1  pˆ c )   
 n1 n2 
Calculator Tip!
Confidence Interval:
STAT-TESTS- 2-PropZInt
Hypothesis Test
STAT-TESTS- 2-PropZTest
Note: The only time you pool is when the standard
deviations are the same. This almost never happens, so just
don’t do it!
Example #1
An election is bitterly contested between two rivals. In a poll of
750 potential voters taken 4 weeks before the election, 420
indicated a preference for candidate Grumpy over candidate
Dopey. Two weeks later, a new poll of 900 randomly selected
potential voters found 465 who plan to vote for Grumpy.
Dopey immediately began advertising that support for Grumpy
was slipping drastically and that he was going to win the
election. Statistically speaking (at the 0.05 level), how happy
should Dopey be?
P:
p1 = true proportion of people who want Grumpy to win in
1st poll
p2 = true proportion of people who want Grumpy to win in
1st poll
pD = p1 - p2 = Difference in proportion of people in 1st poll
and second
H:
Ho : p1  p2
or
Ho : p1  p2  0
H A : p1  p2
or
H A : p1  p2  0
SRS (Says in second one only. Must assume the first)
Normality
885
count of success in both samples
x1  x2
 0.536
pˆ C 


count of individuals from both samples
n1  n2 1650
n1 pˆ C  5
(750)(0.536)  5
402.27  5
n1 1  pˆ C   5
(750) 1  0.536   5
347.73  5
n2 pˆ C  5
(900)(0.536)  5
482.73  5
n2 1  pˆ C   5
(900) 1  0.536  5
417.27  5
Independence
Safe to assume there were more than 10(750+900),
or 16,500 voters
The first poll might have influenced the second poll,
proceed with caution!
N: 2-PropZTest
T:
z
 pˆ1  pˆ 2 
1 1 
pˆ c (1  pˆ c )   
 n1 n2 
0.0433
 1.7576
0.02466

 0.56  0.5167 
1 
 1
0.536(1  0.536) 


750
900



O:
1.75
P(Z > 1.75) = 1 – P(Z < 1.75) =
1.24
P(Z > 1.24) = 1 – P(Z < 1.24) = 1 – 0.9599 = 0.0401
Or, by calculator:
P(Z > 1.24) = 0.03941
M:
<
p ____

0.03941
Reject the Null
0.05
S:
There is enough evidence to say that the proportion of
voters that support Grumpy has dropped from the 1st
poll to the second.
Dopey should be very happy!
Example #2
Two groups of 40 randomly selected students were selected to be
part of a study on drop-out rates. One group was enrolled in a
counseling program designed to give them skills needed to
succeed in school and the other group received no special
counseling. Fifteen of the students who received counseling
dropped out of school, and 23 of the students who did not
receive counseling dropped out. Construct a 90% confidence
interval for the true difference between the drop-out rates of the
two groups.
P: pC = true proportion of students who drop out with
counseling
pN = true proportion of students who drop out without any
counseling
pD = pC - pD = Difference in proportion of students who
drop out with counseling vs. without
A: SRS
(says in both groups)
Normality
n1 pˆ1  5
n2 pˆ 2  5
(40)(0.375)  5
 40 0.575  5
15  5
23  5
n1 1  pˆ1   5
n2 1  pˆ 2   5
 401  0.375  5
 401  0.575  5
25  5
17  5
Independence
Safe to assume there were more than 10(40+40), or
800 students
The drop out rate of the group with counseling might
influence the group without counseling. Proceed with
caution!
N: 2-PropZInt
I:
pˆ 1 (1  pˆ 1 ) pˆ 2 (1  pˆ 2 )
 pˆ 1  pˆ 2   z *

n1
n2
0.375(1  0.375) 0.575(1  0.575)

 0.375  0.575  1.645
40
40
 0.2  1.645  0.1094
 0.3799,
 0.0201
C:
I am 90% confident the true difference in the
proportion of dropouts with counseling vs. without
counseling is between –0.3799 and –0.0201.
It appears that drop out rates are lower with the
group that got counseling than without it.