3680 Lecture 17

Download Report

Transcript 3680 Lecture 17

Math 3680
Lecture #17
Two-Sample Inference
Two-Sample Data:
Matched Pairs
Example. An industrial safety
program was recently instituted.
Ten similar plants recorded the
average weekly loss (averaged
over a month) in man-hours due
to accidents. The chart shows
the results both before and after
the safety program was
implemented.
Is the data statistically
significant for the effectiveness
of the safety program?
Plant
1
2
3
4
5
6
7
8
9
10
Before
30.5
18.5
24.5
32
16
15
23.5
25.5
28
18
After
23
21
22
28.5
14.5
15.5
24.5
21
23.5
16.5
Note: We encountered this kind of problem before
with the sign test. However, the sign test did not take
into account the magnitudes of the differences
between the before and after data; instead, the sign test
only looked at which one was larger.
Using our improved techniques of hypothesis testing,
we are now able to give more a more powerful test to
determine the effectiveness of the safety program.
Solution: Let X denote the before data, and Y the
after data. Let D = X - Y, the difference between the
two.
H0 : m D = 0
Ha : m D > 0
Significance level: a = 0.05.
Plant
1
2
3
4
5
6
7
8
9
10
Before
30.5
18.5
24.5
32
16
15
23.5
25.5
28
18
After
23
21
22
28.5
14.5
15.5
24.5
21
23.5
16.5
Difference Avg Diff
7.5
2.15
-2.5
2.5
SD Diff
3.5
3.00046293
1.5
-0.5
Test Statistic
-1
2.26594933
4.5
4.5
P Value
1.5
0.02484552
The problem now reduces to regular one-variable
hypothesis testing, which we know well by now.
Using the P-value method, we reject the null
hypothesis. It appears that the safety program was
effective.
Example. Mechanical science engineers studied the impact of
infrasound (sound waves at a frequency below the audibility
range of the human ear) on a person’s blood pressure. Five
university students were exposed to infrasound for one hour.
See table.
a) Does it appear that the mean systolic blood pressure
changed as a result of the infrasound?
b) Find a 95% (two-sided) confidence interval for the mean
difference in blood pressure.
C. Y. H. Qibai and H. Shi, Journal
of Low Frequency Noise, Vibration
and Active Control, Vol. 23 (2004)
Student
1
2
3
4
5
Before
105
113
106
126
113
After
118
129
117
134
115
Two-Sample Data:
Independent Samples with
Different Variances
Previously, we had problems in which the data was
obviously paired. However, it’s not uncommon to
compare two different data sets which are not paired.
Example. The Ohio EPA collected Index of Biotic
Integrity (IBI) measurements for sites located in two
Ohio river basins; high IBIs indicate healthier fish
populations. Does it appear that the IBI values are the
same for both locations?
River Basin Sample Size
Muskingum
53
Hocking
51
Mean
0.035
0.34
SD
1.046
0.96
E. L. Boone, Y. Keying and E. P. Smith, Journal of Agricultural,
Biological, and Environmental Sciences, Vol. 10 (2005)
Notice that this is a different problem than the one-sample
problems that we saw earlier. Before, a typical question would
be “Is the mean less than 0.4?”. Now, the question is, “Is there
a difference?”
For such problems, we can use all of the previous machinery
of confidence intervals and hypothesis testing. However, a
couple of things will be different:
• The computation of the standard error (and hence the test
statistic), and
• The computation of the number of degrees of freedom (when
using the Student’s t-distribution).
Let’s define D  X  Y . As discussed in the past,
m D  E ( X )  E (Y )  m X  mY ,
Var ( D)  Var ( X )  Var (Y )
 Var ( X )  Var (Y ) 
2
X

2
X
nX


2
Y
nY
2
Y
s
s
SE 

n X nY
2
D
We will typically be testing if the means are equal, in
which case the null hypothesis will be mD = 0.
Furthermore, we will use Welch’s formula for
computing the number of degrees of freedom:
2
s
s 

 
n X nY 

df 
2
2
1  sX 
1  sY2
  

n X  1  n X  nY  1  nY
2
X
2
Y



2
,
rounded down to the nearest integer. (There isn’t
precise agreement on this, but we’ll defer this
discussion to a more advanced statistics class.)
Example. The Ohio EPA collected Index of Biotic
Integrity (IBI) measurements for sites located in two
Ohio river basins; high IBIs indicate healthier fish
populations. Does it appear that the IBI values are the
same for both locations?
River Basin Sample Size
Muskingum
53
Hocking
51
Mean
0.035
0.34
SD
1.046
0.96
E. L. Boone, Y. Keying and E. P. Smith, Journal of Agricultural,
Biological, and Environmental Sciences, Vol. 10 (2005)
Solution.
H0 : mM = mH, or mD = 0.
Ha : mM = mH, or mD  0.
Critical value: a = 0.05.
River Basin Sample Size
Muskingum
53
Hocking
51
Mean
0.035
0.34
SD
1.046
0.96
Now the cumbersome part:
2
M
2
H
s
s
SE D 

nM n H
2
(1.046) (0.96)


53
51
 0.196759
River Basin Sample Size
Muskingum
53
Hocking
51
Mean
0.035
0.34
2
SD
1.046
0.96
2
s
s 



nM n H 

df 
2
2
2



1
sM
1
sH

 

nM  1  nM  n H  1  n H
2
M
2
H



2
2
 (1.046) (0.96) 



53
51 


 101.776,
2
2
2
2



1 (1.046)
1 (0.96) 

  

52  53  50  51 
2
so we use 101 degrees
of freedom.
2
River Basin Sample Size
Muskingum
53
Hocking
51
Mean
0.035
0.34
SD
1.046
0.96
Test statistic:
River Basin Sample Size
Muskingum
53
Hocking
51
D  mD
t
SED
(0.035  0.34)  0

0.196759
 1.55012.
The critical values are 1.98373.
Mean
0.035
0.34
SD
1.046
0.96
0.4
12.42%
0.3
0.2
0.1
-1.984 -1.55
1.55 1.984
We fail to reject the null hypothesis. There is not
enough evidence to think that the mean IBI values are
different at these two locations.
Example. While carpets are nice in hospitals, they
may not be sanitary. In a Montana hospital, bacteria
levels per cubic foot of air were tested in 8 carpeted
and uncarpeted rooms.
a) Are the bacteria levels in the uncarpeted rooms lower than
in the carpeted rooms?
b) Find a 95% confidence interval for the difference in the
mean number of bacteria per cubic foot of air.
Carpeted 11.8
Uncarpeted 12.1
8.2
8.3
7.1
3.8
13.0 10.8 10.1 14.6 14.0
7.2 12.0 11.1 10.1 13.7
W. G .Walter and A. Stober, Journal of Environmental Health, Vol. 30,
p. 405 (1968)
Two-Sample Data:
Testing for Proportions with
Independent Samples
Example. A mobile computer network consists of
computers that maintain wireless communication with
one another as they move about a given area. Two
different protocols are compared. With protocol A,
170 of 200 (85%) sent messages were successfully
received. With protocol B, 123 of 150 (82%) sent
messages were successfully received. Can we
conclude that protocol A has the higher success rate?
T. Camp et. al., Proceedings of the IEEE International Conference on
Communications pp. 3318-3324 (2002)
This problem is a case where the X and Y
populations measure proportions which are assumed
to be equal under the null hypothesis. Then
m D  E ( X )  E (Y )      0
 (1   )  (1   )
Var ( D)  Var ( X )  Var (Y ) 
1
1
SED   (1   )


n X nY
nX

nY
1
1
p(1  p)

n X nY
In the last formula, the estimate p is pooled, meaning
that we compute the total number of successes over
the total number of trials.
Also, just like our previous problems regarding
proportions, we use the normal distribution and not the
Student t-distribution. Our test statistic will therefore
be labeled z.
As a consequence, we do not have to compute the
degrees of freedom for these kind of problems.
Example. (Repeated for convenience) A mobile
computer network consists of computers that maintain
wireless communication with one another as they
move about a given area. Two different protocols are
compared. With protocol A, 170 of 200 (85%) sent
messages were successfully received. With protocol B,
123 of 150 (82%) sent messages were successfully
received. Can we conclude that protocol A has the
higher success rate?
T. Camp et. al., Proceedings of the IEEE International Conference on
Communications pp. 3318-3324 (2002)
Solution.
H0 :  X =  Y
Ha :  X   Y
Significance level: a = 0.05
Under the null hypothesis, the proportion  of
received messages are the same under both protocols,
and thus the population variance (1  ) is the same.
For p, we use the pooled proportion from both
samples:
170  123
p
 0.837
200  150
1
1
SED  p (1  p)

n X nY
1
1
 (0.837)(0.163)

 0.0399
200 150
Test statistic:
D  mD
z
SED
(0.85  0.82)  0

0.0399
 0.75222
The critical values are 1.96.
0.4
45.21%
0.3
0.2
0.1
-1.96
-0.75188
0.75188
1.96
We fail to reject the null hypothesis. There is not
enough evidence to think that the proportions of
successful deliveries are different.
Example. Researchers studied coliform bacteria
counts among particles found in wastewater samples.
Of 161 particles that were 75-80 mm in diameter, 19
contained coliform bacteria. Of 95 particles that were
90-95 mm in diameter, 22 contained coliform bacteria.
Can we conclude that the larger particles are more
likely to contain coliform bacteria?
R. Emerick et. al., Water Environment Research pp. 432-438 (2000)