Transcript Document

381
Hypothesis Testing
(Testing with Two Samples-III)
QSCI 381 – Lecture 32
(Larson and Farber, Sects 8.3 – 8.4)
Independent and Dependent Samples
381

Two samples are
if the
sample selected from one population is
not related to the sample selected from
the second population. The two
samples are
if each member
of one sample corresponds to a
member of the other sample.
Dependent samples are also called
or matched samples.
Examples
381

Which are independent and dependent
samples?




25 fish in each of two ponds are weighed.
Weights of 25 fish in a pond on two
successive days.
Weights and lengths of 30 fish.
Heights of 25 males and 25 females.
t-test for the Difference Between Means-I
(Conditions)
381

A t-test can be used to test the
difference of two population means
when a sample is randomly selected
from each population.



The samples must be randomly selected.
The samples must be dependent (paired).
Both populations must be normally
distributed.
t-test for the Difference Between Means-II
381


The approaches in lectures 30 and 31 only apply
to independent samples.
Dependent data are analyzed by considering the
difference for each pair:
di  x1,i  x2,i

The test statistic is the mean difference:
d  1n  di
i
381
t-test for the Difference Between Means-III

The test statistic is:
d  1n  di
i
and the standardized test statistic is:
d  d
t
sd / n
sd 
n( di2 )  ( di ) 2
n(n  1)
d is the hypothesized mean of the differences
of the paired data in the population.
d.f. = n-1.
Example-I
381

You are evaluating a program that aims to recover
degraded streams. The data available are “environmental
scores” before and after the recovery program. Prior to the
start of the recovery program, the contractors claimed that
the “environmental score” would increase by an average of
more than 5 points. Evaluate the claim at the 5% level of
significance.
Example-II
381
1.
2.
3.
H0: d  5; Ha: d > 5;
The level of significance is 0.05, the d.f.=15-1=14, and
we have a right-tailed test. The rejection region is
therefore t > 1.76.
The standard deviation of the differences, sd, is given
by:
sd 
4.
n( di2 )  ( di ) 2
 3.622
The standardized test statistic is:
d  d
5.82  5
t
5.
n(n  1)
sd / n

3.622 / 15
 0.877
We fail to reject the null hypothesis.
Note that d  5 but will still fail to reject the null hypothesis – why?
Constructing a c-confidence Interval
381

To construct a confidence interval for
d, use the following inequality:
sd
sd
d  tc
  d  d  tc
n
n

Construct a 90% confidence interval for
d for Example I.
3.622
3.622
5.82  1.76
 d  5.82  1.76
15
15
4.173  d  7.467
Two sample z-test for the difference
between proportions-I
381

We can test the difference between two
population proportions p1 and p2 based on
samples from each population. We can use the ztest if the following conditions are true:



The samples are randomly selected.
The samples are independent.
The sample sizes are large enough to use a normal sampling
distribution assumption, i.e.:
n1 p1  5; n1q1  5; n2 p2  5; n2q2  5;
Two sample z-test for the difference
between proportions-II
381

ˆ1  pˆ 2 , the difference
The sampling distribution for p
between the sample proportions, is a normal distribution
with mean difference:
 pˆ  pˆ  p1  p2
1
2
and standard error:
 pˆ  pˆ 
1
2
p1 q1 p2 q2

n1
n2
The standard error can be approximated by:
 pˆ  pˆ 
1
1
1 1
pq  
 n1 n2 
p
x1  x2
n1  n2
Two sample z-test for difference between
proportions-III
381
1.
2.
3.
State H0 and Ha.
Identify  and find the critical values(s) and
rejection region(s).
Find the weighted estimate of p̂1 and p̂2:
p
4.
Calculate the standardized test statistic:
z
5.
x1  x2
n1  n2
( pˆ1  pˆ 2 )  ( p1  p2 )
1 1
pq  
 n1 n2 
Make a decision to reject or fail to reject the null
hypothesis.
Example-I
381

One expectation of creating a marine reserve is that
the fraction of “large” fish should increase. 100 fish
are sampled from each of two areas (one a reserve
and another actively fished). Test whether the
fraction of “large” fish in the reserve and the fished
area differ at the 1% level of significance.
Reserve
Non-reserve
n1=100
n2=100
x1=17
x2=6
Example-II
381



H0: p1=p2; Ha: p1p2.
=0.01; rejection region |z|>2.576.
The weighted proportion estimate is:
p

The standardized test statistic:
z

x1  x2 17  6

 0.115
n1  n2
200
( pˆ1  pˆ 2 )  ( p1  p2 )
1 1
pq  
 n1 n2 

(0.17  0.06)  0
 2.438
0.115 x 0.885 x (1/100  1/100)
We fail to reject the null hypothesis at the
1% level of significance.
381
Constructing a c-confidence Interval

To construct a confidence interval for
p1-p2, use the following inequality:
( pˆ1  pˆ 2 )  zc

pˆ1 qˆ1 pˆ 2 qˆ2

 p1  p2  ( pˆ1  pˆ 2 )  zc
n1
n2
pˆ1 qˆ1 pˆ 2 qˆ2

n1
n2
Construct a 95% confidence interval for
p1-p2 for Example I.
0.11  1.96
0.17 x 0.83 0.06 x 0.94
0.17 x 0.83 0.06 x 0.94

 p1  p2  0.11  1.96

100
100
100
100
0.023  p1  p2  0.197
Review
381
Are the samples
independent?
No
Use t-test for
dependent samples
Yes
Are both samples
large?
Yes
Use z-test for
large independent
samples
No
Are both
populations normal?
No
Cannot use any
of the tests
No
Use a t-test for
small independent
samples.
Yes
Are both population
standard
deviations known?
Yes
Use z-test