Transcript Document
381
Testing for Independence
QSCI 381 – Lecture 41
(Larson and Farber, Sect 10.2)
Independence
381
Two variables are independent if the
occurrence of one variable does not
affect the probability of the other.
We often wish to examine whether two
variables are independent:
Age and having a “high” heavy metal
concentration.
Concerns regarding the most important factors
influencing a fishery and occupation.
Contingency Tables
381
An
shows the
observed frequencies for two variables.
The observed frequencies are arranged
in r rows and c columns. The
intersection of a row and a column is
called a cell.
381
Example-A-1
Age-class
High heavy metals?
1-10
11-20
21-30
31-40
41+
Yes
12
16
22
21
16
No
219
180
232
190
75
We wish to examine whether having a high
concentration of heavy metals is independent of age.
Expected Frequencies
381
The expected frequency for a cell Er,c in a
contingency table is:
Er ,c
(sum of row r ) x(sum of column c)
Sample size
Age-class
Total
High heavy metals?
1-10
11-20
21-30
31-40
41+
Yes
20.44
17.35
22.48
18.67
8.05
87
No
210.56 178.65 231.52
192.33
82.95
896
211
91
983
Total
231
196
254
381
The Chi-square Test for Independence-I
A
is used
to test the independence of two variables.
The conditions for use of this test are:
the observed frequencies must be obtained from a
random sample; and
each expected frequency must be greater than or
equal to 5.
The null hypothesis for the test is that the
variables are independent and the alternative
hypothesis is that they are dependent.
381
The Chi-square Test for Independence-II
The way this test works is to compare the
observed frequencies with the expected
frequencies (these expected frequencies are
calculated assuming that the two variables
are independent).
If the value of the test statistic is high then
we reject the null hypothesis of
independence.
381
The Chi-square Test for Independence-III
The test statistic for the chi-square
independence test is:
2
i
j
(Oi , j Ei , j )2
Ei , j
where Oij represents the observed frequencies
and Eij represents the expected frequencies.
The sampling distribution for the test statistic is
a chi-square distribution with degrees of
freedom (r-1)(c-1).
Example-A-2
381
Age-class
High heavy
metals?
1-10
11-20
21-30
31-40
41+
Yes
3.488
0.105
0.010
0.290
7.840
No
0.339
0.010
0.001
0.028
0.761
2
i
j
(Oi , j Ei , j )2
Ei , j
12.871
The value of the test statistic is in the rejection
region for =0.05 but not for =0.01.
381
Using EXCEL to conduct Chi-square Tests.
EXCEL includes a function CHITEST which can
be used to test for independence.
CHITEST(observed range, expected range)
CHITEST returns the probability associated
with the test statistic, i.e. it returns
CHIDIST(2,(r-1)(c-1)).
The result of applying CHITEST to the data
for the example is 0.011922, i.e. a probability
less than 0.05 and greater than 0.01.
Example-B-1
381
We sample 150 animals and assess the fraction in
each of four categories to be:
Mature
Female
30
Mature
Male
40
Immature
Female
32
Immature
Male
48
Test the null hypothesis that sex and maturity
state are independent (=0.01).
Example-B-2
381
Mature
Immature
Female
30 (28.93)
32 (33.07)
Male
40 (41.07)
48 (46.93)
2=0.1256
We cannot reject the null hypothesis of independence. We
did reject the null hypothesis that these data are consistent
with a “healthy” marine mammal population.
Homogeneity of Proportions
381
The chi-square test can be used to test
the null hypothesis that proportions in
various categories are equal among
several populations.
The alternative hypothesis for this test
is that at least one proportion differs
among populations.