2-D Chi-Square

Download Report

Transcript 2-D Chi-Square

1
Two-dimensional Chi-square
• Sometimes, we want to classify cases on two
dimensions at the same time – for example, we
might want to classify newly-qualified physicians
on the basis of their choice of type of practice
and their sex.
• If we did this, we could ask whether there is any
relationship between the two – that is, are
women and men equally likely to choose each
type?
2
Two-dimensional Chi-square
• If we classify a set of cases on two dimensions,
and the two dimensions are independent of each
other, then the proportions of events in the
categories on one dimension should be the same
in all the categories on the other dimension:
• Thus, if choice of type of medical practice is
independent of sex, then the proportions of men
choosing various types of practice should be the
same as the proportions of women…
3
Two-dimensional Chi-square
Sex
Male
Female
Specialty
Rural GP City GP
5
20
20
80
Specialist
Σ
15
40
60
160
In this data set, there are four times as many
women in the sample as men. There are also
four times as many women in each specialty –
thus, choice of specialty appears to be
independent of sex.
4
Two-dimensional Chi-square
Sex
Male
Female
Specialty
Rural GP
City GP
16
100
4
25
Specialist
44
11
Σ
160
40
In this data set, there are four times as many men
as women – but again the proportions are constant
across specialties. Again, choice of specialty
appears to be independent of sex.
5
Two-dimensional Chi-square
Sex
Male
Female
Specialty
Rural GP
City GP
20
45
5
65
Specialist
35
30
Σ
100
100
In this data set, there are equal numbers of women
and men. But the proportions vary across
specialties – thus, choice of specialty appears to be
dependent on sex.
6
Two-dimensional Chi-square
• The null hypothesis in the two-dimensional chisquare test is that the two dimensions are not
related (that is, they are independent). To test
this hypothesis, we need to compute expected
values for each of the cells defined by the two
dimensions.
• In there were 25 rural GPs in our sample, and if
type of practice were independent of sex, then
half of the rural GPs should be men and half
women.
7
Two-dimensional Chi-square
• Our expected values two proportions: the
proportion of the sample in each sex category
and the proportion in each practice category:
Specialty
Sex
Rural GP
City GP
Specialist
Σ
Male
12.5
55
32.5
100
Female
12.5
55
32.5
100
Σ 25
110
65
200
8
Two-dimensional Chi-square
• We’ll step through the calculations:
Sex
Male
Female
Specialty
Rural GP City GP
12.5
55
12.5
55
Specialist
Σ
32.5
100
32.5
100
Σ 25
65
110
200
9
Sex
Rural GP
Specialty
City GP Specialist
Male
12.5
55
32.5
100
Female
12.5
55
32.5
100
110
65
200
Σ 25
Σ
10
Two-dimensional Chi-square
Thus, expected values are computed as
Expected value = (Row total * column total)
sum of observations
If you can do that, you can do the 2-dimensional
chi-square.
11
Two-dimensional Chi-square
For the physicians example, we compute:
χ2 = [20-12.5]2 + [5-12.5]2 + [45-55]2 + [65-55]2
12.5
12.5
55
55
+ [35-32.5]2 + [30-32.5]2
32.5
32.5
= 13.0209
12
Two-dimensional Chi-square
For the 2-D chi-square, degrees of freedom are:
(r-1)(c-1)
where r = # of rows and c = # of columns.
Here, r = 2, c = 3, so d.f. = 1 * 2 = 2.
Thus, χ2crit = χ 2(.05,2) = 5.99147. Our decision is to
reject the null hypothesis (that the two dimensions
are independent).
13
Formula for compute expected values
More generally, the rule for working out expected
values in two dimensional classifications is:
Ê(nij) = ri * cj
n
where n = total number of observations (cases in
the sample)
14
2-D Chi-square – Example 1
1. At a recent meeting of the Coin Flippers Society,
each member flipped three coins simultaneously
and the number of tails occurring was recorded.
b.) Subsequently, the number of tails each member
flipped was determined for different value coins.
The data are shown on the next slide as the number
of members throwing different numbers of tails
with different value coins.
15
Chi-square – Example 1b
Coin
Value
.05
.10
.25
0
20
24
21
Number of Tails
1
2
55
72
70
70
57
52
Is there evidence that the number of tails is
affected by coin value? (α = .05)
3
15
24
20
16
Chi-square – Example 1b
HO: The two classifications are independent
HA: The two classifications are dependent
Test statistic: χ2 =
Σ
[nij – Ê(nij)]2
Ê(nij)
Rejection region: χ2obt > χ2crit = χ2(.05, 6) = 12.5916
17
Chi-square – Example 1b
The first step is to compute the expected values
for each cell, using the formula:
Ê(nij) = ri * cj
n
For the top left cell, we get: (65) (162) = 21.06
500
18
Chi-square – Example 1b
Using the formula for all the other cells gives:
.05
.10
.25
0
21.06
24.44
19.50
1
58.99
68.43
54.60
2
62.86
72.94
58.20
3
19.12
22.18
17.70
We are now ready to compute χ2 obtained.
19
Chi-square – Example 1b
χ2obt = [20-21.06]2 + … + [20-17.7]2
21.06
17.7
= 4.032
Decision: do not reject HO - there is no evidence
that the number of tails is affected by coin
value.
20
Chi-square – Example 2b
There is an “old wives’ tale” that babies don’t tend to
be born randomly during the day but tend more to be
born in the middle of the night, specifically between
the hours of 1 AM and 5 AM. To investigate this, a
researcher collects birth-time data from a large
maternity hospital. The day was broken into 4 parts:
Morning (5 AM to 1 PM), Mid-day (1 PM to 5 PM),
Evening (5 PM to 1 AM), and Mid-night (1 AM to 5 AM).
21
Chi-square – Example 2b
The numbers of births at these times for the last
three months (January to March) are shown
below:
Morning
110
Mid-day
50
Evening
100
Mid-night
100
22
Chi-square – Example 2b
• A question can certainly be raised as to whether
the pattern reported above is peculiar to births in
the winter months or reflects births at other
times of the year as well.
• The data obtained from the same hospital during
the hottest summer months last year are shown
on the next slide, along with the original data.
23
Chi-square – Example 2b
Morn
110
Midd
50
Even Mid-night
100 100
Σ
360
90
40
80
70
280
Σ200
90
180
170
640
Are the two patterns different? (α = .05)
24
Chi-square – Example 2b
HO: The two classifications are independent
HA: The two classifications are dependent
Test statistic: χ2 =
Σ
[nij – Ê(nij)]2
Ê(nij)
Rejection region: χ2obt > χ2crit = χ2(.05, 3) = 7.81
25
Chi-square – Example 2b
The first step is to compute the expected values for
each cell, using the formula:
Ê(nij) =
ri * cj
n
For the top left cell, we get: (200) (360) = 112.5
640
26
Chi-square – Example 2b
Using the formula for the other cells we get:
Morn
Cold 112.5
Midd
50.625
Even
101.25
Midn
95.625
Hot 87.5
39.375
78.75
74.375
27
Chi-square – Example 1b
Χ2obt = [110-112.5]2 + … + [70-74.375]2
112.5
74.375
= 0.6374
Decision: do not reject HO - there is no evidence
that the pattern of births is different in the hot
months compared to the rest of the year.