Transcript Document

Applied Statistics Using SAS
and SPSS
Topic: Chi-square tests
By Prof Kelly Fan, Cal. State Univ., East Bay
1
Outline
 ALL variables must be categorical
 Goal one: verify a distribution of Y
One-sample Chi-square test (SPSS lesson 40; SAS
handout)
 Goal two: test the independence between two categorical
variables
Chi-square test for two-way contingency table (SPSS
lesson 41; SAS section 3.G)
McNemar’s test for paired data (SPSS lesson 44; SAS
section 3.L)
 Measure the dependence (Phil and Kappa coefficients)
(SPSS lesson 41, 44; SAS section 3.G, 3.M)
2
Example: Postpartum Depression Study
Are women equally likely to show an
increase, no change, or a decrease in
depression as a function of childbirth?
Are the proportions associated with a
decrease, no change, and an increase in
depression from before to after childbirth
the same?
3
Example: Postpartum Depression Study
Depression after birth
in comparison with
before birth
Observed
frequencies
Hypothesized
proportions
Expected
frequencies
Less depressed (-1)
14
1/3
20
Neither less nor more
depressed (0)
33
1/3
20
More depressed (1)
13
1/3
20
From a random sample of 60 women
4
One-sample Chi-Square Test
Must be a random sample
The sample size must be large enough so
that expected frequencies are greater than
or equal to 5 for 80% or more of the
categories
5
One-sample Chi-Square Test
 Test statistic:
(oi ei )
 
ei
i
2
2
Oi = the observed frequency of i-th category
ei = the expected frequency of i-th category
6
SPSS Output
1. Weight your data by count first
2. Analyze >> Nonparametric Tests >> Legacy
Dialogs >> Chi Square, count as test variable
Postpartum Depression
less depressed
same
more depressed
Total
Observed N
14
33
13
60
Expected N
20.0
20.0
20.0
Residual
-6.0
13.0
-7.0
Test Statistics
Chi-Squarea
df
Asymp. Sig.
Postpartum
Depression
12.700
2
.002
a. 0 cells (.0%) have expected frequencies less than
5. The minimum expected cell frequency is 20.0.
7
Conclusion
Reject Ho
The proportions associated with a
decrease, no change, and an increase in
depression from before to after childbirth
are significantly different to 1/3, 1/3, 1/3.
8
Example: Postpartum Depression Study
Are the proportions associated with a
change and no change from before to after
childbirth the same?
9
Example: Postpartum Depression Study
Depression after birth
in comparison with
before birth
Observed
frequencies
Hypothesized
proportions
Expected
frequencies
Same amount of
depression (0)
33
1/2
30
More or less
depressed (1)
27
1/2
30
From a random sample of 60 women
10
SPSS Output
Postpartum Depression--Recoded
same
more or less depressed
Total
Observed N
33
27
60
Expected N
30.0
30.0
Residual
3.0
-3.0
Test Statistics
Chi-Squarea
df
Asymp. Sig.
Postpartum
Depression
--Recoded
.600
1
.439
a. 0 cells (.0%) have expected frequencies less than
5. The minimum expected cell frequency is 30.0.
11
Two-way Contingency Tables
Report frequencies on two variables
Such tables are also called crosstabs.
12
Contingency Tables (Crosstabs)
1991 General Social Survey
Frequency
Race
Party Identification
Democrat
Independent
Republican
White
341
105
405
Black
103
15
11
13
Crosstabs Analysis (Two-way Chisquare test)
 Chi-square test for testing the
independence between two variables:
1. For a fixed column, the distribution of
frequencies over rows keeps the same
regardless of the column
2. For a fixed row, the distribution of
frequencies over columns keeps the
same regardless of the row
14
Measure of dependence for 2x2 tables
 The phi coefficient measures the
association between two categorical
variables
 -1 < phi < 1
 | phi | indicates the strength of the
association
 If the two variables are both ordinal, then
the sign of phi indicate the direction of
association
15
SPSS Output
 P. 332 – 333
16
SAS Output
Statistic
Chi-Square
Likelihood Ratio Chi-Square
Mantel-Haenszel Chi-Square
Phi Coefficient
Contingency Coefficient
Cramer's V
DF
2
2
1
Value
79.4310
90.3311
79.3336
Prob
<.0001
<.0001
<.0001
0.2847
0.2738
0.2847
Sample Size = 980
17
Measure of dependence for non-2x2 tables
 Cramers V
 Range from 0 to 1
 V may be viewed as the association between
two variables as a percentage of their
maximum possible variation.
 V= phi for 2x2, 2x3 and 3x2 tables
18
Fisher’s Exact Test for Independence
The Chi-squared tests are ONLY for large
samples:
The sample size must be large enough so
that expected frequencies are greater than
or equal to 5 for 80% or more of the
categories
19
SAS/SPSS Output
• SAS output:
Fisher's Exact Test
Table Probability (P)
Pr <= P
3.823E-22
2.787E-20
• SPSS output: in “crosstabs” window, click “exact”,
then tick “exact”:
20
Matched-pair Data
 Comparing categorical responses for two
“paired” samples
When either
 Each sample has the same subjects (or say
subjects are measured twice)
Or
 A natural pairing exists between each subject in
one sample and a subject form the other sample
(eg. Twins)
21
Example: Rating for Prime Minister
Second Survey
First Survey
Approve
Disapprove
Approve
794
150
Disapprove
86
570
22
Marginal Homogeneity
The probabilities of “success” for both
samples are identical
Eg. The probability of approve at the first
and 2nd surveys are identical
23
McNemar Test (for 2x2 Tables only)
SAS: Section 3.L; SPSS: Lesson 44
Ho: marginal homogeneity
Ha: no marginal homogeneity
Exact p-value
Approximate p-value (When n12+n21>10)
24
SAS Output
McNemar's Test
Statistic (S)
DF
Asymptotic Pr > S
Exact
Pr >= S
17.3559
1
<.0001
3.716E-05
Simple Kappa Coefficient
Kappa
ASE
95% Lower Conf Limit
95% Upper Conf Limit
Sample Size = 1600
0.6996
0.0180
0.6644
0.7348
Level of agreement
25
SPSS Output
• SPSS: p. 361 and in “two-samples tests” window tick
McNemar and click “exact”, then tick “exact”:
26