Transcript Chi Square

ANOVA Knowledge Assessment
1.
In what situation should you use ANOVA (the F stat)
instead of doing a t test?
2.
What information does the F statistic give you?
3.
For the ANOVA, the dependent variable should be
what level of measurement?
•
4.
What about the IV?
Why is the F test called “exploratory”?
ANOVA Knowledge Assessment
In what situation should you use ANOVA (the F stat)
instead of doing a t test?
1.
•
When your independent variable has 3 or more
categories/attritbutes.
What information does the F statistic give you?
2.
•
The F statistic tells you the ratio of between-group variance
to within-group variance.
For the ANOVA, the dependent variable should be
what level of measurement? What about the IV?
3.
•
The dependent variable should be interval-ratio. The
independent variable can be either nominal or ordinal.
Why is the F test called “exploratory”?
4.
•
Because a significant F statistic doesn’t allow you to identify
which difference(s) in means are statistically significant.
Contingency Tables (cross tabs)

Generally used when variables are nominal
and/or ordinal


Even here, should have a limited number of variable
attributes (categories)
Some find these very intuitive…others struggle

It is very easy to misinterpret these critters
Interpreting a Contingency Table
 WHAT

IS IN THE INDIVIDUAL CELLS?
The number of cases that fit in that particular
cell
• In other words, frequencies (number of cases that
fit criteria)

For small tables, and/or small sample sizes, it
may be possible to detect relationships by
“eyeballing” frequencies. For most..
• Convert to Percentages: a way to standardize cells
and make relationships more apparent
Example 1

Is there is an even distribution of membership across 4
political parties?
 (N=40 UMD students)
Categories
F
%
Republican
12
30%
Democrat
14
35%
Independent
9
23%
Green
5
10%
Example 2
A
survey of 10,000 U.S. residents
 Is one’s political view related to attitudes
towards police?
What are the DV and IV?
 Convention for bivariate tables



The IV is on the top of the table (dictates columns)
The DV is on the side (dictates rows).
Example 2 Continued
Total
Political Party
Attitude
Towards
Police
Repub
Democrat
Libertarian Socialist
Favorable
2900
2100
180
30
5210
Unfav.
1900
1800
160
28
3888
Total
4800
3900
340
58
9098
The Percentages of Interest
Total
Political Party
Attitude
Towards
Police
Favorable
Repub
Democrat
Libertarian Socialist
2900
(60%)
2100
(54%)
180
(53%)
30
(52%)
5210
Unfav
1900
1800
160
28
3888
Total
4800
3900
340
58
9098
The Test Statistic for Contingency
Tables

Chi Square, or χ2

Calculation
• Observed frequencies (your sample data)
• Expected frequencies (UNDER NULL)


Intuitive: how different are the observed cell
frequencies from the expected cell frequencies
Degrees of Freedom:
• 1-way = K-1
• 2-way = (# of Rows -1) (# of Columns -1)
CHI SQUARE
 The
most simple form of the Chi square
is the one-way Chi square test
• Used to determine whether frequencies observed
differ significantly from an even (expected under
null) distribution
Chi Square: Steps
1.
2.
3.
4.
Find the expected (under null hypothesis) cell
frequencies
Compare expected & observed frequencies cell by
cell
If null hypothesis is true, expected and observed
frequencies should be close in value
Greater the difference between the observed and
expected frequencies, the greater the possibility of
rejecting the null
1-WAY CHI SQUARE

1-way Chi Square Example: There is an even
distribution of membership across 4 political parties
(N=40 UMD students)

Find the expected cell frequencies (Fe = N / K)
Categories
Fo
Fe
Republican
12
10
Democrat
14
10
Independ.
9
10
Green
5
10
1-WAY CHI SQUARE

1-way Chi Square Example: There is an even distribution of
membership across 4 political parties (N=40 UMD students)

Compare observed & expected frequencies cell-by-cell
Categories
Fo
Fe
fo - fe
Republican
12
10
2
Democrat
14
10
4
Independ.
9
10
-1
Green
5
10
-5
1-WAY CHI SQUARE

1-way Chi Square Example: There is an even distribution of
membership across 4 political parties (N=40 UMD students)

Square the difference between observed & expected frequencies
Categories
Fo
Fe
fo - fe
(fo - fe)2
Republican
12
10
2
4
Democrat
14
10
4
16
Independ.
9
10
-1
1
Green
5
10
-5
25
1-WAY CHI SQUARE

1-way Chi Square Example: There is an even distribution of
membership across 4 political parties (N=40 UMD students)

Divide that difference by expected frequency
Categories
Fo
Fe
fo - fe
(fo - fe)2
(fo - fe)2 /fe
Republican
12
10
2
4
0.4
Democrat
14
10
4
16
1.6
Independ.
9
10
-1
1
0.1
Green
5
10
-5
25
2.5
∑=
4.6
Interpreting Chi-Square
 Chi-square
has no intuitive meaning, it can
range from zero to very large

As with other test statistics, the real interest is
the “p value” associated with the calculated
chi-square value
• Conventional testing = find χ2 (critical) for stated
“alpha” (.05, .01, etc.)

Reject if χ2 (observed) is greater than χ2 (critical)
• SPSS: find the exact probability of obtaining the χ2
under the null (reject if less than alpha)
The Chi-Square Sampling Distribution
(Assuming Null is True)
Interpreting χ2
The old fashioned way

Chi square = 4.6

df (1-way Chi square) = K-1 = 3

X2 (critical) (p<.05) = 7.815 (from Appendix C)

Obtained (4.6) < critical (7.815)

Decision

Fail to reject the null hypothesis. There is not a significant
difference in political party membership at UMD
2-WAY CHI SQUARE

For use with BIVARIATE Contingency Tables



Display the scores of cases on two different variables at the
same time (rows are always DV & columns are always IV)
Intersection of rows & columns is called “cells”
Column & row marginal totals (a.k.a. “subtotals”) should always
add up to N
N=40
Packers Fan
Vikings Fan
TOTALS
Like Brett Favre
14
7
21
Don’t Like Favre
6
13
19
TOTALS:
20
20
40
Null Hypothesis for 2-Way χ2


The two variables are independent
Independence:
• Classification of a case into a category on one
variable has no effect on the probability that the
case will be classified into any category of the
second variable
2-WAY CHI SQUARE

Find the expected frequencies

Fe= Row Marginal X Column Marginal
N
•“Like Favre” Row = (21 x 20)/40
=420/40=10.5
•“Don’t Like” Row = (19 x 20)/40 = 380/40=
9.5
N=40
Packers Fan
Vikings Fan
TOTALS
Like Brett Favre
14 (10.5)
7 (10.5)
21
Don’t Like Favre
6 (9.5)
13 (9.5)
19
20
20
40
TOTALS:
2-WAY CHI SQUARE
• Compare expected & observed frequencies cell by cell
• X2(obtained) = 4.920
• df= (r-1)(c-1) = 1 X 1 = 1
• X2(critical) = 3.841 (Healey Appendix C)
• Obtained > Critical
• CONCLUSION:

Reject the null: There is a relationship between the team that
students root for and their opinion of Brett Favre (p<.05).
Chi Square – Example #2

Is quality of a school system significantly
related to a community’s per capita income?
Per Capita
Income
Quality
Low
High
Low 18
6
High 12
14
Totals 30
20
Totals
24
26
50
Chi Square – Example #2
• First, calculate expected frequencies…
Per Capita Income
Quality
Low
High
Totals
Low
18 (14.4) 6 (9.6)
High
12 (15.6) 14 (10.4) 26
Totals
30
20
24
50