Transcript Chi-Sq

Practical Statistics
Chi-Square Statistics
There are six statistics that will
answer 90% of all questions!
1.
2.
3.
4.
5.
6.
Descriptive
Chi-square
Z-tests
Comparison of Means
Correlation
Regression
Chi-square:
Chi-square is a simple test for counts…..
Which means: nominal data
and… if some cases…
Ordinal data
Chi-square:
There are three types:
1. Test for population variance
2. Test of “goodness-of-fit”
3. Contingency table analysis
Which is essentially a measure of association!
Chi-square:
There are three types:
1. Test for population variance
 
2
 n  1 S

2
2
Chi-square:
There are three types:
1. Test for population variance
2. Test of “goodness-of-fit”
(oi  ei )
 
ei
i 1
k
2
2
Where o = frequency of actual observation, and
e = frequency you expected to find
(oi  ei )
 
ei
i 1
k
2
2
Coin thrown 100 times:
Expect (e): heads = 50, tails = 50
Observed (o):
heads = 40, tails = 60
Is this a “fair” coin?
According to marketing research, the clientele
of a Monkey Shine Restaurant is made up of
30% Western businessmen,
30% women who stop in while shopping,
30% Chinese businessmen, and
10% tourists.
A random sample of 600 customers at the Kowloon Monkey Shine found
150 Western businessmen,
190 Chinese businessmen,
100 tourists, and
65 women who were shopping.
Is the clientele at this establishment different
than the norm for this company?
Type
Percent
Western
Business
Chinese
Business
Women
Shoppers
Tourist
30%
Expected
600
Observed
600
180
150
180
190
180
160
60
100
30%
30%
10%
(oi  ei )
 
ei
i 1
k
2
2
(180  150)
(180  190)
(180  160)
(60  100)



180
180
180
60
2
2
2
= 5.00 + 0.56 + 2.22 + 26.67 = 34.45
With (4-1) degrees of freedom
2
The chi-square distribution is highly skewed
and dependent upon how many degrees of
freedom (df) a problems has.
The chi-square for the restaurant problem was:
Chi-square = 34.45, df = 3
By looking in a table, the critical value of
Chi-square with df = 3 is 7.82. The probability
that the researched frequency equals the
frequency found in the MR project was p < .001.
http://www.fourmilab.ch/rpkp/experiments/analysis/chiCalc.html
By looking at the analysis, it is obvious that
the largest contribution to chi-square came from
the tourists.
(180  150)
(180  190)
(180  160)
(60  100)



180
180
180
60
2
2
2
= 5.00 + 0.56 + 2.22 + 26.67 = 34.45 df = 3
Hence, the Kowloon property is attracting more
tourist than what would be expected at the Monkey
Shine.
2
Chi-square:
There are three types:
1. Test for population variance
2. Test of “goodness-of-fit”
3. Contingency table analysis
(oi  ei )
 
ei
i 1
k
2
2
Where o = frequency of actual observation, and
e = frequency you expected to find
A contingency table
is a table with numbers grouped by frequency.
A contingency table
is a table with numbers grouped by frequency.
Consider a study:
There are three groups: brand loyal customers,
regular buyers, and occasional buyers.
Each is asked if they like the taste of new
product over the old. They answer with a
“yes” or a “no.”
A contingency
YES
table would look like this:
NO
Totals
Loyal
50
40
90
Regular
60
40
100
Occasional
40
40
80
150
120
270
Total
A contingency table
is a table with numbers grouped by frequency.
All the numbers in the table are “observed”
frequencies (o).
So, what are the expected values?
The expected values (e) would be a random
distribution of frequencies.
YES
NO
Totals
Loyal
50
40
90
Regular
60
40
100
Occasional
40
40
80
150
120
270
Total
The expected values (e) would be a random
distribution of frequencies. These can be calculated
by multiplying the row frequency by the column
frequency and dividing by the total number of
observations.
YES
Loyal
Regular
Occasional
Total
50
60
40
150
NO
40
40
40
120
Totals
90
100
80
270
For example, the expected values (e) of “loyal”
and “yes” would be (150 X 90)/270 = 50
YES
NO
Totals
Loyal
50
40
90
Regular
Occasional
60
40
100
80
Total
150
40
40
120
270
For example, the expected values (e) of “regular”
And “no” would be (120 X 100)/270 = 44.4
YES
NO
Totals
Loyal
50
40
90
Regular
Occasional
60
40
100
80
Total
150
40
40
120
270
The expected values (e) for the entire table
would be:
YES
NO
Loyal
50.0
40.0
90
Regular
55.6
44.4
100
Occasional
44.4
35.6
80
Total
150
120
270
Totals
The chi-square value is calculated for every cell,
and then summed over all the cells.
YES
NO
Loyal
50.0
40.0
90
Regular
55.6
44.4
100
Occasional
44.4
35.6
80
Total
150
120
270
Totals
The chi-square value is calculated for every cell:
For Cell A: (50-50)^2/50 = 0
For Cell D: (40-44.4)^2/44.4 = 0.44
YES
Loyal
NO
Totals
A 50.0
40.0
90
Regular
55.6
D 44.4
100
Occasional
44.4
35.6
80
Total
150
120
270
The chi-square value is calculated for every cell:
YES
NO
0
0
Regular
.36
.44
Occasional
.44
.55
Loyal
Total
Totals
The chi-square value is calculated for every cell:
Chi-square = 0 + 0 + .35 + .44 + .44 + .54 = 1.77
The df = (r-1)(c-1) = 1 X 2 = 2
YES
NO
0
0
Regular
.35
.44
Occasional
.44
.54
Loyal
Total
Totals
A chi-square with a df = 2 has a critical value
of 5.99, this chi-square = 1.77, so the results
are nonsignificant.
http://www.fourmilab.ch/rpkp/experiments/analysis/chiCalc.html
The probability = 0.4127.
This means that the distribution is random, and
there is no association between customer type
and taste preference.
A chi-square with a df = 2 has a critical value
of 5.99, this chi-square = 1.77, so the results
are nonsignificant.
This means that the distribution is random, and
there is no association between customer type
and taste preference.
Note: This type of chi-square is a test of
association using nothing but
counts (frequency);
VERY useful in business research.
Service Encounter and Personality
Normally, 60% of our shoppers are women.
Is our sample correct?
0.6 X 271 = 163 women
.4 X 271 = 109 men
Service Encounter and Personality
Do men and women shop at different times?
Service Encounter and Personality
Do men and women shop at different times?