Transcript Chi Sq

Practical Statistics
Chi-Square Statistics
There are six statistics that will
answer 90% of all questions!
1.
2.
3.
4.
5.
6.
Descriptive
Chi-square
Z-tests
Comparison of Means
Correlation
Regression
Chi-square:
Chi-square is a simple test for counts…..
Chi-square:
Chi-square is a simple test for counts…..
Which means: nominal data
and… if some cases…
Ordinal data
Chi-square:
There are three types:
1. Test for population variance
Chi-square:
There are three types:
1. Test for population variance
2. Test of “goodness-of-fit”
Chi-square:
There are three types:
1. Test for population variance
2. Test of “goodness-of-fit”
3. Contingency table analysis
Chi-square:
There are three types:
1. Test for population variance
 
2
 n  1 S

2
2
Chi-square:
There are three types:
1. Test for population variance
2. Test of “goodness-of-fit”
(oi  ei )
 
ei
i 1
k
2
2
Where o = frequency of actual observation, and
e = frequency you expected to find
(oi  ei )
 
ei
i 1
k
2
2
Coin thrown 100 times:
Expect (e): heads = 50, tails = 50
Observed (o):
heads = 40, tails = 60
Is this a “fair” coin?
Observed
Expected
Heads
Tails
40
60
Heads
Tails
Observed
40
60
Expected
50
50
Heads
Tails
Observed
40
60
Expected
50
50
=2+2=4
=2+2=4
Chi-Sq = 4.0, df = 1, p = ?
https://www.fourmilab.ch/rpkp/experiments/analysis/chiCalc.html
P = 0.0455
But this is the probability of what?
According to marketing research, the clientele of
a Monkey Shine Restaurant is made up of 30%
Western businessmen, 30% women who stop
in while shopping, 30% Chinese businessmen,
and 10% tourists. A random sample of 600 customers
at the Kowloon Monkey Shine found 150 Western
businessmen, 190 Chinese businessmen, 100 tourists,
and 65 women who were shopping.
Is the clientele at this establishment different
than the norm of the this company?
Type
Percent
Western
Business
Chinese
Business
Women
Shoppers
Tourist
30%
Expected
600
Observed
600
180
150
180
190
180
160
60
100
30%
30%
10%
(oi  ei )
 
ei
i 1
k
2
2
(180  150)
(180  190)
(180  160)
(60  100)



180
180
180
60
2
2
2
= 5.00 + 0.56 + 2.22 + 26.67 = 34.45
With (4-1) degrees of freedom
2
The chi-square distribution is highly skewed
and dependent upon how many degrees of
freedom (df) a problems has.
The chi-square for the restaurant problem was:
Chi-square = 34.45, df = 3
By looking in a table, the critical value of
Chi-square with df = 3 is 7.82. The probability
that the researched frequency equals the
frequency found in the MR project was p < .001.
http://www.fourmilab.ch/rpkp/experiments/analysis/chiCalc.html
By looking at the analysis, it is obvious that
the largest contribution to chi-square came from
the tourists.
(180  150)
(180  190)
(180  160)
(60  100)



180
180
180
60
2
2
2
= 5.00 + 0.56 + 2.22 + 26.67 = 34.45 df = 3
Hence, the Kowloon property is attracting more tourists
than what would be expected at the Monkey Shine.
2
Chi-square:
There are three types:
1. Test for population variance
2. Test of “goodness-of-fit”
3. Contingency table analysis
(oi  ei )
 
ei
i 1
k
2
2
Where o = frequency of actual observation, and
e = frequency you expected to find
A contingency table
is a table with numbers grouped by frequency.
A contingency table (cross-tabs)
is a table with numbers grouped by frequency.
There are three groups: brand loyal customers,
regular buyers, and occasional buyers.
Each is asked if they like the taste of new
product over the old. They answer with a “yes”
or a “no.”
A contingency
YES
table would look like this:
NO
Totals
Loyal
50
40
90
Regular
60
40
100
Occasional
40
40
80
150
120
270
Total
A contingency table
is a table with numbers grouped by frequency.
All the numbers in the table are “observed”
frequencies (o).
So, what are the expected values?
The expected values (e) would be a random
distribution of frequencies.
YES
NO
Totals
Loyal
50
40
90
Regular
60
40
100
Occasional
40
40
80
150
120
270
Total
The expected values (e) would be a random
distribution of frequencies. These can be calculated
by multiplying the row frequency by the column
frequency and dividing by the total number of
observations.
YES
Loyal
Regular
Occasional
Total
50
60
40
150
NO
40
40
40
120
Totals
90
100
80
270
For example, the expected values (e) of “loyal”
and “yes” would be (150 X 90)/270 = 50
YES
NO
Totals
Loyal
50
40
90
Regular
Occasional
60
40
100
80
Total
150
40
40
120
270
For example, the expected values (e) of “regular”
And “no” would be (120 X 100)/270 = 44.4
YES
NO
Totals
Loyal
50
40
90
Regular
Occasional
60
40
100
80
Total
150
40
40
120
270
The expected values (e) for the entire table
would be:
YES
NO
Loyal
50.0
40.0
90
Regular
55.6
44.4
100
Occasional
44.4
35.6
80
Total
150
120
270
Totals
The chi-square value is calculated for every cell,
and then summed over all the cells.
YES
NO
Loyal
50.0
40.0
90
Regular
55.6
44.4
100
Occasional
44.4
35.6
80
Total
150
120
270
Totals
The chi-square value is calculated for every cell:
For Cell A: (50-50)^2/50 = 0
For Cell D: (40-44.4)^2/44.4 = 0.44
YES
Loyal
NO
Totals
A 50.0
40.0
90
Regular
55.6
D 44.4
100
Occasional
44.4
35.6
80
Total
150
120
270
The chi-square value is calculated for every cell:
YES
NO
0
0
Regular
.35
.44
Occasional
.44
.54
Loyal
Total
Totals
The chi-square value is calculated for every cell:
Chi-square = 0 + 0 + .35 + .44 + .44 + .54 = 1.77
The df = (r-1)(c-1) = 1 X 2 = 2
YES
NO
0
0
Regular
.35
.44
Occasional
.44
.54
Loyal
Total
Totals
A chi-square with a df = 2 has a critical value
of 5.99, this chi-square = 1.77, so the results
are nonsignificant.
http://www.fourmilab.ch/rpkp/experiments/analysis/chiCalc.html
The probability = 0.4127.
This means that the distribution is random, and
there is no association between customer type
and taste preference.
This means that the distribution is random, and
there is no association between customer type
and taste preference.
Note: This type of chi-square is a test of
association using nothing but
counts (frequency);
It is VERY useful in business research.
Example:
Suppose that a study of
service providers and personality
had two questions (among many):
Your favorite shopping time: Day Evening
..
..
..
..
Sex: M F
Your favorite shopping time:
Results:
Day:
Evening:
Total
79
192
29%
271
71%
122
149
271
45%
55%
Sex:
Results:
Male:
Female
Total
Service Encounter and Personality
Normally, 60% of our shoppers are women.
Is our sample correct?
0.6 X 271 = 163 women
.4 X 271 = 109 men
Service Encounter and Personality
Do men and women shop at different times?
Service Encounter and Personality
Do men and women shop at different times?