Transcript Chi-square

S519: Evaluation of
Information Systems
Social Statistics
Inferential Statistics
Chapter 15: Chi-square
Last week
This week




What is chi-square
CHIDIST
CHITEST
Nonparameteric statistics
Parametric statistics

A main branch of statistics





Assuming data with a type of probability distribution
(e.g. normal distribution)
Making inferences about the parameters of the
distribution (e.g. sample size, factors in the test)
Most of the well-known elementary statistical methods
are parametric.
Assumption: the sample is large enough to represent
the population (e.g. sample size around 30).
They are not distribution-free (they require a
probability distribution)
Nonparametric statistics

Nonparametric statistics (distribution-free statistics)






Do not rely on assumptions that the data are drawn from a given
probability distribution (data model is not specified).
It is opposite of parametric statistics
It has its own non-parametric statistical models, inference and
statistical tests
It was widely used for studying populations that take on a ranked
order (e.g. movie reviews from one to four stars, opinions about
hotel ranking). Fits for ordinal data.
It makes less assumption. Therefore it can be applied in
situations where less is known about the application.
It might require to draw conclusion on a larger sample size with
the same degree of confidence comparing with parametric
statistics.
Nonparametric statistics

Nonparametric statistics (distribution-free
statistics)

Data with frequencies or percentage



Number of kids in difference grades
The percentage of people receiving social security
Chi-square allows you to test whether a sample of
data came from a population with a specific
distribution.
One-sample chi-square

One-sample chi-square or goodness of fit test
includes only one dimension



Whether the number of respondents is equally
distributed across all levels of education.
Whether the voting for the school voucher has a
pattern of preference.
Two-sample chi-square includes two
dimensions

Whether preference for the school voucher is
independent of political party affiliation and gender
Example
Level of Education:
No College
Some College
College Degree
Total
25
42
17
84
Question: whether the number of respondents is equally
distributed across all levels of education?
Approach:
1. 84/3=28,
2. Calculate the difference among these three categories
Compute chi-square
2
(
O

E
)
2  
E
One-sample chi-square test
O: the observed frequency
E: the expected frequency
Example
Question: Whether the number of respondents is
equally distributed across all opinions
One-sample chi-square
Preference for
School Voucher
for
maybe
23
against
17
total
50
90
Chi-square steps

Step1: a statement of null and research
hypothesis
There is no difference in the frequency or proportion in each category
H 0 : P1  P2  P3
There is difference in the frequency or proportion in each category
H1 : P1  P2  P3
Chi-square steps

Step2: setting the level of risk (or the level of
significance or Type I error) associated with
the null hypothesis

0.05
Chi-square steps

Step3: selection of proper test statistic

Frequencynonparametric procedureschisquare
Chi-square steps

Step4. Computation of the test statistic value
(called the obtained value)
observed
frequency
category (O)
for
maybe
against
Total
expected
frequency (E)
23
17
50
90
D(difference)
30
30
30
90
7
13
20
(O-E)2
(O-E)2/E
49
169
400
1.63
5.63
13.33
20.60
Chi-square steps

Step5: Determination of the value needed for
rejection of the null hypothesis using the
appropriate table of critical values for the
particular statistic




Table B5
df=r-1 (r= number of categories)
If the obtained value > the critical value  reject the
null hypothesis
If the obtained value < the critical value  accept the
null hypothesis
Chi-square steps

Step6: a comparison of the obtained value
and the critical value is made

20.6 and 5.99
Chi-square steps

Step 7 and 8: decision time

What is your conclusion, why and how to
interpret?
Excel functions


CHIDIST (x, degree of freedom)
CHITEST (actual_range, expected_range)
More non parametric statistics

Table 15.1 (P297)
Exercises




S-p298
1
2
3