Lecture 41 - Test of Goodness of Fit
Download
Report
Transcript Lecture 41 - Test of Goodness of Fit
Test of Goodness of
Fit
Lecture 41
Section 14.1 – 14.3
Wed, Nov 14, 2007
Count Data
Count data – Data that counts the number
of observations that fall into each of
several categories.
Count Data
The data may be univariate or bivariate.
Univariate example – Observe a person’s
opinion on a subject (strongly agree,
agree, etc.).
Bivariate example – Observe a opinion on
a subject and their education level (< high
school, high school, etc.)
Univariate Example
Observe a person’s opinion on a question.
Strongly
Agree
100
Agree
120
Strongly
Neutral Disagree
Disagree
80
50
50
Bivariate Example
Observe each person’s opinion and
education level.
Strongly
Agree
Agree
Neutral
Disagree
Strongly
Disagree
< High School
35
25
30
5
5
High School
30
40
15
10
5
College
20
40
15
15
20
> College
15
15
20
20
20
The Two Basic Questions
For univariate data, do the data fit a
specified distribution?
For example, could these data have come
from a uniform distribution?
Strongly
Agree
100
Agree
120
Strongly
Neutral Disagree
Disagree
80
50
50
The Two Basic Questions
For bivariate data, for the various values of
one of the variables, does the other
variable show the same distribution?
Could each row have come from the same
distribution?
Strongly
Agree
Agree
Neutral
Disagree
Strongly
Disagree
< High School
35
25
30
5
5
High School
30
40
15
10
5
College
20
40
15
15
20
> College
15
15
20
20
20
Observed and Expected Counts
Observed counts – The counts that were
actually observed in the sample.
Expected counts – The counts that would
be expected if the null hypothesis were
true.
Tests of Goodness of Fit
The goodness-of-fit test applies only to
univariate data.
The null hypothesis specifies a discrete
distribution for the population.
We want to determine whether a sample
from that population supports this
hypothesis.
Examples
If we rolled a die 60 times, we expect 10 of
each number.
If we get frequencies 8, 10, 14, 12, 9, 7, does
that indicate that the die is not fair?
What is the distribution if the die were fair?
Examples
If we toss a fair coin, we should get two
heads ¼ of the time, two tails ¼ of the
time, and one of each ½ of the time.
Suppose
we toss a coin 100 times and get
two heads 16 times, two tails 36 times, and
one of each 48 times. Is the coin fair?
Examples
If we selected 20 people from a group that
was 60% male and 40% female, we would
expect to get 12 males and 8 females.
If
we got 15 males and 5 females, would that
indicate that our selection procedure was not
random (i.e., discriminatory)?
What if we selected 100 people from the
group and got 75 males and 25 females?
Null Hypothesis
The null hypothesis specifies the
probability (or proportion) for each
category.
Each probability is the probability that a
random observation would fall into that
category.
Null Hypothesis
To test a die for fairness, the null hypothesis
would be
H0: p1 = 1/6, p2 = 1/6, …, p6 = 1/6.
The alternative hypothesis will always be a
simple negation of H0:
H1: At least one of the probabilities is not 1/6.
or more simply,
H1: H0 is false.
Level of Significance
Let = 0.05.
The test statistic will involve the expected
counts.
Expected Counts
To find the expected counts, we apply the
hypothetical probabilities to the sample
size.
For example, if the hypothetical
probabilities are 1/6 and the sample size is
60, then the expected counts are
(1/6) 60 = 10.
Example
The test statistic will be the 2 statistic.
Make a chart showing both the observed
and expected counts (in parentheses).
1
8
(10)
2
10
(10)
3
14
(10)
4
12
(10)
5
9
(10)
6
7
(10)
The Chi-Square Statistic
Denote the observed counts by O and the
expected counts by E.
Define the chi-square (2) statistic to be
2
(
O
E
)
2
E
all cells
The Chi-Square Statistic
Clearly, if all of the deviations O – E are
small, then 2 will be small.
But if even a few the deviations O – E are
large, then 2 will be large.
2
(
O
E
)
2
E
all cells
The Value of the Test Statistic
Now calculate 2.
2
2
2
2
2
2
(
8
10
)
(
10
10
)
(
14
10
)
(
12
10
)
(
9
10
)
(
7
10
)
2
10
10
10
10
10
10
0.4 0.0 1.6 0.4 0.1 0.9
3.4
Compute the p-Value
To compute the p-value of the test statistic,
we need to know more about the
distribution of 2.
Chi-Square Degrees of
Freedom
The chi-square distribution has an
associated degrees of freedom, just like
the t distribution.
Each chi-square distribution has a slightly
different shape, depending on the number
of degrees of freedom.
In this test, df is one less than the number
of cells.
Chi-Square Degrees of
Freedom
0.5
0.4
0.3
0.2
0.1
5
10
15
20
Chi-Square Degrees of
Freedom
0.5
0.4
2(2)
0.3
0.2
0.1
5
10
15
20
Chi-Square Degrees of
Freedom
0.5
0.4
2(2)
0.3
2(5)
0.2
0.1
5
10
15
20
Chi-Square Degrees of
Freedom
0.5
0.4
2(2)
0.3
2(5)
0.2
2(10)
0.1
5
10
15
20
Properties of 2
The chi-square distribution with df degrees
of freedom has the following properties.
2
0.
It is unimodal.
It is skewed right (not symmetric!)
2 = df.
2 = (2df).
Properties of 2
If df is large, then 2(df) is approximately
normal with mean df and standard
deviation (2df).
Chi-Square vs. Normal
0.025
0.02
0.015
0.01
0.005
100
120
140
160
Chi-Square vs. Normal
0.025
0.02
2(128)
0.015
0.01
0.005
100
120
140
160
Chi-Square vs. Normal
0.025
0.02
N(128, 16)
2(128)
0.015
0.01
0.005
100
120
140
160
TI-83 – Chi-Square Probabilities
To find a chi-square probability (p-value) on the
TI-83,
Press
DISTR.
Select 2cdf (item #7).
Press ENTER.
Enter the lower endpoint, the upper endpoint, and the
degrees of freedom.
Press ENTER.
The probability appears.
Computing the p-value
The number of degrees of freedom is 1 less than
the number of categories in the table.
In this example, df = 5.
To find the p-value, use the TI-83 to calculate the
probability that 2(5) would be at least as large
as 3.4.
p-value = 2cdf(3.4, E99, 5) = 0.6386.
Therefore, p-value = 0.6386 (accept H0).