The mystery of the CHI SQUARE
Download
Report
Transcript The mystery of the CHI SQUARE
The mystery of the
CHI SQUARE
Is it CHEE square
Or
CHAI Square?!
2
X
Chi Square
goodness of fit
There is a single test that can be
applied to see if the observed
sample distribution is significantly
different in some way from the
hypothesized population
distribution
Accidents on Cellphones
Are you more likely to have a motor vehicle collision when using
a cell phone? A study of 699 drivers who were using a cell phone
when they were involved in a collision examined this question.
These drivers made 26,798 cell phone calls during a 14-month
study period. Each of the 699 collisions was classified in various
ways. Here are the counts for each day of the week:
Hypotheses:
H0: Motor vehicle accidents involving cell
phone use are equally likely to occur on
each of the seven days of the week.
Ha: The probabilities of a motor vehicle
accident involving cell phone use vary from
day to day (that is, they are not all the same).
Chi square
In general,procedure:
the expected count for any categorical variable
is obtained by multiplying the proportion of the distribution
for each category by the sample size.
Chi-square test
statistics
For Sunday:
For Monday:
Finding the p-value
Degrees of freedom: n-1
df: 7-1 = 6
Calculator syntax: 2nd - VARS - 8 (enter)
X2 cdf( test statistic, 1E99, df )
X2 cdf( 208.84, 1E99, 6 )
p= 2.48 x
-42
10
Conclusion
H0: Motor vehicle accidents involving cell phone use are equally
likely to occur on each of the seven days of the week.
Ha: The probabilities of a motor vehicle accident involving cell
phone use vary from day to day (that is, they are not all the same).
Since the p value is extremely small (p=
2.48 x 10-42), there is sufficient evidence to
reject H0 and conclude that these types of
accidents are not equally likely to occur on
each of the seven days of the week.
Red Eye Fruit Fly
Any offspring receiving an R gene will have red eyes,
and any offspring receiving a C gene will have straight
wings. So based on this Punnett square, the biologists
predict a ratio of 9 red-eyed, straight-winged (x) : 3 redeyed, curly-winged (y) : 3 white-eyed, straight-winged
(z) : 1 white-eyed, curly-winged (w) offspring. To test
their hypothesis about the distribution of offspring, the
biologists mate the fruit flies. Of 200 offspring, 99 had
red eyes and straight wings, 42 had red eyes and curly
wings, 49 had white eyes and straight wings, and 10 had
white eyes and curly wings. Do these data differ
significantly from what the biologists have predicted?
Given Distribution
parents
proportio offspring
n
s
Red-eyed, straight-winged
9
0.5635
99
Red-eyed, curly-winged
3
0.1875
42
White-eyed, straight-winged
3
0.1875
49
White-eyed, curly-winged:
1
1
0.0625
10
16the offspring of 2 parents
200
Ho:total
these proportions is correct for the
Ha: at least one of these proportions is incorrect
Conditions and calculations:
We can use a chi-square goodness of fit test to measure the strength of the
evidence against the hypothesized distribution, provided that the expected cell
counts are large enough.
Sample proportion Observed
Expected
Red-eyed, straight-winged
9
0.5625
99
(200)(0.5625) = 112.5
Red-eyed, curly-winged
3
0.1875
42
(200)(0.1875) = 37.5
White-eyed, straight-winged
3
0.1875
49
(200)(0.1875) = 37.5
1
0.0625
10
(200)(0.0625) = 12.5
White-eyed, curly-winged:
total
16
X2 cdf(6.187, 1E99, 3 )
200
p=0.1029
Interpretations
The P-value of 0.1029 indicates that the probability of
obtaining a sample of 200 fruit fly offspring in which
the proportions differ from the hypothesized values by
at least as much as the ones in our sample is over
10%, assuming that the null hypothesis is true. This is
not sufficient evidence to reject the biologists'
predicted distribution.
Your Turn
Course grades Most students in a large college statistics course are taught by
teaching assistants (TAs). One section is taught by the course supervisor, a fulltime professor. The distribution of grades for the hundreds of students taught by
TAs this semester was
The grades assigned by the
professor to the 91 students in
his section were
(a) What percents of students in the professor's section earned A, B, C, and D/F? In what ways
does this distribution of grades differ from the TA distribution?
(b) Because the TA distribution is based on hundreds of students, we are willing to regard it as
a fixed probability distribution. If the professor's grading follows this distribution, what are the
expected counts of each grade in his section?
(c) Does the chi-square test for goodness of fit give good evidence that the professor's grade
distribution differs from the TA distributions? Use the Inference Toolbox.
Answers:
(a) “A”: 24.2%, “B”: 41.8%, “C”: 22.0%, “D/F”: 12.1%. Fewer A′ s and
more D/F′ s than the TA sections.
(b) “A”: 29.12, “B”: 37.31, “C”: 18.20, “D/F”: 6.37.
(c) H0: p1 = 0.32, p1 = 0.41, p1 = 0.20, p1 = 0.07 vs. Ha: at least one of these
proportions is different. All the expected counts are greater than 5, so the
condition for X2 is satisfied.
X2 = 5.297 (df = 3), so the P–value = 0.1513; there is not enough evidence
to conclude that the professor′ s grade distribution was different from the
TA grade distribution.
Chi-Sq. Practice (with probability model)
Thai, the manager of a car dealership, did not want
to stock cars that were bought less frequently
because of their unpopular color. The five colors
that he ordered were red, yellow, green, blue, and
white. According to Thai,the expected frequencies
or number of customers choosing each color
should follow the percentages of last year. She felt
20% would choose yellow, 30% would choose red,
10% would choose green, 10% would choose blue,
and 30% would choose white. She now took a
random sample of 150 customers and asked them
their color preferences.
Hypotheses:
Ho: there is no significant difference between the proportion of
the costumer’s car color preferences.
Ho: p1 = p2 = p3 = p4 = p5
Ha: there is a significant difference between
the proportion of the costumer’s car color
preferences.
Ha: p1 ≠ p2 ≠ p3 ≠ p4 ≠ p5
Chi-square
procedure:
2
X=
P-value =
26.95
-5
2.03x10
Conclusion
Since our p-value is small, we have
sufficient reason to reject the null
hypothesis
making
our
test
significant. Therefore, there is a
significant difference between the
proportion of the costumer’s car color
preferences.