Lecture 5 - West Virginia University
Download
Report
Transcript Lecture 5 - West Virginia University
Lecture 12
Dan Piett
STAT 211-019
West Virginia University
Last Week
Hypothesis Tests on a difference in means
Hypothesis Tests on a difference in proportions
The 2-sided alternative
Overview
Chi-Squared Goodness of Fit Test
Chi-Squared Test of Independence
Section 12.1
Chi-Squared Goodness of Fit Test
Multinomial Data
Previously we have looked at data coming from a binomial
distribution
2 Outcomes (Success, Failure)
Example: Flipping a coin (Heads, Tails)
Suppose we are interested in data with more than 2
outcomes
Example: Rolling a die
6 Outcomes (1, 2, 3, 4, 5, 6)
We obtain multinomial data from a multinomial experiment
Multinomial Experiments
Multinomial Experiments follow these properties
Fixed number of trials, n
2. Each trial results in exactly one of K possible outcomes
3. Probability pi, is the probability of getting outcome i on a
single trial
1.
4.
p1 + p2 + p3 + … + pK = 1
Trials are independent
Finding Expected Frequencies
Remembering back to the binomial distribution
Expected Value = n*p
For our multinomial distribution we will have K expected
counts
Each Expected Count; Ei = n*pi
Example: Rolling a fair 6-sided die 600 times (pi = 1/6)
Outcome
1
2
3
4
5
6
Probability
1/6
1/6
1/6
1/6
1/6
1/6
Expected
Counts
100
100
100
100
100
100
Observed Frequencies
When we do our multinomial experiment, we will not
always get exactly our expected counts.
Example:
We expected 100 4’s on our dice experiment. Suppose we only get 85.
85 is our Observed Frequency; Oi
Our Observed Frequencies (Counts) are our actual data
Suppose on our 600 dice throws, these are our observed counts
Outcome
1
2
3
4
5
6
Expected
Counts
100
100
100
100
100
100
Observed
Counts
97
113
102
85
109
94
Chi-Squared Goodness of Fit Test
So the question to be asked when looking at a table like this is
“are our observed counts far enough from our expected
counts to determine that the expected counts are wrong?”
Outcome
1
2
3
4
5
6
Expected
Counts
100
100
100
100
100
100
Observed
Counts
97
113
102
85
109
94
This is what the Chi-Squared Goodness of Fit Test attempts to
answer.
Note that our test will follow the 7 step procedure
Chi-Square Goodness of Fit Test
1.
2.
3.
4.
5.
H0: p1 = #1, p2 = #2, … pK = #k
HA: At least one pi ≠ #i
Alpha is .05 if not specified
Test Statistic =
P-value will come from the Chi-Squared Table with df = k-1
P(Test Statistic > Chi Squared Tabled Value)
There is only 1 alternative hypothesis
Our decision rule will be to reject H0 if p-value < alpha
7.
We have (do not have) enough evidence at the .05 level to conclude that the
at least one of our probabilities is incorrect.
We require that our expected counts at each cell are at least 5 and that our sample
is independent and random.
6.
Example:
For Fall 2013, 99 STAT 211 students were given a choice of 3
section times (A,B,C) to take the final exam. The data that
follows shows the number of students who selected each
section. Does the data indicate that the students exhibit a
preference, or indicate that all sections are equally likely to
be chosen. Use alpha=.05 (Hint: If all 3 are equally likely, all
pi’s will be 1/3)
Observed Counts:
A – 40
B – 30
C – 29
Section 12.2
Chi-Squared Test for Independence
Association of Categorical Variables
Thus far, all of our confidence intervals and hypothesis tests
have been done on numeric variables.
We will now shift our attention to categorical variables
Ex: Eye Color, Class Rank
The question we wish to answer is, “is there an association
between two categorical variables?”
Ex: Is there an association between Eye Color and Hair Color?
We will use a Chi Squared Test to answer this question, but
first we need to discuss contingency tables.
Contingency Tables (Observed)
We can organize categorical data in a contingency table, with
r rows and c columns. This is known as an r x c (r by c)
contingency table. Note that the contingency tables contains
observed counts
Example: Some Possible Values for Hair Color vs. Eye Color
Hair x Eye
Brown
Blue
Green
Black
90
20
8
Brown
65
22
9
Blonde
33
75
12
Contingency Tables (Expected)
Much like the goodness of fit test, we will need to calculate our
expected counts.
The formula for the expected counts is
So for the previous example
Hair x Eye
Brown
Blue
Green
Total
Black
110 (81.1)
20 (45.6)
8 (11.3)
138
Brown
65 (??)
22 (??)
9 (??)
96
Blonde
33 (??)
75 (??)
12 (??)
120
Total
208
117
29
354
We now have Observed and Expected counts, so we can do a Chi-
Squared Test for independence
Chi-Squared Test for Independence
1.
2.
3.
4.
5.
H0: Variable 1 and Variable 2 are independent
HA: Variable 1 and Variable 2 are not independent (dependent)
Alpha is .05 if not specified
Test Statistic =
P-value will come from the Chi-Squared Table with df = (r-1)(c-1)
P(Test Statistic > Chi Squared Tabled Value)
There is only 1 alternative hypothesis
Our decision rule will be to reject H0 if p-value < alpha
7.
We have (do not have) enough evidence at the .05 level to conclude that the
variables are dependent.
We require that our expected counts at each cell are at least 5 and that our sample
is independent and random.
6.
Example
Does “test failure” reduce academic aspirations and thereby
contribute to a decision to drop out of school? A survey of 283
students is randomly selected from schools with low graduation
rates. The contingency table below reports the results to the
question “Do tests required for graduation discourage students
from staying in school?” Does there appear to be a relationship
between the schools’ location and the students’ responses?
Response x
School
Urban
Suburban
Rural
Yes
57
27
47
No
23
16
12
Unsure
45
25
31