1-D Chi-square

Download Report

Transcript 1-D Chi-square

1
Outline
1.
2.
3.
4.
Count data
Properties of the multinomial experiment
Testing the null hypothesis
Examples
2
Count data
• Sometimes, the data we have to analyze are
produced by counting things.
– How many people choose each of Brands A, B,
and C of coffee?
3
Count data
• Usually, we count things in a sample in order
to make an inference to a population.
– E.g., are the proportions of people choosing each
brand different from one another?
– Or, are the proportions of people choosing each
brand different from some hypothetical values in
the population?
4
Count data
• To answer such questions, we need to know
approximately how much difference between
the various counts could be produced by
sampling error.
• We determine that quantity using the
‘multinomial probability distribution,’ an
extension of the binomial probability
distribution.
Properties of the Multinomial
Experiment
1. There are n identical trials
2. There are k possible outcomes on each trial
3. The probabilities of the outcomes are the
same across trials
4. Trials are all independent of each other
5. The multinomial random variables are the k
values n1, n2, …, nk.
5
6
Testing the null hypothesis
• We often want to test the null hypothesis that all
the categories are equal in frequency.
• If we asked 60 people which of Brands A, B, and C
they prefer, equal frequency would look like this:
A
20
B
20
C
20
7
Testing the null hypothesis
• At other times, we might want to test a specific
null hypothesis, such as that B and C are equally
popular, but A is twice as popular as either:
A
30
B
15
C
15
• In both cases, we call the values shown the
“expected values.”
8
Testing the null hypothesis
• The null hypothesis can be tested using the
statistic χ 2.
χ2 = Σ[ni – E(ni)]2
E(ni)
• χ 2 increases as the observed values, ni, get
further from the expected values E(ni).
9
Chi-square – example
• Suppose we want to know whether there is
any population preference for brands of coffee
among brands A, B, and C.
• We need to know two things:
– How should choices among the brands be
distributed in a sample if there is no preference
(all are equally popular)?
– How are choices distributed in our sample?
10
Chi-square – example
• We ask a sample of 90 people for their
preference
• If there is no preference, each brand should
be chosen by ⅓ of the people asked:
A
B
C
30
30
30
These are the
“expected values” –
– expected if the null
hypothesis is true
11
Chi-square – example
• We ask a sample of 90 people for their
preference
• The actual choices look like this:
A
B
C
15
42
33
These are the
“observed values”
12
Expected vs. Observed Values
A
30
B
30
C
30
Expected
values – each
value = ⅓ * 90
A
15
B
42
C
33
Observed
values
13
Chi-square – example
χ 2 = Σ[ni – E(ni)]2
E(ni)
χ 2 = (15-30)2 + (42-30)2 + (33-30)2
30
30
30
= 12.6
Chi-square – the formal hypothesis
test
HO: PA = PB = PC = ⅓
HA: Something different – at least one P > ⅓
Test statistic: χ 2 = Σ[ni – E(ni)]2
E(ni)
where d.f. = (k-1; k = number of categories)
14
Chi-square – the formal hypothesis
test
• Rejection region: χ 2obt > χ 2crit = χ 2(.05, 2) =
5.9915
• (note: rejection region is always > χ 2crit)
• Decision: since χ 2obt > χ 2crit, reject HO. Brands
are not equally popular
15
16
Chi-square – Example 1
At a recent meeting of the Coin Flippers Society, each member
flipped three coins simultaneously and the number of tails
occurring was recorded. Shown below are the numbers of
members who had certain numbers of tails. Is there evidence
that the coin flipping outcomes were different from what would
be expected if all the coins used were fair? (α = .01)
Number of Tails
Number of Members
0
65
1
182
2
194
3
59
17
Chi-square – Example 1
• Shown below are the numbers of members who
had certain numbers of tails.
• Number of tails = the categories people fall into
• Number of members = number of people in each
category.
• Number of members is the dependent variable.
Do you see why?
18
Chi-square – Example 1
• To begin, we need to compute the expected
values for each of the categories. That is, we
need to figure out how many of our 500
members would fall into each category if all
the coins used were fair.
• Wait a minute! How do we know there are
500 members?
19
Chi-square – Example 1
At a recent meeting of the Coin Flippers Society, each member
flipped three coins simultaneously and the number of tails
occurring was recorded. Shown below are the numbers of
members who had certain numbers of tails. Is there evidence
that the coin flipping outcomes were different from what would
be expected if all the coins used were fair? (α = .01)
Number of Tails
Number of Members
0
65
1
182
2
194
3
59
Σ = 500
20
Chi-square – Example 1
• How many possible outcomes are there for
one trial?
HHH
HHT
HTH
THH
There are 8 possible outcomes
HTT
THT
THH
TTT
21
Chi-square – Example 1
• Of these eight possible outcomes, how many
involve getting 0 tails? Just one – HHH.
• How many involve getting 1 tail? 3 – HHT,
HTH, THH.
• How many involve getting 2 tails? 3 – HTT,
THT, TTH.
• How many involve getting 3 tails? 1 - TTT
22
Chi-square – Example 1
HO: P0 = .125, P1 = .375, P2 = .375, P3 = .125
HA: At least one P is different from the value specified
in HO.
Test statistic: χ 2 = Σ[ni – E(ni)]2
E(ni)
Rejection region: χ 2obt > χ 2crit = χ 2(.01, 3) = 11.3449
23
Chi-square – Example 1
Now we compute the expected values using (a)
the probabilities in HO and (b) our sample n:
P0 * 500 = .125 * 500 = 62.5
P1 * 500 = .375 * 500 = 187.5
P2 * 500 = .375 * 500 = 187.5
P3 * 500 = .125 * 500 = 62.5
24
Chi-square – Example 1
χ 2 = [65–62.5]2 + [182–187.5]2 + [194–187.5]2 + [59–62.5]2
62.5
187.5
187.5
62.5
= 0.68267
Decision: Do not reject. There is no evidence that the coin
flipping outcomes were different from what would be
expected if all the coins used were fair.
25
Chi-square – Example 2
There is an “old wives’ tale” that babies don’t tend to be born
randomly during the day but tend more to be born in the middle
of the night, specifically between the hours of 1 AM and 5 AM.
To investigate this, a researcher collects birth-time data from a
large maternity hospital. The day was broken into 4 parts:
Morning (5 AM to 1 PM), Mid-day (1 PM to 5 PM), Evening (5 PM
to 1 AM), and Mid-night (1 AM to 5 AM). The number of births at
these times for the last three months (January to March) are
shown on the next slide.
26
Chi-square – Example 2
Morning
110
Mid-day
50
Evening
100
Mid-night
100
Does it appear that births are not randomly distributed
throughout the day? (α = .01)
27
Chi-square – Example 2
• The critical thing about a chi-square question is
usually the expected values. In the previous
example, we computed the expected values on
the basis of probabilities of various outcomes for
a fair coin.
• In this question, expected values for the number
of births in each segment of the day will be based
on one variable: how long in hours is each
segment.
28
Chi-square – Example 2
Morning: 5 AM to 1 PM = 8 hours
Mid-day: 1 PM to 5 PM = 4 hours
Evening: 5 PM to 1 AM = 8 hours
Mid-night: 1 AM to 5 AM = 4 hours
These periods are not all equal in length!
29
Chi-square – Example 2
If time of day was irrelevant to when babies are
born, we would expect every period of, say, 4 hours
to produce the same number of babies. Since the
Morning and Evening segments each contain two 4hour periods and the Mid-day and Midnight
segments each contain one 4-hour period, our
expected values will be:
Morning
1/3
Mid-day
1/6
Evening
1/3
Midnight
1/6
30
Chi-square – Example 2
Our sample totals 360 babies. In 1/6 of a day (4
hours) we would expect 360/6 = 60 babies to be
born, under the null hypothesis, giving these
expected values for the four segments of the
day:
Morning
120
Mid-day
60
Evening
120
Midnight
60
31
Chi-square – Example 2
HO: Pmorn = 1/3, Pmidday = 1/6, Peven = 1/3, Pmidnight = 1/3
HA: At least one P different from value specified in HO.
Test statistic: χ 2 = Σ[ni – E(ni)]2
E(ni)
Rejection region: χ 2obt > χ 2crit = χ 2(.05, 3) = 7.81
32
Chi-square – Example 2
χ 2obt = [110-120]2 + … + [100-60]2
120
60
= 100 + 100 + 400 + 1600
120
60
120
60
33
Chi-square – Example 2
χ 2obt = 32.50
Decision: Reject HO. Births are not randomly
scattered throughout the day.