Chapter 8 Slides

Download Report

Transcript Chapter 8 Slides

ANALYZING MORE
GENERAL SITUATIONS
UNIT 3
Unit Overview



In the first unit we explored tests of significance,
confidence intervals, generalization, and
causation mostly in terms of a single proportion.
In the second unit we compared two proportions,
two averages (independent), and paired data. All
of these had a binary categorical explanatory
variable.
In the third unit we will expand on this to
compare multiple proportions, multiple averages,
and two quantitative variables. The explanatory
variable will be categorical (not necessarily
binary) or quantitative.
Unit Overview
Throughout this unit we can still express
the null hypothesis in terms of no
association between the response and
the explanatory variable and the
alternative hypothesis in terms of an
association between the response and
the explanatory variable.
 We can also be more specific in our
hypotheses to reflect the type of data
(and hence parameters) we are working
with.

Comparing More Than Two
Groups Using Proportions
Chapter 8
Chapter Overview



In chapter 5, our explanatory variable had
two outcomes (like parents smoke or not)
and the response variable also had two
outcomes (like baby is boy or girl).
In this chapter we will allow the explanatory
variable to have more than two outcomes
(like both parents smoke, only mother
smokes, only father smokes, or neither
parent smokes).
We will still focus on the response variable
having two outcomes, but it doesn’t have to.
It can have many as well.
Section 8.1:
Simulation-Based Approach to
Compare Multiple Proportions
Example 8.1
Coming to a Stop
Stopping
Virginia Tech students investigated which
vehicles came to a stop at a intersection
where there was a four-way stop.
 While the students examined many factors
for an association with coming to a
complete stop, we will investigate whether
intersection arrival patterns are associated
with coming to a complete stop:




Vehicle arrives alone
Vehicle is the lead in a group of vehicles
Vehicle is a follower in a group of vehicles
Stopping
Null hypothesis: There is no association
between the arrival pattern of the vehicle
and if it comes to a complete stop.
 Alternative hypothesis: There is an
association between the arrival pattern of
the vehicle and if it comes to a complete
stop.

Stopping



Another way to write the null uses the
probability that a single vehicle will stop is
the same as the probability a lead vehicle
will stop, which is the same as the longterm probability that a following vehicle
will stop
Or 𝜋Single = 𝜋Lead = 𝜋Follow where 𝜋 is the
probability a vehicle will stop.
The alternative hypothesis is that not all
these probabilities are the same (at least
one is different).
Stopping

Percentage of vehicles
stopping:



85.8% of single
vehicles
90.5% of lead vehicles
77.6% of vehicles
following in a group
Complete
Stop
Not Complete
Stop
Total
Single
Vehicle
151
(85.8%)
25
(14.2%)
176
Lead
Vehicle
38
(90.5%)
4
(9.5%)
42
Following
Vehicle
76
(77.6%)
22
(22.4%)
98
Total
265
51
316
Stopping
Remember that no association implies the
proportion of vehicles that stop in each
category should be the same.
 Our question is, if the same proportion of
vehicles come to a complete stop in all the
three categories, how unlikely would we
get proportions at least as far apart as we
did?

Stopping
Applying the 3S Strategy
 We need to find a statistic that will describe
how far apart our proportions are from each
other.
 This is more complicated than when we just
had two groups.
 We also need to decide what types of values
for that statistic (e.g., large or small, positive
or negative) we would consider evidence
against the null hypothesis.
Stopping

To find a statistic we start by finding the
three differences in proportions.
Stopping
How can we combine 3 differences
(-0.047, -0.082 and 0.129) into a single
statistic?
 Add them up? Average them?




-0.047 + (-0.082) + 0.129 = 0.
[-0.047 + (-0.082) + 0.129]/3 = 0.
What could we do so we don’t always get
a sum of zero?
Stopping
1. Statistic
 We are going to use the mean of the
absolute value of the differences (MAD)
 (0.047 + 0.082 + 0.129)/3 = 0.086.
 What would have to be true for the
average of absolute differences to equal 0?
 What types of values of this statistic (e.g.,
large or small) would provide evidence in
favor of the alternative hypothesis?
Stopping
2. Simulate
 If there is no association between arrival
pattern and whether or not a vehicle stops
it basically means it doesn’t matter what
the arrival pattern is. Some vehicles will
stop no matter what the arrival pattern
and some vehicles won’t.
 We can model this by shuffling either the
explanatory or response variables. (The
applet will shuffle the response.)
Stopping

Use the Multiple Proportions Applet
Stopping

The results of one shuffle of the response
Stopping
Simulated values of the statistic for 1000
shuffles
 Bell-shaped?
 Centered at 0?

Stopping
3. Strength of evidence
 Do we use a 1-sided or 2-sided alternative
hypothesis to compute a p-value?



By finding the absolute values we have lost
direction in terms of which proportion is
smaller than another.
When we are looking for more of a difference,
the MAD statistic will be larger.
Hence to calculate our p-value we will
always count the simulations that are as
large or larger than the MAD statistic.
Stopping


We had a p-value of 0.0854 so there is moderate
evidence against the null hypothesis
Do the results generalize to intersections beyond
the one used?


Can we draw any cause-and-effect conclusions?


Probably not since intersections have different factors
that influence stopping
No, an observational study
If we had stronger evidence of a difference in
groups, we could follow with pairwise tests to see
which proportions are significantly different from
each other. (We will save this until the next
section.)
Recruiting Organ Donors
Exploration 8.1
Theory-Based Approach to
Compare Multi-Category
Categorical Variables
Section 8.2
Theory based vs. Simulation based
Just as always:
 Simulation based methods



Always work
Require the ability to simulate (computer).
Theory-based methods


Avoid the need for simulation
Additional ‘validity conditions’ must be met
Other Distributions





The null distributions in the past few chapters
were bell-shaped and centered at 0.
For these, we used normal and t-distributions
to predict what the null distribution would
look like.
In this chapter our null distribution was
neither bell-shaped nor centered at 0.
We can, however, use a chi-squared
distribution to predict the shape of the null
distribution.
When doing this, we will not use the MAD
statistic, but a chi-square statistic.
Sham Acupuncture

A randomized experiment was conducted
exploring the effectiveness of acupuncture
in treating chronic lower back pain (Haake et
al. 2007).

Acupuncture inserts needles into the skin
of the patient at acupuncture points to
treat a variety of ailments.
Sham Acupuncture
3 treatment groups
1. Verum acupuncture: traditional Chinese
2. Sham acupuncture: needles inserted into
the skin, but not deeply and not at
acupuncture points
3. Traditional, non-acupuncture, therapy of
drugs, physical therapy and exercise.
 1162 patients were randomly assigned to
each treatment group
 387 patients in groups 1 and 2, and 388 in
group 3.

Sham Acupuncture
Null hypothesis - no association between
type of treatment received and reduction in
back pain.
Alternative hypothesis - there is an
association between type of treatment and
reduction in back pain.
Sham Acupuncture

Here are the results.
Pain reduced
Pain not
reduced
Total
Real
Acup.
Sham
Acup.
NonAcup.
Total
184
(0.475)
203
(0.524)
387
171
(0.442)
216
(0.558)
387
106
(0.273)
282
(0.776)
388
461
701
1162
Sham Acupuncture
Remember that the MAD statistic is the
mean of the absolute value of differences
in proportions for all 3 groups:
Real vs. Sham: 0.476-0.442=0.034
Real vs. None:0.476-0.274= 0.202
Sham vs. None: 0.442-0.274= 0.168
 The statistic (MAD) is
(0.034+0.202+0.168)/3=0.135
 Larger MAD statistics give more evidence
against the null

Sham Acupuncture
The other statistic we saw, chi-square is
calculated using the formula.


1
2

  n i ( pˆ i  pˆ )
 pˆ (1  pˆ ) 
𝑝𝑖 is the proportion of successes in
category i
 𝑝 is the overall proportion of successes in
the dataset.
 𝑛𝑖 is the sample size of category i

Sham Acupuncture



Don’t worry about the formula. Just realize it
is another way to measure how far apart the
proportions are from each other (or the
overall proportion of successes).
Just like the MAD statistic, the larger the chisquare statistic the more evidence there is
against the null.
Let’s test this in the Multiple Proportions
Applet using both MAD and chi-square
simulation.
Sham Acupuncture
Strength of evidence:
 In both cases, we got p-values of 0.
 Nothing as large as a MAD statistic of 0.135
or larger ever occurred in this simulation.
 Likewise nothing as large as a chi-square
statistic of 38.05 or larger ever occurred in
this simulation.
 Hence we have very strong evidence against
the null and in support of the type of
acupuncture used is associated with pain
reduction.
Sham Acupuncture
The theory-based method will predict the
chi-square null distribution.
 We can use the applet to overlay a
theoretical chi-square distribution and find
the theory-based p-value.
 The theory based method only works well
when each cell in the 2-way table has at
least 10 observations.
 This is easily met here. The smallest
count was 106.

Results
Sham Acupuncture
What if you find evidence of an
association?
 What happens after a chi-squared test?



No standard approach to follow
Could describe the association with confidence
intervals on each of the differences in
proportions.
Sham Acupuncture

95% confidence intervals (found in applet) on
the difference improvement percentages
comparing:






Real to sham (-0.0366, 0.1038)
Real to none (0.1356,0.2689)*
Sham to none (0.1022, 0.2351)*
There is evidence that the probability of pain
reduction is different (lower) for no
acupuncture treatment than the other two.
There is no significant difference between real
and sham acupuncture however.
Let’s see all this in the applet.
Exploration 8.2: Conserving Hotel Towels



Hotels are encouraging guests to practice
conservation by not having their towels
washed.
(Goldstein et al., 2008) conducted a randomized
experiment to investigate how different
phrasings on signs placed on bathroom towel
racks impacted towel reuse behavior.
Researchers were interested how messages
which communicated different types of “social
norms” impacted towel reuse.
Exploration 8.2: Conserving Hotel Towels
Rooms at a single
hotel were randomly
assigned to receive 1
of 5 messages on the
sign on the towel bar
 Is there an association
between the message
left and whether or
not towels get reused?
