JMM13 - Hope College | Department of Mathematics

Download Report

Transcript JMM13 - Hope College | Department of Mathematics

Requisite Knowledge for Teachers,
Assessment Questions, Technology
Allan Rossman, Soma Roy, Beth Chance
Dept of Statistics, Cal Poly – San Luis Obispo
[email protected]
[email protected]
[email protected]
Cal Poly

STAT 217, 221




Algebra-based, wide variety of majors
Several instructors (Chance, McGaughey, Rossman,
Roy) use simulation/randomization-based curriculum
Curricular materials (ISI) developed with Tintle,
Cobb, Swanson, VanderStoep (Wiley, to appear)
STAT 301


Calculus-based, mostly Math and Stat majors
Use simulation/randomization-based ISCAM
materials (Chance and Rossman, 2nd ed.)
JMM 2013
2
Question on faculty interviews

Most challenging topic to teach in Stat 101?

“Right” answer: Sampling distributions



Central to understanding statistical inference:
What would happen with repeated random sampling?
Very difficult cognitive step for students:
Seeing proportion or average not as a number but as a
(random) variable with a distribution
Equally “right” answer: Randomization distributions


Analogous to sampling distributions:
What would happen with repeated random assignment?
Many teachers have not studied
JMM 2013
3
1. What is a randomization distribution?

Sleep deprivation study
JMM 2013
4
Challenges in understanding this graph

What’s the variable?


Difference in group means
What are the observational units?

Random assignments
JMM 2013
5
What to look for in this graph?

Shape



We really don’t care
Only relevant to justify using
theoretical approximation (t)
a
Center


We really don’t care
Only relevant to confirm that we correctly
simulated under the null (0)
JMM 2013
6
What to look for? (cont)

Variability



This we care about
This tells us what values of the statistic are
typical, what values are unusual, what values are
rare when the null model is true
Seeing where observed value of statistic falls in
the randomization (null) distribution determines
p-value, strength of evidence
JMM 2013
7
Exact randomization distribution

Distribution of differences in group means for
all 352,716 possible random assignments

Appropriate for calculus-based class
JMM 2013
8
2. Why simulate?


Study: 14 of 16 infants chose “nice” over
“mean” toy
Two possible explanations



Infants have genuine preference for nice toy
Infants choose randomly
Why simulate?


To investigate what could have happened by
chance alone (random choices), and so …
To assess plausibility of “randomly choose”
hypothesis
JMM 2013
9
Why simulate? (cont)

Non-trivial for students to understand, articulate
motivation for simulation



Especially understanding that simulation is conducted
assuming null to be true, in order to assess plausibility
Some students think that simulation means
replicating the study, generating a larger sample
Some students think any use of software (e.g.,
to calculate a t-statistic) constitutes a simulation
JMM 2013
10
3. Four more examples


A:
B:

C:

D:
Lied
Did not lie
Total
Praised for intelligence
11
18
29
Praised for effort
4
26
30
Native born
Born elsewhere
Total
Year 1950
219
281
500
Year 2000
258
242
500
Donated blood
Did not donate
Total
Male
126
497
623
Female
104
609
713
Patient death
No death
Total
Gilbert on shift
40
217
257
Gilbert not on shift
34
1350
1384
JMM 2013
11
Four more examples (cont)

Identical structure



Two binary variables, 2×2 table of counts
Could apply two-proportion z-test, chi-square test
But there’s a very important difference …
JMM 2013
12
Four more examples (cont)
Lied
Did not lie
Total
Praised for intelligence
11
18
29
Praised for effort
4
26
30
Native born
Born elsewhere
Total
Year 1950
219
281
500
Year 2000
258
242
500
Donated blood
Did not donate
Total
Male
126
497
623
Female
104
609
713
Patient death
No death
Total
Gilbert on shift
40
217
257
Gilbert not on shift
34
1350
1384

Very different uses of
randomness




A: Random assignment
B: Independent random samples
C: One random sample
D: No randomness
JMM 2013
13
Four more examples (cont)
Lied
Did not lie
Total
Praised for intelligence
11
18
29
Praised for effort
4
26
30
Native born
Born elsewhere
Total
Year 1950
219
281
500
Year 2000
258
242
500
Donated blood
Did not donate
Total
Male
126
497
623
Female
104
609
713
Patient death
No death
Total
Gilbert on shift
40
217
257
Gilbert not on shift
34
1350
1384

Very different scope of
conclusions to be drawn

A: Cause/effect

B: Generalize to two popns
C: Generalize to one popn
D: Rule out “random chance”
explanation


JMM 2013
14
Four more examples: Key question

Should inference method mimic the
randomness in data collection?




A: Randomization test
B: Random sample/bootstrap with fixed margin
C: Random sample/bootstrap with only total fixed
D: ???


Permutation test
I’ve answered yes for calc-based course, no
for algebra-based course

But I think instructors should be aware of issue
JMM 2013
15
4. New example, three approaches

Halloween treat study:



148 chose candy, 135 chose toy
Two-sided test of equal likeliness
Three approaches



Simulation-based approximation
Normal-based approximation
Exact binomial calculation
JMM 2013
16
Simulation-based approximation

Produces various approximate p-values
JMM 2013
17
Normal-based z-test

Produces one approximate p-value: .4396
JMM 2013
18
Exact binomial p-value

Produces one exact answer: .4758
JMM 2013
19
Simulation vs. theory vs. exact

Expect algebra-based students to understand
simulation and theory



Maybe it’s enough to expect them to select, apply
any relevant test procedure for one-proportion setting
But expect calculus-based students to
understand relationships among 3 approaches
Notice that simulation “beats” theory here
JMM 2013
20
5. How to estimate parameter?


Do not reject the value .5 for parameter in
Halloween treat example
What other potential values of parameter
would not be rejected?




Perform more tests, using either simulation or
normal approximation or exact binomial
Using some pre-specified significance level
Define as plausible the values not rejected
Confidence interval for parameter: values not
rejected by test
JMM 2013
21
Summary: What teachers need to know


Randomization distribution
Motivation for simulation


Connections among data collection, inference
procedure, scope of conclusions



Common student misconceptions
Subtle distinctions among simulation methods
Simulation vs. theory vs. exact
Confidence interval as inversion of test;
interval of plausible values
JMM 2013
22
Assessment:
Our Favorite Questions – on homework,
quizzes, project reports, exams

Describe how to use simulation/randomization to calculate
a p-value

Interpret a p-value in the context of the study

State an appropriate conclusion in the context of the study
Note: You are welcome to use any or all of the example
questions. If you do, we would be thrilled if you could share
your results with us. Thanks!
JMM 2013, San Diego
23
Describe how to use simulation to calculate a
p-value


Example 1: “A research question of interest is whether
financial incentives can improve performance. Alicia designed
a study to test whether video game players are more likely to
win on a certain video game when offered a $5 incentive
compared to when simply told to “do your best.” Forty subjects
were randomly assigned to one of two groups, with one group
being offered $5 for a win and the other group simply being
told to “do your best.” She collected the following data from her
study:
$5 incentive
“Do your best”
Total
Win
16
8
24
Lose
4
12
16
Total
20
20
40
Explain in detail how you would conduct a simulation (say with
pennies or dice or index cards) to obtain a p-value for this
research question.”
JMM 2013, San Diego
24
Describe how to use simulation to
calculate a p-value(contd.)

What we look for in the answer:

Simulation assumes null is true

Simulation mimics the randomness of the study design
(depending on course)

Records the value of an appropriate statistic for many
repetitions

Compares value of observed statistic to simulated
randomization distribution, to calculate the proportion of
simulated values of statistic at least as extreme than the
observed statistic
JMM 2013, San Diego
25
Interpret a p-value

Example 2: In an article published in Psychology of Music
(2010), researchers reported the results of a study conducted to
investigate the effects of “romantic lyrics on compliance with a
courtship request.” … When a participant came in for the study,
she was randomly assigned to listen to either a romantic song or
a neutral song. After three minutes, she was greeted by a male
“confederate” … who … asked for her phone number so that he
could call her up to ask her out.

Of the 44 women who listened to the romantic song, 23 gave
their phone numbers, whereas of the 43 who listened to the
neutral song, only 12 did.

The p-value is computed to be 0.0103. Interpret this p-value in
the context of the study.
JMM 2013, San Diego
26
Interpret a p-value (contd.)

What we look for in the answer:
 Similar to what look for when describing simulation
 What the p-value is a probability of - four components
 Assuming the null hypothesis is true
 Randomness due to study design
 Compares observed statistic to simulated randomization
distribution
 Direction (as or more extreme)

All in context
JMM 2013, San Diego
27
State an appropriate conclusion

Example 3: To investigate whether there is an association
between happiness and income level, we will use data from
the 2002 General Social Survey (GSS), cross-classifying a
person’s perceived happiness with their family income
level. The GSS is a survey of randomly selected U.S.
adults who are not institutionalized.
The p-value is found to be < 0.001.

State your conclusion in the context of the study.

JMM 2013, San Diego
28
State an appropriate conclusion (contd.)



What we check that the answer addresses, (where applicable)
and includes appropriate justification for
 (S) Statistical significance
 (E) Estimation, i.e, statistical confidence
 (C) Causation (was random assignment used?)
 (G) Generalizability (whom does the sample represent?)
All in context
Student response:
JMM 2013, San Diego
29
More Examples
JMM 2013, San Diego
30

Example 4 (from AP Statistics):
 New statistic = mean / median.
 Give simulation results for values of mean / median, based
on a normal population


Give the observed value of mean / median for given
sample data.
Do the simulation results suggest that the underlying
population is skewed to the right? Explain.
JMM 2013, San Diego
31
Example 5: Present a graph of the null distribution and ask
 At what number should the graph center? Why?
 Shade the region of the graph that represents the pvalue.
 Calculate the approximate p-value.
 Which aspect of the graph tells you whether the study
results are statistically significant: (a) Shape, (b) Center,
or (c) Variability?
JMM 2013, San Diego
32
For example,
Do incentives work? …A national sample of 735 households was
randomly selected, … 368 households were randomly assigned
to receive a monetary incentive along with the advance letter,
and the other 367 households were assigned to receive only the
advance letter; 286 in the incentive and 245 in the no-incentive
group responded to the telephone survey.

Context: Difference in proportions
• Center makes sense? Why?
• Shade region denoting p-value.
• Are the results from the study
statistically significant? How are you
deciding? (a) Shape, (b) Center, or (c)
Variability
JMM 2013, San Diego
33
Example 6: Present 2+ graphs and ask which represents the
null distribution, where
 One graph centers at the hypothesized value
 One graph centers at the observed statistic
 Perhaps include a third graph that centers at some
commonly used null value (for e.g. 0.5 for proportions,
and 0 for difference in means) which is not the correct
null value for current study.
JMM 2013, San Diego
34
For example,
In playing Rock-Paper-Scissors against the instructor, 8 out of
42 students picked scissors. To test whether these data provide
evidence that such players tend to pick scissors less often than
would be expected by random chance, which graph (A, B, or C)
is the appropriate null distribution?
Center @ 14
(1/3 of 42)
Center @ 21
(1/2 of 42)
Center @ 8
(observed count)
JMM 2013, San Diego
35

Example 7: Consider the following output.
Make up a research question for which this is plausible output.
Clearly specify what the observational units and variable(s) in
the study are. Do not use any of the contexts discussed in your
lecture notes, labs, assignments, or any practice material.
Goal of question: Can students correctly identify the setting?
How many variables and what type of variables? Direction of
alternative (one-sided vs. two-sided)?
JMM 2013, San Diego
36
Multiple Choice

Sample Mean
size
score
20
98
20
80
Example 8:
$5 incentive
“do your best”
 Randomized experiment
 Could the difference in sample means (18) have occurred
by chance alone?
 Describes a tactile (card shuffling) simulation process and
gives the following graph based on 1000 repetitions
JMM 2013, San Diego
37
Example 8 (contd.)
What does the histogram tell you about whether $5 incentives
are effective in improving performance on the video game?
a)
The incentive is not effective because the distribution of
differences generated is centered at 0.
b)
The incentive is effective because distribution of differences
generated is centered at 0.
c)
The incentive is not effective because the p-value is greater
than .05.
d)
The incentive is effective because the p-value is less than
.05.

JMM 2013, San Diego
38
Example 8 contd.
1. Explanation for the simulation process?
a)
Allows to determine whether the normal distribution fits the data.
b)
Allows to compare actual result to what could have happened by
chance if gamers’ performances were not affected by treatment
c)
Allows to determine the % of time the $5 incentive strategy
would outperform the “do your best” strategy for all possible
scenarios.
d)
Allows to determine how many times she needs to replicate the
experiment for valid results.
JMM 2013, San Diego
39
Example 8 (contd.)
2. Which of the following was used as a basis for simulating the
data 1000 times?
a)
The $5 incentive is more effective than verbal
encouragement for improving performance.
b)
The $5 incentive and verbal encouragement are equally
effective at improving performance.
c)
Verbal encouragement is more effective than a $5 incentive
for improving performance.
d)
Both (a) and (b) but not (c).

JMM 2013, San Diego
40
Example 8 contd.
3. Approximate p-value in this situation? Recall that the research
question believes that the incentive improves performance.
a)
0.220  divided by 100 instead of 1000
b)
0.047  2-sided p-value instead of 1-sided
c)
0.022  correct answer
d)
0.001  plain wrong
JMM 2013, San Diego
41
Example 8 (contd.)
5. Which of the following is the appropriate interpretation of the
p-value?
a)
The p-value is the probability that the $5 incentive is not
really helpful.
b)
The p-value is the probability that the $5 incentive is really
helpful.
c)
The p-value is the probability that she would get a result as
extreme as she actually found, if the $5 incentive is really not
helpful.
d)
The p-value is the probability that a student wins on the video
game.
JMM 2013, San Diego
42
Example 9: You want to investigate a claim that women are more
likely than men to dream in color. You take a random sample of
men and a random sample of women (in your community) and ask
whether they dream in color, and compare the proportions of each
gender that dream in color.
1) If the difference in the proportions (who dream in color) between
the two samples turns out not to be statistically significant, which of
the following is the best conclusion to draw? (Circle one.)
a)
You have found strong evidence that there is no difference
between the proportions of men and women in your community
that dream in color.
b)
You have not found enough evidence to conclude that there is a
difference between the proportions of men and women in your
community that dream in color
c)
Because the result is not significant, the study does not support
any conclusion
JMM 2013, San Diego
43
Example 9 (contd.):
2) If the difference in the proportions (who dream in color) between
the two samples does turn out to have a small p-value, which one
of the following is the best interpretation? (Circle one.)
a)
It would not be very surprising to obtain the observed sample
results if there is really no difference between men and women
in your community
b)
It would be very surprising to obtain the observed sample
results if there is really no difference between men and women.
c)
It would be very surprising to obtain the observed sample
results if there is really a difference between men and women.
d)
The probability is very small that there is no difference between
men and women in your community on this issue.
e)
The probability is very small that there is a difference between
men and women in your community on this issue.
JMM 2013, San Diego
44
3) Suppose that the difference between the sample groups turns out not to
be significant, even though your review of the research suggested that there
really is a difference between men and women. Which conclusion is most
reasonable?
a)
Something went wrong with the analysis.
b)
There must not be a difference after all.
c)
The sample size might have been too small to detect a difference even if
there is one.
4) Suppose that two different studies are conducted on this issue.
 Study A finds that 40 of 100 women sampled dream in color, compared to
20 of 100 men.
 Study B finds that 35 of 100 women dream in color, compared to 25 of
100 men.
 Which study (A or B) provides stronger evidence that there is a difference
between men and women on this issue?
a)
Study A
b)
Study B
c)
The strength of evidence would be similar for these two studies.
JMM 2013, San Diego
45
5) Suppose that two more studies are conducted on this issue. Both
studies find that 30% of women sampled dream in color, compared to
20% of men. But Study C consists of 100 people of each sex, whereas
Study D consists of 40 people of each gender. Which study provides
stronger evidence that there is a difference between men and women on
this issue?
a)
Study C
b)
Study D
c)
The strength of evidence would be similar for these two studies.
6) If the difference in the proportions (who dream in color) between the
two samples does turn out to be statistically significant, which of the
following is a possible explanation for this result?
a)
Men and women in your community do not differ on this issue but
there is a small probability that random chance alone led to the
difference we observed between the two samples.
b)
Men and women in your community differ on this issue.
c)
Either (a) or (b) are possible explanations for this result.
JMM 2013, San Diego
46
7) Reconsider the previous question. Now think not about possible
explanations but plausible (believable) explanations. If the difference in
the proportions (who dream in color) between the two samples does turn
out to be statistically significant, which of the following is the more
plausible explanation for this result?
a)
Men and women in your community do not differ on this issue but
there is a small chance that random sampling alone led to the
difference we observed between the two groups.
b)
Men and women in your community differ on this issue.
c)
They are both equally plausible explanations for this result.
JMM 2013, San Diego
47
Technology

Desired features
 Transparency: lets student “visualize” what the
simulation/randomization is doing
 Easy to use: Aids student learning, and does not become
an obstacle in the path to learning statistics. Supervision
useful, but not imperative.
 Easily available: Can be run on different platforms, and is
affordable
 Provides choices to the user: For e.g. lets user pick the
direction to look at to calculate p-value.
 Interactive: With pop-up message asking, “Are you sure
that’s the direction?” or “Entry does not match table”
JMM 2013, San Diego
48
Technology as a teaching tool

How much of the thinking do we want the students to do
when they are using technology?
JMM 2013, San Diego
49
Example:
vs.
JMM 2013, San Diego
50
Example:
JMM 2013, San Diego
51
Example:
JMM 2013, San Diego
52
That’s all folks!
JMM 2013, San Diego
53