Transcript Slide 1
Allan Rossman
Beth Chance
Overview
What do want students to know and do at the
end of the course
Our dream content
What would we cut to have time to get there
Assumptions about current content in many courses
Reality vs. fantasy
Are we there yet?
JSM 2010
Top ten essentials
No client disciplines
Example assessment items
2
#1 Understand the statistical process
of investigation
Repeatedly experience the process as a
whole
1. Formulate research question
2. Collect data
3. Examine the data
4. Draw inferences from the data
5. Communicate the results
JSM 2010
3
#1 So what to cut?
Compartmentalizing the topics in the course
Data analysis, data collection, statistical inference
Instead: one categorical variable, compare two
groups on quantitative response…
Some specific techniques
JSM 2010
Example? Chi-square, ANOVA, regression
Possible out of class explorations
4
#2 Describe how to collect relevant
data to answer research question
Research question vs. variable
Do the data answer the question
Example: Songs about the heart
“Worry questions”
>> Make sure students have an opportunity to
write their own research questions and to
critique measurement/data collection
methods
JSM 2010
5
#2 So what to cut?
Ordinal, nominal, interval, ratio scales
Specifics of different sampling methods
(cluster, stratified) and experimental designs
Though do make sure they realize not everything
is an SRS or CRD
Acronyms!
JSM 2010
Short-hand terminology (e.g., sampling
distributions) and symbols (e.g., Ho/Ha)
6
#2 Assessment question
Pose a research question of interest to you that
involves comparing two groups (but not one we
discussed this quarter),
Identify observational units, explanatory and response
variable(s),
Describe a detailed plan to collect data to investigate
this question
JSM 2010
Be sure to provide a detailed enough plan that someone else
could carry out the actual data collection.
Explain whether (and why) your plan will involve
random sampling and/or random assignment, or
neither.
7
#3 Determine scope of conclusions
based on data collection methods
Random sampling: generalize to population
Random assignment: cause/effect between
explanatory and response variables
Some studies use only one, some (few) use
both, some (many) use neither
>> Get students in habit of always commenting
on both of these issues whenever they
summarize the conclusions of a study.
JSM 2010
8
#3 So what to cut?
Nothing; this point is too important
Move: Data collection issues to beginning of
course, descriptive analysis of bivariate
quantitative data to end of course
JSM 2010
Students can discuss confounding variables in
context of observational studies
9
#3 Assessment question
Students using cursive writing on the essay
portion of the SAT in 2005-06 scored
significantly higher, on average, than those who
used printed block letters.
Can you conclude that cursive writing causes higher
scores? Explain.
Different study: Identical essays were given to
graders, some with cursive writing and some
with printed block letters. Those with cursive
writing scored significantly higher.
JSM 2010
Can you conclude that cursive writing causes higher
scores? Explain.
10
#4 Appreciate value/necessity of
graphing data
Always start with a graph
Explain what see
Example: number of letters memorized
Make sure statements/conclusions about the
data follow from the graph
Sometimes the graph is enough!
JSM 2010
11
#4 So what to cut?
Pie charts
Choice of histogram bin width
But use technology explore different choices
Normal probability plots
Stemplots…
Boxplots!!
JSM 2010
12
#4 Assessment question
Did distribution of inter-eruption times of Old Faithful
change between 1978 and 2003?
JSM 2010
If so, how?
How are changes favorable for tourists?
How are changes less favorable for tourists?
What other interesting features are apparent, have
changed?
13
#5 Use proportional thinking
Especially important with categorical data,
two-way tables
Conditional proportions
Proportion vs. percentage vs. percentage change
vs. baseline risk vs. relative risk
Don’t need equal sample sizes to compare
proportions or averages
JSM 2010
Summary already takes sample size into account
to produce a “fair” comparison
14
#5 So what to cut?
Formal probability rules, counting rules
Instead use two-way tables of counts, proportions
Bayes’ rule
Simpson’s paradox
JSM 2010
15
#5 Assessment question
Data from murder trial of nurse Kristen Gilbert:
Gilbert working on shift
Death occurred on shift
40
Death did not occur on shift
217
Gilbert not working on shift
34
1350
Of the 74 shifts with a death, 40 (54.1%) were
Gilbert shifts, not significantly more than half.
JSM 2010
Is this a reasonable calculation to perform here, to assess
the evidence against Gilbert? Explain. If not, perform a
more relevant calculation and explain why it’s more
relevant.
16
#6 Develop distributional thinking
Conjecture how a variable will behave
Not everything follows a normal distribution
Example: Matching variables to graphs (ala ABS)
Appreciate the nature of variability
Think in terms of the distribution as an
“aggregate”
JSM 2010
Don’t let one value (data value or summary statistic)
drive a conclusion
Focus on tendency, effects of outliers
17
#6 So what to cut?
Mode
Relative frequency distributions
Cumulative distributions
1.5×IQR criterion for outliers
Details on calculating mean and median
JSM 2010
Have to start making students responsible for
having seen this before
18
#6 Assessment question
Which would have more variability: ages of
customers at McDonald’s near freeway or
ages of customers at snack bar on campus?
Explain.
JSM 2010
19
#6 Assessment question
Are pamphlets containing information for
cancer patients written at an appropriate level
that cancer patients can understand?
Analyze these data to address the research
question. Summarize and explain your
conclusions.
JSM 2010
20
#7 Consider variability in data when
making comparisons
Comparing a particular outcome to a constant
Comparing outcomes in two different groups
Standardization can be a special case
Using a measure of variability to produce “ruler”
for which we judge distances
JSM 2010
Standard deviation (z-score)
Box lengths…
21
#7 So what to cut?
Calculation of standard deviation by hand
Short-cut calculation formulas (SD,
correlation)
ANOVA table calculations
Linear transformations on summary statistics
JSM 2010
22
#7 Assessment question
Traffic
Deaths
year
Sketch a graph of data from 1950-1960 where the
change observed between 1955 and 1956 would be
considered noteworthy.
Now sketch a graph where the change observed
between 1955 and 1956 would not be considered
noteworthy.
JSM 2010
23
#8 Consider variation of statistics
when making comparisons
Averages vary less than individual values
Less and less with larger and larger samples
Larger samples give more precise estimates
Precision must be considered when making
conclusions
JSM 2010
Example: Three coin flips is not enough to decide
whether a coin is fair
24
#8 So what to cut
Rules for means and variances
/n
Central Limit Theorem
JSM 2010
Instead use simulations, graphs
25
#8 Assessment question
In a rodeo roping contest, a contestant’s
score is the average of two times. Explain
why it is more fair to use this combination of
two scores instead of relying only on one
score.
JSM 2010
26
#9 Understand the logic of inference
When can “chance” be eliminated as a plausible
explanation?
Consider chance variability due to random sampling
or random assignment
Strength of evidence vs. proof
Cobb (2007) argued that the reasoning process
of statistical significance can best be introduced
via simulation of randomization tests rather than
normal-based models
JSM 2010
“What if” distribution
27
#9 So what to cut?
Rejection region approaches
Tables of probability distributions
Randomization approach does not require
probability distributions
Even with traditional tests, technology can
calculate p-values, critical values
But still focus on well-labeled sketches of “what if”
distributions
Technical conditions
20-100% of specific (parametric) procedures
JSM 2010
28
#9 Assessment question
MythBusters: Is yawning contagious?
Yawn seed planted
Subject yawned
10
Subject did not yawn
24
Total
34
10/34 29%
Yawn seed not planted
4
12
16
4/16 25%
Total
14
36
50
Was MythBusters justified in concluding that
the data provide strong evidence that
yawning is contagious?
Conduct your own analysis
Explain reasoning process behind your conclusion
JSM 2010
29
#10 Consider margin of error
Importance of interval estimate not only a
point estimate
Focus on idea of interval of plausible values
More than simply assessing statistical significance
Estimate + 2 SE
Understand what parameter is being estimated
Issues that do/do not affect margin of error
JSM 2010
Random sampling
Sample size
Population size
30
#10 So what to cut?
Solving algebraically for sample size
Any level other than 95% confidence
Any multiplier other than 2!
Interpretation of “confidence level”
JSM 2010
31
#10 Assessment question
Suppose you want to estimate the proportion
of the over 305,000,000 Americans who
prefer cats to dogs within a 3% margin-oferror. Approximately what sample size would
you need with a random sample?
10
JSM 2010
1,000
100,000
1,000,000
10,000,000
32
#1 Assessment question
JSM 2010
What type of study was
this? Advantages and
disadvantages?
What graph could you
examine to summarize
these data?
What is meant by “a 16
percent decreased risk of
death”?
What does it mean for the
average life expectancy to
be “significantly” longer?
Is this an appropriate
headline? Explain.
33
Conclusions
Fun to start from ground zero
Make sure “stat methods” courses don’t
prevent “stat literacy”
Take advantage of computer/calculator
power
What is your bare minimum of essential content?
Emphasize interpretation over calculation
Assess what you value
JSM 2010
34
Questions?
Allan Rossman [email protected]
Beth Chance [email protected]
http://www.rossmanchance.com/jsm2010.ppt
JSM 2010
35