A Data-Oriented Active Learning Post-Calculus
Download
Report
Transcript A Data-Oriented Active Learning Post-Calculus
CAUSE Webinar:
Introducing Math Majors to Statistics
Allan Rossman and Beth Chance
Cal Poly – San Luis Obispo
April 8, 2008
Outline
Goals
Guiding principles
Content of an example course
Assessment
Examples (four)
April 8, 2008
CAUSE Webinar
2
Goals
Redesign introductory statistics course for
mathematically inclined students in order to:
Provide balanced introduction to the practice
of statistics at appropriate mathematical level
Better alternative than “Stat 101” or “Math
Stat” sequence for math majors’ first statistics
course
April 8, 2008
CAUSE Webinar
3
Guiding principles (Overview)
1.
2.
3.
4.
5.
6.
7.
Put students in role of active investigator
Motivate with real studies, genuine data
Repeatedly experience entire statistical
process from data collection to conclusion
Emphasize connections among study design,
inference technique, scope of conclusions
Use variety of computational tools
Investigate mathematical underpinnings
Introduce probability “just in time”
April 8, 2008
CAUSE Webinar
4
Principle 1: Active investigator
Curricular materials consist of investigations
that lead students to discover statistical
concepts and methods
Students learn through constructing own
knowledge, developing own understanding
Need direction, guidance to do that
Students spend class time engaged with
these materials, working collaboratively, with
technology close at hand
April 8, 2008
CAUSE Webinar
5
Principle 2: Real studies, genuine data
Almost all investigations focus on a recent
scientific study, existing data set, or student
collected data
Statistics as a science
Frequent discussions of data collection issues and
cautions
Wide variety of contexts, research questions
April 8, 2008
CAUSE Webinar
6
Real studies, genuine data
Popcorn and lung cancer
Historical smoking studies
Night lights and myopia
Effect of observer with
vested interest
Kissing the right way
Do pets resemble their
owners
Who uses shared armrest
Halloween treats
Heart transplant mortality
April 8, 2008
Lasting effects of sleep
deprivation
Sleep deprivation and car
crashes
Fan cost index
Drive for show, putt for
dough
Spock legal trial
Hiring discrimination
Comparison shopping
Computational linguistics
CAUSE Webinar
7
Principle 3: Entire statistical process
First two weeks:
Data collection
Descriptive analysis
Segmented bar graph
Conditional proportions, relative risk, odds ratio
Inference
Observation vs. experiment (Confounding, random assignment vs.
random sampling, bias)
Simulating randomization test for p-value, significance
Hypergeometric distribution, Fisher’s exact test
Repeat, repeat, repeat, …
April 8, 2008
Random assignment dotplots/boxplots/means/medians
randomization test
Sampling bar graph binomial normal approximation
CAUSE Webinar
8
Principle 4: Emphasize connections
Emphasize connections among study design,
inference technique, scope of conclusions
Appropriate inference technique determined by
randomness in data collection process
Simulation of randomization test (e.g., hypergeometric)
Repeated sampling from population (e.g., binomial)
Appropriate scope of conclusion also determined
by randomness in data collection process
April 8, 2008
Causation
Generalizability
CAUSE Webinar
9
Principle 5: Variety of computational tools
For analyzing data, exploring statistical concepts
Assume that students have frequent access to
computing
Not necessarily every class meeting in computer lab
Choose right tool for task at hand
Analyzing data: statistics package (e.g., Minitab)
Exploring concepts: Applets (interactivity,
visualization)
Immediate updating of calculations: spreadsheet
(Excel)
April 8, 2008
CAUSE Webinar
10
Principle 6: Mathematical underpinnings
Primary distinction from “Stat 101” course
Some use of calculus but not much
Assume some mathematical sophistication
E.g., function, summation, logarithm, optimization, proof
Often occurs as follow-up homework exercises
Examples
Counting rules for probability
Principle of least squares, derivatives to find minimum
Hypergeometric, binomial distributions
Univariate as well as bivariate setting
Margin-of-error as function of sample size, population
parameters, confidence level
April 8, 2008
CAUSE Webinar
11
Principle 7: Probability “just in time”
Whither probability?
Not the primary goal
Studied as needed to address statistical issues
Often introduced through simulation
Tactile and then computer-based
Addressing “how often would this happen by chance?”
Examples
April 8, 2008
Hypergeometric distribution: Fisher’s exact test for 2×2
table
Binomial distribution: Sampling from random process
Continuous probability models as approximations
CAUSE Webinar
12
Content of Example Course (ISCAM)
Chapter 1
Data Collection
Observation vs.
experiment,
confounding,
randomization
Descriptive
Statistics
Conditional
proportions,
segmented bar
graphs, odds
ratio
Probability
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Random
sampling, bias,
precision,
nonsampling
errors
Paired data
Quantitative
summaries,
transformations,
z-scores,
resistance
Bar graph
Models,
Probability
plots, trimmed
mean
Counting,
random
variable,
expected value
empirical rule
Bermoulli
processes, rules
for variances,
expected value
Normal, Central
Limit Theorem
Sampling/
Randomization
Distribution
Randomization
distribution for
Randomization
distribution for
Sampling
distribution for
X, p̂
Large sample
sampling
distributions for
x , p̂
Sampling
distributions of
pˆ 1 pˆ 2 , OR,
Model
Hypergeometric
Binomial
Normal, t
Normal, t, lognormal
Statistical
Inference
p-value,
significance,
Fisher’s Exact
Test
Binomial tests
and intervals,
two-sided pvalues, type I/II
errors
z-procedures for
proportions tprocedures,
robustness,
bootstrapping
Two-sample zChi-square for
and thomogeneity,
procedures,
independence,
bootstrap, CI for ANOVA,
CAUSE Webinar
OR
regression 13
April 8, 2008
pˆ 1 pˆ 2
x1 x2
p-value,
significance,
effect of
variability
Independent
random samples
Chapter 6
Bivariate
Scatterplots,
correlation,
simple linear
regression
x1 x2
Chi-square
statistic, F
statistic,
regression
coefficients
Chi-square, F, t
Assessments
Investigations with summaries of conclusions
Worked out examples
Practice problems
Homework exercises
Technology explorations (labs)
Quick practice, opportunity for immediate feedback,
adjustment to class discussion
e.g., comparison of sampling variability with stratified
sampling vs. simple random sampling
Student projects
Student-generated research questions, data collection
plans, implementation, data analyses, report
April 8, 2008
CAUSE Webinar
14
Example 1: Friendly Observers
Psychology experiment
Butler and Baumeister (1998) studied the effect of
observer with vested interest on skilled
performance
A: vested
interest
B: no vested
interest
Total
Beat
threshold
3
8
11
Do not beat
threshold
9
4
13
Total
12
12
24
pˆ A .250
pˆ B .667
How often would such an extreme experimental difference occur by
April 8, 2008chance, if there was no vested interest effect?
CAUSE Webinar
15
Example 1: Friendly Observers
Students investigate this question through
Hands-on simulation (playing cards)
Computer simulation (Java applet)
Mathematical model
counting techniques
11 13 11 13 11 13 11 13
3 9
2 10
1 11
0 12
p value P ( X 3) .0498
24
12
April 8, 2008
CAUSE Webinar
16
Example 1: Friendly Observers
Focus on statistical process
Data collection, descriptive statistics, inferential analysis
Connection between the randomization in the design and the
inference procedure used
Scope of conclusions depends on study design
Arising from genuine research study
Cause/effect inference is valid
Use of simulation motivates the derivation of the
mathematical probability model
Investigate/answer real research questions in first two weeks
April 8, 2008
CAUSE Webinar
17
Example 2: Sleep Deprivation
Physiology Experiment
Stickgold, James, and Hobson (2000) studied the
long-term effects of sleep deprivation on a visual
discrimination task (3 days later!)
sleep condition
deprived
unrestricted
n
11
10
Mean
3.90
19.82
StDev
12.17
14.73
Median
4.50
16.55
IQR
20.7
19.53
How often would such an extreme experimental difference occur by
April 8, 2008chance, if there was no sleep deprivation effect?
CAUSE Webinar
18
Example 2: Sleep Deprivation
Students investigate this question through
Hands-on simulation (index cards)
Computer simulation (Minitab)
Mathematical model
p-value .002
April 8, 2008
p-value=.0072
CAUSE Webinar
15.92
19
Example 2: Sleep Deprivation
Experience the entire statistical process
again
Tools change, but reasoning remains same
Develop deeper understanding of key ideas
(randomization, significance, p-value)
Tools based on research study, question – not for
their own sake
Simulation as a problem solving tool
Empirical vs. exact p-values
April 8, 2008
CAUSE Webinar
20
Example 3: Infants’ Social Evaluation
Sociology study
Hamlin, Wynn, Bloom (2007) investigated whether infants
would prefer a toy showing “helpful” behavior to a toy
showing “hindering” behavior
Infants were shown a video with these two kinds of toys,
then asked to select one
14 of 16 10-month-olds selected helper
Is this result surprising enough (under null model of
no preference) to indicate a genuine preference for
the helper toy?
Example 3: Infants’ Social Evaluation
Simulate with coin flipping
Then simulate with applet
Example 3: Infants’ Social Evaluation
Then learn binomial distribution, calculate exact pvalue
p value P( X 14)
16 14
16 15
16 16
2
1
0
.5 1 .5 .5 1 .5 .5 1 .5
14
15
16
.0021
Distribution Plot
Binomial, n=16, p=0.5
0.20
0.15
Probability
0.10
0.05
0.00
0.00209
2
X = number who choose helper toy
14
Example 3: Infants’ Social Evaluation
Learn probability distribution to answer inference
question from research study
Again the analysis is completed with
Modeling process of statistical investigation
Tactile simulation
Technology simulation
Mathematical model
Examination of methodology, further questions in study
Follow-ups
Different number of successes
Different sample size
Example 4: Sleepless Drivers
Sociology case-control study
Connor et al (2002) investigated whether those in
recent car accidents had been more sleep
deprived than a control group of drivers
April 8, 2008
No full
night’s sleep
in past week
At least one full
night’s sleep in
past week
Sample sizes
“case” drivers
(crash)
61
510
571
“control” drivers
(no crash)
44
544
588
CAUSE Webinar
25
Example 4: Sleepless Drivers
Sample proportion that were in a car crash
Sleep deprived: .581
Not sleep deprived: .484
Odds ratio: 1.48
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
no crash
crash
No full night’s sleep in past
week
At least one full night’s
sleep in past week
How often would such an extreme observed odds ratio occur by
April 8, 2008chance, if there was no sleep deprivation effect?
CAUSE Webinar
26
Example 4: Sleepless Drivers
Students investigate this question through
Computer simulation (Minitab)
Empirical sampling distribution of odds-ratio
Empirical p-value
Approximate mathematical model
April 8, 2008
CAUSE Webinar
1.48
27
Example 4: Sleepless Drivers
1 1 1 1
a b c d
SE(log-odds) =
Confidence interval for population log odds:
sample log-odds + z* SE(log-odds)
Back-transformation
90% CI for odds ratio: 1.05 – 2.08
April 8, 2008
CAUSE Webinar
28
Example 4: Sleepless Drivers
Students understand process through which
they can investigate statistical ideas
Students piece together powerful statistical
tools learned throughout the course to derive
new (to them) procedures
Concepts, applications, methods, theory
April 8, 2008
CAUSE Webinar
29
For more information
Investigating Statistical Concepts,
Applications, and Methods (ISCAM),
Cengage Learning, www.cengage.com
Instructor resources:
www.rossmanchance.com/iscam/
Solutions to investigations, practice problems,
homework exercises
Instructor’s guide
Sample syllabi
Sample exams
April 8, 2008
CAUSE Webinar
30