But…does it work? Do students truly learn the material better?

Download Report

Transcript But…does it work? Do students truly learn the material better?





Small liberal arts college: 1350
undergraduate students
Statistician within Department of Math, Stat
and CS
Class size: Stat 131 (30 students), 5-6
sections per year
3 hours per week in computer or techenabled classroom

What we know about randomization
approaches

What we don’t

What it means

Tintle et al. flavor (2013 version)
◦ Unit 1. Inference (Single proportion)
◦ Unit 2. Comparing two groups
 Means, proportions, paired data
 Descriptives, simulation/randomization, asymptotic
◦ Unit 3. Other data contexts
 Multiple means, multiple proportions, two quantitative
variables
 Descriptives, simulation/randomization, asymptotic

Qualitative
◦ Momentum:
Attendance at conference sessions, workshops
Publishers agreeing to publish the books
Class testers/inquiries
People doing this in their classrooms (clients,
colleagues)
 Repeat users




 Appealing “in principle” and based on testimonials to
date

Quantitative assessment

Tintle et al. (2011, 2012)
◦ Compare early version of curriculum (2009) to
traditional curriculum at same institution as well as
national sample
◦ 40 question CAOS test
◦ Results
 Better student learning outcomes in some areas
(design and inference); little evidence of declines
Pre-test: 50-60% correct
Post-test answer National sample
Hope -2007
Hope-2009
Small p-value
86%
96%
68%
Sample sizes: Hope ~200 per group; National Sample 760
P<0.001 between cohorts
Example #1. Proportion of students correctly identifying that researchers
want small p-value’s if they hope to show statistical significance

2012-13 results
14 instructors, 7 institutions
Total combined sample size of 783
Instructor
(Inst, Class size)
Pre-test
Post-test
Change
Sample size
1 (LA, Med)
70%
97%
27%
33
2 (LA, Med)
73%
95%
22%
26
3 (Univ, Med)
23%
95%
72%
40
4 (LA, Med)
70%
96%
26%
127
5 (LA, Sm)
28%
92%
64%
11
6 (Univ, Med)
37%
96%
59%
49
7 (Univ, Sm)
39%
73%
34%
23
8 (LA, Med)
60%
97%
37%
35
9 (LA, Med)
29%
96%
67%
95
10 (HS, Med)
24%
74%
50%
38
11 (Univ, Large)
68%
97%
29%
101
12 (LA, Med)
63%
93%
30%
92
13 (LA, Med)
28%
95%
68%
18
14 (LA, Med)
56%
97%
41%
78



Institutional diversity in student background
(pre-test)
Post-test performance very good for most
(over 90%)
A couple of exceptions
◦ Both first time instructors with curriculum who will
use it again this year




Example 1 (continued).
First quiz, 2.5 weeks into course;
Simulation for a single proportion
119 people played RPS, 11.8% picked scissors
Evidence that scissors are picked less than
1/3 of time in long run?

The following graph shows the 1000 different
“could have been” sample proportions
choosing scissors for samples of 119 people
assuming scissors is chosen 1/3 of the time
in the long run.

Would you consider the results of this study
to be convincing evidence that scissors are
chosen less often in the long run than
expected?
No, the p-value is going to be large
8%
No, the p-value is going to be small
2%
Yes, the p-value is going to be small
77%
Yes, the p-value is going to be large
9%
No, the distribution is centered at 1/3.
4%

Suppose the study had only involved 50
people but with the same sample proportion
picking scissors. How would the p-value
change?
It would not change, the sample proportion was
the same
22%
It would be smaller
11%
It would be larger
66%
Not enough information
1%
Single instructor (me), on 92 students, across 4 sections and 2 semesters


Example #2. Moving beyond a specific item to
sets of related items and retention
Tintle et al. 2012 (SERJ)+JSE
◦ Improvement in Data collection and Design, Tests
of significance, Probability (Simulation) on post-test
◦ Data collection and Design and Tests of significance
improvements were retained significantly better
than in consensus curriculum

Retention significantly better (p=0.02)
Retention of knowledge about tests of
significance (6 items from CAOS)
75
70
65
60
Randomization
55
Consensus
50
Pre-test
Post-test
4-Months Later

Example #3. How are weak students doing?
Performance on CAOS for lowest 1/3 of students
(2007 vs. 2009)
45
40
35
Consensus
Randomization
30
25
Pretest
Posttest

2012-2013
Group
Pre-test
Post-test
Change
Lowest
(n=210;
13 or less)
38%
55%
17%
Middle
(n=329;
14-17)
52%
60%
8%
Highest
(n=250;
18+)
66%
69%
3%
All changes are highly significant using paired t-tests (p<0.001)
**Among those who completed course; anecdotally we’re
seeing lower drop out rate now than with consensus curriculum

Example #4. Understand new data contexts?

Old AP Statistics question
10 randomly selected
laptop batteries; tested
and measured hours
they lasted

To investigate whether the shape of the sample data
distribution was simply due to chance or if it actually
provides evidence that the population distribution of
battery lifetimes is skewed to the right, the engineers at
the company decided to take 100 random samples of
lifetimes, each of size 10, sampled from a perfectly
symmetric normally, distributed population with a mean
of 2.6 hours and standard deviation of 0.29 hours. For
each of those 100 samples, the statistic sample mean
divided by the sample median was calculated. A dotplot
of the 100 simulated skewness ratios is shown below.

What is the explanation for why the engineers
carried out the process above?
This process allows them to determine the
percentage of the time the sample distribution
would be skewed to the right
3%
This process allows them to compare their
observed skewness ratio to what could have
happened by chance if the population distribution
was really symmetric/normally distributed.
64%
This process allows them to determine how many
times they need to replicate the experiment for
valid results
10%
This process allows them to compare their
observed skewness ratio to what could have
happened by chance if the population distribution
was really right skewed.
23%


Analysis of all (free-response) class tests is
ongoing
Integrate observed statistic and simulated
values to draw a conclusion?

Summary
◦ Preliminary and current versions showed improved
performance in understanding of tests of
significance, design and probability (simulation)
post-course, and improved retention in these areas
◦ These results appear stable across lowerperforming students with older and newer versions
of the curriculum
◦ Some evidence of student ability to apply the
framework of inference (3-S) to novel situations

Summary
◦ Some instructor differences, but also preliminary
validation of “transferability” of findings across
different institutions/instructors; new instructors?
◦ **Note: Some evidence of weaker performance in
descriptive stats in this earlier curriculum;
substantial changes to descriptive statistics
approach to combat this.



What’s making the change
◦ Content?
◦ Pedagogy?
◦ Repetition?
How much randomization before you see a
change?
Are there differences student performance
based on curricula? Are they important?





What are the developmental learning trajectories for inference
(Do they understand what we mean by ‘simulation’)? Other
topics?
Low performing students; promising---ACT, GPA
Does improved performance transfer across
institutions/instructors? What kind of instructor training/support
is needed to be successful?
Using CAOS (or adapted CAOS) questions, but do we still all
agree these are the “right” questions? Is knowing what a small pvalue means enough? What level of understanding are they
attaining?
Why do students in both curriculums tend to do poorly on
descriptive statistics questions? Or areas where we see little
difference in curricula?

Preliminary indications continue to be positive

You can cite similar or improved performance on nationally
standardized/accepted/normed tests for the approach

Tag line for peers and clients:

Still lots of room for better understanding and continued
improvement of approach

Student engagement (talk yesterday)

◦ We are improving some areas (the important ones?) and doing no harm
elsewhere
Next steps: Larger, more comprehensive assessment effort
coordinated between users of randomization-based curriculum
and those that don’t. If you are interested let me know.

Author team (Beth Chance, George Cobb,
Allan Rossman, Soma Roy, Todd Swanson and
Jill VanderStoep)

Class testers

NSF funding


Tintle NL, VanderStoep J, Holmes V-L, Quisenberry B
and Swanson T “Development and assessment of a
preliminary randomization-based introductory statistics
curriculum” Journal of Statistics Education 19(1), 2011
Tintle NL, Topliff K, VanderSteop J, Holmes V-L,
Swanson T “Retention of statistical concepts in a
preliminary randomization-based introductory statistics
curriculum” Statistics Education Research Journal, 2012.