#### Transcript A Data-Oriented Active Learning Post-Calculus

CAUSE Webinar: Introducing Math Majors to Statistics Allan Rossman and Beth Chance Cal Poly – San Luis Obispo April 8, 2008 Outline Goals Guiding principles Content of an example course Assessment Examples (four) April 8, 2008 CAUSE Webinar 2 Goals Redesign introductory statistics course for mathematically inclined students in order to: Provide balanced introduction to the practice of statistics at appropriate mathematical level Better alternative than “Stat 101” or “Math Stat” sequence for math majors’ first statistics course April 8, 2008 CAUSE Webinar 3 Guiding principles (Overview) 1. 2. 3. 4. 5. 6. 7. Put students in role of active investigator Motivate with real studies, genuine data Repeatedly experience entire statistical process from data collection to conclusion Emphasize connections among study design, inference technique, scope of conclusions Use variety of computational tools Investigate mathematical underpinnings Introduce probability “just in time” April 8, 2008 CAUSE Webinar 4 Principle 1: Active investigator Curricular materials consist of investigations that lead students to discover statistical concepts and methods Students learn through constructing own knowledge, developing own understanding Need direction, guidance to do that Students spend class time engaged with these materials, working collaboratively, with technology close at hand April 8, 2008 CAUSE Webinar 5 Principle 2: Real studies, genuine data Almost all investigations focus on a recent scientific study, existing data set, or student collected data Statistics as a science Frequent discussions of data collection issues and cautions Wide variety of contexts, research questions April 8, 2008 CAUSE Webinar 6 Real studies, genuine data Popcorn and lung cancer Historical smoking studies Night lights and myopia Effect of observer with vested interest Kissing the right way Do pets resemble their owners Who uses shared armrest Halloween treats Heart transplant mortality April 8, 2008 Lasting effects of sleep deprivation Sleep deprivation and car crashes Fan cost index Drive for show, putt for dough Spock legal trial Hiring discrimination Comparison shopping Computational linguistics CAUSE Webinar 7 Principle 3: Entire statistical process First two weeks: Data collection Descriptive analysis Segmented bar graph Conditional proportions, relative risk, odds ratio Inference Observation vs. experiment (Confounding, random assignment vs. random sampling, bias) Simulating randomization test for p-value, significance Hypergeometric distribution, Fisher’s exact test Repeat, repeat, repeat, … April 8, 2008 Random assignment dotplots/boxplots/means/medians randomization test Sampling bar graph binomial normal approximation CAUSE Webinar 8 Principle 4: Emphasize connections Emphasize connections among study design, inference technique, scope of conclusions Appropriate inference technique determined by randomness in data collection process Simulation of randomization test (e.g., hypergeometric) Repeated sampling from population (e.g., binomial) Appropriate scope of conclusion also determined by randomness in data collection process April 8, 2008 Causation Generalizability CAUSE Webinar 9 Principle 5: Variety of computational tools For analyzing data, exploring statistical concepts Assume that students have frequent access to computing Not necessarily every class meeting in computer lab Choose right tool for task at hand Analyzing data: statistics package (e.g., Minitab) Exploring concepts: Applets (interactivity, visualization) Immediate updating of calculations: spreadsheet (Excel) April 8, 2008 CAUSE Webinar 10 Principle 6: Mathematical underpinnings Primary distinction from “Stat 101” course Some use of calculus but not much Assume some mathematical sophistication E.g., function, summation, logarithm, optimization, proof Often occurs as follow-up homework exercises Examples Counting rules for probability Principle of least squares, derivatives to find minimum Hypergeometric, binomial distributions Univariate as well as bivariate setting Margin-of-error as function of sample size, population parameters, confidence level April 8, 2008 CAUSE Webinar 11 Principle 7: Probability “just in time” Whither probability? Not the primary goal Studied as needed to address statistical issues Often introduced through simulation Tactile and then computer-based Addressing “how often would this happen by chance?” Examples April 8, 2008 Hypergeometric distribution: Fisher’s exact test for 2×2 table Binomial distribution: Sampling from random process Continuous probability models as approximations CAUSE Webinar 12 Content of Example Course (ISCAM) Chapter 1 Data Collection Observation vs. experiment, confounding, randomization Descriptive Statistics Conditional proportions, segmented bar graphs, odds ratio Probability Chapter 2 Chapter 3 Chapter 4 Chapter 5 Random sampling, bias, precision, nonsampling errors Paired data Quantitative summaries, transformations, z-scores, resistance Bar graph Models, Probability plots, trimmed mean Counting, random variable, expected value empirical rule Bermoulli processes, rules for variances, expected value Normal, Central Limit Theorem Sampling/ Randomization Distribution Randomization distribution for Randomization distribution for Sampling distribution for X, p̂ Large sample sampling distributions for x , p̂ Sampling distributions of pˆ 1 pˆ 2 , OR, Model Hypergeometric Binomial Normal, t Normal, t, lognormal Statistical Inference p-value, significance, Fisher’s Exact Test Binomial tests and intervals, two-sided pvalues, type I/II errors z-procedures for proportions tprocedures, robustness, bootstrapping Two-sample zChi-square for and thomogeneity, procedures, independence, bootstrap, CI for ANOVA, CAUSE Webinar OR regression 13 April 8, 2008 pˆ 1 pˆ 2 x1 x2 p-value, significance, effect of variability Independent random samples Chapter 6 Bivariate Scatterplots, correlation, simple linear regression x1 x2 Chi-square statistic, F statistic, regression coefficients Chi-square, F, t Assessments Investigations with summaries of conclusions Worked out examples Practice problems Homework exercises Technology explorations (labs) Quick practice, opportunity for immediate feedback, adjustment to class discussion e.g., comparison of sampling variability with stratified sampling vs. simple random sampling Student projects Student-generated research questions, data collection plans, implementation, data analyses, report April 8, 2008 CAUSE Webinar 14 Example 1: Friendly Observers Psychology experiment Butler and Baumeister (1998) studied the effect of observer with vested interest on skilled performance A: vested interest B: no vested interest Total Beat threshold 3 8 11 Do not beat threshold 9 4 13 Total 12 12 24 pˆ A .250 pˆ B .667 How often would such an extreme experimental difference occur by April 8, 2008chance, if there was no vested interest effect? CAUSE Webinar 15 Example 1: Friendly Observers Students investigate this question through Hands-on simulation (playing cards) Computer simulation (Java applet) Mathematical model counting techniques 11 13 11 13 11 13 11 13 3 9 2 10 1 11 0 12 p value P ( X 3) .0498 24 12 April 8, 2008 CAUSE Webinar 16 Example 1: Friendly Observers Focus on statistical process Data collection, descriptive statistics, inferential analysis Connection between the randomization in the design and the inference procedure used Scope of conclusions depends on study design Arising from genuine research study Cause/effect inference is valid Use of simulation motivates the derivation of the mathematical probability model Investigate/answer real research questions in first two weeks April 8, 2008 CAUSE Webinar 17 Example 2: Sleep Deprivation Physiology Experiment Stickgold, James, and Hobson (2000) studied the long-term effects of sleep deprivation on a visual discrimination task (3 days later!) sleep condition deprived unrestricted n 11 10 Mean 3.90 19.82 StDev 12.17 14.73 Median 4.50 16.55 IQR 20.7 19.53 How often would such an extreme experimental difference occur by April 8, 2008chance, if there was no sleep deprivation effect? CAUSE Webinar 18 Example 2: Sleep Deprivation Students investigate this question through Hands-on simulation (index cards) Computer simulation (Minitab) Mathematical model p-value .002 April 8, 2008 p-value=.0072 CAUSE Webinar 15.92 19 Example 2: Sleep Deprivation Experience the entire statistical process again Tools change, but reasoning remains same Develop deeper understanding of key ideas (randomization, significance, p-value) Tools based on research study, question – not for their own sake Simulation as a problem solving tool Empirical vs. exact p-values April 8, 2008 CAUSE Webinar 20 Example 3: Infants’ Social Evaluation Sociology study Hamlin, Wynn, Bloom (2007) investigated whether infants would prefer a toy showing “helpful” behavior to a toy showing “hindering” behavior Infants were shown a video with these two kinds of toys, then asked to select one 14 of 16 10-month-olds selected helper Is this result surprising enough (under null model of no preference) to indicate a genuine preference for the helper toy? Example 3: Infants’ Social Evaluation Simulate with coin flipping Then simulate with applet Example 3: Infants’ Social Evaluation Then learn binomial distribution, calculate exact pvalue p value P( X 14) 16 14 16 15 16 16 2 1 0 .5 1 .5 .5 1 .5 .5 1 .5 14 15 16 .0021 Distribution Plot Binomial, n=16, p=0.5 0.20 0.15 Probability 0.10 0.05 0.00 0.00209 2 X = number who choose helper toy 14 Example 3: Infants’ Social Evaluation Learn probability distribution to answer inference question from research study Again the analysis is completed with Modeling process of statistical investigation Tactile simulation Technology simulation Mathematical model Examination of methodology, further questions in study Follow-ups Different number of successes Different sample size Example 4: Sleepless Drivers Sociology case-control study Connor et al (2002) investigated whether those in recent car accidents had been more sleep deprived than a control group of drivers April 8, 2008 No full night’s sleep in past week At least one full night’s sleep in past week Sample sizes “case” drivers (crash) 61 510 571 “control” drivers (no crash) 44 544 588 CAUSE Webinar 25 Example 4: Sleepless Drivers Sample proportion that were in a car crash Sleep deprived: .581 Not sleep deprived: .484 Odds ratio: 1.48 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% no crash crash No full night’s sleep in past week At least one full night’s sleep in past week How often would such an extreme observed odds ratio occur by April 8, 2008chance, if there was no sleep deprivation effect? CAUSE Webinar 26 Example 4: Sleepless Drivers Students investigate this question through Computer simulation (Minitab) Empirical sampling distribution of odds-ratio Empirical p-value Approximate mathematical model April 8, 2008 CAUSE Webinar 1.48 27 Example 4: Sleepless Drivers 1 1 1 1 a b c d SE(log-odds) = Confidence interval for population log odds: sample log-odds + z* SE(log-odds) Back-transformation 90% CI for odds ratio: 1.05 – 2.08 April 8, 2008 CAUSE Webinar 28 Example 4: Sleepless Drivers Students understand process through which they can investigate statistical ideas Students piece together powerful statistical tools learned throughout the course to derive new (to them) procedures Concepts, applications, methods, theory April 8, 2008 CAUSE Webinar 29 For more information Investigating Statistical Concepts, Applications, and Methods (ISCAM), Cengage Learning, www.cengage.com Instructor resources: www.rossmanchance.com/iscam/ Solutions to investigations, practice problems, homework exercises Instructor’s guide Sample syllabi Sample exams April 8, 2008 CAUSE Webinar 30