Transcript PowerPoint
What Every Math Professor
Needs to Know About Statistics
Rick Cleary
Babson College
For AMATYC Denver, 11/19/16
Thanks!
• To AMATYC and President Jane
Tanner for the invitation!
Thanks!
• To AMATYC and President Jane
Tanner for the invitation!
• To all of the excellent teachers who
helped me think about these topics.
Thanks!
• To AMATYC and President Jane
Tanner for the invitation!
• To all of the excellent teachers who
helped me think about these topics.
• To all of you for the really important
work you do. I am glad a got to
experience it!
Three friends of mine…
• Prof. Ken Mann… Ken's Website
• Prof. Jim Hobert … Jim's Website
• Prof. Marty Wells … Marty's website
Three friends of mine…
• Prof. Ken Mann… Ken's Website
-Mohawk Valley Community College
• Prof. Jim Hobert … Jim's Website
-SUNY Ulster
• Prof. Marty Wells … Marty's website
-Shasta College
A bit about Babson…
• Located in Wellesley, Massachusetts
• 2200 undergraduates, all business
majors with different concentrations.
A bit about Babson…
• Located in Wellesley, Massachusetts
• 2200 undergraduates, all business
majors with different concentrations.
• All students take at least two semesters
of a quant methods sequence with
statistics, a little calculus, math of
finance and operations research.
A bit about Babson…
• Located in Wellesley, Massachusetts
• 2200 undergraduates, all business
majors with different concentrations.
• All students take at least two semesters
of a quant methods sequence with
statistics, a little calculus, math of
finance and operations research.
• Quite a few two year college transfers!
A fun puzzle!
• Find the next number in each sequence
A.) 91, 72, 45, 9, 81 …
B.) 13, 19, 60, 51, 9 …
(This is the one previewed for the
conference.)
Sequence A
• 91, 72, 45, 9, 81 …
• The next number is …
Sequence A
• The next number is …
63
Sequence A
• The next number is …
63
-Sequence was 91, 72, 45, 9, 81 …
Consider each as a two digit number.
Reverse the digits then take absolute
value of the difference.
Sequence B
• 13, 19, 60, 51, 9, …
• The next number is …
Last call!
B.) 13, 19, 60, 51, 9, …
Last call!
B.) 13, 19, 60, 51, 9, …
But before we answer, let’s list some
numbers that we’re pretty sure are NOT
next!
Last call!
B.) 13, 19, 60, 51, 9, …
How about π ? Probably not?
Last call!
B.) 13, 19, 60, 51, 9, …
How about π ? Probably not?
1,346,700?
Last call!
B.) 13, 19, 60, 51, 9, …
How about π ? Probably not?
1,346,700?
- 215?
What’s NOT next?
• Thinking about what the answer is NOT
gives us quite a bit or information ...
What’s NOT next?
• Thinking about what the answer is NOT
gives us quite a bit or information ...
• OK … on to the solution:
Solution
• I don’t know!
Solution
• I don’t know!
…B is a list of five numbers randomly
generated using Minitab on a uniform
distribution on {1, 2, …, 100}.
LESSON NUMBER 1
• Some things are inherently random.
Somebody might correctly guess the
next number in the sequence, but they
would have to be lucky.
LESSON NUMBER 1
• Some things are inherently random.
Somebody might correctly guess the
next number in the sequence, but they
would have to be lucky.
• Or to put it another way:
LESSON NUMBER 1
• Some things are inherently random.
Somebody might correctly guess the next
number in the sequence, but they would
have to be lucky.
• Or to put it another way:
The world is more random than
most people think!
Other examples:
• Stars in the night sky… pretty random
but people attached stories to them!
Other examples:
• Stars in the night sky… pretty random
but people attached stories to them!
• Stock market results … analysts on the
news offer reasons every day for an
inherently random process!
Another fun experiment
• Step 1: Draw a square
Another fun experiment
• Step 1: Draw a square
• Step 2: Place five points (x’s) at
random in the square
Another fun experiment
• Step 1: Draw a square
• Step 2: Place five points (x’s) at
random in the square
• Step 3: Draw nine equal sub-squares in
your original, i.e. a tic-tac-toe board in
your original square.
A guess…
• Your five points are in five different subsquares, aren’t they? (And if yours
aren’t, your students will be!)
A guess…
• Your five points are in five different subsquares, aren’t they? (And if yours
aren’t, your students will be!)
• THAT’S NOT RANDOM!!! That’s a
pattern. Real randomness is ‘clumpier’
than people expect.
A quick representation of Lesson
#1
• MATH:
y = f(x)
A quick representation of
Lesson #1
• MATH:
• STAT:
y = f(x)
y = f(x) + error
Lesson #2 - Models
• Quick quiz …
1.) True/False: Mathematics is a useful
tool for modeling real world problems.
Lesson #2 - Models
• Quick quiz …
1.) True/False: Mathematics is a useful
tool for modeling real world problems.
2.) Name one of the problems.
Lesson #2 - Models
Let’s see what the crowd likes for math
models:
What about stat models?
They are everywhere … like in any
comparative sentence!
What about stat models?
They are everywhere … like in any
comparative sentence!
“This is a pretty good sandwich for a fast
food place!”
Lesson #2 in language…
“This is a pretty good sandwich for a fast
food place!”
Quality = f(venue) + error
Lesson #2 in language…
“This is a pretty good sandwich for a fast
food place!”
Quality = f(venue) + error
“Pedroia has a lot of power for such a
small guy.”
Lesson #2 in language…
“This is a pretty good sandwich for a fast
food place!”
Quality = f(venue) + error
“Pedroia has a lot of power for such a
small guy.”
Power = f(size) + error
Power = f(size) + error
• A mathematician modeling this
relationship might develop an equation
or system that has inputs like swing
plane, bat speed, pitch speed and
location and an output of a trajectory…
Power = f(size) + error
• A mathematician modeling this relationship
might develop an equation or system that
has inputs with details swing plane, bat
speed, pitch speed and location and an
output of a trajectory…
• A statistician would be interested in the
strength of the association between the
inputs and the outputs. How much of
power can be explained by size?
To Recap:
• Lesson 1: The world is more random
than people think.
• Lesson 2: Statements that can be
interpreted as statistical models are
common.
From models to decisions:
• Statistical models are easy to find, but
studying them and applying them takes
a careful approach. This is where
statistical expertise comes in!
Models to decisions:
• Statistical models are easy to find, but
studying them and applying them takes
a careful approach. This is where
statistical expertise comes in!
-What model?
-What population?
Models to decisions:
• Statistical models are easy to find, but
studying them and applying them takes
a careful approach. This is where
statistical expertise comes in!
-What model?
-What population?
-What sample?
-What measures?
Models to decisions:
• Statistical models are easy to find, but
studying them and applying them takes
a careful approach. This is where
statistical expertise comes in!
-What model?
-What population?
-What sample?
-What measures?
-Outliers?
-Presentation?
A timely aside…
• Statisticians reputations took a hit with
the recent election. But a sophisticated
look at the question paints a much less
negative picture.
A timely aside…
• Statisticians reputations took a hit with
the recent election. But a sophisticated
look at the question paints a much less
negative picture.
• Election eve www.fivethirtyeight.com
had the probability of Trump winning the
election at about 29%. Events like that
happen all the time!
A timely aside:
• By and large the election results
nationally were well within the margins
of errors of the polls.
A timely aside:
• By and large the election results
nationally were well within the margins
of errors of the polls.
• Media outlets and their consumers want
clear cut results like “Clinton will win”
when a statement like “Clinton appears
likely to win, but Trump has a
reasonable chance” tells the story!
A timely aside:
• That said, the fact that most polling data
had a small but consistent bias is
something that statisticians within those
polling firms will need to address.
A timely aside:
• That said, the fact that most polling data
had a small but consistent bias is
something that statisticians within those
polling firms will need to address.
• Opinion: The most likely causes are
non-response, not false response; and
incorrectly predicting likelihood of voting
Probability and Statistics
• Many math departments have a course
with this title.
Probability and Statistics
• Many math departments have a course
with this title.
Inside joke: Why are most Prob and Stat
courses like BWI, the BaltimoreWashington International Airport?
Probability and Statistics
• Many math departments have a course
with this title.
Inside joke: Why are most Prob and Stat
courses like BWI, the BaltimoreWashington International Airport?
Answer: BWI is a lot closer to Baltimore.
Lesson 3…
• Statisticians teaching probability should
try to back up the theoretical probability
models with real data and statistical
thinking.
Speaking of sports…
Sports media and sports fans love streaks
…
Famous examples?
Speaking of sports…
Sports media and sports fans love streaks
…
Famous examples?
HOWEVER, sports media and sports fans
tend to underestimate the probability of
streaks by taking them out of context.
More on streaks…
• Google the phrase “Sports Probability
Streaks” and the resulting link is:
http://www.mathaware.org/mam/2010/ess
ays/ClearyStreaks.pdf
Streak Example:
• At a 2007 game, the Boston Red Sox hit
four home runs in a row.
Streak Example:
• At a 2007 game, the Boston Red Sox hit
four home runs in a row.
• The next day, the Boston Globe quoted
a local math professor that the chance
of this was about one in two million.
Streak Example:
• At a 2007 game, the Boston Red Sox hit
four home runs in a row.
• The next day, the Boston Globe quoted
a local math professor that the chance
of this was about one in two million.
• Done by taking MLB home run rate
(.027) to the fourth power.
Surprising streaks
• Seems reasonable, right?
Surprising streaks
• Seems reasonable, right?
• EXCEPT this was the fourth time it had
happened in about 140,000 MLB
games.
Surprising streaks
• Seems reasonable, right?
• EXCEPT this was the fourth time it had
happened in about 140,000 MLB
games.
• So how was the expert opinion off by
two orders of magnitude???
Two reasons…
• Small but significant reason… the four
players (Manny Ramirez, Mike Lowell,
JD Drew, Jason Varitek) all had much
higher than average home run rates.
Two reasons…
• Small but significant reason… the four
players (Manny Ramirez, Mike Lowell,
JD Drew, Jason Varitek) all had much
higher than average home run rates.
• Big reason: Context! Four home runs
in a row during the game vs. four home
runs in a row RIGHT NOW!
Context explained…
• About 80 hitters come up during a major
league baseball game so the Red Sox
and opponent (Yankees) had MANY
CHANCES to start such a streak!
Context explained…
• About 80 hitters come up during a major
league baseball game so the Red Sox
and opponent (Yankees) had MANY
CHANCES to start such a streak!
• This is why streak probabilities are
underestimated; people forget there are
many chances to start a streak.
Classroom fun…
1.) Have some students actually flip a
coin about 80 times, tell others to just
simulate by writing H’s/T’s. The ones with
streaks of five or more H’s or T’s really did
the flips. (You can look like a genius!)
Classroom fun…
1.) Have some students actually flip a
coin about 80 times…The ones with
streaks of five or more H’s or T’s really did
the flips. (You can look like a genius! …
OR not!)
Classroom fun…
2.) Sports interested students: Find some
NBA teams with records of about .500 at
www.basketball-reference.com
See if they had a winning streak or losing
streak of at least five in a row. (About
85% chance that yes they did…)
Classroom fun…
2.) Sports interested students: Find some
NBA teams with records of about .500 at
www.basketball-reference.com
See if they had a winning streak or losing
streak of at least five in a row. (About
85% chance that yes they did…)
Does ANY fan, writer, blogger think a five
game winning streak is random?
Lesson 3 Lesson 1
• Recapping again:
1.) The world is more random than you
think.
2.) Stat models are everywhere and take
that randomness into account.
3.) Check claims with real data … and
back to 1!
What About the Other Way?
• What can statisticians learn from
mathematicians?
What About the Other Way?
• What can statisticians learn from
mathematicians?
• Sometimes statisticians need to
evaluate integrals in high dimensional
spaces.
What About the Other Way?
• What can statisticians learn from
mathematicians?
• Sometimes statisticians need to
evaluate integrals in high dimensional
spaces.
• Examples: Variance calculations, joint
probabilities, Bayesian analysis…
Stat learns from Math…
• These high dimensional integrals are
sometimes easiest to evaluate in a
particular order ...
Stat learns from Math…
• These high dimensional integrals are
sometimes easiest to evaluate in a
particular order ...
• A key question: WHEN CAN WE
CHANGE THE ORDER OF
INTEGRATION?
Math to The Rescue!
Fubini’s Theorem! Suppose A and B are
complete measure spaces. Suppose f(x,y)
is A × B measurable. If
where the integral is taken with respect to a
product measure on the space over A × B,
then we can change the order of
integration… i.e.
Fubini continued…
A slight modification to Fubini
• A statistician has a slightly different
answer to the question, “When can we
change the order of integration?”
A slight modification to Fubini
• A statistician has a slightly different
answer to the question, “When can we
change the order of integration?”
Whenever we want!
Neat Streak #2…
Barton College basketball…
1/21/13 … Barton 76, Pfeiffer 68
Neat Streak #2…
• Barton College basketball…
1/21/13 … Barton 76, Pfeiffer 68
1/24/13 … Barton 76, Queens 68
Neat Streak #2…
• Barton College basketball…
1/21/13 … Barton 76, Pfeiffer 68
1/24/13 … Barton 76, Queens 68
1/26/13 … Barton 76, Erskine 68
Neat Streak #2…
• Barton College basketball…
1/21/13 … Barton 76, Pfeiffer 68
1/24/13 … Barton 76, Queens 68
1/26/13 … Barton 76, Erskine 68
What’s the next score in THIS
sequence??
Neat Streak #2…
• Barton College basketball…
1/21/13 … Barton 76, Pfeiffer 68
1/24/13 … Barton 76, Queens 68
1/26/13 … Barton 76, Erskine 68
What’s the next score in THIS
sequence??
(Alas, they lost to Mount Olive, 80-75.)
What are the chances?
• Of a team winning three straight games
by the same score?
What are the chances?
• Of a team winning three straight games
by the same score?
• -For YOUR favorite or hometown team,
this season, the probability is very
small!
What are the chances?
• Of a team winning three straight games
by the same score?
• -For YOUR favorite or hometown team,
this season, the probability is very
small!
• -For SOME team in somebody’s town
during the next 20 years? There’s a
pretty good chance it will happen again!
Neat Streak #3…
• The Chicago Blackhawks opened this
year’s National Hockey League season
with 24 straight games without a
regulation time loss.
Neat Streak #3…
• The Chicago Blackhawks opened this
year’s National Hockey League season
with 24 straight games without a regulation
time loss.
• Again, what are the chances? And again,
the media was interested:
http://www.usatoday.com/story/sports/nhl/bla
ckhawks/2013/03/04/blackhawks-streakodds/1963659/
Getting quoted is (usually)
fun…
• Later in the year, this appeared in
USAToday;
• http://www.usatoday.com/story/sports/2
013/05/23/home-iceadvantage/2354541/
The lesson here is…
A business implication: Mathematicians
rarely get consulting jobs solving
problems from Calculus I. But
statisticians routinely get paid (and
praised) for solving problems from Stat
101.
• THANKS for your attention and for all
the good work you do!