Statistics – Making Sense of Data

Download Report

Transcript Statistics – Making Sense of Data

Statistics –
Making Sense of Data
History of Math – Fall 2006
Fred Stenger
Larry L. Harman
Where did the term statistics
come from?
Derived in the 18th century from the Latin
term statsticum collegium (“council of
state”) because statistics represented the
scientific study of state affairs.
 The state affairs included herd sizes, grain
supplies, army strength, etc. – information
the government would use to predict and
prepare for military action, famine, plague,
etc.
 Some scholars say these needs created
the invention of numbers themselves.

Early Developments
John Graunt – London 1662 – Bills of
Mortality from 1604 to 1661.
 Observed that more male births than
females and women live longer.
 Annual death rate is constant barring
epidemic circumstances – data tracking.
 Edmund Haley – 1693 – (Haley’s comet)
Famous Astronomer. Began actuarial
tables (science) and beginnings of data
tracking for insurance companies.

Statistics vs. Probability

In the 1700s - Probability and statistics developed
together as two related fields of the mathematics of
uncertainty.
 Probability explores what can be said about an
unknown sample of a known collection. For example:
knowing all possible combinations of a pair of dice,
what is the likelihood of rolling a seven.
 Statistics explores what can be said about an
unknown collection by investigating a small sample.
For example: knowing the life span of 100 Americans,
we can estimate how long Americans are likely to live.
 Astronomy drove some of the later developments
when astronomers used mean and average to predict
minute changes in plants and star positioning –
looking for a way to handle outliers and unexplained
differences.
More Developments






Jakob Bernoulli (1654-1705) “Ars conjectandi” (the art of
conjecture)
“Law of Large Numbers” - the larger the sample size the
better representation of actual data or circumstances. This
seems intuitive to us now, but back then the required
sample size was unknown.
Abraham De Moivre (1733) established the famous binomial
distribution curve (probabilities of 0.5). What we now call
the normal curve.
De Moivre used this idea (later rediscovered by Gauss and
Laplace) to improve on Bernoulli’s estimates.
His ultimate goal was to use probability and statistics
toward society’s practical questions.
The process materialized with Pierre Simon Laplace’s
publication in 1812 called “Analytical Theory of
Probabilities”.
More Developments...cont.

Adrien Marie Legendre (1752 - 1833)
(published in 1805 the “la methode de
moindres quarres” or the method least
squares. His method rivaled that of
Laplace in that he used the errors to help
with overall predictions. “By this method,
a kind of equilibrium is established among
the errors which, since it prevents the
extremes from dominating, is appropriate
for revealing the state of the system which
most nearly approaches the truth.”
Statistics in Social Sciences

Statistics then began inroads into the
social sciences (kind of where it started)
in 1835 when Lambert Quetelet of Belgium
published a book called “Social Physics”.
In this book he attempted to apply the
laws of probability to the study of human
characteristics.
 Unlike other social sciences, psychology
embraced this method of statistical
analysis.
Statistics Emerges
With many advances in the 19th century, statistics
emerged from the shadow of probability to become a
mathematical discipline in its own right. The advances
focused around data collection and processing - the
major contributors were:
 Sir Francis Galton (1860s) a first cousin of Charles
Darwin – used statistics to help improve the human
race by selective breeding (eugenics movement). The
two methods of data analysis he is credited with are:
regression and correlation. He used these methods to
predict hereditary traits in humans.
 Karl Pearson and his student Undy refined Galton’s
work into an effective methodology of regression
analysis using a subtle variant of Legendre’s method
of least squares. This paved the way for widespread
use of statistics throughout the biological and social
sciences.

Modern Developments



Some modern advances came from William S. Gosset
working as a statistician for the Guinness Brewery where
he discussed sample size and deriving reliable data from
small samples.
Ronald A. Fisher (1890 – 1962) widely considered the most
important statistician of the early 20th century wrote books
called: “Statistical Methods for Research Workers” and
“The Design of Experiments”.
With computers and significantly larger data sets
statisticians can provide more accurate predictions. John
Tukey of Bell Labs and Princeton University invented
(1960s) what he called “Exploratory Data Analysis” – a
collection of methods for dealing with today’s large data
sets. He also coined the words “software” and “bit”.
Where are statistics now?


Stephen Stigler paraphrases the ascent of
statistics in modern society – “modern
statistics…. is a logic methodology for the
measurement of uncertainty and for an
examination of the consequences of that
uncertainty in the planning and interpretation of
experimentation and observation”.
Their work illustrated the evolution of statistics
and data analysis from social sciences (life
expectancy, actuarial tables, politics, etc) to
company related tools like quality assurance,
design of experiments, and product failure charts.
Penny Flip











Probability that a head first arrives on an odd
toss.
Flip Penny
Head- First throw is odd
Tail - Flip again
Second flip
Head- Throw is even
Tail – Flip again
Third Try
Head-Throw is odd
Tail Flip again. Etc, Etc.
Do 5 times. Keep track or results
Penny Flip …. cont.
Sum of Probabilities (Success on
Odd Throw) = = .67
 Sum of Probabilities (Success on
Even Throw) =
= .33
 Check Statistically
 1- Probability Z-Test

Design of Experiments





Widely used in industry by medical
manufacturers, software developers, food
manufacturers, electronics manufacturers, etc. –
6 sigma, IPC, ISO, etc.
Not widely used in education, but could it be?
Planning and set-up is the most important part of
DOE (determining independent variables,
dependent variables, control variables, random
control variables, etc.).
Drag car racing DOE.
Math teaching DOE exercise.
Timeline














1662 – John Graunt published a pamphlet entitled Natural and Political
Observations Made upon the Bills of Mortality.
Around 1640s – John Graunt and William Petty founded the field of “Political
Arithmetic”.
1693 – Edmund Haley founded actuarial science.
1713 - Jakob Bernoulli “Ars conjectandi” (the art of conjecture)
“Law of Large Numbers”.
1733 - Abraham De Moivre established the famous binomial distribution curve.
1805 - Adrien Marie Legendre (published the “la methode de moindres quarres”
or the method least squares.
1812 - Pierre Simon Laplace’s publication called “Analytical Theory of
Probabilities” is released.
1835 - Lambert Quetelet published a book called “Social Physics”.
1860s - Sir Francis Galton discovers regression and correlation.
1890s - Karl Pearson and G. Undy refined Galton’s work into an effective
methodology of regression analysis.
Early 1900s - William S. Gosset discussed sample size and deriving reliable
data from small samples.
1925 - R.A. Fisher wrote books called: “Statistical Methods for Research
Workers” and “The Design of Experiments”.
Mid 1960s - John Tukey invented what he called “Exploratory Data Analysis”.
References





Berlinghoff, William P. and Gouvea, Fernando Q.
Math Through the Ages – A Gentle History for
Teachers and Others Oxton House Publishers;
copyright 2002.
Katz, Victor J. A History of Mathematics Pearson
Education; copyright 2004.
http://en.wikipedia.org/wiki/statistics.
http://cm.bell-labs.com/cm/ms/
departments/sia/tukey/index.html.
www.statease.com/pubs/dragracing.pdf