Adolphe Quetelet: Statistics and Social Science in the

Download Report

Transcript Adolphe Quetelet: Statistics and Social Science in the

“Adolphe Quetelet:
Statistics and Social Science in the Early
19th Century”
Evan Brott
February 3, 2003
Quetelet: 1796-1874
• Today, Quetelet is nearly unknown
• But, he made major contributions to statistics
• Also one of his era’s greatest social scientists
Main Works
• 1835- Publishes Physique Sociale: A Treatise on
Man, and the Development of His Faculties which
introduces the concept of the ‘Average Man,’
a basic concept in the
Social Sciences.
(That’s him on the right)
• 1846 – Is the first to fit a normal curve to a
distribution of human traits
Outline
Science in the 1830’s
The Early Life of Quetelet
The Average Man: a Study of Mortality
Comparisons of Average Men:
a Look at European Sex Ratios
5) Statistical Morality and Early ANOVA:
Crime and Punishment in 1820’s France
6) Fitting a Normal Curve:
the Chest Size of a Scotsman
7) Quetelet’s Legacy
1)
2)
3)
4)
Part I: Science in the 1830’s
or: They Thought WHAT!?
State of the Arts
• Quetelet’s research was from 1820-1850.
• MANY theories we take for granted were
not yet developed.
Biology
• 1859 – Darwin publishes The Origin of Species
• 1860s – Pasteur develops Germ Theory of
Disease
• 1865 – Mendel discovers basics of Genetics
Quetelet’s Environment
• Spontaneous Generation not disproved
• Quetelet believes Miasmic Theory of
Disease
• Many results seemed strange without
understanding heredity
Social Science
• Quetelet one of the
first mathematical
social scientists
• 1830’s beliefs seem
very strange today
• Ex: Phrenology:
Personality read by the
shape of the skull
Early Statistical History
• Beginnings in 17th century
• Studied Laws of Probability through Gambling
More Proto-Statistics
• 1680s: Newton and Leibniz independently
develop Theory of Calculus
• 1689: Bernoulli first states the
Law of Large Numbers
Normal Distribution
• 1733: De Moivre finds Normal Distribution
arises as a limit of the Binomial
• 1778-1812: Laplace develops the
Central Limit Theorem
• 1809: Gauss finds that most random errors
are distributed normally
Future Statistical Knowledge
• 1890s: Pearson develops his correlation
coefficient
• 1904: Gosset (a.k.a. ‘Student’) develops the
t-distribution
• 1920s: Fischer’s work starts the modern era
of statistics
Part II: The Early Life of
Quetelet
or: how to build an observatory
without really trying
Origins
• Born on 2/22/1796
in Ghent, Belgium
• Doctorate in conic
sections from
University of
Ghent in 1819
Astronomy
• Initial post-doctoral work in astronomy
under Arago and Bouvard
• Famous story about founding Belgium’s
first observatory: traveled to France at age
26, and got funding despite having NO
experience at all.
Astronomical Statistics
• Galileo first showed astronomical
measurement errors were:
- random
- symmetric
- small errors occur more often than large
errors.
Hypothesized Error Distributions
• Thomas Simpson
(1756)
• Daniel Bernoulli
(1777)
• Karl Freidrich Gauss
(1809)
More Statistical Exposure
• Met the 75-year old Laplace while getting
funding for his observatory
• Post-doctoral mathematical work with
Fourier
The Census
• 1826: began work with the Belgian Department
of the Census- was in charge by 1829.
• All censuses at that time were total population
counts; Laplace thought of a simpler method
• Count the number of births in several regions;
then multiply by ratio of births/population
Quetelet’s Plan
• Quetelet was interested in Laplace’s method
• Received a letter from Baron de Keverberg
• Letter said far too many variables in social
science for random sampling
• Quetelet was convinced- conducted full
census anyway
PART III:
THE
AVERAGE
MAN
Physique Sociale
• Newton’s mechanical physics was highly
esteemed in Quetelet’s time
• Quetelet envisioned a similar Social Physics
• Central to this was the idea of
The Average Man – which was likened to a
social ‘center of gravity’
What is the Average Man?
• It’s exactly what you think it is
• Consider human size:
Small
AVERAGE
Large
Influential
• Quetelet was obviously not the first to think
of this sort of thing
• He popularized it, and as we will see carried
the concept much further though
• It is a VERY common concept today
Nutritional Example
“The average man needs 250g of
carbohydrates each day”
Common Example
• “The Average Family has 2.4 Children”
(Here, we see the Average man doesn’t necessarily exist)
Political Example
- “The Average American will save $278
dollars with my tax plan”
- “But 50% goes to the top 1% of Americans”
- “The bottom 20% pays no taxes”
- “The top 1% makes over $300,000 already”
- And so on . . .
Silly Example
• “The Average Man has less than 2 legs”
(Out of the worlds 6 billion people at least
10,000 have only 1 leg . . .)
What Quetelet Thought
“If an individual at any given epoch of
society possessed all the qualities of the
average man, he would represent all that is
great, good, or beautiful.”
Cournot’s Critique
• “A totally average man, if forced to exist,
would be an unviable monstrosity: just as
the averages of several different right
triangles will not be a right triangle.”
Quetelet’s First Example
• The beginnings of Survival Analysis came
from Mortality Tables
• These listed the expected times of death
• In short, the Age of the Average Man
Quetelet’s Work
• Mortality- P(dying this year)*10,000
• Viability- 1/P(dying this year)
Part IV: Many Average Men
Or:Where Male Babies Come From
Categories
• Quetelet did not only envision the Average
Man as a ‘global average’
• Rather, there was:
An Average Man – and Woman – for every
“race, location, age, and epoch – and all
combinations of these”
• Allowed between group comparisons
Categories
• This was also understood before his time
• The mortality tables were divided by gender,
location, and occupation
• Still, Quetelet popularized and greatly
refined the notion
The Sex Ratio
• It is a biological fact that 1.06 male babies
are born for every female baby.
• Known as early as the 17th Century
• Why?
1.06 : 1.00
Current Thought
• Evolutionary: men are more expendable
• Sources of variation:
- Prenatal diseases disproportionately effect boys
- First birth, younger women have more boys
- Effects of family planning
• Quetelet noticed most of these!
The Mind of God
• 1710: John Aurbuthnot believes probability
evidences the Divine Mind:
• Sees sex ratio as evidence – more men die
in war, but still enough left to evenly match
with women
• One of the first applications of probability
outside of pure math / gaming
Quetelet: by Country
• Shows global average; evidence of variation
Sources of Variation
• Tried to explain why different countries had
different ratios
• Decided on racial differences (e.g. Russians
naturally have more boys than Swedes)
• Showed many other possible causes
South Africa
• Climate, Race, Lifestyle, Small Samples
Legitimacy
• The following page shows a table of births
by marital status
• Quetelet never said WHY this effect was
there – surely he didn’t think church
sanction ‘blessed’ the couple with more
boys?
• Proxy for age? Or social status?
Legitimacy
Age
• Quetelet presented other theories, this one
from Hofacker:
• Overstates effect
Other Theories
• Dismisses Bicke’s family planning theory
• Shows first marriages (not births) lead to
more boys
• Town vs. Country also considered
• Decides on Race
Still Births
• Several Chapters later, demonstrates that
Stillbirths are predominately male
• Does not realize that differing levels of
healthcare can exaggerate this effectaccounting for variation
Part V: Analysis of Crime
or: “If you must murder, try to be a welleducated woman over 30”
Victorian STAT 410
• Ordinary Least Squares had been known for
centuries
• ‘Regression’ would not be called such until
Galton in the 1870’s
• Hypothesis Testing, ANOVA still in
extremely vague state
Criminology
• Data collected from the French Courts of
Assize from 1825-1830
• Avg. Probability of Conviction: 0.614
Question
• P(conviction) = 0.614 for THE average man.
• Is this probability different for different groups
of people (different “ average ‘men’ ”)?
Answer: YES!
New Question: How can we
Explain this Variation?
• From the table, it appears that gender, age,
type of crime, appearance at trial, and
educational status are important.
• How can we tell which of these are
significantly different from 0.614?
• Which of these variations are more
significant than the yearly variation?
• Can we make multiple comparisons?
Quetelet’s Paradigm
• 3 sources of variation
- Constant
(e.g. women always have a lower rate)
- Variable
(e.g. conviction rate decreases w/ time)
- Accidental
(e.g. a change in alcohol policy at the
university causes more arrests, but not
convictions, in 1828.)
Analysis of Variation
Relative Degree of Influence
• Calculated as
| P(conviction | status )  0.614 |
0.614
• For instance- for crimes against property we
get |0.655-0.614|/0.614 = 0.067
• Thus, property crimes are ‘average crimes’
How to Assess Variability
without knowledge of 2
• Quetelet used (xmax-xavg)/xavg and
(xavg-xmin)/xavg
to give limits on variability.
• Hence for superior education we get a range
of (0.40-0.35)/0.40 = 0.125 and
(0.48-0.40)/0.40 = 0.200
What does all this mean?
• Higher ‘relative degree of influence’ means
the cause is more likely to be constant,
i.e. P(conviction|status)  P(conviction)
• If ‘variability’ is less than R.D.I., then
variation by year (variable cause) is less
important than the constant cause
Example: No Shows
•
•
•
•
Average Conviction Rate = 0.960
Relative Degree of Influence = 0.563
Lower Variability = 0.031
Upper Variability = 0.010
• High RDI -> significant
• Small variability
-> same across years
Comparisons
• Can we compare groups’ conviction rates?
• No, not really. We have a very poor grasp
on variability, and cannot conduct
hypothesis testing.
• Nevertheless, Quetelet states that the best
position to be in was “a well-educated
female over thirty, appearing voluntarily to
answer a crime against persons.”
Primitive ANOVA
• Can we decide which causes are more
variable or influential?
• Well, sort of. Quetelet has the basic
framework of ANOVA set up
• Lacks consistency and optimality properties;
ANOVA will be refined by Fischer in early
20th century
Multiple Comparisons
• Many data groupings highly dependent (e.g.
gender and higher education in the 1820’s)
• Basic, modern ANOVA would fail in these
circumstances too!
• So the ‘well-educated, voluntarily appearing
woman over 30’ comment is not valid
Poisson
• Quetelet’s most famous contemporary (by
today’s standards, anyway) was Poisson.
• Poisson also analyzed this same dataset
• Summary:
- Using corrected data for 1825, refutes
Quetelet’s claim of decreasing rates
- Modeling jury selections as a binomial
random variable, gets a rate distribution
- Comes up with pseudo-Bayesian
probabilities on conviction.
Part VI: Fitting a Normal Curve
or: Statistics and the 48-inch chest
What is Normally Distributed?
• Laplace’s CLT (1778-1812) showed that the
Normal is the limit of many distributions
• Gauss (1809) shows it is a very common
error distribution
• Quetelet is the first to show human
physiology can be normally distributed
• Thinks ALL natural variables are normal
Scottish Army Uniforms in 1819
• Data on the following page collected by
Scottish army
• Needed to fit shirts to soldiers – so tried to
estimate soldier’s shirt sizes
Average Soldier?
• Can’t just clothe the ‘Average Soldier’ –
gotta clothe ‘em all.
• Possibility – Average solider of each height
1846
• Instead, decides to fit a normal curve to his
data.
• Did not have a normal table – used a
binomial with n=999 (1,000 outcomes)
• Created a table by realizing
yn+1 = yn * (999-n)/(n+1)
for the binomial
Odd fit
• 1) Split data at median
• 2) Find upper/lower cumulative frequencies
• 3) Transform to rank scale through inverse
binomial
• 4) ‘Match ranks to transformed ranks
through trial and error’ (???)
• 5) Transform fitted ranks through inverse
normal.
Influence
• This gave Quetelet mathematical
justification for the average man
• He asks: can we tell the difference between
these measurements, and very inaccurate
measurements on a single soldier?
• Normal can only arise through
Accidental causes:
All is NORMAL!
Part VII: Quetelet’s Fallout
or: The Good, the Bad, and the Statistical
Francis Galton (1822-1911)
• Primary work in the 1870s
• Discovered Genetics independently of
Mendel
• Coined the phrase ‘regression to the mean’
• Developed several intelligence tests
• Mentor to Karl Pearson; Cousin to Darwin
• Found direct precursor to Pearson’s r2
• Often considered the father of
social science
• Often mistakenly credited for
Quetelet’s work on the Normal
Theory of Heredity
• Firmly believed that performance was based
solely on genetics
• Severely discounted education/life
experience
• Concerned with intelligence, strength and
beauty- thought all were dependent
on each other
Fallacy
• Armed with:
- his belief in heredity
- Darwin’s theory of evolution
- Quetelet’s many Average Men
• Reached startling conclusion:
groups of people can be mathematically
shown to be inferior to others!
Eugenics
• Therefore, we must ‘improve the human stock’
• Galton’s methods:
- encourage matings between desirable people
- forced sterilization of the truly unfit
(criminals, the insane, etc.)
• Science largely accepted in late 19th century
England
• Pearson was Chair of Eugenics at Oxford!
Theory to Practice
• Most infamously adopted in Germany,
1930-1945
• Justified concept of ‘Aryan Master Race’
• Sterilization upgraded to genocide
• Obviously, today Eugenics is widely
condemned
• Galton’s Eugenics merely ‘bad’, not
‘monstrous’
Quetelet’s Fault?
• Made few value judgments in comparison
(e.g. only found one highly qualified
mention of racial intelligence)
• Considered the Average Man to be
‘beautiful,’ not ‘mediocre’
• Advocated social reform (education,
increased government spending) – not the
gradual breeding out of the inferiors
• That is: NO!!!
Florence Nightingale (1820-1910)
• Studied statistics extensively under her
friends Quetelet and William Farr
• Strong believer that statistics was evidence
of the Divine Mind: Statistics
was her religion
• Worked extensively in wartime
hospitals, saving many lives
• Used statistics to do so!
Hospital Sanitation
• Germ Theory of disease not understood
• Hospitals – especially at war – lacked even
basic methods of sterilization
• Demonstrated that Dr. Lister’s
antiseptic surgical implements
saved many lives- using
Quetelet’s statistical methods
Eulogy for Quetelet
• “Quetelet has shown us the path we must go on if
we are to discover the laws of the Divine
Government of the Moral World.”
• “It is not understood that human actions are – not
subordinate, but – reducible to general laws . . . Of
these at present, we know hardly any. Our object
in life is to ascertain what they are.”
• “A fitting memorial to Quetelet would therefore be
the introduction of his science in the studies of
Oxford”.
Overview of Quetelet’s Statistical
Contributions
• Did much to firmly establish statistics as a
reputable science, and to mathematicize the
Social Sciences
• The Average Man is an enduring paradigm
for statistical and social reasoning
• Showed basics of data analysis, hypothesis
testing, and analysis of variance
• Demonstrated that natural human traits are
normally distributed
THE END