After Dinner Talk at SLDS Meeting, 2016
Download
Report
Transcript After Dinner Talk at SLDS Meeting, 2016
Some Musings on Life
& “Data Science”
Statistical Learning and Data Science
Friday Center, UNC, Chapel Hill
J. S. Marron
Dept. of Statistics and Operations Research
University of North Carolina
Some Views of Statistics
Statistics
X z1
2
s
n
H o : 1 2
EY X
Most People
Some Views of Statistics
Statistics
Bootstrap
HDLSS
X z1
2
s
n
Bayes
H o : 1 2
EY X
Kernels
Survival Analysis
Sparsity
Functional Data
Machine Learning
MCMC
Time Series
Mixed Models
Etc. Etc. Etc. …
Reality
Some Views of Statistics
Statistics
Statistics in Science
Some Views of Statistics
Medicine
Biology
Agriculture
Physics
Statistics
Geology
Economics
Psychology
Statistics in Science
Some Views of Statistics
John Tukey Quote:
From:
http://www.morris.umn.edu/~sungurea/introstat/history/w98
Statistics in Science
Some Views of Statistics
John Tukey Quote:
“The best thing about being a statistician
is that you get to play in everyone's
backyard”
From: http://www.york.ac.uk/depts/maths/histstat/tukey_nytimes.htm
Statistics in Science
Some Views of Statistics
Words coined by John Tukey:
(0 – 1 data unit)
Bit
Software
(mention to Computer Science friends…)
Some Views of Statistics
Another Prescient Statistician:
Bill Cleveland
Coined the Term “Data Science”
Cleveland, W. S. (2001). Data science: an action plan
for expanding the technical areas of the field of
statistics. International Statistical Review.
Some Views of Statistics
Statistics
X z1
2
s
n
H o : 1 2
EY X
Most People
Some Views of Statistics
“Data Science (Analytics)”
Computer Science
Math (Applied)
Bus. / Finance
Others (Info. Sci., Psych, …)
Statistics
X z1
2
Caution: ∃ a desire to replace old
ideas with exciting new ones
s
n
H o : 1 2
EY X
Some Views of Statistics
What is (should be) the relationship?
Statistics
Data Science
Machine Learning
…
(Cleveland View)
Some Views of Statistics
What is (should be) the relationship?
Data Science
Machine Learning
…
Statistics
The Big Question
What are the Boundaries of Statistics?
NSF/DMS Program Director (late 2004):
“That is not statistics”
The Big Question
What are the Boundaries of Statistics?
OK, then where are they?
We should discuss this much more…
Openly, not in the “Rejection Process
(Publications, Grants, etc.)”
Variation
Thoughts From Business Statistics Course
Variation
A Fundamental Concept:
Sounds Obvious
Easy to Not Consider (Forget)
{Surprisingly So}
Variation
Easy to Not Consider (Forget)
E.g. An Explorer Drowned in a Lake That
Averaged 6 Inches in Depth…
o Hard to visualize?
Thanks to N. I. Fisher
Variation
Easy to Not Consider (Forget)
E.g. An Explorer Drowned in a Lake That
Averaged 6 Inches in Depth…
o Hard to visualize?
Lake Eyre, Australia, from Wikipedia
Variation
Easy to Not Consider (Forget)
E.g. An Explorer Drowned in a Lake That
Averaged 6 Inches in Depth…
o Hard to visualize?
Lake Eyre, Australia, from
www.airadventure.com.au
Variation
Easy to Not Consider (Forget)
E.g. An Explorer Drowned in a Lake That
Averaged 6 Inches in Depth…
o Hard to visualize?
o Key is Variation About “Average”
o Simple Idea Takes a Minute to Recall
(happens a lot)
Variation
A Fundamental Concept:
Sounds Obvious
U.S. Presidential
Politics ?!?
Common Gross Oversimplification:
Group of people:
Political. Religious,
Ethnic Origin, …
They are going to …
They all want to ..
Variation
Homework C0.1
Find an Example of Ignoring Variation.
Send me an email, with: text, and attribution.
Plan to discuss in class.
Variation
Homework C0.1
Results:
Out of First 10 Quotes
9 Were From
Donald Trump
Ideas on Human Relationships
Common Question:
“How Are Dep’t Politics Going?”
Background:
Long Dubious History
Merger of Statistics & OR
(More Diverse Interests)
Rapidly Changing University
Ideas on Human Relationships
Response:
“Best I’ve Seen in Chapel Hill”
Reason:
Respect
Key to Current Interactions
Moved Beyond “Politics of Disrespect”
Ideas on Human Relationships
Fundamental Observation:
Human Interactions Work Best In An
Atmosphere of Respect
Day to Day Interactions w/ Colleagues
Reviews of Papers / Grant Proposals
US Congress
US Presidential Politics…
Special Thanks
UNC, Stat & OR
Department of Statistics and Applied Prob.
National University of Singapore
For Many Discussions
This Talk
28
BIG DATA Models & Concepts
UNC, Stat & OR
Challenge from the Recent Media:
Mayer-Schönberger and Cukier (2014)
“Big Data: A Revolution That Will Transform
How We Live, Work, and Think”
29
BIG DATA Models & Concepts
UNC, Stat & OR
Challenge from the Recent Media:
Mayer-Schönberger and Cukier (2014)
Major Premise:
Differing Data Analytic Goals
“Correlational” vs. “Causal”
30
BIG DATA Models & Concepts
UNC, Stat & OR
“Causal” Data Analysis:
Goal: Underlying Causes of Phenomena
Approach: Classical “Scientific Method”
Formulate Hypothesis
Collect Data
Test Hypothesis
Consequences:
Solid Knowledge w/ Measurable Certainty
31
BIG DATA Models & Concepts
UNC, Stat & OR
“Correlational” Data Analysis:
Goal: Find (and Use) Mere Correlations
Motivation: Correlations are
Useful (e.g. ___ Recognition Software)
Valuable (Buying and Selling of Data…)
Insightful????
Consequences:
Automatic Solutions to Some Hard Problems
32
Correlation vs. Causation
UNC, Stat & OR
How New Is This Discussion?
33
Correlation vs. Causation
UNC, Stat & OR
How New Is This Discussion?
Naïve Readers
:
[Of Mayer-Schönberger and Cukier (2014)]
This is Exciting!!!
Great New Ideas!!!
Change Statistics Curricula!!!
Start Up “Data Analytics”!!!
34
Correlation vs. Causation
UNC, Stat & OR
How New Is This Discussion?
Statistics
Time
35
Correlation vs. Causation
UNC, Stat & OR
How New Is This Discussion?
Time
Statistics
Pattern Recognition
Artificial Intelligence
Neural Networks
Data Mining
Machine Learning
36
Correlation vs. Causation
UNC, Stat & OR
How New Is This Discussion?
Time
Statistics
Pattern Recognition
Artificial Intelligence
Neural Networks
Data Mining
Machine Learning
???
37
Correlation vs. Causation
UNC, Stat & OR
How New Is This Discussion?
Time
Statistics
Pattern Recognition
Artificial Intelligence
Neural Networks
Data Mining
Machine Learning
Big Data – Data Science
38
A Small Aside
A Personal Apology to
Xiaotong Shen
For My Skepticism About
ASA Section on Data Mining
My (Wrong) Idea: Name Would Change,
So Not Appropriate as “Section”
{Great to See Recent Name Change}
Correlation vs. Causation
UNC, Stat & OR
How New Is This Discussion?
Time
Statistics
Pattern Recognition
Artificial Intelligence
Neural Networks
Data Mining
Machine Learning
Big Data – Data Science
40
Correlation vs. Causation
UNC, Stat & OR
How New Is This Discussion?
Some Came
With Major
New Ideas
Pattern Recognition
Artificial Intelligence
Neural Networks
Data Mining
Machine Learning
Big Data
41
Correlation vs. Causation
UNC, Stat & OR
How New Is This Discussion?
Pattern Recognition
Less So For
Others, But
More Focus
On
Artificial Intelligence
Neural Networks
Data Mining
Machine Learning
Big Data
42
Correlation vs. Causation
UNC, Stat & OR
How New Is This Discussion?
Data Mining
Great Correlational Discovery
43
Correlation vs. Causation
UNC, Stat & OR
How New Is This Discussion?
Data Mining
Great Correlational Discovery:
Super Market Scanner Data
Baby Diapers (aka Nappies) & Beer
44
Correlation vs. Causation
UNC, Stat & OR
How New Is This Discussion?
Data Mining
Baby Diapers (aka Nappies) & Beer
Some Perspective:
Correlational Discovery
Makes Causational Sense
(Too Soon To Totally Dump Causation)
45
Correlation vs. Causation
UNC, Stat & OR
Relative Emphasis???
46
Correlation vs. Causation
UNC, Stat & OR
Relative Emphasis???
Classical Statistics:
Correlation
vs.
Causation
47
Correlation vs. Causation
UNC, Stat & OR
Relative Emphasis???
Mayer-Schönberger and Cukier:
Correlation
vs.
Causation
48
Correlation vs. Causation
UNC, Stat & OR
Relative Emphasis???
Suggested Actual Future Course:
Correlation & Causation
49
Correlation vs. Causation
UNC, Stat & OR
Relative Emphasis???
Suggested Actual Future Course:
Correlation & Causation
Note: Changes Are Needed in Curricula, Etc.
50
The Big Question
What are the Boundaries of Statistics?
NSF/DMS Program Director (late 2004):
“That is not statistics”
The Big Question
What are the Boundaries of Statistics?
We Should Openly Discuss Much More…
Statistics
Data Science
Machine Learning
…
Data Science
OR
Machine Learning
…
Statistics
The Big Question
What are the Boundaries of Statistics?
We Should Openly Discuss Much More…
How Much Leadership Should We Take?
Let’s Embrace Our Wide Diversity
of Opinions on This Point
Challenges for You
Lead Statistics (D. S.) into the Future
Promote Increasing Breadth
Embrace New Ideas
Advocate Them While Reviewing
Speak Up Serving On Panels
Openly Discuss Boundaries