Strategies of Modern Statistics
Download
Report
Transcript Strategies of Modern Statistics
The Surprising Consequences
of Randomness
LS 829
Mathematics in Science and Civilization
Feb 6, 2010
7/17/2015
LS 829 - 2010
1
Sources and Resources
• Statistics: A Guide to the Unknown, 4th ed.,
by R.Peck, et al. Publisher: Duxbury, 2006
• Taleb, N. N. (2008) Fooled by Randomness
The Hidden Role of Chance in the Markets
and Life, 2nd Edition. Random House.
• Mlodinow, L (2008) The Drunkard’s Walk.
Vintage Books. New York.
• Rosenthal, J.S. (2005) Struck by Lightning
Harper Perennial. Toronto.
• www.stat.sfu.ca/~weldon
7/17/2015
LS 829 - 2010
2
Introduction
• Randomness concerns Uncertainty - e.g. Coin
• Does Mathematics concern Certainty? - P(H) = 1/2
• Probability can help to Describe Randomness &
“Unexplained Variability”
• Randomness & Probability are key concepts for
exploring implications of “unexplained variability”
7/17/2015
LS 829 - 2010
3
Abstract
Real World
Mathematics
Applications of Mathematics
Probability
Applied Statistics
Useful Principles
7/17/2015
Surprising Findings
Nine Findings and Associated Principles
LS 829 - 2010
4
Example 1 When is Success just
Good Luck?
An example from the world of
Professional Sport
7/17/2015
LS 829 - 2010
5
7/17/2015
LS 829 - 2010
6
7/17/2015
LS 829 - 2010
7
Sports League - Football
Success = Quality or Luck?
2 0 0 7 A F L LA D D E R
TEA M
P layed WinD raw Los s P oints F O RP oints A gains tR atio P oints
G eelong
22
18 4
2542
1664 153
72
P ort A delaide
22
15 7
2314
2038 114
60
Wes t C oas t E agles
22
15 7
2162
1935 112
60
Kangaroos
22
14 8
2183
1998 109
56
H awthorn
22
13 9
2097
1855 113
52
C ollingwood
22
13 9
2011
1992 101
52
Sydney Swans
22
12
1
9
2031
1698 120
50
A delaide
22
12 10
1881
1712 110
48
St Kilda
22
11
1
10
1874
1941
97
46
Bris bane Lions
22
9
2
11
1986
1885 105
40
Fremantle
22
10 12
2254
2198 103
40
E s s endon
22
10 12
2184
2394
91
40
Wes tern Bulldogs
22
9
1
12
2111
2469
86
38
M elbourne
22
5 17
1890
2418
78
20
C arlton
22
4 18
2167
2911
74
16
Ric hmond
22
3
1
18
1958
2537
77
14
7/17/2015
LS 829 - 2010
8
7/17/2015
LS 829 - 2010
9
Recent News Report
“A crowd of 97,302 has witnessed Geelong break
its 44-year premiership drought by crushing a hapless
Port Adelaide by a record 119 points in Saturday's
grand final at the MCG.” (2007 Season)
7/17/2015
LS 829 - 2010
10
Sports League - Football
Success = Quality or Luck?
2 0 0 7 A F L LA D D E R
TEA M
P layed WinD raw Los s P oints F O RP oints A gains tR atio P oints
G eelong
22
18 4
2542
1664 153
72
P ort A delaide
22
15 7
2314
2038 114
60
Wes t C oas t E agles
22
15 7
2162
1935 112
60
Kangaroos
22
14 8
2183
1998 109
56
H awthorn
22
13 9
2097
1855 113
52
C ollingwood
22
13 9
2011
1992 101
52
Sydney Swans
22
12
1
9
2031
1698 120
50
A delaide
22
12 10
1881
1712 110
48
St Kilda
22
11
1
10
1874
1941
97
46
Bris bane Lions
22
9
2
11
1986
1885 105
40
Fremantle
22
10 12
2254
2198 103
40
E s s endon
22
10 12
2184
2394
91
40
Wes tern Bulldogs
22
9
1
12
2111
2469
86
38
M elbourne
22
5 17
1890
2418
78
20
C arlton
22
4 18
2167
2911
74
16
Ric hmond
22
3
1
18
1958
2537
77
14
7/17/2015
LS 829 - 2010
11
Are there better teams?
• How much variation in the total points table
would you expect IF
every team had the same chance of winning
every game? i.e. every game is 50-50.
• Try the experiment with 5 teams.
H=Win T=Loss (ignore Ties for now)
7/17/2015
LS 829 - 2010
12
5 Team Coin Toss Experiment
•Win=4, Tie=2, Loss=0 but we ignore ties. P(W)=1/2
•5 teams (1,2,3,4,5) so 10 games as follows
•1-2,1-3,1-4,1-5,2-3,2-4,2-5,3-4,3-5,4-5
My experiment …
• T T H T T H H H H T
Team
Points
3
16
Experiment
But all teams Equal Quality
2
12
Result
(Equal Chance to win)
----->
5
8
1
4
4
7/17/2015
LS0
829 - 2010
13
Implications?
• Points spread due to chance?
• Top team may be no better than the
bottom team (in chance to win).
7/17/2015
LS 829 - 2010
14
Simulation: 16 teams, equal chance to win, 22 games
7/17/2015
LS 829 - 2010
15
Sports League - Football
Success = Quality or Luck?
2 0 0 7 A F L LA D D E R
TEA M
P layed WinD raw Los s P oints F O RP oints A gains tR atio P oints
G eelong
22
18 4
2542
1664 153
72
P ort A delaide
22
15 7
2314
2038 114
60
Wes t C oas t E agles
22
15 7
2162
1935 112
60
Kangaroos
22
14 8
2183
1998 109
56
H awthorn
22
13 9
2097
1855 113
52
C ollingwood
22
13 9
2011
1992 101
52
Sydney Swans
22
12
1
9
2031
1698 120
50
A delaide
22
12 10
1881
1712 110
48
St Kilda
22
11
1
10
1874
1941
97
46
Bris bane Lions
22
9
2
11
1986
1885 105
40
Fremantle
22
10 12
2254
2198 103
40
E s s endon
22
10 12
2184
2394
91
40
Wes tern Bulldogs
22
9
1
12
2111
2469
86
38
M elbourne
22
5 17
1890
2418
78
20
C arlton
22
4 18
2167
2911
74
16
Ric hmond
22
3
1
18
1958
2537
77
14
7/17/2015
LS 829 - 2010
16
Does it Matter?
Avoiding foolish predictions
Managing competitors (of any kind)
Understanding the business of sport
Appreciating the impact of uncontrolled variation
in everyday life
7/17/2015
LS 829 - 2010
17
Point of this Example?
Need to discount “chance”
In making inferences from
everyday observations.
7/17/2015
LS 829 - 2010
18
Example 2 - Order from
Apparent Chaos
An example from some
personal data collection
7/17/2015
LS 829 - 2010
19
Gasoline Consumption
Each Fill - record kms and litres of fuel used
Smooth
--->
Seasonal
Pattern
….
Why?
7/17/2015
LS 829 - 2010
20
Pattern Explainable?
Air temperature?
Rain on roads?
Seasonal Traffic Pattern?
Tire Pressure?
Info Extraction Useful for Exploration of Cause
Smoothing was key technology in info extraction
7/17/2015
LS 829 - 2010
21
Intro to smoothing with context …
Jan 12, 2010
STAT 100
22
Optimal Smoothing Parameter?
• Depends on Purpose of Display
• Choice Ultimately Subjective
• Subjectivity is a necessary part
of good data analysis
7/17/2015
LS 829 - 2010
23
Summary of this Example
• Surprising? Order from Chaos …
• Principle - Smoothing and Averaging reveal
patterns encouraging investigation of cause
7/17/2015
LS 829 - 2010
24
3. Weather Forecasting
7/17/2015
LS 829 - 2010
25
Chaotic Weather
• 1900 – equations too complicated to solve
• 2000 – solvable but still poor predictors
• 1963 – The “Butterfly Effect”
small changes in initial conditions ->
large short term effects
• today – ensemble forecasting see p 173
• Rupert Miller p 178 – stats for short term …
7/17/2015
LS 829 - 2010
26
Conclusion from
Weather Example?
• It may not be true that weather forecasting
will improve dramatically in the future
• Some systems have inherent instability and
increased computing power may not be
enough the break through this barrier
7/17/2015
LS 829 - 2010
27
Example 4 - Obtaining
Confidential Information
•
•
•
•
•
How can you ask an individual for data on
Incomes
Illegal Drug use
Sex modes
…..Etc
in a way that will get an honest
response?
There is a need to protect confidentiality of answers.
7/17/2015
LS 829 - 2010
28
Example: Marijuana Usage
• Randomized Response Technique
Pose two Yes-No questions and have coin
toss determine which is answered
Head 1. Do you use Marijuana regularly?
Tail 2. Is your coin toss outcome a tail?
7/17/2015
LS 829 - 2010
29
Randomized Response
Technique
• Suppose 60 of 100 answer Yes. Then about
50 are saying they have a tail. So 10 of the
other 50 are users. 20%.
• It is a way of using randomization to protect
Privacy. Public Data banks have used this.
7/17/2015
LS 829 - 2010
30
Summary of Example 4
• Surprising that people can be induced to
provide sensitive information in public
• The key technique is to make use of the
predictability of certain empirical
probabilities.
7/17/2015
LS 829 - 2010
31
5. Randomness in the Markets
• 5A. Trends That Deceive
• 5B. The Power of Diversification
• 5C. Back-the-winner fallacy
7/17/2015
LS 829 - 2010
32
5A. Trends That Deceive
People often fail to appreciate the
effects of randomness
7/17/2015
LS 829 - 2010
33
The Random Walk
7/17/2015
LS 829 - 2010
34
Trends that do not persist
7/17/2015
LS 829 - 2010
35
Longer Random Walk
7/17/2015
LS 829 - 2010
36
Recent Intel Stock Price
7/17/2015
LS 829 - 2010
37
Things to Note
• The random walk has no patterns useful for
prediction of direction in future
• Stock price charts are well modeled by
random walks
• Advice about future direction of stock
prices – take with a grain of salt!
7/17/2015
LS 829 - 2010
38
5B. The Power of
Diversification
People often fail to appreciate the
effects of randomness
7/17/2015
LS 829 - 2010
39
Preliminary Proposal
I offer you the following “investment opportunity”
You give me $100. At the end of one year, I will
return an amount determined by tossing a fair
coins twice, as follows:
$0 ………25% of time
(TT)
$50.……. 25% of the time
(TH)
$100.……25% of the time
(HT)
$400.……25% of the time. (HH)
Would you be interested?
7/17/2015
LS 829 - 2010
40
Stock Market Investment
• Risky Company - example in a known context
• Return in 1 year for 1 share costing $1
0.00
25% of the time
0.50
25% of the time
1.00
25% of the time
4.00
25% of the time
i.e. Lose Money 50% of the time
Only Profit 25% of the time
“Risky” because high chance of loss
7/17/2015
LS 829 - 2010
41
Independent Outcomes
• What if you have the chance to put $1 into
each of 100 such companies, where the
companies are all in very different markets?
• What sort of outcomes then? Use cointossing (by computer) to explore
7/17/2015
LS 829 - 2010
42
Diversification
Unrelated Companies
• Choose 100 unrelated companies, each one
risky like this. Outcome is still uncertain
but look at typical outcomes ….
One-Year Returns to a $100 investment
7/17/2015
LS 829 - 2010
43
Looking at Profit only
Avg Profit approx 38%
7/17/2015
LS 829 - 2010
44
Gamblers like
Averages and Sums!
• The sum of 100 independent investments in
risky companies is very predictable!
• Sums (and averages) are more stable than
the things summed (or averaged).
Variation -----> Variation/n
• Square root law for variability of averages
7/17/2015
LS 829 - 2010
45
Summary - Diversification
• Variability is not Risk
• Stocks with volatile prices can be good
investments
• Criteria for Portfolio of Volatile Stocks
– profitable on average
– independence (or not severe dependence)
7/17/2015
LS 829 - 2010
46
5C - Back-the-winner fallacy
• Mutual Funds - a way of diversifying a
small investment
• Which mutual fund?
• Look at past performance?
• Experience from symmetric random walk
…
7/17/2015
LS 829 - 2010
47
Implication from Random Walk
…?
• Stock market trends may not persist
• Past might not be a good guide to future
• Some fund managers better than others?
• A small difference can result in a big
difference over a long time …
7/17/2015
LS 829 - 2010
48
A simulation experiment to
determine the value of past
performance data
• Simulate good and bad managers
• Pick the best ones based on 5 years data
• Simulate a future 5-yrs for these select
managers
7/17/2015
LS 829 - 2010
49
How to describe good and bad
fund managers?
• Use TSX Index over past 50 years as a
guide ---> annualized return is 10%
• Use a random walk with a slight upward
trend to model each manager.
• Daily change positive with probability p
Good manager
ROR = 13%pa p=.56
Medium manager
ROR = 10%pa p=.55
Poor manager
7/17/2015
ROR = 8% pa P=.54
LS 829 - 2010
50
7/17/2015
LS 829 - 2010
51
Simulation to test
“Back the Winner”
• 100 managers assigned various p
parameters in .54 to .56 range
• Simulate for 5 years
• Pick the top-performing mangers (top 15%)
• Use the same 100 p-parameters to simulate
a new 5 year experience
• Compare new outcome for “top” and
“bottom” managers
7/17/2015
LS 829 - 2010
52
START=100
7/17/2015
LS 829 - 2010
53
Mutual Fund Advice?
Don’t expect past relative performance to be a
good indicator of future relative
performance.
Again - need to give due allowance for
randomness (i.e. LUCK)
7/17/2015
LS 829 - 2010
54
Summary of Example 5C
• Surprising that Past Perfomance is such a
poor indicator of Future Performance
• Simulation is the key to exploring this issue
7/17/2015
LS 829 - 2010
55
6. Statistics in the Courtroom
• Kristen Gilbert Case
• Data p 6 of article – 10 years data needed!
• Table p 9 of article – rare outcome if only
randomness involved. P-value logic.
• Discount randomness but not quite proof
• Prosecutor’s Fallacy P[E|I] ≠ P[I|E]
7/17/2015
LS 829 - 2010
56
Lesson from Gilbert Case
• Statistical logic is subtle
• Easy to misunderstand
• Subjectivity necessary in some decisionmaking
7/17/2015
LS 829 - 2010
57
Example 7 - Lotteries:
Expectation and Hope
• Cash flow
–
–
–
–
Ticket proceeds in (100%)
50 %
Prize money out (50%)
Good causes (35%)
Administration and Sales (15%)
•$1.00 ticket worth 50 cents, on average
•Typical lottery P(jackpot) = .0000007
7/17/2015
LS 829 - 2010
58
How small is .0000007?
• Buy 10 tickets every week for 60 years
• Cost is $31,200.
• Chance of winning jackpot is = ….
1/5 of 1 percent!
7/17/2015
LS 829 - 2010
59
Summary of Example 7
•Surprising that lottery tickets provide
so little hope!
•Key technology is simple use of
probabilities
7/17/2015
LS 829 - 2010
60
Nine Surprising Findings
1.
2.
3.
4.
5A.
5B.
5C.
6.
7.
Sports Leagues - Lack of Quality Differentials
Gasoline Mileage - Seasonal Patterns
Weather - May be too unstable to predict
Marijuana – Can get Confidential info
Random Walk – Trends that are not there
Risky Stocks – Possibly a Reliable Investment
Mutual Funds – Past Performance not much help
Gilbert Case – Finding Signal amongst Noise
Lotteries - Lightning Seldom Strikes
7/17/2015
LS 829 - 2010
61
Nine Useful
Concepts & Techniques?
1. Sports Leagues - Unexplained variation can
cause illusions - simulation can inform
2. Gasoline Mileage - Averaging (and smoothing)
amplifies signals
3. Weather – Beware the Butterfly Effect!
4. Marijuana – Randomized Response Surveys
5A. Random Walks – Simulation can inform
5B. Risky Stocks - Simulation can inform
5C. Mutual Funds - Simulation can inform
6. Gilbert Case – Extracts Signal from Noise
7. Lotteries – 14 million is a big number!
7/17/2015
LS 829 - 2010
62
Role of Math?
• Key background for
–
–
–
–
Graphs
Probabilities
Simulation models
Smoothing Methods
• Important for constructing theory of
inference
7/17/2015
LS 829 - 2010
63
Limitation of Math
• Subjectivity Necessary in Decision-Making
• Extracting Information from Data is still
partly an “art”
• Context is suppressed in a mathematical
approach to problem solving
• Context is built in to a statistical approach
to problem solving
7/17/2015
LS 829 - 2010
64
The End
Questions, Comments, Criticisms…..
[email protected]
7/17/2015
LS 829 - 2010
65