JKaplan_2010_MOP_Presentation
Download
Report
Transcript JKaplan_2010_MOP_Presentation
Financial Noise
And
Overconfidence
Ubiquitous News and Advice
CNBC
WSJ
Jim Cramer
Assorted talking heads
How good can it be?
Thought Experiment
Listen to CNBC every day
Choose ten stocks from recommendations
Buy stocks
How good can this strategy be?
Thought Experiment
10 stocks x 260 business days = 2,600 bets
Assume the bets are independent
How good does the information have to be such that the probability
of losing money is 1 in 10,000?
Detour: Gaussian Distribution
Random Variable X depends upon two parameters:
‣ mean value:
‣ uncertainty (standard deviation):
1
x
Proba X b
exp
dx
2
2
2
b
2
a
2
Detour: Gaussian Distribution
Universal “Bell Curve” Distribution
Prob(X < 0) depends only on the ratio of /
ProbX 0
1
2
/
x2
exp 2 dx
The Central Limit Theorem
{Xi}i = 1..N a set of r.v.:
‣ Identically distributed
‣ Independent
‣ All moments are finite
If X = x1 + x2 + … + xN then:
‣
X has a Gaussian distribution with:
• = N * average(x1)
• = N * stddev(x1)
1
x
Proba X b
exp
dx
2
2
2
b
2
a
2
The Central Limit Theorem
X = x1 + x2 + … + xN is a Gaussian r.v. with:
‣ mean(X) = = N * average(x1)
‣ stddev(X) = = N * stddev(x1)
For a Gaussian r.v. the probability that it is greater than zero is just
a function of the ratio of the mean to the stddev:
Prob(X 0)
In our experiment, our annual profit/loss will have a ratio of mean to
stddev sqrt(2,600) 50 greater than just one of our stock bets.
This is the advantage of aggregation.
How Good is the Information?
Assume Prob(Xannual < 0) 1 / 10,000
mean(Xannual) 3.7 * stddev(Xannual)
mean(Xind) (3.7 / 50) * stddev(Xind) 0.07 * stdev(Xind)
Prob(Xind < 0) 47%
Probability That Advice Is Right?
Any given piece of advice can’t be much more than 53% reliable,
compared to 50% for a random flip of a coin
Otherwise, we could make a strategy that almost never loses
money by just aggregating all the different pieces of advice
Contrast “my price target for this stock is 102 by mid-November”
with “I think it’s 53–47% that this goes up vs. down”
Precision here greatly exceeds possible accuracy
Overconfidence
No error bars
Probability that Advice is Right?
If we want Prob(Xannual < 0) 1 / 1,000,000 then:
‣ Prob(Xind < 0) 46%, instead of 47%
A computer algorithm (or team of analysts) predicting something
everyday for all 2,000 U.S. stocks, instead of just 10 stocks, would
need only Prob(Xind < 0) 49.75% to get Prob(Xannual < 0) to 1 in
1,000,000!
The existence of such a strategy seems unlikely
Statements expressed with certainty about the market need to be
viewed with skepticism
Low signal-to-noise ratio
Where Are Our Assumptions Wrong?
Transaction costs matter
Individual stock bets almost certainly not independent
By placing bets on linear combinations of individual stocks one can
partially overcome correlations between bets, but you reduce the
effective number of bets.
These aren’t just technical details. They can make the difference
between making or losing money.
Honesty Is The Best Policy
Despite caveats, the big picture is correct
‣ Only small effects exist
‣ The data is very noisy
‣ One can find statistical significance only by aggregating lots of data
As a result there is great value in:
‣ Clean data
‣ Quantitative sophistication
‣ Discipline
‣ Honesty
Overconfidence and Overfitting
Humans are good at learning patterns when:
‣ Patterns are persistent
‣ Signal-to-noise ratio isn’t too bad
Examples include: language acquisition, human motor control,
physics, chemistry, biology, etc.
Humans are bad at finding weak patterns that come and go as
markets evolve
They tend to overestimate their abilities to find patterns
Further Consequences of Aggregation
If profit/loss distribution is Gaussian then only the mean and the
standard deviation matter
Consider 3 strategies:
‣ Invest all your money in the S&P 500
‣ Invest all your money in the Nasdaq
‣ 50% in the Nasdaq and 50% in short-term U.S. Government
Bonds
Over the past 40 years, these strategies returned approximately:
‣ 7% +/- 20%
‣ 8% +/- 35%
‣ 5.5% +/- 17.5%
Diversifying Risk
Imagine we can lend or borrow at some risk-free rate r
Suppose we have an amount of capital c, which we distribute:
‣ X in S&P 500
‣ Y in Nasdaq
‣ Z in “cash” (where c = X + Y + Z)
Our investment returns are:
‣ rP = X*rSP + Y*rND + Z*r
‣ rP = X*(rSP – r) + Y*(rND – r) + c*r
‣ E(rP) = X*E(rSP – r) + Y*E(rND – r) + c*r
Consequences of Diversification
As long as E(rSP) and/or E(rND) exceeds the risk-free rate, r, we
can target any desired return
The price for high return is high volatility
One measure of a strategy is the ratio of the return above the
risk-free rate divided by the volatility of the strategy
Investing everything in Nasdaq gave the best return of the three
strategies in the original list
Assuming the correlation between ND and SP is 0.85, the optimal
mixture of investments gives:
Y
0.2
X
Consequences of Diversification
For our example, we want Y - 0.2*X, but we get to choose the
value of X. We can choose the level of risk!
If we choose X 2.4 and Y -0.5, we get the same risk as
investing everything in the Nasdaq but our return is 10.1%
rather than 8%
Returns are meaningless if not quoted with:
‣ Volatility
‣ Correlations
Why does everyone just quote absolute returns?
Past Results Do Not Guarantee Future Results
If we only look back over the past 20 years, the numbers change:
‣ 3.5% +/- 20% for S&P
‣ 7% +/- 35% for Nasdaq
‣ 5.0% +/- 17.5% for 50% Nasdaq and 50 % Risk-free
The same optimal weighting calculation gave X -2.6 and Y 1.9
which gave the same risk as Nasdaq but with a return of 9.3%!
40 years of data suggests that an investor should go long S&P and
short Nasdaq, but 20 years of data suggests to opposite. If we
used the 40 year weight on the last 20 years we’d end up making
2%.
Application to the Credit Crisis
Securitization
‣ Bundles of loans meant to behave independently
‣ CDOs are slices of securitized bundles
‣ Rating agencies suggested 1 in 10,000 chance of investments
losing money
‣ All the loans defaulted together, rather than independently!
Credit Models Were Not Robust
Even if mortgages were independent, the process of securitization
can be very unstable.
Thought Experiment:
‣ Imagine a mortgage pays out $1 unless it defaults and then
pays $0.
‣ All mortgages are independent and have a default probability
of 5%.
‣ What happens to default probabilities when one bundles
mortgages?
Mortgage Bundling Primer: Tranches
Combine the payout of 100 mortgages to make 100 new
instruments called “Tranches.”
Tranche 1 pays $1 if no mortgage defaults.
Tranche 2 pays $1 if only one mortgage defaults.
Tranche i pays $1 if the number of defaulted mortgages < i.
So far all we have done is transformed the cashflow.
What are the default rates for the new Tranches?
Mortgage Bundling Primer: Tranches
If each mortgage has default rate p=0.05 then the ith Tranche has
default rate:
100 j
p (1 p)100 j
p T (i)
j i j
Where pT(i) is the default probability for the ith Tranche.
For the first Tranche pT(1) = 99.4%, i.e., it’s very likely to default.
But for the 10th Tranche pT(10) = 2.8%.
By the 10th Tranche we have created an instrument that is safer
than the original mortgages.
Securitizing Tranches: CDOs
That was fun, let’s do it again!
Take 100 type-10 Tranches and bundle them up together using the
same method.
The default rate for the kth type-10 Tranche is then:
100
pT (10) j (1 pT (10))100 j
p CDO (k )
j k j
The default probability on the 10th type-10 Tranche is then
pCDO(k) = 0.05%. Only 1/100 the default probability of the original
mortgages!
Why Did We Do This: Risk Averse Investors
After making these manipulations we still only have the same cash
flow from 10,000 mortgages, so why did we do it?
Some investors will pay a premium for very low risk investments.
They may even be required by their charter to only invest in
instruments rated very low risk (AAA) by rating agencies.
They do not have direct access to the mortgage market, so they
can’t mimic securitization themselves.
Are They Really So Safe?
These results are very sensitive to the assumptions.
If the underlying mortgages actually have a default probability of
6% (a 20% increase) than the 10th Tranches have a default
probability of 7.8% (a 275% increase).
Worse, the 10th type-10 Tranches will have a default probability of
25%, a 50,000% increase!
These models are not robust to errors in the assumptions!
Application to the Credit Crisis
Connections with thought experiment:
‣ Overconfidence (bad estimate of error bars)
‣ Insufficient independence between loans (bad use of Central
Limit Theorem)
‣ Illusory diversification (just one big bet houses wouldn’t lose
value)
About the D. E. Shaw Group
Founded in 1988
Quantitative and qualitative investment strategies
Offices in North America, Europe, and Asia
1,500 employees worldwide
Managing approximately $22 billion (as of April 1, 2010)
About What I Do
Quantitative Analyst (“Quant”)
Most Quants have a background in math, physics, EE, CS, or
statistics
‣ Many hold a Ph.D. in one of these subjects
We work on:
‣ Forecasting future prices
‣ Reducing transaction costs
‣ Modeling/managing risk