Transcript Document

Data Mining
Prof. Chris Volinsky
Rajesh Shekhar
◦ Use Data Mining techniques to build a portfolio with
superior return/risk characteristics using technical
indicators
 Maximize return
 Minimize risk
◦ Build different momentum based strategies
◦ Risk Diversification
 Select stocks across sectors for a natural
diversification.
 Virtual sectors created using k-means cluster
algorithm
◦ Return maximization
 Use momentum based indicators to predict future
returns
 Try different trading algorithms



Investment Universe: Large Market Cap
Stocks (Top 100/300/500)
Data collected for everyday stock prices from
WRDS (CRSP database) for the entire stock
universe from 1999-2009 .
Custom benchmark of top 100/300/500
stocks was created as composition of S&P
500 was not known over the period
Issue
Approach
Large Dataset (Entire stock
universe from 1999 to 2009;
more than 5 GB)
Use database (SQL Server) and
query to get subset of the data
and create proper indexes.
Ticker name change.
Use permno
Dividends: The price change for
stocks does not give the true
return as it ignores the dividend
paid.
Use daily adjusted return which
adjusts for the dividend.
Missing Returns
Use average to fill the returns
Duplicates
Use ‘select distinct’ SQL query to
filter the data
Null Values:
Use average to fill the returns

Used k-means cluster to create virtual clusters
 11 clusters for 300/500 stock universe and 10 clusters
for 100 stock universe
 Input: β, Market Cap (Liquidity), P/E (Price/Earning)
 βstock = cov(Rstock, Rmarket)/var(Rmarke)
 β captures long term adjusted equilibrium rate of return
◦ Different models tried for capturing momentum
indicators (linear models (based on APT)
 Best model to capture model momentum was:
 Based on time decay of historical returns
 r = kj*rj
where r = predicted stock return
j = time period (j=0 for the current time)
k = constant achieved after calibration
 More weights on recent data
◦ Two years of moving window for prediction
◦ Portfolio analysis and rebalancing every two weeks

Long Only

Short Only

Long-Short

Sector Rotation

Sector Portfolio Optimization

Basic Idea:

Portfolio Weights:
Short Only
• Basic Idea:
• Portfolio Weights:
Long top “n” performing
stocks in each sector
based on market cap
All selected stocks are
equally weighted in
portfolio
Short bottom “n” performing
stocks in each sector based
on market cap
All selected stocks are
equally weighted in
portfolio


Basic Idea:
Portfolio Weights:
Combination of Long and Short
All selected stocks are equally
weighted in portfolio
Sector Rotation
Basic Idea:
Long top performing sectors &
short on bottom performing ones
Portfolio Weights: Weight in each sector is
proportional to return
(More weight on the more
outperforming sector; shorting
allowed)


Basic Idea: Select stocks using long only strategy.
Portfolio Weights: Decided by Markowitz Portfolio
optimization techniques
◦ Sector Constraints : (weights vary from 1.1 to 0.9 of the
target sector weights)
◦ Asset Constraints (Shorting and leverage allowed):
(weights vary from -0.1 to 1.1)
◦ Allocation on the efficient frontier
• MATLAB (Object Oriented)
• SQL Server database (> 5 GB of raw data and with indexes
12GB)
Portfolio Engine
(MATLAB Code)
Portfolio
Reports &Graphs
(SQL Server)
Database
Risk Analysis
(MATLAB Code)
Performance
& Risk Report

Vary Input parameters
◦ Stock universe (100/300/500)
◦ Stock selected (10/20/40)
◦ Running time window (2001-2002, 2005-2007)
◦ Rebalancing period (15/21/30/45 days)
Results: 300 stocks universe
Performance Measurement : Risk Metrics
2005-2007
2001-2003
2005-2007
2001-2003
(100stocks)
stocks)
(100
stocks)
(300
(300Return
stocks)
Return
Return
(annual)
(annual)
Return
(annual)
Sigma
Sigma
(annual)
Sigma
(annual)
(annual)
Sigma
(annual)
Alpha
Alpha
(annual)
(annual)
(annual)
Alpha
Alpha
SharpeRatio
SharpeRatio
(annual)
(annual)
(annual)
(annual)
Sharpe
Ratio
SharpeRatio
InfoRatio
InfoRatio
(annual)
(annual)
(annual)
(annual)
Info
Ratio
InfoRatio
Var
Var
(annual)
(annual)
(95%
(95%Daily)
Daily)
Var
CVAR
CVAR
VaR
(95%
Daily)
(95%
Daily)
(95%
Daily)
(95%
Daily)
MaxDD
MaxDD
CVAR
CVAR
(Daily)
(Daily)
(95%
Daily)
(95%
Daily)
MaxDD
Max DD
(Daily)
(Daily)
Long
Long
Long
Long
Only
Only
Only
Short
Short
Short
Short
Only
Only
Only
Sector
Sector
Sector
Sector
Rotation
Rotation
Rotation
SectorSector
Sector
Sector
PortOpt
PortOpt
Port Opt
20.16%
25.55%
70.02%
37.91%
32.93%
15.29%
21.68%
42.74%
37.35%
15.43%
58.18%
54.26%
0.53233
1.3871
37.10%
0.53%
25.05%
90.17%
55.44%
15.71%
19.03%
66.97%
54.29%
-9.59%
13.22%
106.53%
0.62196
-0.24379
16.18%
6.00%
45.07%
57.09%
36.73%
12.49%
17.62%
48.95%
33.36%
-4.12%
33.24%
73.44%
0.36859
0.13134
52.15%
22.43%
51.91%
105.86%
47.04%
16.77%
22.98%
54.23%
69.34%
12.30%
40.08%
122.21%
1.0532
1.0782
14.49%
33.30%
69.17%
61.34%
30.01%
16.23%
23.96%
42.79%
31.68%
23.18%
57.33%
77.69%
0.395
1.7846
0.8257
3.0318
1.599
1.3748
1.308
1.0882
0.73514
-0.40856
1.113
2.3126
0.67099
-0.25663
1.9049
1.3728
2.0707 1.2174
2.7072
1.1749
0.67326
1.7481
1.5872
-2.91%
3.2054
-1.31%
1.2666
-3.93%
0.5056
-1.51%
1.2248
-3.28%
1.6552
-1.22%
1.891
2.1169
-3.96%
1.6635 -2.81%
2.7043
-1.52%
-1.37%
-3.72%
-3.58%
-1.71%
-1.44%
-7.23%
-4.69%
-2.15%
-1.54%
-5.56%
-4.40%
-1.52%
-1.29%
-5.97%
-4.17%
-3.52%
-1.91%
-1.79%
-1.72% -3.69%
-1.61%
15.39%
8.64%
-4.53%
-1.89%
43.56%
8.04%
-8.36%
-2.24%
27.95%
5.90%
-6.78%
-1.83%
30.36%
13.40%
8.59%
7.78%
-6.33%
-2.43% -4.48%
-2.34%
17.55%
10.99%
45.84%
10.09%
27.92%
8.27%
33.22%
13.61%
Only
Only
Long
Long
Long
Long
Short
Short
Short
Short
Rotation
PortOpt
24.54%
15.98%


Benchmark : Custom Benchmark
Value-added Return = Pure sector allocation
+ Allocation/Selection interaction
+ Within-sector selection
RV=
+
Rv = the value-added return
wP,j = portfolio weight of sector j
wB,j = benchmark weight of sector j
RP,j = portfolio return of sector j
RB,j = benchmark return of sector j
RB = return in the portfolio’s benchmark
S = number of sectors
+

Transaction Costs:
◦ Slippage cost and explicit costs are taken into
account
◦ Market impact and other implicit costs are ignored

Leverage costs are not taken into account

Portfolio Turnover not taken into account



Virtual sectors works reasonably well.
Time decay returns is a decent predictor of future
returns in stable market for short time periods.
Statistically relevant for large market caps.