No Slide Title

Download Report

Transcript No Slide Title

Data Mining in Finance
Andreas S. Weigend
Leonard N. Stern School of Business, New York University
Nonlinear Models
8 February 1999
2: 1
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
The seven steps of model building
 1. Task
• Predict distribution of portfolio returns, understand structure in
yield curves, find profitable time scales, discover trade styles, …
 2. Data
• Which data to use, and how to code/ preprocess/ represent them
 3. Architecture
 4. Objective/ Cost function (in-sample)
 5. Search/ Optimization/ Estimation
 6. Evaluation
 7. Analysis and Interpretation
2: 2
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
How to make predictions?
 “Pattern” = Input + Output Pair
Keep all data
 Nearest neighbor lookup
 Local constant model
 Local linear model
Throw away data, only keep model
 Global linear model
 Global nonlinear model
• Neural network with hidden units
- Sigmoids or hyperbolic tangents (tanh)
• Radial basis functions
Keep only a few representative data point
• Support vector machines
2: 3
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Training data: Inputs and corresponding outputs
2: 4
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
What is the prediction for a new input?
new input
2: 5
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Nearest neighbor
 Use output value of nearest neighbor in input space as
prediction
prediction
nearest
neighbor
new input
2: 6
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Local constant model
 Use average of the outputs of nearby points in input space
new input
2: 7
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Local linear model
 Find best-fitting plane (linear model) through nearby points in
input space
new input
2: 8
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Nonlinear regression surface
 Minimize “energy” stored in the “springs”
2: 9
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Throw away the data… just keep the surface!
2: 10
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Modeling – an iterative process
 Step 1: Task/ Problem definition
 Step 2: Data and Representation
 Step 3: Architecture
 Step 4: Objective/ Cost function (in-sample)
 Step 5: Search/ Optimization/ Estimation
 Step 6: Evaluation (out-of-sample)
 Step 7: Analysis and Interpretation
2: 11
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Modeling issues
 Step 1: Task and Problem definition
 Step 2: Data and Representation
 Step 3: Architecture
• What are the “primitives” that make up the surface?
 Step 4: Objective/ Cost function (in-sample)
• How flexible should the surface be?
- Too rigid model: stiff board (global linear model)
- Too flexible model: cellophane going through all points
- Penalize too flexible models (regularization)
 Step 5: Search/ Optimization/ Estimation
• How do we find the surface?
 Step 6: Evaluation (out-of-sample)
 Step 7: Analysis and Interpretation
2: 12
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Step 3: Architecture – Example of neural networks
 Project the input vector x onto a weight vector w
• w*x
 This projection is then be nonlinearly “squashed” to give a
hidden unit activation
• h = tanh (w * x)
 Usually, a constant c in the argument allows the shifting of the
location
• h = tanh (w * x + c)
 There are several such hidden units, responding to different
projections of the input vectors
 Their activations are combined with weights v to form the
output (and another constant b can be added)
• output = v * h + b
2: 13
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Neural networks compared to standard statistics
 Comparison between neural nets and standard statistics
• Complexity
- Statistics: Fix order of interactions
- Neural nets: Fix number of features
• Estimation
- Statistics: Find exact solution
- Neural nets: Focus on path
 Dimensionality
• Number of inputs: Curse of dimensionality
- Points far away in input space
• Number of parameters: Blessing of dimensionality
- Many hidden units make it easier to find good local minimum
- But need to control for model complexity
2: 14
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Step 4: Cost function
 Key problem:
• Want to be good on new data...
• ...but we only have data from the past
 Always
observation y = f(input) + noise
 Assume
• Large sudden variations in output are due to noise
• Small variation (systematic) are signal, expressed as f(input)
 Flexible models
- Good news: can fit any signal
- Bad news: can also fit any noise
 Requires modeling decisions:
• Assumptions about model complexity
- Weight decay, weight elimination, smoothness
• Assumptions about noise: error model or noise model
2: 15
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Step 5: Determining the parameters
Search with gradient descent: iterative
 Vice to virtue: path important
 Guide network through solution space
•
•
•
•
•
•
•
Hints
Weight pruning
Early stopping
Weight-elimination
Pseudo-data
Add noise
…
Alternative approaches:
 Model to match the local noise level of the data
• Local error bars
• Gated experts architecture with adaptive variances
2: 16
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU