No Slide Title
Download
Report
Transcript No Slide Title
Data Mining in Finance
Andreas S. Weigend
Leonard N. Stern School of Business, New York University
Nonlinear Models
8 February 1999
2: 1
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
The seven steps of model building
1. Task
• Predict distribution of portfolio returns, understand structure in
yield curves, find profitable time scales, discover trade styles, …
2. Data
• Which data to use, and how to code/ preprocess/ represent them
3. Architecture
4. Objective/ Cost function (in-sample)
5. Search/ Optimization/ Estimation
6. Evaluation
7. Analysis and Interpretation
2: 2
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
How to make predictions?
“Pattern” = Input + Output Pair
Keep all data
Nearest neighbor lookup
Local constant model
Local linear model
Throw away data, only keep model
Global linear model
Global nonlinear model
• Neural network with hidden units
- Sigmoids or hyperbolic tangents (tanh)
• Radial basis functions
Keep only a few representative data point
• Support vector machines
2: 3
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Training data: Inputs and corresponding outputs
2: 4
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
What is the prediction for a new input?
new input
2: 5
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Nearest neighbor
Use output value of nearest neighbor in input space as
prediction
prediction
nearest
neighbor
new input
2: 6
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Local constant model
Use average of the outputs of nearby points in input space
new input
2: 7
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Local linear model
Find best-fitting plane (linear model) through nearby points in
input space
new input
2: 8
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Nonlinear regression surface
Minimize “energy” stored in the “springs”
2: 9
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Throw away the data… just keep the surface!
2: 10
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Modeling – an iterative process
Step 1: Task/ Problem definition
Step 2: Data and Representation
Step 3: Architecture
Step 4: Objective/ Cost function (in-sample)
Step 5: Search/ Optimization/ Estimation
Step 6: Evaluation (out-of-sample)
Step 7: Analysis and Interpretation
2: 11
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Modeling issues
Step 1: Task and Problem definition
Step 2: Data and Representation
Step 3: Architecture
• What are the “primitives” that make up the surface?
Step 4: Objective/ Cost function (in-sample)
• How flexible should the surface be?
- Too rigid model: stiff board (global linear model)
- Too flexible model: cellophane going through all points
- Penalize too flexible models (regularization)
Step 5: Search/ Optimization/ Estimation
• How do we find the surface?
Step 6: Evaluation (out-of-sample)
Step 7: Analysis and Interpretation
2: 12
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Step 3: Architecture – Example of neural networks
Project the input vector x onto a weight vector w
• w*x
This projection is then be nonlinearly “squashed” to give a
hidden unit activation
• h = tanh (w * x)
Usually, a constant c in the argument allows the shifting of the
location
• h = tanh (w * x + c)
There are several such hidden units, responding to different
projections of the input vectors
Their activations are combined with weights v to form the
output (and another constant b can be added)
• output = v * h + b
2: 13
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Neural networks compared to standard statistics
Comparison between neural nets and standard statistics
• Complexity
- Statistics: Fix order of interactions
- Neural nets: Fix number of features
• Estimation
- Statistics: Find exact solution
- Neural nets: Focus on path
Dimensionality
• Number of inputs: Curse of dimensionality
- Points far away in input space
• Number of parameters: Blessing of dimensionality
- Many hidden units make it easier to find good local minimum
- But need to control for model complexity
2: 14
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Step 4: Cost function
Key problem:
• Want to be good on new data...
• ...but we only have data from the past
Always
observation y = f(input) + noise
Assume
• Large sudden variations in output are due to noise
• Small variation (systematic) are signal, expressed as f(input)
Flexible models
- Good news: can fit any signal
- Bad news: can also fit any noise
Requires modeling decisions:
• Assumptions about model complexity
- Weight decay, weight elimination, smoothness
• Assumptions about noise: error model or noise model
2: 15
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU
Step 5: Determining the parameters
Search with gradient descent: iterative
Vice to virtue: path important
Guide network through solution space
•
•
•
•
•
•
•
Hints
Weight pruning
Early stopping
Weight-elimination
Pseudo-data
Add noise
…
Alternative approaches:
Model to match the local noise level of the data
• Local error bars
• Gated experts architecture with adaptive variances
2: 16
RiskTeam/ Zürich, 6 July 1998
Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU