Transcript Handout

Special Challenges With Large Data
Mining Projects
CAS PREDICTIVE MODELING SEMINAR
Beth Fitzgerald
ISO
October 2006









Agenda
• Project Overview
• Prior to Modeling
• Modeling
• Business Issues
Development of a Model - Project
Overview
• Data
• Statistical Tools
• Computer Capacity
• Team Skills
– Data management
– Analytical/statistical
– Technology
– Business Knowledge
Prior to Modeling
• Formulate the Problem
• Evaluate Possible Data Sources
• Prepare the Data
• Develop Understanding of Modeling
Procedures and Diagnostics
• Explore the Data with Simple Modeling
Techniques
What percent of a model building
project is the data preparation and
data management?
 25%
 50%
 75%
 85%
Formulate the Problem
• What problem are you trying to solve?
• What results do you expect to see?
• How will you know if the results are
reasonable?
Prepare the Data
• Do quality checks in level of detail needed
for project
• Understand how to prepare individual
variables for use in models
• Need to be practical about number of
classification categories models can
handle
• Need to decide on truncation and
bucketing of variables that are continuous
• Create new variables
Develop Understanding of Modeling
Procedures and Diagnostics
• Basic modeling training – GLM, Data
Mining
• What software is available?
• What software/models work for my data
investigation, modeling problem, etc.
• What computer capacity do I need?
• Learn how to use software
• Learn how to interpret the diagnostics
Development of a Model
•
•
•
Analyze historical policy and loss data
– Policy level detail
– Location level detail
Link policy and loss data with external and/or
internal data:
– Specific business risk data – operational,
financial
– Specific location data – demographic,
weather
– Other data – building, vehicle, agency
Need link between policy detail and other data
Explore the Data with Simple
Modeling Techniques
• Start with sample of data
• Try different classical analysis on
sample such as:
– regression
– linear models
– correlation matrices
• Make use of graphical options to
explore data
Data Management Issues
• Matching additional internal policy
information to premium/loss data
– Different points in time
– Tracking & balancing audited exposures
• Different summarization keys – handling of
mid-term endorsements
• Address scrubbing
• Matching to external data for correct point in
time
• Significance of missing values within variable
Modeling Activities
• Selection of Predictors – variable
elimination, variable transformation
• Start with classical models prior to
evaluating more complex models
• Methodology Understanding and
Evaluation
• Evaluation of Model Performance
Data Mining Techniques
Balance good fit with explanatory power
• Generalized Linear Models
• Classification Trees
• Regression Trees
• Multivariate Adaptive Regression
Splines
• Neural Networks
Data Mining Process
Data Linking
Data Gathering
Data Cleansing
Evaluation
Business
Knowledge
Determine
Predictive Variables
Analyze
Variables
Data Mining
Model Performance
• Lift Curve Analysis
– Score all risks in sample
– Rank risks by score from Bad to
Good
– Compare loss ratio of risks in each
decile to loss ratio for all risks
Sample Lift Curve Analysis
Relative Loss Ratio Lift
Optimal Model
Loss Ratio Relativity
1.3
1.2
1.1
LR Relativity by Decile
1
0.9
0.8
0.7
1
2
3
4
5
6
7
8
Decile of Worst to Best Risk
9
10
Business Issues
• Model uses information from a thirdparty vendor
• Model needs to be accessible
electronically
• Technology Issues
• Implementation Decisions
Technology Issues
•
•
Develop/Modify Systems
Integrate into underwriting/rating workflow
– Decision process
– Agency system
• Decide on technology
– Web-based interface
– API, FTP, MQ, TCP/IP, HTTPS webservices
Implementation of Model
Solution focus/usage:
•
•
•
•
•
Suitability of risk for underwriting
decision
Source for additional pricing factors
Consistency in underwriting/pricing
decisions
Compliance with regulations based on
implementation decision
Consider model alone or model with
other information available from
application
Implementation of Model
Workflows:
• Underwriting
–
–
•
New Business
Renewal business
Rating
–
–
Pricing
Coverage Adjustment
Business Implementation of Model
• Strategic Plan
- need management
involvement
• Prepare Announcement/Training Material
for Internal & External Customers
• Coordinate Implementation
• Monitor Feedback/Adjust Implementation
Future Plans
• Determine Process for Updates to
Model
– Use of Updated Data
– Use of New Data Variables
– Use of New Techniques








