Transcript JAMES

Data Quality:
Opportunities, Data, and Examples
1
Data Woes
We are agents of CHANGE!
The Kübler-Ross grief cycle
…roller-coaster ride of activity and passivity as the person wriggles and turns
in their desperate efforts to avoid the change.
2
3
Better and More Data
– Level of analysis
• Take a quick look at what/why use data
• Linking data from disparate and third party sources
– Explore data types
– Typical issues & Tricks
•
•
•
•
Cross validation and sourcing
Reverse Look-up
GIS layering
Backfill from text correlated to codes
– Information from operations
• Text analytics
4
General Organizational Overview
An information business focused on risk taking.
Make. Sell. Serve.
Sales and Distribution
Underwriting
Risk Selection and Pricing
Portfolio Management
Premium Adequacy
Billing and Collections Management
5
Producer Segmentation
Market Planning
Revenue Forecasting
Cross sell and Up sell
Retention and Profitability
Claims
Payment Accuracy
Claim Collaboration
> Fraud Detection
> Subrogation
> Risk Transfer
> 3rd Party Deductible
> Reinsurance Recoverable
Same Problems – Different Lines of Business
• Personal – Auto, HO, Umbrella
• Small Commercial – BOP, CPP
• Middle Market Commercial – CPP w/GL, CP, Crime, CIM,
•
•
•
•
•
6
B&M, WC, Auto
Large Commercial Accounts
Commercial Auto
Workers Comp
Umbrella/Excess
Specialty Lines – D&O, EPL, E&O, Farm, FI
Data Types and Forms
Structured data
Semi-structured data
Unstructured data
Text
Spatial
Pictographic
Graphic
Voice
Video
7
Multiple Data Systems which must be pulled together for
analysis. Great opportunity for cross-validation and
sourcing
Vendors/Partners
Archive,
Legacy Systems
Current System
Claim
Medical Data
- Bill Review
- PPO
- Case Management
- Paradigm
Data
External Data
Policy
Multiple Underwriting Systems
ACTIONS
Multiple States
Billing Systems
Finance Systems
CRM Systems, other data
• Identify Data Systems
• Get right data from right systems
• Overcome internal Organizational Barriers
• Bridge to legacy systems and archived data
• Augment to create rich data mining environment
• Expect the need to negotiate for resources
8
Some typical external data sources and vendors
Dun & Bradstreet
Experian
Bureau of Labor and Statistics
Market Stance
AM Best
Equifax
US Census
Claritas
Melissa Data
ISO
GIS vendors
U&C Data sets
Code Sets for ICD-s and CPT’s
…
9
Data Glitches – historical and on-going
Systemic changes to data not process related
– Changes in data layout / data types
– Changes in scale / format
– Temporary reversion to defaults
– Missing and default values
– Gaps in time series
10
Process Reasons for poor data entry
11
Defining Issues-sample
Source Data
1-Define
Issues
12
Constants
Definition Mismatches
Filler Containing Data
Inconsistent Cases
Inconsistent Data Types
Inconsistent Null Rules
Invalid Keys
Invalid Values
Miscellaneous
Missing Values
Orphans
Out of Range
Pattern Exceptions
Potential Constants
Potential Defaults
Potential Duplicates
Potential Invalids
Potential RedundantValues
Potential Unused Fields
Rule Exceptions
Unused Fields
MORE ISSUES…
Mapping across sources: Same Fact, Different Terms
Data
Element
Concept
Name: Country Identifiers
Context:
Definition:
Unique ID: 5769
Conceptual Domain:
Maintenance Org.:
Steward:
Classification:
Registration Authority:
Others
Algeria
Belgium
China
Denmark
Egypt
France
...
Zimbabwe
Data Elements
Name:
Context:
Definition:
Unique ID: 4572
Value Domain:
Maintenance Org.
Steward:
Classification:
Registration
Authority:
Others
13
Algeria
L`Algérie
DZ
DZA
012
Belgium
Belgique
BE
BEL
056
China
Chine
CN
CHN
156
Denmark
Danemark
DK
DNK
208
Egypt
Egypte
EG
EGY
818
France
La France
FR
FRA
250
...
...
...
...
...
Zimbabwe
Zimbabwe
ZW
ZWE
716
ISO 3166
French Name
ISO 3166
2-Alpha Code
ISO 3166
3-Alpha Code
ISO 3166
3-Numeric Code
ISO 3166
English Name
Data Filling
•
•
•
•
•
14
Manual
Statistical Imputation
Temporal
Spatial
Spatial-temporal
Geographic Hierarchy
15
Deriving Data = Power









16
Totals: Household Income
Trends: Rate of Medical Bill Increases
Ratios: Claims/Premium, Target/Median
Friction: Level of inconvenience, ratio of rental to damage
Sequences: Lawyer-Doctor, Auto-Life Policy
Circumstances: Minimal Impact Severe Trauma
Temporal: Loss shortly after adding collision
Spatial: Distance to Service, proximity of stakeholders
Logged: Progress Notes, Diaries,
 Who did it, When, “Why”
Deriving Data = Power (Cont’d)









17
Behavioral: Deviation from past usage, spike buying
Experience Profiles: Vendor, Doctor, Premium Audit
Channel: How applied, How reported, Service Chain
Legal Jurisdiction: Venue Disposition, Rules
Demographics: Working, Weekly wage, lost income
Firmographics: Industry Class Code Vs Injuries Claimed
Inflation: Wage, Medical, Goods, Auto, COLA
Gov’t Statistics: Crime Rate, Employment, Traffic
Other Stats: Rents, Occupancy, Zoning, Mgd Care
“Search” versus “Discover”
Structured
Data
Unstructured
Data (Text)
18
Search
(goal-oriented)
Discover
(opportunistic)
Data
Retrieval
Data
Mining
Information
Retrieval
Text
Mining
Searching
Input Value
[Jim]
Jimmy
Jim
James
Word Replacement
Lists
Transformed
Input Value
[JAMES]
19
JAMES
JAMES
JAMES
Returns
“Similar Matches”
All Records Found:
Jimmy
Jim
James
Motivation for Text Mining
•
•
Approximately 90% of the world’s data is held in
unstructured formats (source: Oracle Corporation)
Information intensive business processes demand that we
transcend from simple document retrieval to “knowledge”
discovery.
10%
90%
20
Structured Numerical or Coded
Information
Unstructured or Semi-structured
Information
Convergence of Disciplines Example
21
Techniques for attacking text data:
Rules-based
Statistical Text Analysis and Clustering
Linguistic and Semantic Clustering
Support Vector Machines
Pattern Matching or other statistical algorithms
Neural Networks
Combination of methods from above
Text is like a data iceberg
22
Claims processing – Progress notes and Diaries
Service
•Medical Management Staff
•Special Investigation Unit
•NICB
•Vendor Management
•Consulting Engineers
•Hearing Representative
•Structured Settlement Unit
•Recovery Staff
•Legal Staff
23
CLAIMS
ADJUSTER
•Home Office Staff
•Field Office Claim Staff
•Insured Risk Manager
•Agent or Broker
•Diary forward – “call Dr Jones next week”
•Business Rule – large loss review
•System Reminder – update case reserves
•Correspondence Tracking – legal letter sent
Semantic processing:
Named Entity Extraction
• Identify and type language features
• Examples:
• People names
• Company names
• Geographic location names
• Dates
• Monetary amount
• Phone #, zipcodes, SSN, FEIN
• Others… (domain specific)
24
Feedback to UW
Forklift
Hits Ladder
Ladder in
Doorway
Forklift
Couldn’t Stop
Or
No Barrier
Signs
No
Policy
25
Forklift
Brakes
Defective
Cooking Oil
on Floor
Forklift
Going Too
Fast
Brake
Maintenance
Delayed
Housekeeping
Inadequate
Speed Limits
Not Enforced
Lack of
Personnel
No
Enforcement
No
Enforcement
Data Quality:
Opportunities, Data, and Examples
26