CAB Algorithms Presentation

Download Report

Transcript CAB Algorithms Presentation

BIWA SIG Wednesday TechCast Series
START TIME: 12 NOON Eastern
Data Mining Made Easy!
Introducing Oracle Data Miner 11g Release 2
New "Workflow" GUI
Charlie Berger - Oracle
AUDIO DIAL-IN NUMBERS
US Toll-Free Number: 866 682 4770
Conference ID: 1683901
Security Code: 334451
International Toll-Free Numbers:
http://www.intercall.com/oracle/access_numbers.htm
Copyright 2010 Oracle Corporation
BIWA SIG Wednesday TechCast Series
• Welcome to BIWA’s 21st TechCast!
• Visit www.oraclebiwa.org for updates
on our future TechCasts
• Future TechCasts will include top-rated
presentations from COLLABORATE 10 – IOUG
Forum’s BIWA Training Days
• Everyone invited
to present!
Copyright 2010 Oracle Corporation
Oracle BIWA SIG Basics
• Worldwide association of 2000+ professionals interested
in Oracle Database-centric BI, data warehousing, and
analytical products, features and options.
• Membership is still FREE!
• Open forum to foster success in use and development of
Oracle BIWA products.
• BIWA’s goals include sharing best practices and novel
and interesting use cases of Oracle BIWA-centric
technology
• Search the BIWA knowledgebase of past presentations/TechCasts
• See Mission Statement and Charter at oraclebiwa.org.
• National conferences in 2007, 2008, 2010
Copyright 2010 Oracle Corporation
BIWA Training Days
at COLLABORATE 10 - IOUG Forum
April 18-22, 2010 Las Vegas, NV
• COLLABORATE 10 – IOUG, OAUG, Quest:
5,000 attendees, 200+ Exhibits
• BIWA presented a conference within a conference called
“Get Analytical with BIWA Training Days”
• Hands on Labs, BI Boot Camp, BI Deep Dives, Reception
60+ Sessions with topics covering:
• Data Warehousing:
•
•
•
•
•
•
•
Optimizer, Partitioning, ETL
OBIEE
Oracle Data Mining
OLAP
Essbase
Data Visualization
BI Publisher
Copyright 2010 Oracle Corporation
SUBMITTING a BIWA TechCast
• Any Oracle user or professional may submit
abstracts for 45-min webcasts to IOUG Oracle BIWA SIG
Community (Visit: www.oraclebiwa.org)
• Audience is technical
• Presenters are encouraged to include a significant amount of
technical detail.
• Live demos are strongly encouraged
Copyright 2010 Oracle Corporation
Today’s Presenter: Charlie Berger
• Senior Director for Data Mining Technologies at Oracle
• Heads product management for Oracle Database's data
mining and predictive analytics technology:
• Oracle Data Mining
• Text mining
• Statistical functions
• Winner of IOUG’s 2010 Ken Jacobs Award for User
Community Contribution
• 20+ years experience in Data Mining and Analytics
in technology leaders including Thinking Machines
and BBN Software
Copyright 2010 Oracle Corporation
Data Mining Made Easy!
Introducing Oracle Data Miner 11g Release 2 New "Workflow" GUI
Charlie Berger
Sr. Director Product Management, Data Mining Technologies
Oracle Corporation
[email protected]
www.twitter.com/CharlieDataMine
Copyright 2010 Oracle Corporation
The following is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
Copyright 2010 Oracle Corporation
Oracle Data Miner 11gR2 New
• Workflow
• Multi-attribute
statistics and graphs
• Multiple model build
and evaluation
• SQL Developer
• SQL model deploy
• Ability to save/share
analytical workflows
• Beta release available
by OOW 2010 beta
Copyright 2010 Oracle Corporation
Oracle Data Miner “Classic”
• Mining Activities
• Wizards driven
• Univariate statistics
and graphs
• Single model build
and evaluation
• Available now for
10g through 11gR2
Copyright 2010 Oracle Corporation
Oracle Data Miner 11gR2 Availability
• Customers
• Beta release to coincide with SQL Developer 3.0 release—
OOW 2010
• Download from OTN
• Watch ODM OTN pages, ODM Blog and
Twitter for announcements
• Internal Oracle Employees
• Available to download GUI and use
hosted environment for customer demos
• Contact Product Management, [email protected]
• Available on Amazon Cloud under special arrangement
Copyright 2010 Oracle Corporation
What is Data Mining?
• Automatically sifts through data to
find hidden patterns, discover new insights,
and make predictions
• Data Mining can provide valuable results:
•
•
•
•
Predict customer behavior (Classification)
Predict or estimate a value (Regression)
Segment a population (Clustering)
Identify factors more associated with a business
problem (Attribute Importance)
• Find profiles of targeted people or items (Decision Trees)
• Determine important relationships and “market baskets”
within the population (Associations)
• Find fraudulent or “rare events” (Anomaly Detection)
Copyright 2010 Oracle Corporation
Analytics: Strategic and Mission Critical
• Competing on Analytics, by Tom Davenport
• “Some companies have built their very businesses
on their ability to collect, analyze, and act on data.”
• “Although numerous organizations are embracing analytics, only a
handful have achieved this level of proficiency. But analytics
competitors are the leaders in their varied fields—consumer products
finance, retail, and travel and entertainment among them.”
• “Organizations are moving beyond query and reporting”
- IDC 2006
• Super Crunchers, by Ian Ayers
• “In the past, one could get by on intuition and experience.
Times have changed. Today, the name of the game is data.”
—Steven D. Levitt, author of Freakonomics
• “Data-mining and statistical analysis have suddenly become
cool.... Dissecting marketing, politics, and even sports, stuff this
complex and important shouldn't be this much fun
to read.” —Wired
Copyright 2010 Oracle Corporation
• 11 years “stem celling analytics” into Oracle
• Designed advanced analytics into database kernel to leverage relational
database strengths
• Naïve Bayes and Association Rules—1st algorithms added
• Leverages counting, conditional probabilities, and much more
• Now, analytical database platform
• 12 cutting edge machine learning algorithms and 50+ statistical functions
• A data mining model is a schema object in the database, built via a PL/SQL API
and scored via built-in SQL functions.
• When building models, leverage existing scalable technology
• (e.g., parallel execution, bitmap indexes, aggregation techniques) and add new core
database technology (e.g., recursion within the parallel infrastructure, IEEE float, etc.)
• True power of embedding within the database is evident when scoring models
using built-in SQL functions (incl. Exadata)
select cust_id
from customers
where region = ‘US’
and prediction_probability(churnmod, ‘Y’ using *) > 0.8;
Copyright 2010 Oracle Corporation
The Forrester Wave™: Predictive Analytics And
Data Mining Solutions, Q1 2010
Oracle Data Mining Cited as a Leader; 2nd place in Current Offering
• Ranks 2nd place in
Current Offering
• “Oracle focuses on indatabase mining in the
Oracle Database, on
integration of Oracle Data
Mining into the kernel of
that database, and on
leveraging that technology
in Oracle’s branded
applications.”
The Forrester Wave is copyrighted by Forrester Research, Inc. Forrester and Forrester Wave are trademarks of Forrester Research, Inc. The Forrester Wave is a
graphical representation of Forrester's call on a market and is plotted using a detailed spreadsheet with exposed scores, weightings, and comments. Forrester
does not endorse any vendor, product, or service depicted in the Forrester Wave. Information is based on best available resources. Opinions reflect judgment at
the time and are subject to change.
Copyright 2010 Oracle Corporation
In-Database Data Mining
Traditional Analytics
Oracle Data Mining
Results
Data Import
Data Mining
Model “Scoring”
Data Preparation
and
Transformation
Savings
Data Mining
Model Building
Data Prep &
Transformation
Model “Scoring”
Data remains in the Database
Embedded data preparation
Data Extraction
Cutting edge machine learning
algorithms inside the SQL kernel of
Database
Model “Scoring”
Embedded Data Prep
Model Building
Data Preparation
Hours, Days or Weeks
Source
Data
• Faster time for
“Data” to “Insights”
• Lower TCO—Eliminates
• Data Movement
• Data Duplication
• Maintains Security
SAS
Work
Area
SAS
Process
ing
Process
Output
SAS
SAS
SAS
Target
Secs, Mins or Hours
SQL—Most powerful language for data
preparation and transformation
Data remains in the Database
Copyright 2010 Oracle Corporation
Oracle Data Mining Algorithms
Problem
Algorithm
Classification
Logistic Regression (GLM)
Decision Trees
Naïve Bayes
Support Vector Machine
Multiple Regression (GLM)
Support Vector Machine
Regression
Anomaly
Detection
Attribute
Importance
Association
Rules
Clustering
Feature
Extraction
One Class SVM
Minimum Description
Length (MDL)
A1 A2 A3 A4 A5 A6 A7
Apriori
Hierarchical K-Means
Hierarchical O-Cluster
NMF
F1 F2 F3 F4
Copyright 2010 Oracle Corporation
Applicability
Classical statistical technique
Popular / Rules / transparency
Embedded app
Wide / narrow data / text
Classical statistical technique
Wide / narrow data / text
Lack examples
Attribute reduction
Identify useful data
Reduce data noise
Market basket analysis
Link analysis
Product grouping
Text mining
Gene and protein analysis
Text analysis
Feature reduction
Oracle Data Mining + Exadata
• In 11gR2, SQL predicates and Oracle Data Mining models are
pushed to storage level for execution
For example, find the US customers likely to churn:
select cust_id
from customers
Scoring function executed in Exadata
where region = ‘US’
and prediction_probability(churnmod,‘Y’ using *) > 0.8;
Copyright
2010 Oracle June
Corporation
Company
Confidential
2009
Predictive Analytics Applications
Powered by Oracle Data Mining
CRM OnDemand—Sales Prospector
(Partial List as of March 2010)
Oracle Communications Data Model
Oracle Open World - Schedule Builder
Oracle Retail Data Model
Spend Classification
Copyright 2010 Oracle Corporation
Example: Simple, Predictive SQL
Select customers who are more than 85% likely to be HIGH VALUE
customers & display their AGE & MORTGAGE_AMOUNT
SELECT * from(
SELECT A.CUST_ID, A.AGE,
MORTGAGE_AMOUNT,PREDICTION_PROBABILITY
(CUST_INSUR_LT46939_DT, 'VERY HIGH'
USING A.*) prob
FROM CBERGER.CUST_INSUR_LTV A)
WHERE prob > 0.85;
Copyright 2010 Oracle Corporation
Fraud Prediction Demo
drop table CLAIMS_SET;
exec dbms_data_mining.drop_model('CLAIMSMODEL');
create table CLAIMS_SET (setting_name varchar2(30), setting_value varchar2(4000));
insert into CLAIMS_SET values
('ALGO_NAME','ALGO_SUPPORT_VECTOR_MACHINES');
insert into CLAIMS_SET values ('PREP_AUTO','ON');
commit;
begin
dbms_data_mining.create_model('CLAIMSMODEL', 'CLASSIFICATION',
'CLAIMS2', 'POLICYNUMBER', null, 'CLAIMS_SET');
end;
/
-- Top 5 most suspicious fraud policy holder claims
select * from
(select POLICYNUMBER, round(prob_fraud*100,2) percent_fraud,
rank() over (order by prob_fraud desc) rnk from
(select POLICYNUMBER, prediction_probability(CLAIMSMODEL, '0' using *) prob_fraud
from CLAIMS2
where PASTNUMBEROFCLAIMS in ('2 to 4', 'more than 4')))
where rnk <= 5
order by percent_fraud desc;
Copyright 2010 Oracle Corporation
POLICYNUMBER
PERCENT_FRAUD
RNK
------------
-------------
----------
6532
64.78
1
2749
64.17
2
3440
63.22
3
654
63.1
4
12650
62.36
5
Real-time Prediction
with
records as (select
78000 SALARY,
On-the-fly, single record
250000 MORTGAGE_AMOUNT,
apply with new data (e.g.
6 TIME_AS_CUSTOMER,
12 MONTHLY_CHECKS_WRITTEN,
from call center)
55 AGE,
423 BANK_FUNDS,
'Married' MARITAL_STATUS,
'Nurse' PROFESSION,
'M' SEX,
4000 CREDIT_CARD_LIMITS,
2 N_OF_DEPENDENTS,
1 HOUSE_OWNERSHIP from dual)
select s.prediction prediction, s.probability probability
from (
select PREDICTION_SET(CUST_INSUR_LT46939_DT, 1 USING *) pset
from records) t, TABLE(t.pset) s;
Copyright 2010 Oracle Corporation
Integration with Oracle BI EE
Oracle Data Mining results
available to Oracle BI EE
administrators
Oracle BI EE defines
results for end user
presentation
Copyright 2010 Oracle Corporation
Example
Better Information for OBI EE Reports and Dashboards
ODM’s
predictions &
Predictions
probabilities are
available in the
Database for
reporting
Oracle
BI using
EE
Oracle
BI EE
and
other
and othertools
tools
reporting
Copyright 2010 Oracle Corporation
Demo(s)
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
Getting Started
Copyright 2010 Oracle Corporation
Oracle By Example—Online Course
Copyright 2010 Oracle Corporation
Cue Cards—Just-in-time Assistance
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
Additional Information
•
•
•
•
•
•
•
•
•
•
•
Preview of the new Oracle Data Miner 11g R2 “work flow” New GUI
Oracle Data Mining 11gR2 presentation at Oracle Open World 2009
Oracle Data Mining Blog
Funny YouTube video that features Oracle Data Mining
Oracle Data Mining on the Amazon Cloud
Oracle Data Mining 11gR2 data sheet
Oracle Data Mining 11gR2 white paper
New TechCast (audio and video recording): ODM overview and several demos
Fraud and Anomaly Detection using Oracle Data Mining 11g presentation
Algorithm technical summary with links to Documentation
Getting Started w/ ODM page w/ instructions to download
•
•
•
•
•
•
Oracle Data Miner graphical user interface (GUI),
ODM Step-by-Step Tutorial
Demo datasets
ODM Discussion Forum on OTN (great for posting questions/answers)
ODM 11g Sample Code (examples of ODM SQL and Java APIs applied in several use cases; great for
developers)
Oracle’s 50+ SQL based statistical functions (t-test, ANOVA, Pearson’s, etc.)
Oracle Data Mining
Copyright 2010 Oracle Corporation
Copyright 2010 Oracle Corporation
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”