Using ORCL as an Oracle

Download Report

Transcript Using ORCL as an Oracle

Going from Zero to 60 with Oracle
Advanced Analytics
- an intro to how YOU can get gold from your data
Audience
 I’m addressing
those who:
 Haven’t used Oracle Advanced Analytics or similar products
before (RapidMiner, MatLab, Actian, Weka, etc.)
 Want to find out if/how predictive analytics can benefit them
We’ll start at the beginning and point you in the right direction (hopefully)
Image courtesy of marcolm at FreeDigitalPhotos.net
What do I mean “gold?”
 Ore > Refining Process > Gold > Value
 Data > Patterns > Models > Value
Img: courtesy Mcstrother at wikimedia commons
Where’s the Value?
Answers!
*Just make sure they’re right…
*
Answers to WHAT???
 What is the output value given a set of inputs?
 What kinds of groups could this data be organized into?
 What similarities exist inside this data set?
 What is the sentiment
of this sentence?
 What inputs to this
model really matter?
 What data points in this set
just don’t fit?
When you leave…
 Ask yourselves what can YOU do with Advanced Analytics?
 Can you start asking some of the types of
questions we’ll ask today?
 Can your company benefit from the answers
to those questions?
 Try some of these capabilities out in
a sand box:
 Use your own enterprise sand box
 VirtualBox
 apex.oracle.com => this is awesome!
Oracle Advanced Analytics
 Available as an Extra Cost add on to the Enterprise Edition
 Sorry, ONLY the Enterprise Edition (afaik…)
 Integrated with architectures of several Oracle Applications
 OBIEE – Generic and customizable integration
 HCM Fusion – Employee Performance Predictions
 CRM Fusion – Sales Opportunity predictions – what to sell,
when, and for how much
 ORCA – Market Basket and Next Offer Analysis
 Industry Specific Models – Communications, Airline, etc
 The GUI is SQL Developer
We’ll focus on this today
IMG: © Oracle
Getting things running…
Images courtesy of ecee.colorado.edu
…it’s NOT that complicated
 Oracle By Example (OBE) has some great
tutorials
 Google “Oracle Data Mining 12c OBE
Series” and you’ll find it
 These are great “0-60” tutorials that show you
exactly how to get SQL Developer, Oracle
Advanced Analytics, and the Oracle R Extension
up and running.
The steps… (50,000 ft view)
Get your data
Feed it to a model
Tweak it until it’s accurate
Use your model
Does order matter?
 Everyone has an opinion…
 Lots of Paradigms
 KDD (www.kdd.org)
 SEMMA (Wikipedia : SEMMA)
 Five A’s ( Google SPSS )
 CRISP-DM (Wikipedia : Cross Industry Standard Process for Data Mining)
All are similar and contain the phases we’ll talk about today.
Just remember about the NFL…
NFL
 No Free Lunch Theorem:
No one algorithm (or defined process) is always better
than another.
“Sometimes one process is better, sometimes it’s not”
“IT DEPENDS”
Common DM Steps
Pre-Process Data
Create a Model
Evaluate Performance
Use the Model
Tune the Model
Steps of Data Mining
Pre-Process
 This can be a headache…
 Pre-Processing involves
getting your data ready for
analyses.
 PL/SQL and SQL can be used to further prepare your
data.
We’ll go over how Oracle Advanced Analytics makes Pre-Processing
fast and easy
Steps of Data Mining
Common Pre-Processing tasks
 Get your data (It’s closer than you think)
 Format it (Use SQL and PL/SQL)
 Sample it
 Bin it
 Normalize it
 “Outlier” it
(ok… no more made up words)
Pre-Processing Overview
Pre-Process Demo
 It doesn’t have to be a headache anymore!
 Sampling - Sometimes you don’t want it all
 Binning – Group numbers into Categories
 Normalization – Put data on the same scale
 Deal with Outliers
 Deal with Missing Values
Sample It (who doesn’t love samples?)
 Sometimes you don’t want it all
 OAA provides several sampling options
Sampling at the
field level
 Sample Size can be % or a given #
 Sample Types can be Random, Stratified, or Top N
 Sampling creates many smaller data sets from a single big one
Bin it
 Binning takes scalar values
(say 0.1 through 99.0)
and groups them into discrete
“bins” or categories
 For example : 10 FICOs (-999, 403, 428, 446, 698, 700, 740,782,812,849)
 Bin it into 3 categories “Yes”, “No”, “Maybe”:
 “No” : -999, 403, 428, 446
 “Maybe” : 698, 700, 740
 “Yes” : 782,812,849
 These bins can be used in algorithms (Models) that can’t work on scalar values
Pre-Processing > Bin data into chunks
Normalize it
 This isn’t your 3NF normalization
 Normalization means adjusting values measured on different
scales to a common one (usually 0-1)
 Example 2 fields called Rate and Amount
 Rate has a scale of 1% to 29%
 Amount has a scale of -9,999,999 to 9,999,999
A change of .10 in the Rate scale has a bigger impact than a change
of .10 in the Amount scale
OAA has several methods built in (Min Max, Z Score, Linear, and others)
Pre-Processing > Make your data Normal
Outlier It
 OAA will detect outliers for you
 You can use various definitions of outliers, standard deviations,
percent ranges, and arbitrary value ranges
 You can replace outlier values with null, edge values, etc.
 Example:
Fico scores usually come in
ranges between “about” 300
to “about” 850, sometimes they
come in as negatives, 999, or
some (seemingly) randomly
generated very large number.
Pre-Processing > Single out the odd ones
Automatic Data Preparation
 Some algorithms need data
put into certain formats
 OAA has options to prepare this
data for you automatically
 OAA supports Binning, Normalization, Missing Value
Replacement, etc.
 When testing and applying data to models ADP applies the
same transformations
Create a Model
 You have your questions
 What kinds of answers do you want?
Answer Type
Generate Grouping or Organization
Discrete Value or Predefined Category
A Number
Free form text details, Comment Sentiment
Anomalies or data sets that aren't normal
Model
Clustering
Classification
Regression
Text Processing
Anomaly Detection
I’ve found 14 different model types
(and sub types) Advanced Analytics
natively has to offer
Steps of Data Mining
Model Types
Clustering – Automated Grouping
 Feed a Clustering model data and it will group records into groups and
tell you:
 Various groups that exist inside the data you gave it
 How are groups different from each other
 Why it put any given data point in the group it did
 Once you’ve got a model you like, you can use Advanced Analytics to
assign a new data point to a group
 Lets use this to segment members (Account Holders)
Phases of Data Mining > Creating Models
Demo 1 : Member Segmentation
 Question: What groups do Members (account holders) fall into?
Demos: Product Suggestions
Classification – Supervised Grouping
 Similar to Clustering, but you pick the group(s) you
want.
 Predicts one column from your dataset by looking at the
other columns.
 Lets use this to predict loans more likely to be written off
Phases of Data Mining > Creating Models
Demo 2: Write Off Classification
 Question: Given details from
loan applications, which
loans are more likely to be
written off than others?
Demos: Anomaly Detection
Regression – Predicting X because you know Y
 Similar to Classification, but the predicted value is a scalar
(number) value not a discreet (group) value.
 Attempts to find a function that fits data being given to it
 Training Data builds the model, Testing Data sees how good
the model really is
 Lets use this to look at a simple Payment Amount Function
Phases of Data Mining > Creating Models
Classic Regression
1200
1000
800
600
Data Point
Linear (Data Point)
400
Poly. (Data Point)
Poly. (Data Point)
200
0
0
2
4
6
8
10
12
14
16
18
-200
-400
Demos: Member Segmentation
Demo 3 : Payment Amount Model
 Question: Given details from loan
applications, what payment amount range
can be expected?
Association Discovery – Correlating “stuff”
 A.K.A. Market Basket Analysis/Discovery
 Give this model data groups and it will output patterns it detects
Examples:
 Amazon : “Items Recommended for You”
 Netflix : “Movies you Might Like”
 Wal-Mart’s classic (and untrue) finding that
people buy Beer and Diapers on Thursdays
 Target’s famous (and true) ability to detect
pregnant women based upon purchases
 Lets use this to build a
“Next Product Suggestion” model
Phases of Data Mining > Creating Models
Demo 4 : Product Suggestion
 Question: Which products commonly go
together?
Anomaly Detection – Find the oddballs
 Ever played “One of these things is not like the others?”
 This model type finds data points that are outside the norm
 Useful for fraud detection
 Sorry no demo for this one: Check out YouTube for one though…
(Search for Oracle Advanced Analytics Anomaly Detection)
Text Analysis – Get the Jist
 Text strings can be broken down into Tokens and Themes
 Example:
“When I started Oracle, what I wanted to do was to create an
environment where I would enjoy working. That was my primary
goal” – Larry Ellison
[started],[Oracle],[wanted],[create],[environment],[enjoy],[working],[primary],[g
oal]
[when],[I],[what],[to],[do],[was],[to],[an],[where],[would],[that],[my]
 These can be stemmed using dictionary operations
work, worked, working, works => [work]
 Lets use this to get a general satisfaction from surveys
Demo 5 : Comment Sentiment
 Question: What is the sentiment of
comments given in feedback surveys?
Maintaining your Model
 Models will get old and out dated as they “age”
 New data should be added and the model
reprocessed
 If the data structure changes or new fields
are used, reprocess your model
Phases of Data Mining > Maintain the Model
Questions?
 What will your next steps be?
 What questions can you ask?
 Check out YouTube for some great tutorials!
 Oracle Docs:
 Oracle® Data Mining Concepts
Credits
 All images, where not attributed, courtesy of
istockphoto.com or are otherwise used with permission.
 Attributed images are copyright © or trademark (TM) of
their respective owner.
 No sponsorship or endorsement shall be implied by the
educational fair use of these images.