Using ORCL as an Oracle
Download
Report
Transcript Using ORCL as an Oracle
Going from Zero to 60 with Oracle
Advanced Analytics
- an intro to how YOU can get gold from your data
Audience
I’m addressing
those who:
Haven’t used Oracle Advanced Analytics or similar products
before (RapidMiner, MatLab, Actian, Weka, etc.)
Want to find out if/how predictive analytics can benefit them
We’ll start at the beginning and point you in the right direction (hopefully)
Image courtesy of marcolm at FreeDigitalPhotos.net
What do I mean “gold?”
Ore > Refining Process > Gold > Value
Data > Patterns > Models > Value
Img: courtesy Mcstrother at wikimedia commons
Where’s the Value?
Answers!
*Just make sure they’re right…
*
Answers to WHAT???
What is the output value given a set of inputs?
What kinds of groups could this data be organized into?
What similarities exist inside this data set?
What is the sentiment
of this sentence?
What inputs to this
model really matter?
What data points in this set
just don’t fit?
When you leave…
Ask yourselves what can YOU do with Advanced Analytics?
Can you start asking some of the types of
questions we’ll ask today?
Can your company benefit from the answers
to those questions?
Try some of these capabilities out in
a sand box:
Use your own enterprise sand box
VirtualBox
apex.oracle.com => this is awesome!
Oracle Advanced Analytics
Available as an Extra Cost add on to the Enterprise Edition
Sorry, ONLY the Enterprise Edition (afaik…)
Integrated with architectures of several Oracle Applications
OBIEE – Generic and customizable integration
HCM Fusion – Employee Performance Predictions
CRM Fusion – Sales Opportunity predictions – what to sell,
when, and for how much
ORCA – Market Basket and Next Offer Analysis
Industry Specific Models – Communications, Airline, etc
The GUI is SQL Developer
We’ll focus on this today
IMG: © Oracle
Getting things running…
Images courtesy of ecee.colorado.edu
…it’s NOT that complicated
Oracle By Example (OBE) has some great
tutorials
Google “Oracle Data Mining 12c OBE
Series” and you’ll find it
These are great “0-60” tutorials that show you
exactly how to get SQL Developer, Oracle
Advanced Analytics, and the Oracle R Extension
up and running.
The steps… (50,000 ft view)
Get your data
Feed it to a model
Tweak it until it’s accurate
Use your model
Does order matter?
Everyone has an opinion…
Lots of Paradigms
KDD (www.kdd.org)
SEMMA (Wikipedia : SEMMA)
Five A’s ( Google SPSS )
CRISP-DM (Wikipedia : Cross Industry Standard Process for Data Mining)
All are similar and contain the phases we’ll talk about today.
Just remember about the NFL…
NFL
No Free Lunch Theorem:
No one algorithm (or defined process) is always better
than another.
“Sometimes one process is better, sometimes it’s not”
“IT DEPENDS”
Common DM Steps
Pre-Process Data
Create a Model
Evaluate Performance
Use the Model
Tune the Model
Steps of Data Mining
Pre-Process
This can be a headache…
Pre-Processing involves
getting your data ready for
analyses.
PL/SQL and SQL can be used to further prepare your
data.
We’ll go over how Oracle Advanced Analytics makes Pre-Processing
fast and easy
Steps of Data Mining
Common Pre-Processing tasks
Get your data (It’s closer than you think)
Format it (Use SQL and PL/SQL)
Sample it
Bin it
Normalize it
“Outlier” it
(ok… no more made up words)
Pre-Processing Overview
Pre-Process Demo
It doesn’t have to be a headache anymore!
Sampling - Sometimes you don’t want it all
Binning – Group numbers into Categories
Normalization – Put data on the same scale
Deal with Outliers
Deal with Missing Values
Sample It (who doesn’t love samples?)
Sometimes you don’t want it all
OAA provides several sampling options
Sampling at the
field level
Sample Size can be % or a given #
Sample Types can be Random, Stratified, or Top N
Sampling creates many smaller data sets from a single big one
Bin it
Binning takes scalar values
(say 0.1 through 99.0)
and groups them into discrete
“bins” or categories
For example : 10 FICOs (-999, 403, 428, 446, 698, 700, 740,782,812,849)
Bin it into 3 categories “Yes”, “No”, “Maybe”:
“No” : -999, 403, 428, 446
“Maybe” : 698, 700, 740
“Yes” : 782,812,849
These bins can be used in algorithms (Models) that can’t work on scalar values
Pre-Processing > Bin data into chunks
Normalize it
This isn’t your 3NF normalization
Normalization means adjusting values measured on different
scales to a common one (usually 0-1)
Example 2 fields called Rate and Amount
Rate has a scale of 1% to 29%
Amount has a scale of -9,999,999 to 9,999,999
A change of .10 in the Rate scale has a bigger impact than a change
of .10 in the Amount scale
OAA has several methods built in (Min Max, Z Score, Linear, and others)
Pre-Processing > Make your data Normal
Outlier It
OAA will detect outliers for you
You can use various definitions of outliers, standard deviations,
percent ranges, and arbitrary value ranges
You can replace outlier values with null, edge values, etc.
Example:
Fico scores usually come in
ranges between “about” 300
to “about” 850, sometimes they
come in as negatives, 999, or
some (seemingly) randomly
generated very large number.
Pre-Processing > Single out the odd ones
Automatic Data Preparation
Some algorithms need data
put into certain formats
OAA has options to prepare this
data for you automatically
OAA supports Binning, Normalization, Missing Value
Replacement, etc.
When testing and applying data to models ADP applies the
same transformations
Create a Model
You have your questions
What kinds of answers do you want?
Answer Type
Generate Grouping or Organization
Discrete Value or Predefined Category
A Number
Free form text details, Comment Sentiment
Anomalies or data sets that aren't normal
Model
Clustering
Classification
Regression
Text Processing
Anomaly Detection
I’ve found 14 different model types
(and sub types) Advanced Analytics
natively has to offer
Steps of Data Mining
Model Types
Clustering – Automated Grouping
Feed a Clustering model data and it will group records into groups and
tell you:
Various groups that exist inside the data you gave it
How are groups different from each other
Why it put any given data point in the group it did
Once you’ve got a model you like, you can use Advanced Analytics to
assign a new data point to a group
Lets use this to segment members (Account Holders)
Phases of Data Mining > Creating Models
Demo 1 : Member Segmentation
Question: What groups do Members (account holders) fall into?
Demos: Product Suggestions
Classification – Supervised Grouping
Similar to Clustering, but you pick the group(s) you
want.
Predicts one column from your dataset by looking at the
other columns.
Lets use this to predict loans more likely to be written off
Phases of Data Mining > Creating Models
Demo 2: Write Off Classification
Question: Given details from
loan applications, which
loans are more likely to be
written off than others?
Demos: Anomaly Detection
Regression – Predicting X because you know Y
Similar to Classification, but the predicted value is a scalar
(number) value not a discreet (group) value.
Attempts to find a function that fits data being given to it
Training Data builds the model, Testing Data sees how good
the model really is
Lets use this to look at a simple Payment Amount Function
Phases of Data Mining > Creating Models
Classic Regression
1200
1000
800
600
Data Point
Linear (Data Point)
400
Poly. (Data Point)
Poly. (Data Point)
200
0
0
2
4
6
8
10
12
14
16
18
-200
-400
Demos: Member Segmentation
Demo 3 : Payment Amount Model
Question: Given details from loan
applications, what payment amount range
can be expected?
Association Discovery – Correlating “stuff”
A.K.A. Market Basket Analysis/Discovery
Give this model data groups and it will output patterns it detects
Examples:
Amazon : “Items Recommended for You”
Netflix : “Movies you Might Like”
Wal-Mart’s classic (and untrue) finding that
people buy Beer and Diapers on Thursdays
Target’s famous (and true) ability to detect
pregnant women based upon purchases
Lets use this to build a
“Next Product Suggestion” model
Phases of Data Mining > Creating Models
Demo 4 : Product Suggestion
Question: Which products commonly go
together?
Anomaly Detection – Find the oddballs
Ever played “One of these things is not like the others?”
This model type finds data points that are outside the norm
Useful for fraud detection
Sorry no demo for this one: Check out YouTube for one though…
(Search for Oracle Advanced Analytics Anomaly Detection)
Text Analysis – Get the Jist
Text strings can be broken down into Tokens and Themes
Example:
“When I started Oracle, what I wanted to do was to create an
environment where I would enjoy working. That was my primary
goal” – Larry Ellison
[started],[Oracle],[wanted],[create],[environment],[enjoy],[working],[primary],[g
oal]
[when],[I],[what],[to],[do],[was],[to],[an],[where],[would],[that],[my]
These can be stemmed using dictionary operations
work, worked, working, works => [work]
Lets use this to get a general satisfaction from surveys
Demo 5 : Comment Sentiment
Question: What is the sentiment of
comments given in feedback surveys?
Maintaining your Model
Models will get old and out dated as they “age”
New data should be added and the model
reprocessed
If the data structure changes or new fields
are used, reprocess your model
Phases of Data Mining > Maintain the Model
Questions?
What will your next steps be?
What questions can you ask?
Check out YouTube for some great tutorials!
Oracle Docs:
Oracle® Data Mining Concepts
Credits
All images, where not attributed, courtesy of
istockphoto.com or are otherwise used with permission.
Attributed images are copyright © or trademark (TM) of
their respective owner.
No sponsorship or endorsement shall be implied by the
educational fair use of these images.