Beyond_Opportunity Enterprise Miner
Download
Report
Transcript Beyond_Opportunity Enterprise Miner
Beyond Opportunity;
Enterprise Miner
Ronalda Koster, Data Analyst
Agenda
Introduction
SAS EM at Dalhousie University
Exploring SAS EM
Discussion
Introduction
Teaching Assistant with Dalhousie
University
Analyst, Precision BioLogic Inc.
Consultant
Informatics at Dalhousie
Informatics
The study of the application of computer and
statistical techniques to the management of
information -HGSC glossary
Dalhousie University
First marketing informatics MBA major in
North America
The first to use SAS EM for teaching purposes
Health Informatics program
New Bachelor of Informatics
Success story
Other courses required for
Informatics major
Multivariate statistics
Direct marketing
Marketing research
Marketing strategy
Database design
Internet marketing
Our students
Work for:
Small consulting companies
Large financial institutions
Not for profit organizations
Telecommunications companies
Insurance companies
Hospitals
Loyalty program companies
Travel companies
Oil and gas industry
Publishing houses
A common thing is – they all work with
information
SEMMA Process
Sample
Explore
Transform data, filter outliers, cluster to derive
new variables
Model
View distributions and associations
Modify
Input, partition and sample data
Develop models i.e. Decision tree’s and
Regression
Access
Assess models
Business Problem
Have you ever wanted to
understanding things that occur
together or in sequence?
Market Basket Analysis: Association Node
Broad applications
Basket
data analysis, cross-marketing,
catalog design, campaign sales analysis
Web
log (click stream) analysis, DNA
sequence analysis, etc.
Associations Node
Support, probability that a transaction
contains XY
Confidence, conditional probability that a
transaction having X also contains Y
Frequency the combination occurs
Percentage of cases that Y occurs, given that X
has occurred
Sequential Association
Y occurs some time period after X occurs
Associations Node
If a customer purchases Avocado,
then 80% of the time they will
purchase steak
Confidence = 800 / 1,000 = 80%
Support = 800 / 8,000 = 10%
8,000 transactions
1,000 Avocados
2,000 Steak
800 Avocados & Steak
Avocado
antecedent
Steak
consequent
Business Problem
Have you ever wanted to classify or
segment data on the basis of similar
attributes so that each segment or
cluster differs from another and all
objects within a cluster share traits?
Segmentation: Clustering Node
Broad Applications
Demographic / psychographic
segmentation, campaign segmentation
etc.
Clustering Example
Identify similar objects or groups that
are dissimilar from other clusters
through disjoint cluster analysis on
the basis of Euclidean distances
Profile clusters graphically within EM
Use derived segments for further
analysis / algorithms (as an input
variable or a target)
Customize clusters based on
standardization method, clustering
method and clustering criterion
Business Problem
Have you ever wanted to predict the
likelihood of an event (and assign a
cost to it)?
Decision tree Node
Broad Applications
classify observations, predict outcomes
based on decision alternatives.
Decision Tree Example
A flow-chart-like tree structure
Internal node denotes a test on an attribute
Branch represents an outcome of the test
Leaf nodes represent class labels or class
distribution
Handles missing data well
Represent the knowledge in the form of IF-THEN
rules
Decision tree generation consists of two phases
Tree construction
At start, all the training examples are at the root
Partition examples recursively based on selected
attributes
Tree pruning
Identify and remove branches that reflect noise or
outliers
Business Problem
Have you ever wanted to ensure you
target those most likely to purchase
from a campaign whom you’ve never
contacted previously?
Scoring Node
Broad applications:
Testing model scalability, applying
learning for subsequent events, etc.
EM Diagram
Lessons learned
Data cleansing and transformation takes most of the time
Data analysis done using EM – interpretable results
Data modeling techniques are very robust
SAS EM works well with huge datasets
Knowledge obtained is transferred easily
Learning never stops – EM reference, tutorial examples
You can analyze almost any kind of data
You can use SAS EM regardless the industry and size of
dataset
You need: a good computer, SAS support, and patience
While not all students use SAS in their careers, the
analytical principles they learn are extremely useful for
their careers
Discussion