Beyond_Opportunity Enterprise Miner

Download Report

Transcript Beyond_Opportunity Enterprise Miner

Beyond Opportunity;
Enterprise Miner
Ronalda Koster, Data Analyst
Agenda

Introduction

SAS EM at Dalhousie University

Exploring SAS EM

Discussion
Introduction
Teaching Assistant with Dalhousie
University
 Analyst, Precision BioLogic Inc.
 Consultant

Informatics at Dalhousie

Informatics


The study of the application of computer and
statistical techniques to the management of
information -HGSC glossary
Dalhousie University
First marketing informatics MBA major in
North America
 The first to use SAS EM for teaching purposes
 Health Informatics program
 New Bachelor of Informatics
Success story

Other courses required for
Informatics major
Multivariate statistics
 Direct marketing
 Marketing research
 Marketing strategy
 Database design
 Internet marketing

Our students

Work for:











Small consulting companies
Large financial institutions
Not for profit organizations
Telecommunications companies
Insurance companies
Hospitals
Loyalty program companies
Travel companies
Oil and gas industry
Publishing houses
A common thing is – they all work with
information
SEMMA Process

Sample


Explore


Transform data, filter outliers, cluster to derive
new variables
Model


View distributions and associations
Modify


Input, partition and sample data
Develop models i.e. Decision tree’s and
Regression
Access

Assess models
Business Problem

Have you ever wanted to
understanding things that occur
together or in sequence?


Market Basket Analysis: Association Node
Broad applications
 Basket
data analysis, cross-marketing,
catalog design, campaign sales analysis
 Web
log (click stream) analysis, DNA
sequence analysis, etc.
Associations Node

Support, probability that a transaction
contains XY


Confidence, conditional probability that a
transaction having X also contains Y


Frequency the combination occurs
Percentage of cases that Y occurs, given that X
has occurred
Sequential Association

Y occurs some time period after X occurs
Associations Node

If a customer purchases Avocado,
then 80% of the time they will
purchase steak
Confidence = 800 / 1,000 = 80%
 Support = 800 / 8,000 = 10%

8,000 transactions
1,000 Avocados
2,000 Steak
800 Avocados & Steak
Avocado
antecedent
Steak
consequent
Business Problem

Have you ever wanted to classify or
segment data on the basis of similar
attributes so that each segment or
cluster differs from another and all
objects within a cluster share traits?


Segmentation: Clustering Node
Broad Applications

Demographic / psychographic
segmentation, campaign segmentation
etc.
Clustering Example
Identify similar objects or groups that
are dissimilar from other clusters
through disjoint cluster analysis on
the basis of Euclidean distances
 Profile clusters graphically within EM
 Use derived segments for further
analysis / algorithms (as an input
variable or a target)
 Customize clusters based on
standardization method, clustering
method and clustering criterion

Business Problem
Have you ever wanted to predict the
likelihood of an event (and assign a
cost to it)?
 Decision tree Node
 Broad Applications


classify observations, predict outcomes
based on decision alternatives.
Decision Tree Example







A flow-chart-like tree structure
Internal node denotes a test on an attribute
Branch represents an outcome of the test
Leaf nodes represent class labels or class
distribution
Handles missing data well
Represent the knowledge in the form of IF-THEN
rules
Decision tree generation consists of two phases


Tree construction
 At start, all the training examples are at the root
 Partition examples recursively based on selected
attributes
Tree pruning
 Identify and remove branches that reflect noise or
outliers
Business Problem
Have you ever wanted to ensure you
target those most likely to purchase
from a campaign whom you’ve never
contacted previously?
 Scoring Node
 Broad applications:


Testing model scalability, applying
learning for subsequent events, etc.
EM Diagram
Lessons learned










Data cleansing and transformation takes most of the time
Data analysis done using EM – interpretable results
Data modeling techniques are very robust
SAS EM works well with huge datasets
Knowledge obtained is transferred easily
Learning never stops – EM reference, tutorial examples
You can analyze almost any kind of data
You can use SAS EM regardless the industry and size of
dataset
You need: a good computer, SAS support, and patience
While not all students use SAS in their careers, the
analytical principles they learn are extremely useful for
their careers
Discussion