Introduction to Pattern Discovery

Download Report

Transcript Introduction to Pattern Discovery

Chapter 8: Introduction to Pattern Discovery
8.1 Introduction
8.2 Cluster Analysis
8.3 Market Basket Analysis (Self-Study)
1
Chapter 8: Introduction to Pattern Discovery
8.1 Introduction
8.2 Cluster Analysis
8.3 Market Basket Analysis (Self-Study)
2
Pattern Discovery
The Essence of Data Mining?
“…the discovery of interesting,
unexpected, or valuable
structures in large data sets.”
– David Hand
3 3
...
Pattern Discovery
The Essence of Data Mining?
“…the discovery of interesting,
unexpected, or valuable
structures in large data sets.”
– David Hand
“If you’ve got terabytes of data, and you’re
relying on data mining to find interesting
things in there for you, you’ve lost before
you’ve even begun.”
– Herb Edelstein
4 4
Pattern Discovery Caution






5 5
Poor data quality
Opportunity
Interventions
Separability
Obviousness
Non-stationarity
Pattern Discovery Applications
Data reduction
Novelty detection
Profiling
Market basket analysis
A C
B
6 6
Sequence analysis
...
Pattern Discovery Tools
Data reduction
Novelty detection
Profiling
Market basket analysis
A C
B
7 7
Sequence analysis
...
Pattern Discovery Tools
Data reduction
Novelty detection
Profiling
Market basket analysis
A C
B
8 8
Sequence analysis
Chapter 8: Introduction to Pattern Discovery
8.1 Introduction
8.2 Cluster Analysis
8.3 Market Basket Analysis (Self-Study)
9
Unsupervised Classification
inputs
grouping
cluster 1
cluster 2
cluster 3
cluster 1
Unsupervised classification:
grouping of cases based on
similarities in input values.
cluster 2
1010
...
Unsupervised Classification
inputs
grouping
cluster 1
cluster 2
cluster 3
cluster 1
Unsupervised classification:
grouping of cases based on
similarities in input values.
cluster 2
1111
...
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Re-assign cases.
6. Repeat steps 4 and 5
until convergence.
1212
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Re-assign cases.
6. Repeat steps 4 and 5
until convergence.
1313
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5
until convergence.
1414
...
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5
until convergence.
1515
...
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5
until convergence.
1616
...
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5
until convergence.
1717
...
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5
until convergence.
1818
...
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5
until convergence.
1919
...
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5
until convergence.
2020
...
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5
until convergence.
2121
...
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5
until convergence.
2222
...
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5
until convergence.
2323
...
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5
until convergence.
2424
...
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5
until convergence.
2525
...
k-means Clustering Algorithm
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest
center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5
until convergence.
2626
...
Segmentation Analysis
Training Data
When no clusters exist,
use the k-means
algorithm to partition
cases into contiguous
groups.
2727
Demographic Segmentation Demonstration
Analysis goal:
Group geographic regions into segments based on
income, household size, and population density.
Analysis plan:




2828
Select and transform segmentation inputs.
Select the number of segments to create.
Create segments with the Cluster tool.
Interpret the segments.
Segmenting Census Data
This demonstration introduces
SAS Enterprise Miner tools and techniques
for cluster and segmentation analysis.
2929
Exploring and Filtering
Analysis Data
This demonstration introduces
SAS Enterprise Miner tools and
techniques that explore and filter
analysis data, particularly data source
exploration and case filtering.
3030
Setting Cluster Tool Options
This demonstration illustrates how to use
the Cluster tool to segment the cases
in the CENSUS2000 data set.
3131
Creating Clusters
with the Cluster Tool
This demonstration illustrates how the
Cluster tool determines the number
of clusters in the data.
3232
Specifying the
Segment Count
This demonstration illustrates how you
can change the number of clusters created
by the Cluster node.
3333
Exploring Segments
This demonstration illustrates how to use
graphical aids to explore the segments.
34
Profiling Segments
This demonstration illustrates using the
Segment Profile tool to interpret the
composition of clusters.
35
Exercises
This exercise reinforces the concepts
discussed previously.
3636
Chapter 8: Introduction to Pattern Discovery
8.1 Introduction
8.2 Cluster Analysis
8.3 Market Basket Analysis (Self-Study)
37
Market Basket Analysis
A B C
3838
A C D
B C D
A D E
B C E
Rule
Support
Confidence
AD
CA
AC
B&CD
2/5
2/5
2/5
1/5
2/3
2/4
2/3
1/3
...
Market Basket Analysis
A B C
3939
A C D
B C D
A D E
B C E
Rule
Support
Confidence
AD
CA
AC
B&CD
2/5
2/5
2/5
1/5
2/3
2/4
2/3
1/3
...
Implication?
Checking Account
No
Yes
No
500
3500
4,000
Yes
1000
5000
6,000
Savings
Account
Support(SVG  CK) = 50%
Confidence(SVG  CK) = 83%
Expected Confidence(SVG  CK) = 85%
Lift(SVG  CK) = 0.83/0.85 < 1
4040
10,000
Barbie Doll  Candy
1.
2.
3.
4.
5.
6.
7.
8.
4141
Put them closer together in the store.
Put them far apart in the store.
Package candy bars with the dolls.
Package Barbie + candy + poorly selling item.
Raise the price on one, and lower it on the other.
Offer Barbie accessories for proofs of purchase.
Do not advertise candy and Barbie together.
Offer candies in the shape of a Barbie doll.
Data Capacity
D
A
4242
A
A
A
B C
B B
A
D
A
Association Tool Demonstration
Analysis goal:
Explore associations between retail banking
services used by customers.
Analysis plan:
 Create an association data source.
 Run an association analysis.
 Interpret the association rules.
 Run a sequence analysis.
 Interpret the sequence rules.
4343
Market Basket Analysis
This demonstration illustrates how to conduct
market basket analysis.
44
Sequence Analysis
This demonstration illustrates how to conduct
a sequence analysis.
45
Pattern Discovery Tools: Review
Generate cluster models using automatic
settings and segmentation models with
user-defined settings.
Compare within-segment distributions of
selected inputs to overall distributions. This
helps you understand segment definition.
Conduct market basket and sequence analysis
on transactions data. A data source must have
one target, one ID, and (if desired) one
sequence variable in the data source.
4646