FAssociationRules

Download Report

Transcript FAssociationRules

Association Rules
Spring 2010
Data Mining: What is it?
 Two definitions:
 The first one, classic and well-known, says
that data mining is the nontrivial extraction of
implicit, previously unknown, and potentially
useful information from data (W. Frawley)
 The second one, spark and rebel, says that
data mining is nothing else than torturing the
data until it confesses… and if you torture it
enough, you can get it to confess to anything
(Fred Menger).
Data mining techniques




Association Rules
Classification
Prediction
Clustering
What is Association
mining?
Finding frequent patterns, associations, or casual structures
among sets of items or objects in transaction databases, relational
databases, and other information repositories.
Searches for relationships between variables. For example a
supermarket might gather data on customer purchasing habits.
Using association rule learning, the supermarket can determine
which products are frequently bought together and use this
information for marketing purposes. This is sometimes referred to
as market basket analysis.
Applications
Basket Data Analysis
Cross-marketing
Catalog design
…
Introduction to Association Rules
(AR)
 Ideas came from the market basket
analysis (MBA)
 What do customers buy?
 Which products are bought together?
AIM:
 Find association and correlations
between the different items that
customers place in their shopping basket.
Some definitions in AR
Pattern:
 A particular data behavior, arrangement
or form that might be of a business
interest
Itemset:
 A set of items, a group of elements that
represents together a single entity. It is
actually a type of pattern.
Some definitions in AR
(cntd.)
 Transaction database T
 A set of transactions T = {T1 , T2,…, Tn }
 Itemset
 Each transaction contains a set of items
I(Itemset)
 An Itemset is a collection of items I = {I1, I2,
..., In}
AR General Aim
 Find frequent/interesting patterns,
associations, correlations, or casual
structures among set of items or
elements in databases or other
information repositories.
 An AR is an implication of two itemsets:
 X => y
AR
(contd.)
Transaction ID (TID)
T1
T2
T3
T4
T5
Items
Bread, peanut-butter, jelly
Bread, peanut-butter
Bread, peanut-butter, milk
Bread, soda
Soda, milk
Frequent itemsets: items that frequently appear together.
Example
 Bread => peanut-butter
 I = {bread, peanut-butter}
An Interesting Rule
 Support count (σ):
 Frequency of occurrence of an itemset
 σ {bread, peanut-butter} = 3
 Support:
 Fraction of transactions that contain an
itemset
 S {bread, peanut-butter} = 3/5
AR (contd.)
The two most used measures of interest:
 Support(s): the occurring frequency of the
rule, i.e. the number of transactions that
contain both X and Y
 S = σ (X union Y) / # of transactions
 Confidence(s): the strength of the
association, i.e. measures of how often
items in Y appear in transactions that
contain X.
 C = σ (X union Y) / σ (X)
AR (contd.)
Transaction ID Items
(TID)
T1
Bread,
peanut-butter,
jelly
T2
Bread, peanutbutter
T3
Bread, peanutbutter, milk
T4
T5
Bread, soda
Soda, milk
Transaction ID (TID)
S
C
Bread => peanut-butter 3/5=.6
3/4=.75
peanut-butter => Bread 3/5=.6
3/3=1
Soda =>Bread
1/5=.2
1/2=.5
peanut-butter => jelly
1/5 =.2
1/3=.33
Jelly => peanut-butter
1/5 =.2
1/1=1
Jelly => milk
0
0
Types of AR
 Binary Association Rules
 Quantitative Association Rules
 Fuzzy Association Rules
Let’s start from the beginning:
 Binary Association Rules, A-priori
A-priori algorithm


Priori is the most influential AR miner
It consist of two steps:
1. Generate all frequent itemsets whose support
>= minimum support.
2. Use frequent itemsets to generate association
rules.
A-priori
(contd.)
 Key Idea:
 Downward closure property:
 Any subsets of a frequent itemset are also
frequent itemsets.
 The algorithm iteratively does:
 Create itemsets
 Only continue exploration of those whose support
>= minimum support
Back to our example
(minsup = 3)
Transaction
ID (TID)
Items
T1
Bread,
peanut-butter, jelly
T2
T3
T4
T5
Items
Count
Bread
4
peanut-butter
3
Bread, peanut-butter
jelly
1
Bread, peanutbutter, milk
milk
2
soda
2
Bread, soda
Soda, milk
Itemset
Count
Bread, peanut-butter
3
Example
(minsup = 2)
TID
Items
Items
sup
Itemset
sup
10
A,C,D
A
2
{A,B}
1
20
B,C,E
B
3
{A,C}
2
C
3
{A,E}
1
D
1
{B,C}
2
E
3
{B,E}
3
{C,E}
2
30
A,B,C,E
40
B,E
Itemset
sup
{A,B,C}
1
Itemset
sup
{B,C,E}
2
{B,C,E}
2