Association Rule.
Download
Report
Transcript Association Rule.
Association Rule.
Association rule mining
It is an important data mining model studied
extensively by the database and data mining
community.
Assume all data are categorical.
Initially used for Market Basket Analysis to
find how items purchased by customers are
related.
Transaction data: supermarket data
Market basket transactions:
t1: {bread, cheese, milk}
t2: {apple, eggs, salt, yogurt}
…
…
tn: {biscuit, eggs, milk}
Concepts:
• An item: an item/article in a basket
• I: the set of all items sold in the store
• A transaction: items purchased in a basket; it
may have TID (transaction ID)
• A transactional dataset: A set of transactions
The model: rules
A transaction t contains X, a set of items
(itemset) in I, if X t.
An association rule is an implication of the
form:
X Y, where X, Y I, and X Y =
An itemset is a set of items.
• E.g., X = {milk, bread, cereal} is an itemset.
A k-itemset is an itemset with k items.
• E.g., {milk, bread, cereal} is a 3-itemset
Rule strength measures
Support: The rule holds with support sup in T (the
transaction data set) if sup% of transactions
contain X Y.
sup = Pr(X Y)= Count (XY)/total count.
Confidence: The rule holds in T with confidence
conf if conf% of transactions that contain X also
contain Y.
• conf = Pr(Y | X)=support(X,Y)/support(X).
An association rule is a pattern that states when X
occurs, Y occurs with certain probability.
Goal of Association Rule.
Find all rules that satisfy the userspecified minimum support (minsup) and
minimum confidence (minconf).
An Example.
Transaction data
Assume:
minsup = 30%
minconf = 80%
An example frequent itemset:
{Chicken, Clothes, Milk}
[sup = 3/7]
Association rules from the itemset:
Clothes Milk,Chicken[sup = 3/7, conf = 3/3]
…
…
Clothes, Chicken Milk[sup = 3/7, conf = 3/3]
•t1: Beef, Chicken, Milk
•t2: Beef, Cheese
•t3: Cheese, Boots
•t4: Beef, Chicken, Cheese
•t5: Beef, Chicken, Clothes,
Cheese, Milk
•t6: Chicken, Clothes, Milk
•t7: Chicken, Milk, Clothes
Data set.
This data set related to retail industry.
The data set contains information of each
transaction with the transaction ids.
Each row represent a single transaction ,i.e
information of a single customer.
For example if a row present the data like this{Bread sandwich,Milk,Egg,Butter}, it means this
customer has taken those mentioned item in a
single transaction.
Objective.
Here our main objective is to find out the pattern
of buying from this huge data base
The discovery of such association rule can help
people to develop marketing strategies by
gaining insight into, which items are frequently
purchased together by customer.
Here we have taken the following parameters,
Minsup=.08
Minconf=.40
Mincorr=.30
Analysis.
The spreadsheet
showing the frequently
item set with the
support values.
From the table it is
clear that Fluid milk
has the maximum
frequencies followed
by Bananas ,Salad
vegetable, Eggs etc.
This means most of
the customers has
taken these three
items into their basket.
The fifth rule has got highest confidence value 58.83424%,which means
58% of customers who are taking Eggs also taking Fluid milk.
Similarly 54% of customers who are taking Tomatoes also taking Salad
vegetables.
Same way 52% of customer who are taking Bread Sandwiches also
taking Fluid milk.
Rule Graph.
This will represent the entire Association rules
Graphically, which will help us to understand the
entire process in a single snapshot.
In this graph, the support values for the Body and
Head portions of each association rule are
indicated by the sizes and colors of each circle.
The thickness of each line indicates the
confidence value (conditional probability of Head
given Body) for the respective association rule.
The sizes and colors of the circles in the center,
above the Implies label, indicate the joint support
(for the co-occurrences) of the respective Body
and Head components of the respective
association rules.
In the graphical summary the strongest support value was found for
Fluid milk associated with Bananas, Bread sandwiches, and Eggs.
From the graph it is also clear that Fluid milk and Eggs has got the
highest confidence value (thickness of these rule is very high).
3D Rule Graph.
The above graph is the 3D version of the earlier graph.
From the graph it is clear that Fluid milk and Eggs have the
highest confidence value compared to any other items.
Conclusion.
According to the rule Fluid milk, Bananas, Bread
sandwiches, Eggs, Salad Vegetables, Grapes, Fruit juice
these items are frequently taken by customers into their
basket.
Also the rule suggest that more than 50% of customers
who are buying Fluid milk also buying Eggs and Bread
sandwiches.
All the above information can be utilized for better
marketing strategies.
For example retailer can arrange those frequently brought
items very close to each other in the super market so that
customer can get all these items easily.
Some new products (related to previous items) can also be
placed nearby which will attract to the customers.
Thank You.
Krishnendu Kundu
(Statistician)
StatSoft India.
Email- [email protected]
Mobile - +919873119520