Association Rule.

Download Report

Transcript Association Rule.

Association Rule.
Association rule mining
 It is an important data mining model studied
extensively by the database and data mining
community.
 Assume all data are categorical.
 Initially used for Market Basket Analysis to
find how items purchased by customers are
related.
Transaction data: supermarket data
 Market basket transactions:
t1: {bread, cheese, milk}
t2: {apple, eggs, salt, yogurt}
…
…
tn: {biscuit, eggs, milk}
 Concepts:
• An item: an item/article in a basket
• I: the set of all items sold in the store
• A transaction: items purchased in a basket; it
may have TID (transaction ID)
• A transactional dataset: A set of transactions
The model: rules
A transaction t contains X, a set of items
(itemset) in I, if X  t.
An association rule is an implication of the
form:
X  Y, where X, Y  I, and X Y = 
An itemset is a set of items.
• E.g., X = {milk, bread, cereal} is an itemset.
A k-itemset is an itemset with k items.
• E.g., {milk, bread, cereal} is a 3-itemset
Rule strength measures
 Support: The rule holds with support sup in T (the
transaction data set) if sup% of transactions
contain X  Y.
sup = Pr(X  Y)= Count (XY)/total count.
 Confidence: The rule holds in T with confidence
conf if conf% of transactions that contain X also
contain Y.
• conf = Pr(Y | X)=support(X,Y)/support(X).
 An association rule is a pattern that states when X
occurs, Y occurs with certain probability.
 Goal of Association Rule.
 Find all rules that satisfy the userspecified minimum support (minsup) and
minimum confidence (minconf).
An Example.
 Transaction data
 Assume:
minsup = 30%
minconf = 80%
 An example frequent itemset:
{Chicken, Clothes, Milk}
[sup = 3/7]
 Association rules from the itemset:
Clothes  Milk,Chicken[sup = 3/7, conf = 3/3]
…
…
Clothes, Chicken  Milk[sup = 3/7, conf = 3/3]
•t1: Beef, Chicken, Milk
•t2: Beef, Cheese
•t3: Cheese, Boots
•t4: Beef, Chicken, Cheese
•t5: Beef, Chicken, Clothes,
Cheese, Milk
•t6: Chicken, Clothes, Milk
•t7: Chicken, Milk, Clothes
Data set.
 This data set related to retail industry.
 The data set contains information of each
transaction with the transaction ids.
 Each row represent a single transaction ,i.e
information of a single customer.
 For example if a row present the data like this{Bread sandwich,Milk,Egg,Butter}, it means this
customer has taken those mentioned item in a
single transaction.
Objective.
 Here our main objective is to find out the pattern
of buying from this huge data base
 The discovery of such association rule can help
people to develop marketing strategies by
gaining insight into, which items are frequently
purchased together by customer.
 Here we have taken the following parameters,
Minsup=.08
Minconf=.40
Mincorr=.30
Analysis.
 The spreadsheet
showing the frequently
item set with the
support values.
 From the table it is
clear that Fluid milk
has the maximum
frequencies followed
by Bananas ,Salad
vegetable, Eggs etc.
 This means most of
the customers has
taken these three
items into their basket.
 The fifth rule has got highest confidence value 58.83424%,which means
58% of customers who are taking Eggs also taking Fluid milk.
 Similarly 54% of customers who are taking Tomatoes also taking Salad
vegetables.
 Same way 52% of customer who are taking Bread Sandwiches also
taking Fluid milk.
Rule Graph.
 This will represent the entire Association rules
Graphically, which will help us to understand the
entire process in a single snapshot.
 In this graph, the support values for the Body and
Head portions of each association rule are
indicated by the sizes and colors of each circle.
 The thickness of each line indicates the
confidence value (conditional probability of Head
given Body) for the respective association rule.
 The sizes and colors of the circles in the center,
above the Implies label, indicate the joint support
(for the co-occurrences) of the respective Body
and Head components of the respective
association rules.
 In the graphical summary the strongest support value was found for
Fluid milk associated with Bananas, Bread sandwiches, and Eggs.
 From the graph it is also clear that Fluid milk and Eggs has got the
highest confidence value (thickness of these rule is very high).
3D Rule Graph.
 The above graph is the 3D version of the earlier graph.
 From the graph it is clear that Fluid milk and Eggs have the
highest confidence value compared to any other items.
Conclusion.
 According to the rule Fluid milk, Bananas, Bread
sandwiches, Eggs, Salad Vegetables, Grapes, Fruit juice
these items are frequently taken by customers into their
basket.
 Also the rule suggest that more than 50% of customers
who are buying Fluid milk also buying Eggs and Bread
sandwiches.
 All the above information can be utilized for better
marketing strategies.
 For example retailer can arrange those frequently brought
items very close to each other in the super market so that
customer can get all these items easily.
 Some new products (related to previous items) can also be
placed nearby which will attract to the customers.
Thank You.
Krishnendu Kundu
(Statistician)
StatSoft India.
Email- [email protected]
Mobile - +919873119520