267_25_Association_Rules_Presentation

Download Report

Transcript 267_25_Association_Rules_Presentation

Association Rules
Presented by:
Panicker
Anilkumar
What is Data Mining??
• Search for valuable information in large
volumes of data.
• A step in knowledge discovery in
databases.
• It enables companies to focus on
customer satisfaction, corporate profits,
and determining the impact of various
parameters on the sales.
Association Rule
• Association rules are used to show the
relationships between data items.
• Association rules detect common usage of
data items.
• E.g. The purchasing of one product when
another product is purchased represents
an association rule.
Example 1
• Grocery store.
• Association rules have most direct
application in the retail businesses.
• Association rules used to assist in
marketing, advertising, floor placements
and inventory control.
• From the transaction history several
association rules can be derived.
• E.g. 100% of the time that PeanutButter is
purchased, so is bread.
• 33% of the time PeanutButter is
purchased, Jelly is also purchased.
Example 2
• A Telephone Company.
• A telephone company must ensure that all
calls are completed and in acceptable
period of time.
• In this environment, a potential data
mining problem would be to predict a
failure of a node.
• This can be done by finding association
rules of the type XFailure.
• If these types of rules occur with a high
confidence, Failures can be predicted.
• Even though the support might be low
because the X condition does not
frequently occur.
Association rule
• Given a set of items I = {I1,I2,….Im} and a
database of transactions D = {t1,t2,….tm}
where ti = { Ii1,Ii2,….Iik} and IiJ € I , an
association rule is an implication of the
form X  Y where X,Y C I are sets of
items called itemsets and X∩Y =ø.
• Support (s):
The support (s) for an association rule
XY is the percentage of transactions in
the database that contain X U Y.
E.g. If bread along with peanutbutter occurs
in 60% of the total transactions, then the
support for breadpeanutbutter is 60%
• Confidence or Strength (α):
The confidence or strength (α) for an
association rule XY is the ratio of the
number of transactions that contain X U Y
to the number of transactions that contain
X.
Eg.if support for breadpeanutbutter is
60% and bread occurs in 80% of total
transactions then confidence for
breadpeanutbutter is 75%.
Selecting Association rules
• The selection of association rules is based on
Support and Confidence.
• Confidence measures the strength of the rule,
Whereas support measures how often it should
occur in the database.
• Typically large confidence values and a smaller
support are used.
• Rules that satisfy both minimum support and
minimum confidence are called strong rules.
Association Rule Problem
• Given a set of Items I = {I1,I2,….Im} and a
database of transactions D = {t1,t2,….tn}
where ti = { Ii1,Ii2,….Iik} and IiJ € I . The
association rule problem is to identify all
association rules XY with a minimum
support and confidence. These values
(s,α) are given as input to the problem.
Large Itemsets
• A Large Itemset / frequent Itemset is an itemset
whose number of occurrences is above a
threshold, s (Support)
• Finding large Itemsets generally is quite easy
but very costly.
• The naive approach would be to count all
itemsets that appear in any transaction.
• Given a set of items of size m, there are 2m
subsets. Ignoring the empty set we are still left
with 2m – 1 subsets.
• For e.g. In the retail store example if have
set of items of size 5, i.e the store sells 5
products. Then the possible number of
itemsets is 25 – 1 = 31.
• If the 5 products sold are
bread,peanutbutter,milk,beer and jelly.
then the 31 possible itemsets are
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Bread
Peanutbutter
Milk
Beer
Jelly
Bread,peanutbutter
Bread,milk
Bread,beer
Bread,jelly
Peanutbutter,milk
Peanutbutter,beer
Peanutbutter,jelly
Milk,beer
Milk,jelly
Beer, jelly
Bread,peanutbutter,milk
Bread, Peanutbutter, beer
and so on.
• For m = 30 the number of potential
itemsets become 1073741823.
• The challenge in solving an association
problem is hence to efficiently determining
all large itemsets.
• Most association rule algorithms are
based on smart ways to reduce the
number of itemsets to be counted.
Large Itemsets
•
The most common approach to finding
association rules is to breakup the
problem into two parts
1. Finding large Itemsets and
2. Generating rules from these itemsets.
• Subset of any large itemset is also large.
• Once the large Itemsets have been found,
we know that any interesting association
rule, XY ,must have X U Y in this set of
frequent itemsets.
• When all large itemsets are found,
generating the association rules is
straightforward.
Apriori Algorithm
• Apriori algorithm is the most well known
association rule algorithm.
• Apriori algorithm is used to efficiently
discover large itemsets.
• Apriori algorithm uses the property that
any subset of a large itemset must be
large.
• Inputs: Itemsets, Database of transactions,
support and the output is large itemsets.
Apriori Algorithm Example
ITEM SET
SUPPORT
1,3,4
{1}
2
200
2,3,5
{2}
3
300
1,2,3,5
{3}
3
400
2,5
{4}
1
{5}
3
T.I.D.
Items
100
Support threshold = 2
ITEM SET
SUPPORT
{1}
2
{2}
3
{3}
3
{5}
3
ITEM SET
{1,2}
{1,3}
{1,5}
{2,3}
{2,5}
{3,5}
Threshold Support = 2
ITEM SET
SUPPORT
ITEM SET
SUPPORT
{1,2}
1
{1,3}
2
{1,3}
2
{2,3}
2
{1,5}
1
{2,3}
2
{2,5}
3
{2,5}
3
{3,5}
2
{3,5}
2
ITEM SET
ITEM SET
{2,3,5}
{2,3,5}
SUPPORT
2
References
• Data Mining by Margaret Dunham.
• Wikipedia
Q&A
…… Thanks..