Frequent Itemset
Download
Report
Transcript Frequent Itemset
Elective-I
Examination Scheme-
In semester Assessment: 30
End semester Assessment :70
Text Books:
Data Mining Concepts and Techniques- Micheline Kamber
Introduction to Data Mining with case studies-G.k.Gupta
Reference Books:
Mining the Web Discovering Knowledge from Hypertext dataSaumen charkrobarti
Reinforcement and systemic machine learning for decision
making- Parag Kulkarni
Market Basket Analysis
Frequent item set, Closed item set,
Association Rules
Mining multilevel Association Rules
Constraint based association rule mining
Apriori Algorithm
FP growth Algorithm
Itemset: Transaction is a set of items (Itemset).
Confidence : It is the measure of trust worthiness associated
with each discovered pattern.
Support : It is the measure of how often the collection of
items in an association occur together as percentage of all
transactions
Frequent itemset : If an itemset satisfies minimum
support,then it is a frequent itemset.
Def: Market Basket Analysis (Association
Analysis) is a mathematical modeling
technique based upon the theory that if you
buy a certain group of items, you are likely to
buy another group of items.
It is used to analyze the customer purchasing
behavior and helps in increasing the sales and
maintain inventory by focusing on the point
of sale transaction data.
identify purchase patterns
what items tend to be purchased together
▪ obvious: steak-potatoes; diaper- baby lotion
what items are purchased sequentially
▪ obvious: house-furniture; car-tires
what items tend to be purchased by season
Categorize customer purchase behavior
purchase profiles
profitability of each purchase profile
Use it for marketing
▪ layout or catalogs
▪ select products for promotion
▪ space allocation
Customer 1: beer, pretzels, potato chips, aspirin
Customer 2: diapers, baby lotion, grapefruit juice,
baby food, milk
Customer 3: soda, potato chips, milk
Customer 4: soup, beer, milk, ice cream
Customer 5: soda, coffee, milk, bread
Customer 6: beer, potato chips
beauty conscious
health conscious
sports conscious
smoker
casual drinker
new family
kids’ play
pet lover
gardener
automotive
photographer
tv/stereo enthusiast
convenience food
women’s fashion
kid’s fashion
hobbyist
student/home office
illness (prescription)
illness over-the-counter
casual reader
home handyman
men’s image conscious
sentimental
seasonal/traditional
homemaker
home comfort
fashion footwear
men’s fashion
personal care
Beauty conscious
cotton balls
hair dye
cologne
nail polish
BENEFITS:
simple computations
can be undirected (don’t have to have hypotheses
before analysis)
different data forms can be analyzed
Itemset:
A collection of one or more items
▪ Example: {Milk, Bread, Diaper}
Support count ()
Frequency of occurrence of an itemset
E.g. ({Milk, Bread, Diaper}) = 2
Support
Fraction of transactions that contain an
itemset
E.g. s({Milk, Bread, Diaper}) = 2/5
Frequent Itemset
An itemset whose support is greater than
or equal to a minsup threshold
TID
Items
1
Bread, Milk
2
3
4
5
Bread, Diaper, Beer, Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke
Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other
items in the transaction
Market-Basket transactions
TID
Items
1
Bread, Milk
2
3
4
5
Bread, Diaper, Beer, Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke
Example of Association Rules
{Diaper} {Beer},
{Milk, Bread} {Eggs,Coke},
{Beer, Bread} {Milk},
Implication means cooccurrence..
TID
Items
1
Bread, Milk
2
3
4
5
Bread, Diaper, Beer, Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke
Example of Rules:
{Milk, Diaper} {Beer} (s=0.4, c=0.67)
{Milk, Beer} {Diaper} (s=0.4, c=1.0)
{Diaper, Beer} {Milk} (s=0.4, c=0.67)
{Beer} {Milk, Diaper} (s=0.4, c=0.67)
{Diaper} {Milk, Beer} (s=0.4, c=0.5)
{Milk} {Diaper, Beer} (s=0.4, c=0.5)
Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements