Frequent Itemset

Download Report

Transcript Frequent Itemset

Elective-I
Examination Scheme-
In semester Assessment: 30
End semester Assessment :70
Text Books:
Data Mining Concepts and Techniques- Micheline Kamber
Introduction to Data Mining with case studies-G.k.Gupta
Reference Books:
Mining the Web Discovering Knowledge from Hypertext dataSaumen charkrobarti
Reinforcement and systemic machine learning for decision
making- Parag Kulkarni






Market Basket Analysis
Frequent item set, Closed item set,
Association Rules
Mining multilevel Association Rules
Constraint based association rule mining
Apriori Algorithm
FP growth Algorithm

Itemset: Transaction is a set of items (Itemset).

Confidence : It is the measure of trust worthiness associated
with each discovered pattern.

Support : It is the measure of how often the collection of
items in an association occur together as percentage of all
transactions

Frequent itemset : If an itemset satisfies minimum
support,then it is a frequent itemset.

Def: Market Basket Analysis (Association
Analysis) is a mathematical modeling
technique based upon the theory that if you
buy a certain group of items, you are likely to
buy another group of items.

It is used to analyze the customer purchasing
behavior and helps in increasing the sales and
maintain inventory by focusing on the point
of sale transaction data.


identify purchase patterns
 what items tend to be purchased together
▪ obvious: steak-potatoes; diaper- baby lotion
 what items are purchased sequentially
▪ obvious: house-furniture; car-tires
 what items tend to be purchased by season

Categorize customer purchase behavior
 purchase profiles
 profitability of each purchase profile
 Use it for marketing
▪ layout or catalogs
▪ select products for promotion
▪ space allocation
Customer 1: beer, pretzels, potato chips, aspirin
Customer 2: diapers, baby lotion, grapefruit juice,
baby food, milk
Customer 3: soda, potato chips, milk
Customer 4: soup, beer, milk, ice cream
Customer 5: soda, coffee, milk, bread
Customer 6: beer, potato chips
beauty conscious
health conscious
sports conscious
smoker
casual drinker
new family
kids’ play
pet lover
gardener
automotive
photographer
tv/stereo enthusiast
convenience food
women’s fashion
kid’s fashion
hobbyist
student/home office
illness (prescription)
illness over-the-counter
casual reader
home handyman
men’s image conscious
sentimental
seasonal/traditional
homemaker
home comfort
fashion footwear
men’s fashion
personal care

Beauty conscious
 cotton balls
 hair dye
 cologne
 nail polish

BENEFITS:
 simple computations
 can be undirected (don’t have to have hypotheses
before analysis)
 different data forms can be analyzed

Itemset:
 A collection of one or more items
▪ Example: {Milk, Bread, Diaper}

Support count ()
 Frequency of occurrence of an itemset
 E.g. ({Milk, Bread, Diaper}) = 2

Support
 Fraction of transactions that contain an
itemset
 E.g. s({Milk, Bread, Diaper}) = 2/5

Frequent Itemset
 An itemset whose support is greater than
or equal to a minsup threshold
TID
Items
1
Bread, Milk
2
3
4
5
Bread, Diaper, Beer, Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke

Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other
items in the transaction
Market-Basket transactions
TID
Items
1
Bread, Milk
2
3
4
5
Bread, Diaper, Beer, Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke
Example of Association Rules
{Diaper}  {Beer},
{Milk, Bread}  {Eggs,Coke},
{Beer, Bread}  {Milk},
Implication means cooccurrence..
TID
Items
1
Bread, Milk
2
3
4
5
Bread, Diaper, Beer, Eggs
Milk, Diaper, Beer, Coke
Bread, Milk, Diaper, Beer
Bread, Milk, Diaper, Coke
Example of Rules:
{Milk, Diaper}  {Beer} (s=0.4, c=0.67)
{Milk, Beer}  {Diaper} (s=0.4, c=1.0)
{Diaper, Beer}  {Milk} (s=0.4, c=0.67)
{Beer}  {Milk, Diaper} (s=0.4, c=0.67)
{Diaper}  {Milk, Beer} (s=0.4, c=0.5)
{Milk}  {Diaper, Beer} (s=0.4, c=0.5)
Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements