Association Rules - s3.amazonaws.com

Download Report

Transcript Association Rules - s3.amazonaws.com

Chap 6: Association Rules
Rule Rules!
 Motivation ~ recent progress in data mining + warehousing
have made it possible to collect HUGE amount of data.
Example : supermarket transaction => barcode,website
automatically record purchase data
 These data provides POSSIBLE interaction among each
item.
 Supermarket buying @ transaction data might provides
consumer buying pattern!
Association Analysis
 Association analysis is a popular data mining technique
aimed at discovering novel and interesting relationships
between the data objects present in a database.
 Is used to estimate the probability of whether a person will
purchase a product given that they own particular product
or group of products
 “Market Basket Analysis” looks at transactions to see which
products get bought together.
Association Rules
 Known as “market-basket analysis”.
 Aims  to find regularities behaviors ~ to find set of
products that are frequently be bought together!
 Rules structure: { X ^ Y }  Z
consequence
Antecedent @
@ apriori
 Example: priori
“if costumer bought milk and eggs, they often bought sugar
too!”
association rules: (milk ^ eggs)  {sugar}
Apriori Algorithm
 Method to find frequent patterns, associations and causal structures among
set of items.
 Main concept ~ frequent itemsets  itemset that appeared more and
related with another item How? Using support and confidence value.
 Given item {X,Y,Z}:
 Support (S): probability that a transaction X ^ Y  Z contain all given
transaction (X,Y,Z).


Measures how often the rules occur in database
Confidence (C) : conditional probability that a transaction X ^ Y  Z
contains only item Z. sometimes known as “accuracy”

Measures the strength of the rules
Support and Confidence
Concept
 Given items X,Y with T transaction, if rule XY therefore:
 Support :
= transaction that contain every item in A and B
/ (total number of transactions)

Confidence: =Transaction that contain every item in A and B
/ transaction that contain the items in A
 The support and confidence value is ranges between 0 and 1.
 Only rule that exceed minimum support will be generated.
Support and Confidence
Concept
 Example: (A ^ B)  D
ID
Items
1
A,D,E
2
A,B,C
support
s
3
A,B,C,D confidence  
4
A,B,E,C
5
A,C,B,D
 ( A, B, D)
5
2
  40%
5
 ( A, B, D) 2
  50%
 ( A, B)
4
Illustrating Apriori Algorithm
Principles
 Collect single item counts, find the combination of k
itemsets and evaluate until finish.
ID
Items
1
Bread, Milk
2
Cheese, Diaper, Bread, Eggs
3
Cheese, Coke, Diaper, Milk
4
Cheese, Bread, Diaper, Milk
5
Coke, Bread, Diaper, Milk
Given minimum support, s = 3
Illustrating Apriori Algorithm
Principles (cont.)
 Convert into single itemsets. (with s=3)
(1st itemsets)
(2nd itemsets)
ID
Counts
ID
Counts
Bread
4
Bread, Milk
3
Coke
2
Bread, Cheese
2
Cheese
3
Bread, Diaper
3
Milk
4
Milk, Cheese
2
Diaper
4
Milk, Diaper
3
Cheese, Diaper
3
S >= 3
Eggs
1
* Prune items COKE & EGGS
because the count < 3
Illustrating Apriori Algorithm
Principles (cont.)
 Support And Confidence. (with s=3)
Relations
Lift
Support(%)
2
0.9
60
2
0.9
2
Confidence(%)
Transaction Count
Rule
75
3
Milk ==> Diaper
60
75
3
Diaper ==> Milk
0.9
60
75
3
Milk ==> Bread
2
0.9
60
75
3
Bread ==> Milk
2
1.3
60
75
3
Diaper ==> Cheese
2
1.3
60
100
3
Cheese ==> Diaper
2
0.9
60
75
3
Diaper ==> Bread
2
0.9
60
75
3
Bread ==> Diaper
Interpreting Support and
Confidence
 Confidence measure the strength the rules, whereas
support measures how often it should occur in the
database. For example, look at Diaper Cheese.
With a confidence of 75%, this indicates that this rule
holds 75% of the time it could. That is, ¾ times that
Diaper occur, so does Cheese. The support value of
60% indicate that, this rules exists almost 60% of the
all transaction.
Example(Association Rules)
A B C
A CD
Rule
AD
CA
AC
B&CD
B CD
ADE
B C E
Support
Confidence
2/5
2/5
2/5
1/5
2/3
2/4
2/3
1/3
Implication?
Checking Account
No
Yes
Saving
Account
No
500
3,500
4,000
Yes
1,000
5,000
6,000
10,000
Support(SVG  CK) = 50%
Confidence(SVG  CK) = 83%
Lift(SVG  CK) = 0.83/0.85 < 1
Apriori Algorithm Principles
(cont.)
•Lift is equal to the confidence factor divided by the
expected confidence.
•Lift is a factor by which the likelihood of consequent
increases given an antecedent.
•Expected confidence is equal to the number of
consequent transactions divided by the total number of
transactions.
•A creditable rule has a large confidence factor, a large
level of support, and a value of lift greater than 1.
•Rules having a high level of confidence but little
support should be interpreted with caution.