DataMining05

Download Report

Transcript DataMining05

Database Management Systems:
Data Mining
Market Baskets
Association Rules
Jerry Post
Copyright © 2003
1
D
A
T
A
M
i
n
i
n
g
Association/Market Basket
 Examples
 What items are customers likely to buy together?
 What Web pages are closely related?
 Others?
 Classic (early) example:
 Analysis of convenience store data showed customers often buy
diapers and beer together.
 Importance: Consider putting the two together to increase crossselling.
2
D
A
T
A
M
i
n
i
n
g
Association Challenges
 If an item is rarely purchased, any other item bought with it
seems important. So combine items into categories.
Item
Freq.
Item
Freq.
1 “ nails
2%
Hardware
15%
2” nails
1%
Dim. Lumber
20%
3” nails
1%
Plywood
15%
4” nails
2%
Finish lumber 15%
Lumber
50%
 Some relationships are obvious.
 Burger and fries.
 Some relationships are meaningless.
 Hardware store found that toilet rings sell well only when a new
store first opens. But what does it mean?
3
D
A
T
A
M
i
n
i
n
g
Association Measure: Confidence
 Does A  B?
 If a customer purchases A, will they purchase B?
confidence( A  B) 
# baskets containing both A and B
# baskets containing A
4
D
A
T
A
M
i
n
i
n
g
Association Measure: Support
 Does the existing data support the rule?
 What percentage of baskets contain both A and B?
# baskets containing both A and B
Support ( A  B) 
# baskets
5
D
A
T
A
M
i
n
i
n
g
Association Measure: Lift
 How does the association rule compare to the null hypothesis
(the A item exists without the B item)?
 What is the likelihood of finding the second item (B) in any
random basket?
Support ( A and B )
Lift ( A  B ) 
Support ( A) * Support ( B )
P( A  B)

P ( A) P ( B )
P ( B | A)

P( B)
6
D
A
T
A
M
i
n
i
n
g
Association Details (two items)
 Rule evaluation (A implies B)
 Support for the rule is measured by the percentage of all
transactions containing both items: P(A ∩ B)
 Confidence of the rule is measured by the transactions with A that
also contain B: P(B | A)
 Lift is the potential gain attributed to the rule—the effect compared
to other baskets without the effect. If it is greater than 1, the effect
is positive:
 P(A ∩ B) / ( P(A) P(B) )
 P(B|A)/P(B)
 Example: Diapers implies Beer
 Support: P(D ∩ B) = .6
 Confidence: P(B|D) = .857
 Lift: P(B|D) / P(B) = 1.714
P(D) = .7
P(B) = .5
= P(D ∩ B)/P(D) = .6/.7
= .857 / .5
7
D
A
T
A
M
i
n
i
n
g
Example (Marakas)
Transaction data
1. Frozen pizza, cola, milk
2. Milk, potato chips
3. Cola, frozen pizza
4. Milk, pretzels
5. Cola, pretzels
8