DataMining05
Download
Report
Transcript DataMining05
Database Management Systems:
Data Mining
Market Baskets
Association Rules
Jerry Post
Copyright © 2003
1
D
A
T
A
M
i
n
i
n
g
Association/Market Basket
Examples
What items are customers likely to buy together?
What Web pages are closely related?
Others?
Classic (early) example:
Analysis of convenience store data showed customers often buy
diapers and beer together.
Importance: Consider putting the two together to increase crossselling.
2
D
A
T
A
M
i
n
i
n
g
Association Challenges
If an item is rarely purchased, any other item bought with it
seems important. So combine items into categories.
Item
Freq.
Item
Freq.
1 “ nails
2%
Hardware
15%
2” nails
1%
Dim. Lumber
20%
3” nails
1%
Plywood
15%
4” nails
2%
Finish lumber 15%
Lumber
50%
Some relationships are obvious.
Burger and fries.
Some relationships are meaningless.
Hardware store found that toilet rings sell well only when a new
store first opens. But what does it mean?
3
D
A
T
A
M
i
n
i
n
g
Association Measure: Confidence
Does A B?
If a customer purchases A, will they purchase B?
confidence( A B)
# baskets containing both A and B
# baskets containing A
4
D
A
T
A
M
i
n
i
n
g
Association Measure: Support
Does the existing data support the rule?
What percentage of baskets contain both A and B?
# baskets containing both A and B
Support ( A B)
# baskets
5
D
A
T
A
M
i
n
i
n
g
Association Measure: Lift
How does the association rule compare to the null hypothesis
(the A item exists without the B item)?
What is the likelihood of finding the second item (B) in any
random basket?
Support ( A and B )
Lift ( A B )
Support ( A) * Support ( B )
P( A B)
P ( A) P ( B )
P ( B | A)
P( B)
6
D
A
T
A
M
i
n
i
n
g
Association Details (two items)
Rule evaluation (A implies B)
Support for the rule is measured by the percentage of all
transactions containing both items: P(A ∩ B)
Confidence of the rule is measured by the transactions with A that
also contain B: P(B | A)
Lift is the potential gain attributed to the rule—the effect compared
to other baskets without the effect. If it is greater than 1, the effect
is positive:
P(A ∩ B) / ( P(A) P(B) )
P(B|A)/P(B)
Example: Diapers implies Beer
Support: P(D ∩ B) = .6
Confidence: P(B|D) = .857
Lift: P(B|D) / P(B) = 1.714
P(D) = .7
P(B) = .5
= P(D ∩ B)/P(D) = .6/.7
= .857 / .5
7
D
A
T
A
M
i
n
i
n
g
Example (Marakas)
Transaction data
1. Frozen pizza, cola, milk
2. Milk, potato chips
3. Cola, frozen pizza
4. Milk, pretzels
5. Cola, pretzels
8