Data Mining & Association Rule

Download Report

Transcript Data Mining & Association Rule

Association Rule
By Kenneth Leung
Data Mining

The process of extracting valid, previously
unknown, comprehensible, and actionable
information from large databases, and using
it to make crucial business decisions.

Make decision based on previous
experience or observation
Association Rule Mining

Formal: To find interesting associations
and/or correlation relationships among
large set of data items. Association
Obesity => Diabetes
rules show attribute value conditions
that occur frequently together in a given dataset.

Informal: “If – Then” relationship. If this happen, what is
most likely to happen next.
Market Basket Analysis
A typical and widely-used example of
association rule mining.
Example:

Data are collected using bar-code scanners in supermarkets.

Each record will consist of all items in a single purchase
transaction.

Managers would be interested to know if certain groups of
items are consistently purchased together.

They could use this data for adjusting store layouts (placing
items optimally with respect to each other), for cross-selling,
for promotions, for catalog design and to identify customer
segments based on buying patterns.
Famous & Interesting Finding

Beer & Diaper
“A number of convenience store clerks noticed that
men often bought beer at the same time they
bought diapers. The store mined its receipts and
proved the clerks' observations correct. So, the
store began stocking
diapers next to the beer
coolers, and sales
skyrocketed”
Why beer and Diapers??
Moms are stressed out by
their naughty babies, and they
need some beers for relief?
Diapers boxes for putting old
beer bottles. Very environmental
Friendly, and easy handling.
Two Certainty Indices

Determine whether a rule is good

Support of AR: percentage of transactions
that contain X and Y (X and Y are two items)

Confidence of AR: Ratio of number of
transactions that contain X and Y to the
number that contain X

The higher, the more reliable.
Example: Support




Supermarket has 100,000 transactions.
2000/100,000 transactions include beer
800/2000 transactions contain diapers
Support for the rule “beer->diapers” is
800 or 800/100,000 = 0.0008, or 0.8%
Example: Confidence




Supermarket has 100,000 transactions.
2000/100,000 transactions include beer
800/2000 transactions contain item diapers
Confidence for the rule “beer->diapers”
is 800/2000 = 0.4, or 40%
Full example from Wiki
1. {Cold, Raining} => No
Support: 2/5 = 40%
Confidence: 2/2 = 100%
=> Good
2. {Calm, Dry} => Yes
Support: 2/5 = 40%
Confidence: 2/2 = 100%
=> Good
1.
2.
3.
4.
{Cold, Raining} => No
{Calm, Dry} => Yes
{Dry} => No
{Windy} => No
3. {Dry} => No
Support: 1/5 = 20%
Confidence: 1/3 = 33.3%
=> Bad
4. {Windy} => No
Support: 0/5 = 0%
Confidence: 1/1 = 100%
=>Bad
References



http://www.resample.com/xlminer/help/
Assocrules/associationrules_intro.htm
http://en.wikipedia.org/wiki/Association
_rule_learning
Dr Sin-Min Lee’s lecture 30