Data Mining: Association Rules by Thanh Troung

Download Report

Transcript Data Mining: Association Rules by Thanh Troung

Data Mining: Association Rule
By: Thanh Truong
Association Rules
In Association Rules, we look at the
associations between different items to draw
conclusions from.
In sales, we look at purchases:
Example in book


Someone who buys bread is most likely buy milk
Someone who buys the book Database System
Concepts is quite likely also to buy the book
Operating System Concepts.
Association Uses
When a customer buys a book on-line
shop may suggest associated books.
(cont…)
In grocery stores they can place
associated items next to each other.
OR
They can place at opposite ends of the
aisle, with other associated items in
between.
The store can sell an item at a
discounted price, but not the other.
Association Notation
Association Rules are statement of the
form

{X1, X2,…, Xn} => Y
Means: If we find things in X, then we
will most likely find Y
Population & Instance
An association rule must have an
associated population.
The population consists of a set of
instances.
In the grocery example,


Population may be all grocery-store
purchases
Instances are the purchases itself
Support
Support is the measure of what fraction of
the population satisfies both the antecedent
and the consequent of the rule.
For example, if only .0001 percent of
purchases include milk and screwdrivers,
then, the support is low for

milk => screwdrivers
If 50% of purchases of diapers include beer,
then we would say the support is high.
Confidence
Confidence is a measure of how often
the consequent is true when the
antecedent is true.
bread => milk has a confidence of 80%
if 80% of the purchases that include
bread also includes milk.
A rule with a low confidence is not
meaningful.
Other Types of Association
In statistical terms, we can look for
correlations between items.
So, even if purchases of bread is not
correlated with cereal, it would not be
reported, even if there was a strong
association between the two.
Assocation Rule: {bread, butter} => jam
Correlation: Someone who buys tea will not
buy coffee
(cont…)
Sequence associations: Time-series
data, such as stock prices on a
sequence of days
Example is the following rule
“Whenever bond rates go up, the stock
prices go down within 2 days”
Using this will help make investment
decisions.
References
Database System Concepts, Fifth
Edition, Silberschatz