Transcript DATA MINING

DATA MINING
Using Association Rules
by
Andrew Williamson
What is Data Mining?
• A.K.A. – Knowledge Discovery in DataBases (KDD)
• “Data mining is the automated extraction of hidden predictive
information from databases. Data mining software allows users to
analyze large databases. This can enable you to solve business
decision problems, for data mining as an extension of statistics;
while it doesn't solve your problems, it is the technology that can find
your problems. You can build a predictive model of an ideal
customer from your own databases and, using this information, build
your marketing strategy or even determine the creation of new
products and services.”
• - www.biz-services-savvy.com
• In short “Data mining can be defined as,
“the non-trivial extraction of implicit,
previously unknown, and potentially useful
information from data” ”.
An example of Association
“A simple example of data mining is its use in a retail
sales department. If a store tracks the purchases of
a customer and notices that a customer buys a lot of
silk shirts, the data mining system will make a
correlation between that customer and silk shirts…
Association example cont.
…The sales department will look at that information
and may begin direct mail marketing of silk shirts to
that customer, or it may alternatively attempt to get
the customer to buy a wider range of products. In
this case, the data mining system used by the retail
store discovered new information about the
customer that was previously unknown to the
company. “
• - wikipedia.org
Association Rules
• Using Association Rules, X  Y, helps to
identify shopping trends over a given
amount of time
• 1) Support
• 2) Confidence
Association Rules Cont.
• 1) Support
– The ratio of transactions that contain both an
item X and an item Y, over all transactions.
• P(X U Y)
– Measures the significance of the given rule.
Association Rules..
• 2) Confidence
– The ratio of transactions containing X, and
also contains Y.
• P(X U Y) / P( X )
– Measures the strength of the correlation of the
given rule.
Association Rules…
• The Association rules X  Y with a
support and confidence ratio of 50% or
more are considered to be meaningful and
therefore are kept, otherwise the rule will
be discarded or ignored.
2 Phases of Association Rules
• When considering an association rule to
examine follow these 2 steps:
• Phase 1
– List together all rules with a high support ratio.
– A.K.A. – The Frequent Itemset.
• Phase 2
– List together all rules with a high Confidence ratio
from the Frequent Itemset group.
Extended Association Rules
• Standard Association Rules express a
correlation between values of a single
dimensional schema.
• While, more meaningful associations may
be discovered when incorporating multi dimensional schemas
Extended Association Rules cont.
• A Simple Example:
– Sleeping bags  Tents
• This alone, seems obvious and not useful.
– Sleeping bags  Tents ( region=north,
season=summer)
• more meaningful incorporating region and
season of transactions.
Extended Association Rules cont..
• More meaningful information may be
associated by augmenting the original
Association Rules
– X  Y (Z)
• Transactions which satisfy Z and contain X, are
more likely to contain Y as well.
Work Cited
•
•
•
•
www.wikipedia.org
http://www.biz-services-savvy.com
http://sba.luc.edu
http://www-128.ibm.com