Market Basket Analysis

Download Report

Transcript Market Basket Analysis

Market Basket Analysis
By
Sowjanya Alaparthi
Topics to be discussed
•
•
•
•
•
Introduction to Market basket analysis
Apriori Algorithm
Demo-1 ( Using self created table)
Demo-2 ( Using Oracle sample schema)
Demo-3 ( Using OLAP analytic
workspace)
Introduction to Market
Basket Analysis
• Def: Market Basket Analysis (Association
Analysis) is a mathematical modeling
technique based upon the theory that if you
buy a certain group of items, you are likely
to buy another group of items.
• It is used to analyze the customer purchasing
behavior and helps in increasing the sales
and maintain inventory by focusing on the
point of sale transaction data.
• Given a dataset, the Apriori Algorithm trains
and identifies product baskets and product
association rules
Definitions and Terminology
• Transaction is a set of items (Itemset).
• Confidence : It is the measure of uncertainty or trust
worthiness associated with each discovered pattern.
• Support : It is the measure of how often the collection
of items in an association occur together as
percentage of all transactions
• Frequent itemset : If an itemset satisfies minimum
support,then it is a frequent itemset.
• Strong Association rules: Rules that satisfy both a
minimum support threshold and a minimum confidence
threshold
• In Association rule mining, we first find all frequent
itemsets and then generate strong association rules
from the frequent itemsets
Definitions and
Terminology- Continued
• Apriori algorithm is the most established
algorithm for finding frequent item sets
mining.
• The basic principle of Apriori is “Any
subset of a frequent itemset must be
frequent”.
• We use these frequent itemsets to
generate association rules.
Apriori Algorithm
Ck: Candidate itemset of size k
Lk: Frequent itemset of size k
L1={frequent items};
For (k=1; Lk!=0; k++) do begin
Ck+1= Candidates generated from Lk;
For each transaction t in the database do
Increment the count of all candidates in Ck+1 that are
contained in t
Lk+1=candidates in Ck+1 with min_support
End
Return UkLk;
Pictorial representation of
Apriori algorithm
Step 1
Scan the transaction
database to get the support
S of each 1-itemsets,
compare S with min _sup,
and get a set of frequent 1itemsets,L1
Step 3
Scan the transaction database to
get the support S of each candidate
k-itemset in the final set, compare
S with min_sup, and get a set of
frequent k-itemsets,Lk
Step 2
Use Lk-1 join Lk-1 to generate
a candidate k-itemsets. And
use appiori propersty to prune
the unfrequented k-itemssets
from this set.
NO
Step 4
The candidate
set = null
YES
Step 6
For every non empty
subset s of 1, output the
rule “s =>(1-s)” if
confidence C of the rule
“s=> (1-s)”(=support S
of 1/support S of s )3
Step 5
For each frequent
itemset 1,
generate all non
empty subsets of 1
Demo-1
Installations
Oracle 10g enterprise edition
SQL Plus
Oracle Data Miner Client
Demo 1 - Data Preparation
• Download the sample data, which is in
excel sheet.
• write macro to convert data in excel
sheet to insert queries
• Create a table and execute these insert
queries in SQLplus
• As we are connected to Oracle server,
this table is then found in Oracle
database
Demo-1 Connections
Connect Oracle Data Miner Client to
Oracle Database
• Make sure the oracle listener is listening
• Database instance ‘ora478’ is started.
• The port used is 1521
• Give the hostname as oracle.itk.ilstu.edu
Demo-1
• Perform the activity, after installations
and connections are made.
Demo-2
( using oracle sample schema)
• Download Oracle 10g on your system and
install it
• Select the sample schema option during the
custom installation
• Launch Oracle Data Miner Client
• In order to use this sample scheme for our
activity, we should have the system
administrator privileges.
• The username is SH and password is
password
Demo -2
• Administrator should perform some grants in sqlplusw to
build this activity. They are
alter user sh account unlock;
alter user sh identified by password;
grant create table to sh;
grant create sequence to sh;
grant create session to sh;
grant create view to sh;
grant create procedure to sh;
grant create job to sh;
grant create type to sh;
grant create synonym to sh;
grant execute on ctxsys.ctx_ddl to sh;
Demo-2
The points to be noted before starting the
activity are:
• Make sure the oracle listener is started
• Database instance ‘ORCL’ is started.
• The port used is 1521
• Give the hostname as 127.0.0.1, which
is a general hostname.
Demo-2
• Finally, the results from the model are
published to a table, and this table
forms the raw source for the new OLAP
product dimension.
• At this point there is no information
relating to revenue, costs or quantity.
So, we need to extend the activity
beyond association analysis to OLAP.
OLAP
• We have to correctly format the results
obtained from Association analysis for
dimension mapping in OLAP. This can be
done using OLAP DML or PL/SQL.
• In our activity we create a separate
dimension that can hold the results from
algorithm. For each dimension we can
create Levels, hierarchies, attributes and
mappings.
OLAP- Analytic workspace
• Launch Analytic workspace and give the
login details as
Username- sh
Connection information127.0.0.1:1521:orcl
This connects to Oracle sample schema
SH on 1521 port and local host
127.0.0.1 and orcl database instance.
Demo 3- OLAP Analytic
Workspace
• Perform the activity and show the
mappings
Conclusion
• We have shown how Market basket
analysis using association rules works in
determining the customer buying
patterns. This can be further extended
using OLAP Analytic workspace as
shown in demo-3, to add dimensions
and cube to identify other measures like
costs, revenue and quantity.
References
Books:
• [1] Michael J. A. Berry, Gordon Linoff.”Data Mining Techniques: For
Marketing, Sales, and Customer Support (Paperback)”.
• [2] J.Han,M. Kamber(2001) “Data Mining”, Morgan Kaufmann
publishers, San Francisco,CA
Links:
• [3]. http://oraclebi.blogspot.com/2007/02/using-market-basketanalysis-to-add.html
• [4]. http://nymetro.chapter.informs.org/prac_cor_pubs/Ausleder-Onmarket-basket-analysis-May-04.pdf
• [5]. http://www2.sas.com/proceedings/sugi28/223-28.pdf
• [6] http://en.wikipedia.org/wiki/Market_basket_analysis
• [7]
http://www.cs.ualberta.ca/~zaine/courses/cmput499/slides/lect10/sl
d053.htm
• [8] http://www.icaen.uiowa.edu/~comp/Public/Apriori.pdf
Questions??