problem description
Download
Report
Transcript problem description
國立雲林科技大學
National Yunlin University of Science and Technology
Efficient Mining of Intertransaction
Association Rules
Advisor:Dr. Hsu
Graduate: Keng-Wei Chang
Author:Anthony K.H. Tung
Hongjun Lu
Jiawei Han
Ling Feng
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING VOL. 15, NO. 1,
JANUARY/FEBRUARY 2003
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Outline
Motivation
Objective
Introduction
PROBLEM DESCRIPTION
PRINCIPLES OF INTERTRANSACTION ASSOCIATION
MINNING METHODS
THE FITI ALGORITHM
PERFORMANCE STUDY
CONCLUSION
PERSONAL OPINION
Intelligent Database Systems Lab
Intratransaction v.s. Intertransaction
Intratransaction
same transaction
same customer
same day
Intertransaction
different transaction
Intelligent Database Systems Lab
Motivation
Most of the previous studies on mining
association rules are on mining intratransaction
associations.
same transaction
same customer
same day
Intelligent Database Systems Lab
Objective
we break the barrier of transactions and extend
the scope of mining association rules from
traditional single-dimensional, intratransaction
associations to multidimensional,
intertransaction associations.
Intelligent Database Systems Lab
Introduction
Most of the pervious studies…
There is an important form of association rules
which are useful but could not discovered with
existing association rule mining framework.
Intelligent Database Systems Lab
Introduction
Example:stock
R1: When the prices of IBM and SUN go up, at 80
percent of probability the price of Microsoft goes
up on the same day.
R2: If the prices of IBM and SUN go up,
Microsoft’s will most likely(80 percent of
probability) go up the next day and then drop
four days later.
Intelligent Database Systems Lab
Introduction
It is different from mining sequential patterns
in transaction data.
Contribution
an association mining problem that is more
general than what has been discussed in the
literature.
Developed an efficient algorithm for mining
intertransaction association rule from large
databases. (FITI)
Intelligent Database Systems Lab
PROBLEM DESCRIPTION
Definition
Let e1 , e2 ..., eu be a set of literals, called items.
Let D be an attribute and Dom(D) be the domain of D.
A transaction database is a database containing transactions in
the form of (d, E), where d Dom ( D) and E
Definition
1.
2.
A sliding window W in a transaction database T is a block of w
continuous intervals along domain D, starting from interval d 0
such that T contains a transaction at interval d 0 .
Each interval d j in W is called a subwindow of W denoted as W[j],
where j d j d 0 .
We call j the subwindow number of d j within W.
Intelligent Database Systems Lab
PROBLEM DESCRIPTION
Definition 3.
Let W be a sliding window with w intervals and u be the number
of literals in ∑ .
We define a megatransaction M contained within W to be
To distinguish the items in a megatransaction from the items in a
traditional transaction, the items in a megatransaction are called
extended-items.
Intelligent Database Systems Lab
PROBLEM DESCRIPTION
Definition 4.
intratransaction itemset is a set of items A
An intertransaction itemset is a set of extended items B such
that ei (0) B, 1 i u
Definition 5.
An intertransaction association rule is an implication of the form X Y
, where
Intelligent Database Systems Lab
PROBLEM DESCRIPTION
Definition 6.
Let S be the number of transactions in the transaction
database.
Let Txy be the set of megatransactions that contain a
set of extended-items X Y and Tx
be the set of megatransactions that contain X.
Then, the support and confidence of an
intertransaction association rule X Y are defined as
Intelligent Database Systems Lab
PRINCIPLES OF INTERTRANSACTION
ASSOCIATION MINNING METHODS
Decomposed into two subproblems:
Intelligent Database Systems Lab
THE FITI ALGORITHM
FITI consists of the following three phasses.
Intelligent Database Systems Lab
Phase 1 :
Mining and Storing Frequent IntraTransaction Itemsets
Frequent intratransaction itemsets are first
mined using the Apriori algorithm
Frequent-Itemsets Linked Table, FILT
ItemSet Hash Table
1. Lookup Links
2. Generator and Extension Links
3. Subset Links
4. Descendant Links
Intelligent Database Systems Lab
Phase 1 :
Mining and Storing Frequent IntraTransaction Itemsets
Intelligent Database Systems Lab
Phase 2 :
Database Transformation
FITI is to transform the database into a set of
Encoded Frequent-Itemset Tables, FIT
Intelligent Database Systems Lab
Intelligent Database Systems Lab
Phase 3 :
Mining frequent Intertrasaction Itemsets
Intelligent Database Systems Lab
PERFORMANCE STUDY
Generation of Synthetic Data
Relative Performance
Effect of Increasing Maxspan
Effect of Increasing the Number of
Transactions
Effect of Increasing the Average Transaction
Size
Experiment on Real-Life Data
Intelligent Database Systems Lab
Generation of Synthetic Data
TABLE4
Parameter Settings
Intelligent Database Systems Lab
Relative Performance
Compare the performance of EH-Apriori and
FITI
vary the support level on data sets one and two
maxspan equal to four and six
Intelligent Database Systems Lab
Relative Performance
Intelligent Database Systems Lab
Effect of Increasing Maxspan
vary maxspan from two to eight fro both data
sets
Intelligent Database Systems Lab
Effect of Increasing the Number
of Transactions
Vary the number of transactions in dataset one
from 10k to 1,000k
increase linearly
Intelligent Database Systems Lab
Effect of Increasing the Average
Transaction Size
Increase the average transaction size of data
set one from five to 20
Intelligent Database Systems Lab
Experiment on Real-Life Data
Intelligent Database Systems Lab
CONCLUSION
Intorduce a new problem of mining
intertransaction association rules and provided
an extensive study of it.
EH-Apriori and FITI
FITI proves to be much faster than EH-Apriori
Performance is acceptable for real life application
Useful in providing prediction capability along
a single dimension.
Intelligent Database Systems Lab