problem description

Download Report

Transcript problem description

國立雲林科技大學
National Yunlin University of Science and Technology
Efficient Mining of Intertransaction
Association Rules
Advisor:Dr. Hsu
Graduate: Keng-Wei Chang
Author:Anthony K.H. Tung
Hongjun Lu
Jiawei Han
Ling Feng
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING VOL. 15, NO. 1,
JANUARY/FEBRUARY 2003
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Outline









Motivation
Objective
Introduction
PROBLEM DESCRIPTION
PRINCIPLES OF INTERTRANSACTION ASSOCIATION
MINNING METHODS
THE FITI ALGORITHM
PERFORMANCE STUDY
CONCLUSION
PERSONAL OPINION
Intelligent Database Systems Lab
Intratransaction v.s. Intertransaction

Intratransaction




same transaction
same customer
same day
Intertransaction

different transaction
Intelligent Database Systems Lab
Motivation

Most of the previous studies on mining
association rules are on mining intratransaction
associations.



same transaction
same customer
same day
Intelligent Database Systems Lab
Objective

we break the barrier of transactions and extend
the scope of mining association rules from
traditional single-dimensional, intratransaction
associations to multidimensional,
intertransaction associations.
Intelligent Database Systems Lab
Introduction

Most of the pervious studies…

There is an important form of association rules
which are useful but could not discovered with
existing association rule mining framework.
Intelligent Database Systems Lab
Introduction

Example:stock

R1: When the prices of IBM and SUN go up, at 80
percent of probability the price of Microsoft goes
up on the same day.

R2: If the prices of IBM and SUN go up,
Microsoft’s will most likely(80 percent of
probability) go up the next day and then drop
four days later.
Intelligent Database Systems Lab
Introduction


It is different from mining sequential patterns
in transaction data.
Contribution


an association mining problem that is more
general than what has been discussed in the
literature.
Developed an efficient algorithm for mining
intertransaction association rule from large
databases. (FITI)
Intelligent Database Systems Lab
PROBLEM DESCRIPTION
Definition



Let   e1 , e2 ..., eu  be a set of literals, called items.
Let D be an attribute and Dom(D) be the domain of D.
A transaction database is a database containing transactions in
the form of (d, E), where d  Dom ( D) and E  
Definition



1.
2.
A sliding window W in a transaction database T is a block of w
continuous intervals along domain D, starting from interval d 0
such that T contains a transaction at interval d 0 .
Each interval d j in W is called a subwindow of W denoted as W[j],
where j  d j  d 0 .
We call j the subwindow number of d j within W.
Intelligent Database Systems Lab
PROBLEM DESCRIPTION

Definition 3.



Let W be a sliding window with w intervals and u be the number
of literals in ∑ .
We define a megatransaction M contained within W to be
To distinguish the items in a megatransaction from the items in a
traditional transaction, the items in a megatransaction are called
extended-items.
Intelligent Database Systems Lab
PROBLEM DESCRIPTION

Definition 4.



intratransaction itemset is a set of items A  
An intertransaction itemset is a set of extended items B   such
that ei (0)  B, 1  i  u
Definition 5.

An intertransaction association rule is an implication of the form X  Y
, where
Intelligent Database Systems Lab
PROBLEM DESCRIPTION

Definition 6.



Let S be the number of transactions in the transaction
database.
Let Txy be the set of megatransactions that contain a
set of extended-items X  Y and Tx
be the set of megatransactions that contain X.
Then, the support and confidence of an
intertransaction association rule X  Y are defined as
Intelligent Database Systems Lab
PRINCIPLES OF INTERTRANSACTION
ASSOCIATION MINNING METHODS

Decomposed into two subproblems:
Intelligent Database Systems Lab
THE FITI ALGORITHM

FITI consists of the following three phasses.
Intelligent Database Systems Lab
Phase 1 :
Mining and Storing Frequent IntraTransaction Itemsets


Frequent intratransaction itemsets are first
mined using the Apriori algorithm
Frequent-Itemsets Linked Table, FILT





ItemSet Hash Table
1. Lookup Links
2. Generator and Extension Links
3. Subset Links
4. Descendant Links
Intelligent Database Systems Lab
Phase 1 :
Mining and Storing Frequent IntraTransaction Itemsets
Intelligent Database Systems Lab
Phase 2 :
Database Transformation

FITI is to transform the database into a set of
Encoded Frequent-Itemset Tables, FIT
Intelligent Database Systems Lab
Intelligent Database Systems Lab
Phase 3 :
Mining frequent Intertrasaction Itemsets
Intelligent Database Systems Lab
PERFORMANCE STUDY






Generation of Synthetic Data
Relative Performance
Effect of Increasing Maxspan
Effect of Increasing the Number of
Transactions
Effect of Increasing the Average Transaction
Size
Experiment on Real-Life Data
Intelligent Database Systems Lab
Generation of Synthetic Data
TABLE4
Parameter Settings
Intelligent Database Systems Lab
Relative Performance

Compare the performance of EH-Apriori and
FITI


vary the support level on data sets one and two
maxspan equal to four and six
Intelligent Database Systems Lab
Relative Performance
Intelligent Database Systems Lab
Effect of Increasing Maxspan

vary maxspan from two to eight fro both data
sets
Intelligent Database Systems Lab
Effect of Increasing the Number
of Transactions

Vary the number of transactions in dataset one
from 10k to 1,000k
increase linearly
Intelligent Database Systems Lab
Effect of Increasing the Average
Transaction Size

Increase the average transaction size of data
set one from five to 20
Intelligent Database Systems Lab
Experiment on Real-Life Data
Intelligent Database Systems Lab
CONCLUSION


Intorduce a new problem of mining
intertransaction association rules and provided
an extensive study of it.
EH-Apriori and FITI



FITI proves to be much faster than EH-Apriori
Performance is acceptable for real life application
Useful in providing prediction capability along
a single dimension.
Intelligent Database Systems Lab