Mining Closed Frequent Itemsets in Data Streams
Download
Report
Transcript Mining Closed Frequent Itemsets in Data Streams
CFI-Stream: Mining Closed
Frequent Itemsets in Data
Streams
Nan Jiang,Le Gruenwald
SIGKDD’06
報告者:林靜怡
2006/10/04
Introduction
mining Closed frequent itemsets
computes and maintains closed
itemsets online and incrementally
perform the closure checking
output the current closed frequent
itemsets in real time based on users’
specified thresholds
Definition
D:data stream
I = { i 1 , i 2 , …, in } :a set of n elements,
called items
T: subsets of all the transactions
X: subsets of all the items appearing
in a data stream
Definition
C(X):the smallest closed set containing
X
Definition 1
An itemset X is said to be closed if and
only if C(X)= f(g(X)) = f•g(X) = X
Algorithm
CFI-Stream algorithm
DIrect Update (DIU) tree
perform the closure checking online
over a data stream sliding window
Conditions need to check for closed
itemsets
check when performing addition and
deletion operations on the DIU tree
DIU tree
maintain the current closed itemsets
k levels in the DIU tree, each level i
stores the closed i-itemsets
DIU tree
Each node in the DIU tree stores
a closed itemset
its current support information
links to its parent and children nodes
Add a Transaction to the DIU Tree
T1:original transaction set
t:new arrived transaction
Conditions to Check for Closed Itemsets
(1)
t is in the T1, if the largest itemset X it
contains is not currently in the DIU tree
->check for all X’s subsets Y, which are in
T1
(2)
when t is not in T1, for each its subset
Y, if Y is in T1, we need to check
Closure Checking for Addition
1
C,D
2
A,B
3
A,B,C
4
A,B,C
ABC
C 32
2
1
3
AB
21
11
CD
CD
Delete a Transaction in DIU
Tree
Conditions to Check for Closed Itemsets
When the number of the transactions
with same itemset of X is equal to zero,
if Y is a subset of X, and Y is a closed
itemset in the original transaction set
Closure Checking for Deletion
1
C,D
2
A,B
3
A,B,C
4
A,B,C
C
23
AB
ABC
2
32
1
CD
Experiment
Synthetic datasets
T10.I6.D100K and T5.I4.D100K
Experiment
Experiment