A Tree-based Approach for Frequent Pattern Mining from Uncertain

Download Report

Transcript A Tree-based Approach for Frequent Pattern Mining from Uncertain

A Tree-based Approach for Frequent
Pattern Mining from Uncertain Data
Carson Kai-Sang Leung, Mark Anthony F. Mateo,
and Dale A. Brajczuk
PAKDD 2008
Outline
 Motivation
 UF-Growth algorithm
 Construction of the UF-Tree
 Mining of Frequent Patterns from the UF-Tree
 Improvements to UF-Growth algo.
 Experimental Results
 Conslusion
Motivation
 Over the past decade, there have been numerous studies on
mining frequent patterns from precise data.
 However, there are situations in which users are uncertain
about the presence or absence of some items.
suspicion
UF-Growth Algorithm
 The algorithm consists of two operations:
 The construction of UF-tree
 The mining of frequent patterns from UF-tree
Construction of the UF-Tree
minsup = 1
Scan DB
a : 2.7
b: 2.625
c: 2.52429
d: 2.20875
e:2.1575
1
1
1
Mining of Frequent Patterns from the
UF-Tree
{e}-projected DB
 expSup({a,e}) = (1*0.72*0.9)+(2*0.71875*0.9) =1.94175
 expSup({d,e}) = (1*0.72*0.71875)+(2*0.71875*0.72) =1.5525
 {a,e} and {d,e} are frequent
(Cont.)
{d,e}-projected DB
{e}-projected DB
 expSup({d,e}) in {d,e}-projected DB is 0.5175=0.71875*0.72
 expSup ({a,d,e})=3*0.5175*0.9=1.39725
 {a}, {a,d}, {a,d,e}, {a,e}, {b}, {b,c}, {c}, {d}, {d,e}, and {e}
Improvements to UF-Growth Algorithm
 The UF-tree above may appear to require a large amount of
memory
 Improvement
1.
To increase the chance of path sharing, we discretize and
round the expected support of each tree node up to k dceimal
places
(Cont.)
2.
The iproved UF-growth does not need to bulid subsequent
UF-trees for any non-singleton patterns.
{e}-projected DB
{a,e},{a,d,e},
{a,d,e},{d,e}
{d,e} with their
 To enumerate
enumerateallallitsitssubsets
subsetsand{a,e},
their
accumulative
expected
supports
1.94175,
andfar.
expected
supports
equal
0.648,equal
0.46575
and1.39725
0.5175 so
1.5525
Experimental Results
(Cont.)
Conclusion
 Improvement 1. method may cause false positive.