A Tree-based Approach for Frequent Pattern Mining from Uncertain
Download
Report
Transcript A Tree-based Approach for Frequent Pattern Mining from Uncertain
A Tree-based Approach for Frequent
Pattern Mining from Uncertain Data
Carson Kai-Sang Leung, Mark Anthony F. Mateo,
and Dale A. Brajczuk
PAKDD 2008
Outline
Motivation
UF-Growth algorithm
Construction of the UF-Tree
Mining of Frequent Patterns from the UF-Tree
Improvements to UF-Growth algo.
Experimental Results
Conslusion
Motivation
Over the past decade, there have been numerous studies on
mining frequent patterns from precise data.
However, there are situations in which users are uncertain
about the presence or absence of some items.
suspicion
UF-Growth Algorithm
The algorithm consists of two operations:
The construction of UF-tree
The mining of frequent patterns from UF-tree
Construction of the UF-Tree
minsup = 1
Scan DB
a : 2.7
b: 2.625
c: 2.52429
d: 2.20875
e:2.1575
1
1
1
Mining of Frequent Patterns from the
UF-Tree
{e}-projected DB
expSup({a,e}) = (1*0.72*0.9)+(2*0.71875*0.9) =1.94175
expSup({d,e}) = (1*0.72*0.71875)+(2*0.71875*0.72) =1.5525
{a,e} and {d,e} are frequent
(Cont.)
{d,e}-projected DB
{e}-projected DB
expSup({d,e}) in {d,e}-projected DB is 0.5175=0.71875*0.72
expSup ({a,d,e})=3*0.5175*0.9=1.39725
{a}, {a,d}, {a,d,e}, {a,e}, {b}, {b,c}, {c}, {d}, {d,e}, and {e}
Improvements to UF-Growth Algorithm
The UF-tree above may appear to require a large amount of
memory
Improvement
1.
To increase the chance of path sharing, we discretize and
round the expected support of each tree node up to k dceimal
places
(Cont.)
2.
The iproved UF-growth does not need to bulid subsequent
UF-trees for any non-singleton patterns.
{e}-projected DB
{a,e},{a,d,e},
{a,d,e},{d,e}
{d,e} with their
To enumerate
enumerateallallitsitssubsets
subsetsand{a,e},
their
accumulative
expected
supports
1.94175,
andfar.
expected
supports
equal
0.648,equal
0.46575
and1.39725
0.5175 so
1.5525
Experimental Results
(Cont.)
Conclusion
Improvement 1. method may cause false positive.