Outline - EnhanceEdu

Download Report

Transcript Outline - EnhanceEdu

Outline
• Introduction
– Frequent patterns and the Rare Item Problem
– Multiple Minimum Support Framework
– Issues with Multiple Minimum Support Framework
• Related Work
• Proposed Approaches
–
–
–
–
Methodology to specify items’ MIS values
An algorithm to mine frequent patterns effectively.
Mining frequent patterns in databases in which items’ frequencies vary widely.
Mining rare periodic-frequent patterns.
• Conclusions and Future Work
1
Mining Frequent Patterns in Which
Items’ Frequencies Vary Widely
 Minimum constraint model can
 Observation:
14
Support
MIS
12
Support count
generate uninteresting frequent
patterns if the items’ frequencies
within a database vary widely.
 Consider an uninteresting pattern
{a,e}
 S(a) = 12, S(e) = 5 and S(a,e)=2
10
8
6
4
2
 Support of a pattern is much
minsup
0
less than the support of the
maximal frequent item within it.
a
c
b
d
e
f
g
h
Items
Figure . Uninteresting frequent patterns getting
generating in minimum constraint model.
2
Minimum-Maximum Constraint Model
(DASFAA
2010)
• Basic Idea
– Limit the difference between the pattern support and the support of the maximal
frequent item within it.
• Item-to-pattern difference (ipd)
• Given a pattern X = {i1,i2,…,ik}, where s(i1) ≤s(i2) ≤ … ≤ s(ik) and MIS(i1) ≤
MIS(i2) ≤ … ≤ MIS(ik), the ipd of a pattern X, denoted as ipd(X) = S(ik)-S(X).
• Example
– In a transactional database if S(bread)=12, S(jam)=8 and S(bread,jam) =
5, ipd(bread,jam) = 7 (=12-5).
• Maximum item-to-pattern difference (mipd)
• A user-specified constraint.
• A pattern X is an interesting pattern if ipd(X) ≤ mipd.
• A pattern X is frequent if
– S(X) ≥ minimum(MIS(i1), MIS(i2),…,MIS(ik))
– Ipd(X) ≤ mipd.
3
Experimental Results
 Datasets
o Synthetic dataset (T10I4D100k)
 1,00,000 transactions
 870 items
o Real-world dataset (Retail)
 88,162 transactions
 16,470 items.
 MIS values for the items (IEEE CIDM 2009)
4
Experimental Results
 Comparing Minimum Constraint Model (MCM) and Minimum-
Maximum Constraint Model (MMCM)
T10I4D100k dataset
Retail dataset
Figure : Generation of frequent patterns in synthetic and retail datasets.
5
Outline
• Introduction
– Frequent patterns and the Rare Item Problem
– Multiple Minimum Support Framework
– Issues with Multiple Minimum Support Framework
• Related Work
• Proposed Approaches
–
–
–
–
Methodology to specify items’ MIS values
An algorithm to mine frequent patterns effectively.
Mining frequent patterns in databases in which items’ frequencies vary widely.
Mining rare periodic-frequent patterns.
• Conclusions and Future Work
6
Mining Periodic-Frequent Patterns
• Periodic-Frequent patterns a class of user interest based frequent
patterns.
• A frequent pattern is periodic if it occurs periodically
throughout the database.
• Technically, a pattern is said to be periodic-frequent if it
satisfies the user-specified minsup and maximum periodicity
(maxprd) thresholds.
– Minsup – controls the minimum number of transactions that
a pattern must cover in a database.
– Maxprd – controls the maximum difference between two
consecutive appearances of a pattern in the database.
7
The Rare Item Problem in PeriodicFrequent Pattern Mining
• Since the basic model of periodic-frequent uses single minsup and
single maxprd values, it still suffers from “rare item problem.”
– If high minsup and/or low maxprd is specified, we miss the
periodic-frequent patterns containing rare items.
• Reason: Rare items have low support and high periodicity.
– To mine periodic-frequent patterns containing rare items, one
should specify low minsup and high maxprd value. However,
they cause combinatorial explosion.
8
Extended Model of Periodic-Frequent
Patterns
 Each item ij is specified with two constraints:
o Minimum item support (MinIS(ij))
o Maximum item periodicity (MaxIP(ij))
 A pattern X ={i1,i2,…,ik}, 1 ≤ k ≤ n, is periodic-frequent if
o S(X) ≥min.(MinIS(i1),MinIS(i2),…,MinIS(ik)) and
o P(X) ≤max.(MaxIP(i1),MaxIP(i2),…,MaxIP(ik))
 Periodic-Frequent patterns do not satisfy downward closure
property.
 A pattern-growth algorithm has been proposed for mining periodicfrequent patterns.
9
Experimental Results
• Datasets
Dataset
Type
# transactions
Items
Nature
T10I4D100K
Synthetic
100,000
870
Sparse
Retail
Real world
88,162
16,470
Sparse
Mushroom
Real world
3196
75
Dense
10
Specifying MaxIP for an Item
• Thought process:
• Are items’ MaxIP values be related to their supports?
mip(ij) = β * S(ij) + Permax
MaxIP(ij) = max.(mip(ij), Permin)
Where, S(ij) is the support of the item ij
Permax and Permin are user-specified maximum and
minimum periodicities
β is a user-specified constant.
 This methodology specifies low MaxIP for frequent items and
high MaxIP for rare items.
11
Experiment 1: Generation of periodicfrequent patterns
MinIS(ij) = max.(µ S(ij), LS)
Where, LS = user-specified least minimum item support
µ is a constant between [0, 1]
µ = 1/ α
12
Experiment 2: Performance test
13
Experiment 3: Scalability test
14
Summary of the Contributions
Topic
Existing
methodology
Performance problem
Proposed Methodology
Specifying
items’ MIS
values
Percentagebased
methodology
Causes rare item problem as it
will not maintain uniform
difference between items’
support and MIS values
Support-difference
based methodology
Patterns do
not satisfy
downward
closure
property
CFP-growth
1. Constructed tree is not
efficient
2. Search space is huge as it
searches using those items
that cannot generate
frequent pattern at higher
order.
CFP-growth++ uses
“least minimum
support”, and
“infrequent leaf node
pruning” to construct
tree effectively.
In addition, uses
“conditional minsup”
and “conditional closure
property” to effectively
reduce the search space.
15
Summary of Contributions
Topic
Existing
methodology
Not sufficient
for databases
of widely
varying items’
frequencies
Multiple
minimum
support
framework
Generates uninteresting
frequent patterns containing
both very high and very low
frequency items. The items
within the pattern are not
correlated.
A new interestingness
measure “item-topattern difference” has
been extended to prune
such interesting
frequent patterns.
Single
minimum
support and
single
maximum
periodicity
framework
The rare item problem.
1. The multiple
minimum supports
and maximum
periodicity
framework
2. A pattern growth
algorithm
Periodicfrequent
pattern
mining.
Performance problem
Proposed Methodology
16
Outline
• Introduction
– Frequent patterns and the Rare Item Problem
– Multiple Minimum Support Framework
– Issues with Multiple Minimum Support Framework
• Related Work
• Proposed Approaches
–
–
–
–
Methodology to specify items’ MIS values
An algorithm to mine frequent patterns effectively.
Mining frequent patterns in databases in which items’ frequencies vary widely.
Mining rare periodic-frequent patterns.
• Conclusions and Future Work
17
Conclusions and Future Work
• For efficient mining of rare frequent patterns, the notion “support
difference” has been exploited to specify items’ MIS values.
• An Improved FP-growth-like approach has been proposed for
mining rare frequent patterns. Various heuristics have been
exploited to reduce the search space.
• To extract rare frequent patterns from the datasets of widely
varying items’ frequencies, we introduce a new measure, called
“item-to-pattern difference” and proposed an efficient FP-growthlike approach.
• Overall, the proposed approaches provide scope to effectively
mine interesting frequent patterns or association rules by trading
with the additional efforts from the user, especially in terms of
specifying items’ MIS values.
18
Conclusions and Future Work
• The future work is as follows:
– The interestingness of patterns discovered using various
measures needs to be studied.
– Rare item problem in various constraint-based patterns needs
to be investigated and addressed.
– The multiple minsups framework needs to be extended to other
data sets such as streams, sequential data and uncertain data
etc.
19
References
 N. Jiang and L. Gruenwald. Research issues in data stream association
rule mining. SIGMOD Rec., 35:14–19, March 2006.
 M. V. Joshi, R. C. Agarwal, and V. Kumar. Predicting rare classes: can
boosting make any weak
 learner strong? In KDD ’02: Proceedings of the eighth ACM SIGKDD
international conference on Knowledge discovery and data mining,
pages 297–306, New York, NY, USA, 2002. ACM
 B. Liu, W. Hsu, and Y. Ma. Mining association rules with multiple
minimum supports. In KDD ’99: Proceedings of the fifth ACM
SIGKDD international conference on Knowledge discovery and data
mining, pages 337–341, New York, NY, USA, 1999. ACM.
References
 P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the right
interestingness measure for association patterns. In KDD ’02:
Proceedings of the eighth ACM SIGKDD international conference on
Knowledge discovery and data mining, pages 32–41, New York, NY,
USA, 2002. ACM.
 S. K. Tanbeer, C. F. Ahmed, B.-S. Jeong, and Y.-K. Lee. Discovering
periodic-frequent patterns in transactional databases. In PAKDD ’09:
Proceedings of the 13th Pacific-Asia Conference on Advances in
Knowledge Discovery and Data Mining, pages 242–253, Berlin,
Heidelberg, 2009.Springer-Verlag.
 G. M. Weiss. Mining with rarity: a unifying framework. SIGKDD
Explor. Newsl., 6(1):7–19,2004.
References
 J. Yang, W. Wang, and P. S. Yu. Mining asynchronous periodic
patterns in time series data. In Proceedings of the sixth ACM SIGKDD
international conference on Knowledge discovery and data mining,
KDD ’00, pages 275–279, New York, NY, USA, 2000. ACM
 J. Yang, W. Wang, and P. S. Yu. Infominer: mining surprising periodic
patterns. In Proceedings of the seventh ACM SIGKDD international
conference on Knowledge discovery and data mining,KDD ’01, pages
395–400, New York, NY, USA, 2001. ACM.

L. Zhou and S. Yau. Association rule and quantitative association rule
mining among infrequent items. In MDM ’07: Proceedings of the 8th
international workshop on Multimedia data mining,pages 1–9, New
York, NY, USA, 2007. ACM.