Exploratory Mining and Pruning Optimization of Constrained

Download Report

Transcript Exploratory Mining and Pruning Optimization of Constrained

Exploratory Mining and Pruning
Optimization of Constrained
Associations Rules
1998년 8월 7일
Data Engineering Lab
성 유진
1
Abstract
• Standpoint of supporting human-centered
discovery of Knowledge
– lack of user exploration and control
– lack of focus
– rigid notion of relationship
• Constrained association queries
– pruning using monotonicity, succinctness
1998년 8월 7일
Data Engineering Lab
성 유진
2
Introduction
• Problem1 (Lack of User Exploration and
Control)
– Mining Process => Black Box
– (user can’t preempt and needs to wait for hours)
– establish clear breakpoints to allow user feedback
• Problem2 (Lack of Focus)
– on which to focus the mining
 to find association between sets of items whose
types do not overlap
1998년 8월 7일
Data Engineering Lab
성 유진
3
 associations from item sets whose total price is at
least $1,000
– provide a rich interface for the user to express
focus (CAQ)
• Problem3 (Rigid notion of Relationship)
– significance metrics :
– separate criteria for selecting candidates for the
antecedent and consequent:
 association from items to sets of types
pepsi => snacks
1998년 8월 7일
Data Engineering Lab
성 유진
4
1998년 8월 7일
Data Engineering Lab
성 유진
5
Architecture
• Phase 1
– user initially specifies CAQ
• includes a set of constraints C
• C is applicable to the antecedent and consequent
– output:
• pairs of candidates(Sa, Sc)
• Sa, Sc have support over thresholds
– user can add, delete, of modify the constraints
as many times as desired
1998년 8월 7일
Data Engineering Lab
성 유진
6
• Phase 2
– significance metric
– a threshold for the metric
– whatever further conditions to be imposed ont
the antecedent and consequent
 classical association mining
- confidence (as significance metric)
- confidence threshold
- require ( SaSc) be frequent
1998년 8월 7일
Data Engineering Lab
성 유진
7
1998년 8월 7일
Data Engineering Lab
성 유진
8
Constrained Association Queries
• CAQ
– S  Item : S is a set variable on the Item
domain
– {(S1, S2) |C}, C is a set of constraints on S1, S2
– frequent constraints freq(Si)
– trans(TID, Itemset), iteminfo(Item, Type, Price)
– S.price  100 : all items in S are of price less
than of equal to $100
– {snacks, sodas}  S.Type
1998년 8월 7일
Data Engineering Lab
성 유진
9
• CAQ Examples
– {(S1, S2) | S1 Item & S2  Item & count(S1) = 1 & count(S2)
= 1 & freq(S1) & freq(S2)}
• S1.Type  S2.Type   and max(S1.Price)  avg(S2.Price)
– {(S1, S2) | agg1(S1.Price)  100 & agg2(S2.Price  1000}
– {(S1, S2) | S1.Type  {Snacks} & S2.Type  {beers} &
max(S1.Price)  min(S2.Price)
• Sound/Complete
– algorithm is sound if it only finds frequent sets that satisfy the
given constraints
– algorithm is complete if all frequent sets satisfying the given
constraints are found
1998년 8월 7일
Data Engineering Lab
성 유진
10
• Goal
– to push the constraints as deeply as possible
inside the computation of frequent set
– classical algorithm + test them for constraint
satisfaction => too inefficient
– sound/complete : anti-monotone, succinctness
1998년 8월 7일
Data Engineering Lab
성 유진
11
Anti-Monotone Constraints
• Find constraints which satisfy anti-monotone
– prune away a significant num of candidates
• Definition
– A 1-var constraint C is anti-monotone iff for all sets
S, S’:
• S  S’ & S satisfies C  S’ satisfies C
• Identify which constraints are anti-monotone
– Fig3
– min(S)  v (anti-monotone) , min(S)  v (not )
1998년 8월 7일
Data Engineering Lab
성 유진
12
1998년 8월 7일
Data Engineering Lab
성 유진
13
Succinct Constraints
• once-and-for-all (before any iteration takes place)
– not generate and test paradigm
– how to
• succinctness
• member generating functions
– definition
• SATc(Item) : the set of item sets satisfying C , pruned space
– C1  S.Price 100 , pruned space for C1 contains only item sets
such that each item in the set has a price at least $100
• selection predicate, p
1998년 8월 7일
Data Engineering Lab
성 유진
14
1998년 8월 7일
Data Engineering Lab
성 유진
15
Example
 C1  S.Price 100 , let Item1 = price 100 (Item):
 C1 is succinct because its pruned space SATc1(Item)
is simply 2item1
 C2  {snacks, sodas}  S.Type : Let Item2, Item3 , Item4
be the sets type = ‘snacks’(Item), type = ‘sodas’(Item) , type  ‘snacks’ 
type  ‘sodas’
(Item)
 C2 is succint SATC2(Item) can be expressed as
2item - 2item2 - 2item3 - 2item4 - 2item2  item4 - 2item3  item4
1998년 8월 7일
Data Engineering Lab
성 유진
16
 Example
C1  S.Price 100, MGF = {X |X  Item1 & C   }
C2  {snacks, sodas}  S.Type, MGF = {X1 X2  X3|
X1 Item2 & X1   & X2 Item3 & X2   & X3 Item4}
1998년 8월 7일
Data Engineering Lab
성 유진
17
Algorithms
• Algorithm Apriori+
– computes the frequent set => among frequent set, those
which satisfy constraints become answer set
• Algorithm Hybrid(m)
– in case (C - Cfreq ) is more selective , apriori+ is
inefficient
– First check Cfreq for m iterations
– to reduce the remaining I/O cost, it switches to
checking (C- Cfreq)
1998년 8월 7일
Data Engineering Lab
성 유진
18
1998년 8월 7일
Data Engineering Lab
성 유진
19
CAP algorithm
• 4 Cases
 succinct and Anti-monotone
– Replace C1 in the Apriori Algorithm by C1c
 succinct but not anti-monotone
1998년 8월 7일
Data Engineering Lab
성 유진
20
 Anti-monotone but Non-succinct
– Define Ck as in apriori algorithm, drop the candidates S if S fails C
– constraint satisfaction is tested before counting is done
 neither
–
Induce any weaker constraint C’ from C, depending on whether
C’ is anti-monotone and /or sucinct, use the above strategies
– Once all frequent sets are generated, test them for satisfaction of C
1998년 8월 7일
Data Engineering Lab
성 유진
21
1998년 8월 7일
Data Engineering Lab
성 유진
22