Using Classification to Evaluate the Output of Confidence

Download Report

Transcript Using Classification to Evaluate the Output of Confidence

The 17th Australian Joint Conference on Artificial Intelligence,
Cairns, 06.12.2004-10.12.2004
Using Classification to Evaluate the
Output of Confidence-Based
Association Rule Mining
Stefan Mutter, Mark Hall, Eibe Frank
University of Freiburg, Germany
University of Waikato, New Zealand
Motivation
• Previous work
– Association rule mining
• Run time used to compare mining algorithms
• Lack of accuracy-based comparisons
– Associative classification:
• Focus on accurate classifiers
• Idea:
Think backwards
Side effect:
Comparison of a standard
associative classifier to
standard techniques
– Using the resulting classifiers as basis for comparisons of
confidence-based rule miners
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
2
Overview
• Motivation
• Basics:
– Definitions
– Associative classification
• (Class) Association Rule Mining
– Apriori vs. predictive Apriori (by Scheffer)
• Pruning
• Classification
• Quality measures and Experiments
evaluate the sort order of
rules using properties
of associative classifiers
• Results
• Conclusions
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
3
Basics: Definitions
• A table over n attributes a1 ,...,an  (item attribute-value pair)
• Class association rule: implication X  Y where
X  a1,..,ai 1, ai 1,..,an , Y  ai , ai class attribute
X body of rule, Y head of the rule
s( X  Y )
s( X )
(support s(X): the number of database records that satisfy X )

• Confidence of a (class) association rule: c 
Relative frequency of a correct prediction in the (training) table
of instances.
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
4
Basics: Mining - Apriori
rules sorted according to: confidence
generates all (class) association rules with support and confidence
larger than predefined values.
1. Mines all item sets above minimum support (frequent item sets)
2. Divide frequent item sets in rule body and head. Check if
confidence of the rule is above minimum confidence
Adaptations to mine class association rules as described by Liu et al (CBA):
• divide training set into classes; one for each class
• mine frequent item set separately in each subset
• take frequent item set as body and class label as head
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
5
Basics: Mining – predictive Apriori
rules sorted according to: expected predicted accuracy
•
Predictive accuracy of a rule r: c( X  Y )  Pr(r satisfies y | r satisfies x)
support based correction of the confidence value
•
Inherent pruning strategy:
– Output the best n rules according to:
1. Expected pred. accuracy among n best
prefers more
general rules
2. Rule not subsumed by a rule with at least the same
expected pred. accuracy
Adaptations to mine class association rules:
–Generate frequent item sets from all data (class attribute deleted)
as rule body
–Generate rule for each class label
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
6
Basics: Pruning
• Number of rules too big for direct use in a classifier
• Simple strategy:
– Bound the number of rules
– Sort order of mining algorithm remains
• CBA: Optional pruning step: pessimistic error-rate-based pruning:
– A rule is pruned if removing a single item from a rule results in a
reduction of the pessimistic error rate
• CBA: Obligatory pruning: database coverage method:
– Rule that classifies at least one instance correctly (is highest
ranked) belongs to intermediate classifier
– Delete all covered instances
– Take intermediate classifier with lowest number of errors
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
7
Overview
• Motivation
• Basics:
– Definitions
– Associative classification
• (Class) Association Rule Mining
– Apriori vs. predictive Apriori (by Scheffer)
• Pruning
• Classification
• Quality measures
• Results
• Conclusions
Think backwards:
Use the properties of different classifiers
to obtain accuracy-based measures
for a set of (class) association rules
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
8
Classification
• Input:
– Pruned, sorted list of class association rules
• Two different approaches
– Weighted vote algorithm
• Majority vote
• Inversely weighted
Think backwards:
Mining algorithm preferable
if resulting classifier is
more accurate, compact,
and built in an efficient way
– Decision list classifier, e.g. CBA
• Use first rule that covers test instance for classification
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
9
Quality measures and Experiments
Measures to evaluate confidence-based mining algorithms:
1. Accuracy on a test set (2 slides)
2. Average rank of the first rule that covers and correctly predicts a
test instance
3. Number of mined rules and number of rules after pruning
4. Time required for mining and for pruning
Comparative study for Apriori and pred. Apriori:
12 UCI datasets: balance,breast-w, ecoli, glass, heart-h, iris, labor
led7, lenses, pima, tic-tac-toe,wine
One 10 fold cross validation
Discretisation
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
10
1a. Accuracy and Ranking
100
Apriori
Pred. Apriori
90
accuracy in %
80
•
Inversely weighted
•
Emphasises top ranked
rules
•
Shows importance of
good rule ranking
70
60
50
40
30
20
10
le
ns
es
p
tic im
-ta a
cto
e
w
in
e
le
d7
r
la
bo
iri
s
ec
ol
i
gl
as
he s
ar
t-h
ba
la
n
br ce
ea
st
-w
0
Mining algorithm preferable if resulting classifier is more accurate.
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
11
1b. How many rules are necessary to be accurate?
ecoli
Majority vote classifier
100
90
accuracy in %
80
70
60
50
40
30
apriori
20
pred apriori
10
0
10
100
200
500
number of rules
Similar results for CBA
700
1000
Mining algorithm preferable if
resulting classifier is more
compact.
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
12
Comparison: CBA to standard techniques
rules after mining
Apriori
Pred. Apriori
ecoli
888.2
304.4
labor
96084.3
228.9
pima
3311.4
179.5
rules after pruning
CBA + Apriori
C4.5
JRip
PART
CBA + pred. Apriori
ecoli
19.8
18.3
9.0
13.6
20.3
labor
26.7
3.6
3.6
3.4
8.7
pima
39.7
19.2
3.3
7.5
39.0
accuracy
CBA + Apriori
C4.5
JRip
PART
CBA + pred. Apriori
ecoli
81.26
84.23
82.16
83.60
80.96
labor
81.33
73.67
77.00
78.67
79.33
pima
74.36
73.83
75.14
75.27
72.79
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
13
Conclusions
• Use classification to evaluate the quality of confidence-based
association rule miners
• Test evaluation:
– Pred. Apriori mines a higher quality set of rules
– Pred. Apriori needs fewer rules
– But: pred. Apriori is slower than Apriori
• Comparison of standard associative classifier (CBA) to standard
ML techniques:
– CBA comparable accuracy to standard techniques
– CBA mines more rules and is slower
• All algorithms are implemented in WEKA or an add-on to WEKA
available from http://www.cs.waikato.ac.nz/~ml
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
14
The End
Thank you for your attention.
Questions...
Contact:
[email protected]
[email protected]
[email protected]
Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining
15