Transcript ppt-file
Evaluation of classifiers:
choosing the right model
Rajashree Paul
Irina Pekerskaya
1. Introduction
Different classification algorithms – which
one is the best?
No free lunch theorem
Huge amount of experimental evaluations
already conducted
Idea – to ask what is important to the user
SFU, CMPT 740
R. Paul, I. Pekerskaya
2
1.1. Long-term vision
User define quality measures that are
important
Data characteristics of the given dataset are
analyzed
Feature selection method is chosen
Classification method is chosen
The parameters of the classification method
are optimized
SFU, CMPT 740
R. Paul, I. Pekerskaya
3
1.2. Project goals
Analyze possible user-defined criteria
Analyze possible data characteristics
Analyze the behavior of chosen algorithms
Give some suggestions on the algorithm choice
with respect to those criteria and characteristics
Survey some approaches for automatic selection
of the classification method
SFU, CMPT 740
R. Paul, I. Pekerskaya
4
Presentation Progress
Introduction
Classification Algorithms
Specifying Classification Problems
Combining Algorithms and different
measures for a final mapping
Meta Learning and automatic selection of
the algorithm
SFU, CMPT 740
R. Paul, I. Pekerskaya
5
2. Classification Algorithms:
Decision Tree (C4.5)
Neural Net
Naïve Bayes
k – Nearest Neighbor
Linear Discriminant Analysis (LDA)
SFU, CMPT 740
R. Paul, I. Pekerskaya
6
2.1
Decision Trees (C4.5)
Starts with the root as complete dataset
Chooses the best split using Information Ratio and
partitions the dataset accordingly
Recursively does the same thing at each node
Stops when no attribute is left or all records of the
node are of same class
Applies Post pruning to avoid overfitting
SFU, CMPT 740
R. Paul, I. Pekerskaya
7
2.2
Neural Net
Layered network of Neurons ( Perceptrons) connected via
weighted edges
In Backpropagation Network
1. Input and the Expected Output are fed forward
2. Actual and Expected Outputs are compared (Error)
3. For each node Error is propagated back through the
network to tune the weights and bias of each node
SFU, CMPT 740
R. Paul, I. Pekerskaya
8
2.3
Naïve Bayes
Assumes that the Input Attributes are independent
of each other given a target class
Decision Rule of the Naïve Bayes Classifier :
argmax P(Cj) P(O | Cj)
i
cj C
where P(Oi | Cj) is the posterior probability
of attribute Oi conditioned on class Cj
SFU, CMPT 740
R. Paul, I. Pekerskaya
9
2.4
k – Nearest Neighbor
Uses a distance function as a measure of
similarity/dissimilarity between two objects
Classifies objects to the majority class of k nearest objects
Value of k may vary
SFU, CMPT 740
R. Paul, I. Pekerskaya
10
2.5 Linear Discriminant Analysis (LDA)
Optimal for normally distributed data and classes having
equal covariance structure
What it does:
For each class it generates a linear function
For a problem with d features it separates the classes
by (d-1) dimensional hyperplanes
SFU, CMPT 740
R. Paul, I. Pekerskaya
11
Presentation Progress
Introduction
Classification Algorithms
Specifying Classification Problems
Combining Algorithms and different
measures for a final mapping
Meta Learning and automatic selection of
the algorithm
SFU, CMPT 740
R. Paul, I. Pekerskaya
12
3. Specifying Classification
Problems
3.1
Quality Measures
3.2
Data Characteristics
SFU, CMPT 740
R. Paul, I. Pekerskaya
13
3.1
Quality Measures
Accuracy
Training Time
Execution Time
Interpretability
Scalability
Robustness
…
SFU, CMPT 740
R. Paul, I. Pekerskaya
14
3.2 Data Characteristics
Size of Dataset
Number of Attributes
Amount of Training Data
Proportion of Symbolic Attributes
Proportion of Missing Values
Proportion of Noisy Data
Linearity of Data
Normality of Data
SFU, CMPT 740
R. Paul, I. Pekerskaya
15
Presentation Progress
Introduction
Classification Algorithms
Specifying Classification Problems
Combining Algorithms and different
measures for a final mapping
Meta Learning and automatic selection of
the algorithm
SFU, CMPT 740
R. Paul, I. Pekerskaya
16
4. Combining Algorithms and
Measures
1st
step toward our goal
Comparing the algorithms with respect
to different Data Characteristics
SFU, CMPT 740
R. Paul, I. Pekerskaya
17
4.1 Evaluating Algorithms with respect to Data Characteristics
Large # large dataset Small
Lots
of Lots
of Lots of noisy Nonattributes
amount of symbolic missing
data
linearity
training data attributes values
C4.5
No
NNet
Yes
Yes (time No
increase)
No
Yes
NB
Yes
Yes
Ok
k-NN
Ok
Yes
No
LDA
Not enough Yes
information
Ok
Yes
Contradicti Ok
Yes
ve
No
Contradicti Ok
Yes
ve
Preferably Yes
Yes
Yes
no
Preferably No
Preferably Yes
no
no
No
No
Not enough No
information
SFU, CMPT 740
R. Paul, I. Pekerskaya
Nonnormality
Yes
Yes
No
Yes
No
18
4. Combining Algorithms and
Measures …(contd.)
2nd
step…
Comparing Algorithms in terms of
different Quality Measures
SFU, CMPT 740
R. Paul, I. Pekerskaya
19
4.2 Evaluating Algorithms with respect to Quality Measures
C4.5
NNet
NB
k-NN
LDA
Accuracy Training
Time
Yes
Yes
Yes
No
Yes
Yes
No
Yes
Yes
Yes
Execution
Time
Yes
Ok
Yes
No
Yes
SFU, CMPT 740
R. Paul, I. Pekerskaya
Interpretab Scalability
ility
Yes
Yes
No
No
Ok
Yes
No
Yes
No
Yes
20
4. Combining Algorithms and
Measures…(Contd.)
Finally…
Merging the information of two
tables into one table :
(Data characteristics + Quality Measures) => Algorithms
SFU, CMPT 740
R. Paul, I. Pekerskaya
21
4.3 Choice of the algorithm
# of attributes
Prevalent type of Lots of Lots of Nonattributes
missing noisy data linearity
Small Large
Small large
symbolic numeric values
Accuracy C4.5, LDA NNet, NB NNet, C4.5 C4.5, NB C4.5
NNet, LDA NB, NNet NNet, NB Nnet
LDA
Training C4.5, LDA, NB, k-NN C4.5
NB, k-NN C4.5
NB, LDA, NB, C4.5 NB, C4.5 C4.5, NB,
time
k-NN
k-NN
k-NN
Execution C4.5, LDA
time
Interpretab C4.5
ility
Size of data
NB
NB
C4.5
NB, LDA C4.5
C4.5, NB, C4.5
LDA
Nonnormality
NNet, C4.5
C4.5, k-NN
NB, LDA, NB, C4.5 NB, C4.5 C4.5, NB, C4.5, Nnet
NNet
Nnet
C4.5, NB C4.5, NB C4.5, NB C4.5, NB C4.5, NB C4.5
SFU, CMPT 740
R. Paul, I. Pekerskaya
22
Presentation Progress
Introduction
Classification Algorithms
Specifying Classification Problems
Combining Algorithms and different
measures for a final mapping
Meta Learning and automatic selection of
the algorithm
SFU, CMPT 740
R. Paul, I. Pekerskaya
23
5. Meta-learning
Overview
We have some knowledge about the algorithms
performance on some datasets
Measure similarity between given dataset and previously
processed ones and choose those that are similar
Measure algorithm performance on chosen datasets
Rank the algorithms
SFU, CMPT 740
R. Paul, I. Pekerskaya
24
5. Meta-learning
Measuring the similarity between datasets
Data characteristics:
Number of examples
Number of attributes
Proportion of symbolic attributes
Proportion of missing values
Proportion of attributes with outliers
Entropy of classes
SFU, CMPT 740
R. Paul, I. Pekerskaya
25
5. Meta-learning
Measuring the similarity between datasets
K-Nearest Neighbor
Distance function:
SFU, CMPT 740
R. Paul, I. Pekerskaya
26
5. Meta-learning
Evaluating algorithms
Adjusted ratio of ratios(ARR) multicriteria
evaluation measure:
AccD – the amount of accuracy user can trade
for a 10 times speedup or slowdown
SFU, CMPT 740
R. Paul, I. Pekerskaya
27
5. Meta-learning
Ranking algorithms
Calculate ARR for each algorithm
Rank them according to this measure
SFU, CMPT 740
R. Paul, I. Pekerskaya
28
5. Meta-learning
Given a new dataset
with data certain
characteristics
Time and
accuracy
on training
datasets
Find k datasets
most similar to
the new
problem dataset
using k-NN
algorithm
Count the ARR
measure for all the
algorithms for k
chosen datasets
Rank the
algorithms
according to ARR
SFU, CMPT 740
R. Paul, I. Pekerskaya
29
5. Meta-learning
Promising approach, as it allows
quantitative measure
Needs more investigation, for example
incorporating more parameters to the ARR
parameter
SFU, CMPT 740
R. Paul, I. Pekerskaya
30
6. Conclusion
Done:
Overall mapping of five classifiers according to
some user specified parameters along with some
characteristics reflecting the type of datasets.
Surveyed systems for automatic selection of the
appropriate classifier or rank some different
classifiers given a dataset and set of user
preferences
SFU, CMPT 740
R. Paul, I. Pekerskaya
31
6. Conclusion
Future research
Performing dedicated experiments
Analysis of the choice of feature selection
method and optimization of the parameters
Working on the intelligent system for
choosing classification method
SFU, CMPT 740
R. Paul, I. Pekerskaya
32
7. References
1. Brazdil, P., & Soares, C. (2000). A comparison of ranking methods for classification algorithm
selection. In R.de M´antaras & E. Plaza (Eds.), Machine Learning: Proceedings of the 11th European
Conference on Machine Learning ECML2000 (pp. 63–74). Berlin: Springer.
2. Brodley, C., Utgoff ,C. (1995). Multivariate Decision Trees, Machine Learning, 19
3. Brown, D., Corruble, V, Pittard, C.L. (1993) A Comparison of Decision Tree Classifiers With
Backpropagation Neural Networks For Multimodal Classification Problems, Pattern Recognition,
Vol. 26,No. 6
4. Curram, S., Mingers, J (1994) Neural Networks, Decision Tree Induction and Discriminant
Analysis : An Empirical Comparison, Operational Research Society, Vol. 45, No. 4
5. Han, J, Kamber, M. (2001) Data Mining:concepts and techniques. San Francisco:Morgan
Kaufmann Publishers
6. Hilario, M., & Kalousis, A. (1999). Building algorithm profiles for prior model selection in
knowledge discovery systems. In Proceedings of the IEEE SMC’99 International Conference on
Systems, Man and Cybernetics. New York: IEEE Press.
7. Kalousis, A.,&Theoharis, T. (1999).NOEMON:Design, implementation and performance results of
an intelligent assistant for classifier selection. Intelligent Data Analysis, 3:5, 319–337.
SFU, CMPT 740
R. Paul, I. Pekerskaya
33
7. References
8. Keller, J., Paterson, I., & Berrer, H. (2000). An integrated concept for multi-criteria ranking of
data-mining algorithms. In J. Keller & C. Giraud-Carrier (Eds.), Meta-Learning: Building Automatic
Advice Strategies for Model Selection and Method Combination.
9. Kiang, M. (2003). A Comparative Assessment of Classification Methods, Decision Support
Systems, 35
10. Kononenko,I, Bratko,I. (1991). Information-Based Evaluation Criterion for Classifier’s
Performance, Machine Learning ,6
11 Lim, T., Loh, W., Shoh, Y.(2000). A comparison of Prediction Accuracy, Complexity and Training
Timeof Thirty-Three Old and New Classification Algorithms, Machine Learning ,6
12. Michie, D., Spiegelhalter, D., & Taylor, C. (1994). Machine Learning, Neural and Statistical
Classification. Ellis Horwood.
13. Mitchell, T. (1997). Machine Learning. New York: McGraw-Hill.
14. Quinlan, J. (1993) C4.5 : programs for machine learning. San Francisco : Morgan Kaufmann
Publishers
SFU, CMPT 740
R. Paul, I. Pekerskaya
34
7. References
15. Salzberg, S.(1997) On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach,
Data Mining and knowledge Discovery, 1
16. Shalvic, J, Mooney, R., Towell, G. (1991). Symbolic and Neural Learning Algorithms: An
Experimental Comparison, Machine Learning ,40
17. Todorovski, L., & Dˇzeroski, S. (2000). Combining multiple models with meta decision trees. In
D. Zighed, J. Komorowski, & J. Zytkow (Eds.), Proceedings of the Fourth European Conference on
Principles and Practice of Knowledge Discovery in Databases (PKDD00) (pp. 54–64). New York:
Springer.
18. Weiss, S., Kulikowski, C. (1991). Computer Systems that learn. San Francisco:Morgan
Kaufmann Publishers
19. Witten, I., Frank, E. (2000) Data mining: practical machine learning tools and techniques with
Java implementations. San Francisco:Morgan Kaufmann Publishers
20. Brazdil, P., Soares, C.(2003) Ranking Learning Algorithms: Using IBL and Meta-Learning on
Accuracy and Time Results, Machine Learning, 50, 251–277
21. Blake, C., Keogh, E., & Merz, C. (1998). Repository of machine learning databases. Available at
http:/www. ics.uci.edu/~mlearn/MLRepository.html
SFU, CMPT 740
R. Paul, I. Pekerskaya
35
Thank you
Questions?