Distributed Database Management Systems
Download
Report
Transcript Distributed Database Management Systems
Evaluation of WEKA
Waikato Environment for Knowledge Analysis
Presented By:
Manoj Wartikar &
Sameer Sagade
WEKA
Outline
Introduction to the WEKA System.
Features
Pros and Cons
Enhancements
WEKA
Introduction
A research project at the University of
Waikato, NZ
Weka is a collection of machine
learning algorithms for solving realworld data mining problems.
Developed in Java 2
WEKA
Features
Documented features of WEKA
–
–
–
–
–
–
WEKA
Attribute Selection
Clustering
Classification
Association Rules
Filters
Estimators
Attribute Selection
A part of the Preprocessing phase in the
Knowledge Discovery process.
Useful to specify the attributes and their
values on which data can be mined.
WEKA
Attribute Selection contd….
Algorithms Implemented
– Best First
– Forward Selection
– Ranked Output First
WEKA
Clustering
Algorithms Implemented
– Cobweb
– Estimation Maximization
– Clusterer
– Distribution Clusterer
WEKA
Classification
Algorithms Implemented
– K Nearest Neighbor
– Naïve Bayes
– Bagging
– Boosting
– Multi - Class Classifier
WEKA
Association Rules
Algorithms Implemented
– Apriori
WEKA
Filters
Algorithms Implemented
– Attribute Filter
– Discretize Filter
– Split Dataset Filter
WEKA
Estimators
Algorithms Implemented
– Discrete Estimator
– Kernel Estimator
– Normal Estimator
– Poisson Estimator
WEKA
Sample Execution
java weka.associations.Apriori -t
data/weather.nominal.arff -I yes
Apriori
=======
Minimum support: 0.2
Minimum confidence: 0.9
Number of cycles performed: 17
Generated sets of large itemsets:
Size of set of large itemsets L(1): 12
WEKA
Sample Execution
Best rules found:
1. humidity=normal windy=FALSE 4 ==> play=yes 4 (1)
2. temperature=cool 4 ==> humidity=normal 4 (1)
3. outlook=overcast 4 ==> play=yes 4 (1)
4. temperature=cool play=yes 3 ==> humidity=normal 3 (1)
5. outlook=rainy windy=FALSE 3 ==> play=yes 3 (1)
6. outlook=rainy play=yes 3 ==> windy=FALSE 3 (1)
7. outlook=sunny humidity=high 3 ==> play=no 3 (1)
8. outlook=sunny play=no 3 ==> humidity=high 3 (1)
WEKA
Boosting
ADA Boost
Logit Boost
Decision Stump
WEKA
Pros and Cons of WEKA
Covers the Entire Machine Learning
Process
Easy to compare the results of the different
algorithms implemented
Accepts one of the most widely used data
formats as input i.e the ARFF format.
WEKA
Pros and Cons for WEKA
Flexible APIs for programmers
Customization possible
WEKA
Pros and Cons for WEKA
Textual
User Interface
Requires the Java Virtual Machine to
be installed for execution
Visualization of the mining results not
possible
WEKA
Enhancements
The new version of WEKA 3.1.7 overcomes some
of the decripancies of the previous version like
– Graphical User Interface
– Visualization of Results.
– Mining of Non - local data bases
WEKA