Distributed Database Management Systems

Download Report

Transcript Distributed Database Management Systems

Evaluation of WEKA
Waikato Environment for Knowledge Analysis
Presented By:
Manoj Wartikar &
Sameer Sagade
WEKA
Outline

Introduction to the WEKA System.
 Features
 Pros and Cons
 Enhancements
WEKA
Introduction
A research project at the University of
Waikato, NZ
Weka is a collection of machine
learning algorithms for solving realworld data mining problems.
Developed in Java 2
WEKA
Features
Documented features of WEKA
–
–
–
–
–
–
WEKA
Attribute Selection
Clustering
Classification
Association Rules
Filters
Estimators
Attribute Selection

A part of the Preprocessing phase in the
Knowledge Discovery process.
 Useful to specify the attributes and their
values on which data can be mined.
WEKA
Attribute Selection contd….

Algorithms Implemented
– Best First
– Forward Selection
– Ranked Output First
WEKA
Clustering

Algorithms Implemented
– Cobweb
– Estimation Maximization
– Clusterer
– Distribution Clusterer
WEKA
Classification

Algorithms Implemented
– K Nearest Neighbor
– Naïve Bayes
– Bagging
– Boosting
– Multi - Class Classifier
WEKA
Association Rules

Algorithms Implemented
– Apriori
WEKA
Filters

Algorithms Implemented
– Attribute Filter
– Discretize Filter
– Split Dataset Filter
WEKA
Estimators

Algorithms Implemented
– Discrete Estimator
– Kernel Estimator
– Normal Estimator
– Poisson Estimator
WEKA
Sample Execution
java weka.associations.Apriori -t
data/weather.nominal.arff -I yes
Apriori
=======
Minimum support: 0.2
Minimum confidence: 0.9
Number of cycles performed: 17
Generated sets of large itemsets:
Size of set of large itemsets L(1): 12
WEKA
Sample Execution
Best rules found:
1. humidity=normal windy=FALSE 4 ==> play=yes 4 (1)
2. temperature=cool 4 ==> humidity=normal 4 (1)
3. outlook=overcast 4 ==> play=yes 4 (1)
4. temperature=cool play=yes 3 ==> humidity=normal 3 (1)
5. outlook=rainy windy=FALSE 3 ==> play=yes 3 (1)
6. outlook=rainy play=yes 3 ==> windy=FALSE 3 (1)
7. outlook=sunny humidity=high 3 ==> play=no 3 (1)
8. outlook=sunny play=no 3 ==> humidity=high 3 (1)
WEKA
Boosting

ADA Boost
 Logit Boost
 Decision Stump
WEKA
Pros and Cons of WEKA

Covers the Entire Machine Learning
Process
 Easy to compare the results of the different
algorithms implemented
 Accepts one of the most widely used data
formats as input i.e the ARFF format.
WEKA
Pros and Cons for WEKA

Flexible APIs for programmers
 Customization possible
WEKA
Pros and Cons for WEKA
 Textual
User Interface
 Requires the Java Virtual Machine to
be installed for execution
 Visualization of the mining results not
possible
WEKA
Enhancements
 The new version of WEKA 3.1.7 overcomes some
of the decripancies of the previous version like
– Graphical User Interface
– Visualization of Results.
– Mining of Non - local data bases
WEKA