Transcript Document


WEKA



Mark Hall
Department of Computer Science,
University of Waikato, New Zealand


WEKA - what is it?
History
WEKA’s GUI, old
and new
Impact
 Downloads
 Projects based
on WEKA
The project today
Plans for the future
Weka: the bird
Copyright: Martin Kramer ([email protected])
7/17/2015
University of Waikato
2
7/17/2015
University of Waikato
3
7/17/2015
University of Waikato
4
7/17/2015
University of Waikato
5
Hamilton
7/17/2015
University of Waikato
6
WEKA: the software


Waikato Environment for Knowledge Analysis
Collection of state-of-the-art machine learning algorithms
and data processing tools implemented in Java


Support for the whole process of experimental data mining





Released under the GPL
Preparation of input data
Statistical evaluation of learning schemes
Visualization of input data and the result of learning
Used for education, research and applications
Complements “Data Mining” by Witten & Frank
7/17/2015
University of Waikato
7
Main Features






49 data preprocessing tools
76 classification/regression algorithms
8 clustering algorithms
15 attribute/subset evaluators + 10 search algorithms for
feature selection
3 algorithms for finding association rules
3 graphical user interfaces



7/17/2015
“The Explorer” (exploratory data analysis)
“The Experimenter” (experimental environment)
“The KnowledgeFlow” (new process model inspired interface)
University of Waikato
8
History

Project funded by the NZ government since 1993



7/17/2015
Develop state-of-the art workbench of data mining tools
Explore fielded applications
Develop new fundamental methods
University of Waikato
9
History - timeline


Late 1992 - funding was applied for by Ian Witten
1993 - development of the interface and infrastructure


WEKA acronym coined by Geoff Holmes
WEKA’s file format “ARFF” was created by Andrew Donkin


ARFF was rumored to stand for Andrew’s Ridiculous File Format
Sometime in 1994 - first internal release of WEKA


TCL/TK user interface + learning algorithms written mostly in C
Very much beta software
Changes for the b1 release included (among others):
“Ambiguous and Unsupported menu commands removed.”
“Crashing processes handled (in most cases :-)”


October 1996 - first public release of WEKA (v 2.1)
7/17/2015
University of Waikato
10
History - timeline

July 1997 - WEKA 2.2



Early 1997 - decision was made to rewrite WEKA in Java



Originated from code written by Eibe Frank for his PhD
Originally codenamed JAWS (JAva Weka System)
May 1998 - WEKA 2.3


Schemes: 1R, T2, K*, M5, M5Class, IB1-4, FOIL, PEBLS,
support for C5
Included a facility (based on unix makefiles) for configuring
and running large scale experiments
Last release of the TCL/TK-based system
Mid 1999 - WEKA 3 (100% Java) released


7/17/2015
Version to complement the Data Mining book
Development version (including GUI)
University of Waikato
11
Back then…
7/17/2015
University of Waikato
12
Today:
Explorer: pre-processing the data




Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an SQL
databases using JDBC
Pre-processing tools in WEKA are called “filters”
WEKA contains filters for:

7/17/2015
Discretization, normalization, resampling, attribute
selection, attribute combination, …
University of Waikato
15
Explorer: Building classification
models


“Classifiers” in WEKA are models for predicting
nominal or numeric quantities
Implemented schemes include:


Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
“Meta”-classifiers include:

7/17/2015
Bagging, boosting, stacking, error-correcting output
codes, data cleansing, …
University of Waikato
16
Explorer: classification
7/17/2015
University of Waikato
17
Explorer: classification
7/17/2015
University of Waikato
18
Explorer: classification
7/17/2015
University of Waikato
19
Explorer: classification
7/17/2015
University of Waikato
20
Explorer: classification
7/17/2015
University of Waikato
21
Explorer: classification
7/17/2015
University of Waikato
22
Explorer: classification
7/17/2015
University of Waikato
23
Explorer: classification
7/17/2015
University of Waikato
24
Explorer: classification
7/17/2015
University of Waikato
25
KnowledgeFlow: process flows
7/17/2015
University of Waikato
26
KnowledgeFlow: batch processing
7/17/2015
University of Waikato
27
KnowledgeFlow: batch processing
7/17/2015
University of Waikato
28
KnowledgeFlow: incremental processing
7/17/2015
University of Waikato
29
Experimenter
7/17/2015
University of Waikato
30
Experimenter
7/17/2015
University of Waikato
31
Experimenter
7/17/2015
University of Waikato
32
Impact - downloads
7/17/2015
University of Waikato
33
Projects based on WEKA

Incorporate/wrap WEKA





GRB Tool Shed - a tool to aid gamma ray burst research
YALE - facility for large scale ML experiments
GATE - NLP workbench with a WEKA interface
Judge - document clustering and classification
Extend/modify WEKA





BioWeka - extension library for knowledge discovery in biology
WekaMetal - meta learning extension to WEKA
Weka-Parallel - parallel processing for WEKA
Grid Weka - grid computing using WEKA
Weka-CG - computational genetics tool library
7/17/2015
University of Waikato
34
The WEKA Project Today



FRST funding for the next two years
Goal of the project remains the same
People
6 staff
 2 postdocs
 3 PhD students
 3 MSc students
 2 research programmers

7/17/2015
University of Waikato
35
The Future


Continue to develop and support WEKA
MOA (Massive Online Analysis)

Framework that supports learning from data streams




The Moa (another native NZ bird) is not only flightless, like
the Weka, but also extinct
First public release, probably this Christmas, or perhaps
Thanksgiving (as it’s just another turkey)
MILK


Facilities for data generation, experimental analysis, learning
algorithms, etc.
Multi-Instance Learning Kit
Proper

7/17/2015
Propositionalization toolbox for WEKA
University of Waikato
36