Transcript Document


Machine Learning with
WEKA

WEKA: A Machine
Learning Toolkit
The Explorer
•
•
Eibe Frank
•
•
Department of Computer Science,
University of Waikato, New Zealand
•



Classification and
Regression
Clustering
Association Rules
Attribute Selection
Data Visualization
The Experimenter
The Knowledge
Flow GUI
Conclusions
WEKA: the bird
WEKA 3.3
Copyright: Martin Kramer ([email protected])
7/16/2015
University of Waikato
2
WEKA: the software




Machine learning/data mining software written in
Java (distributed under the GNU Public License)
Used for research, education, and applications
Complements “Data Mining” by Witten & Frank
Main features:
Comprehensive set of data pre-processing tools,
learning algorithms and evaluation methods
 Graphical user interfaces (incl. data visualization)
 Environment for comparing learning algorithms

7/16/2015
University of Waikato
3
WEKA: versions

There are several versions of WEKA:
WEKA 3.0: “book version” compatible with
description in data mining book
 WEKA 3.2: “GUI version” adds graphical user
interfaces (book version is command-line only)
 WEKA 3.3: “development version” with lots of
improvements


This talk is based on the latest snapshot of WEKA
3.5.6
7/16/2015
University of Waikato
4
7/16/2015
University of Waikato
5
Launching WEKA (1)
Program
 LogWindow Opens a log window that captures all
that is printed to stdout or stderr. Useful for
environments like MS Windows, where WEKA is
normally not started from a terminal.
 Exit Closes WEKA
7/16/2015
University of Waikato
6
Launching WEKA (2)
Applications Lists the main applications within WEKA
 Explorer An environment for exploring data with WEKA
(the rest of this documentation deals with this application in
more detail).
 Experimenter An environment for performing experiments
and conducting statistical tests between learning schemes.
 KnowledgeFlow This environment supports essentially the
same functions as the Explorer but with a drag-and-drop
interface. One advantage is that it supports incremental
learning.
 SimpleCLI Provides a simple command-line interface that
allows direct execution of WEKA commands for operating
systems that do not provide their own command line
interface.
7/16/2015
University of Waikato
7
Launching WEKA (3)
Tools Other useful applications.
 ArffViewer An MDI (“multiple document interface”)
application for viewing ARFF files in spreadsheet
format.
 SqlViewer represents an SQL worksheet, for
querying databases via JDBC.
 EnsembleLibrary An interface for generating
setups for Ensemble Selection [5] (a contribution
by Robert Jung and David Michael from Cornell
University, Ithaca, NY, USA).
7/16/2015
University of Waikato
8
Launching WEKA (4)
Visualization Ways of visualizing data with WEKA
 Plot For plotting a 2D plot of a dataset.
 ROC Displays a previously saved ROC (receiver operating
characteristic) curve (true positive rate vs false positive
rate).
 TreeVisualizer For displaying directed graphs, e.g., a
decision tree.
 GraphVisualizer Visualizes XML BIF or DOT format
graphs, e.g., for Bayesian networks.
 BoundaryVisualizer Allows the visualization of classifier
decision boundaries in two dimensions.
7/16/2015
University of Waikato
9
Launching WEKA (5)
Windows All open windows are listed here.
 Minimize Minimizes all current windows.
 Restore Restores all minimized windows again.
7/16/2015
University of Waikato
10
Launching WEKA (6)
Help Online resources for WEKA can be found here.
 Weka homepage Opens a browser window with WEKA’s
homepage.
 Online documentation Directs to the WekaDoc Wiki.
 HOWTOs, code snippets, etc. The general WekaWiki,
containing lots of examples and HOWTOs around the
development and use of WEKA.
 Weka on Sourceforge WEKA’s project homepage on
Sourceforge.net.
 SystemInfo Lists some internals about the Java/WEKA
environment, e.g., the CLASSPATH.
 About The infamous “About” box.
7/16/2015
University of Waikato
11
7/16/2015
University of Waikato
12
WEKA 3.3
7/16/2015
University of Waikato
13
Applications – WEKA Explorer




Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from an SQL
database (using JDBC)
Pre-processing tools in WEKA are called “filters”
WEKA contains filters for:

7/16/2015
Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …
University of Waikato
14
7/16/2015
University of Waikato
15
Section Tabs






Preprocess. Choose and modify the data being
acted on.
Classify. Train and test learning schemes that
classify or perform regression.
Cluster. Learn clusters for the data.
Associate. Learn association rules for the data.
Select attributes. Select the most relevant
attributes in the data.
Visualize. View an interactive 2D plot of the data.
7/16/2015
University of Waikato
16
Status Box

Right-clicking the mouse anywhere inside the
status box brings up a little menu. The menu gives
two options:
Memory information. Display in the log box the
amount of memory available to WEKA.
 Run garbage collector. Force the Java garbage
collector to search for memory that is no longer
needed and free it up, allowing more memory for
new tasks.
Note that the garbage collector is constantly
running as a background task anyway.

7/16/2015
University of Waikato
17
Log Button




Clicking on this button brings up a separate window
containing a scrollable text field.
Each line of text is stamped with the time it was entered
into the log.
As you perform actions in WEKA, the log keeps a record of
what has happened.
For people using the command line or the SimpleCLI, the
log now also contains the full setup strings for
classification, clustering, attribute selection, etc., so that it
is possible to copy/paste them elsewhere.
7/16/2015
University of Waikato
18
WEKA Status Icon





When no processes are running, the bird sits down and
takes a nap.
The number beside the × symbol gives the number of
concurrent processes running.
When the system is idle it is zero, but it increases as the
number of processes increases.
When any process is started, the bird gets up and starts
moving around.
If it’s standing but stops moving for a long time, it’s sick:
something has gone wrong!
In that case you should restart the WEKA Explorer.
7/16/2015
University of Waikato
19
Graphical output




Most graphical displays in WEKA, e.g., the
GraphVisualizer or the TreeVisualizer, support
saving the output to a file.
A dialog for saving the output can be brought up
with Alt+Shift+left-click.
Supported formats are currently Windows Bitmap,
JPEG, PNG and EPS (encapsulated Postscript).
The dialog also allows you to specify the
dimensions of the generated image.
7/16/2015
University of Waikato
20
WEKA deals with “flat” files
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
7/16/2015
University of Waikato
21
WEKA deals with “flat” files
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
7/16/2015
University of Waikato
22
Simple examples - weather data
7/16/2015
University of Waikato
23
Simple examples - weather data
7/16/2015
University of Waikato
24
Simple examples - iris data
7/16/2015
University of Waikato
25
Simple examples – CPU performace data
7/16/2015
University of Waikato
26
Simple examples – Labor
7/16/2015
University of Waikato
27
Simple examplesSoybean Data
7/16/2015
University of Waikato
28
Conclusion: try it yourself!



WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
Also has a list of projects based on WEKA
WEKA contributors:
Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard
Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger
,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg,
Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert ,
Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,
Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang
7/16/2015
University of Waikato
29