Weka: An open source tool for data analysis and

Download Report

Transcript Weka: An open source tool for data analysis and

Weka: An open-source tool for
data analysis and mining with
machine learning
Quantitative Data Analysis Colloquium
Centenary College of Louisiana
Mark Goadrich
4/17/2008
Regression lines and
correlation
• Find relationship
between two
attributes
• Correlation
coefficient
Categorization
• Can we learn one
category based
on the others?
• This search for
classification lines
is called machine
learning
Data Sets
•
•
•
•
•
House of Representative Votes
Labor Relations
Iris (plant) Discrimination
Breast Cancer
Many more at http://archive.ics.uci.edu/ml/
• Table of Features
– Example is a row
– Features are discrete or continuous
Weka Time - Explore
• http://www.cs.waikato.ac.nz/ml/weka/
• Open Explorer
• Open Data File
– ARFF or CSV
• Visualize All
• Visualize Crosstabs
Discrete : Decision Trees
• Reduce confusion
(entropy) in the
data by drawing
recursive lines
• Result is
comprehensible
to humans
Continuous : ANN and SVM
• Artificial Neural
Networks simulate
activating and
thresholding
neurons
• Support Vector
Machines use a
kernel to transform
data to higher
dimensions
Weka Time - Classify
• Choose Algorithm
– J48, Multilayered Perceptron, SMO
• Validate Learning
– Training set
– Cross validation
• Visualize output
– ROC Curves
– Precision-Recall Curves
Future Topics
• Clustering
– Number and makeup of categories
unknown
• Relational Data
– Features are related within examples
– Features are related across examples