R - Shengdong Zhao
Download
Report
Transcript R - Shengdong Zhao
Application of Data
Mining Techniques on
Survey Data
using R and Weka
Supunmali Ahangama
29/11/2013
Outline
Introduction to data mining in R
Introduction to data mining in Weka
Example
2
What is R?
3
Why Learn R?
R offers more analytical methods and now
over 1000 add-on packages are available
R is far more flexible in the type of data it can
analyze
R’s procedures (functions), are open for you
to see and modify
R is free
4
The Popularity of R is Growing Fast
#1 most used data mining
tool (in both 2010 and 2011).
Up from #5 in 2007
An increasing number of data
miners consider R their
primary tool
#2 in parimary tool rankings
(in 2011). Up from #7 in 2008.
Reference: Rexer Analytic 2011- Data miner survey summary report
5
Data Mining Software
6
Reference: Rexer Analytic 2011- Data miner survey summary report
Graphical User Interface (GUI)
R Studio
R Commander
Rattle
Deducer
Revolution Analytics
Reference: Rexer Analytic 2011- Data miner survey summary report
7
Rattle 2.6.26 - Installation
Startup R (v3.0.2) and then
> install.packages("rattle")
> library("rattle")
> rattle()
Artificial Neural Network
(ANN) package: neuralnet
1.32
8
Weka
Waikato Environment for Knowledge Analysis
A collection of machine learning algorithms and visualization
tools
Written in Java
RWeka – An R interface for Weka
9
Data set (Y. Hayashi & R. Seti0no 2010)
Aim: To discover factors that could be used to distinguish the
consumers who eat out frequently to those who do not.
Survey is conducted in major cities in Taiwan in 2003 among
consumers aged 15 to 64 years
Target:
Class 1 – if the respondent’s eat out frequency is less than 25 per
month on average
Class 2 –otherwise
Predictor variables – respondent’s eating out considerations and
personal characteristics (socio-demographics, psychological
information)
10
Methodology: ANN
ANN - Artificial Neural Network
Set of connected input and output units in which each
connection has a weight associated with it
Network learn by adjusting the weights so as to be able to
predict the correct class label of the input tuples
11
ANN
Black box
12
Tools
Rattle GUI
Weka
13
Techniques for Ferretting Out
Information from Trained ANN
Sensitivity analysis
Probe ANN with test inputs, and record the outputs
Determining the impact or effect of an input variable on the
output
hold the other inputs to some fixed value (e.g. mean or median
value), vary only the input while monitoring the change in outputs
measure of the degree to which each input contributes to the
output error
the largest error the largest impact
Rule Extraction
14
Sensitivity Analysis
Relative Importance
Where
wji = weight from the ith input node to the jth hidden
node
wkj = weight from the jth hidden node to the kth
output node.
15
In a nutshell
Appreciation of R as a leading statistical tool
How Rattle GUI and Weka could be used for data mining
How ANN could be applied into consumer behaviour study
Identification of the relationship between predictors and
dependent variable through sensitivity analysis
16
Thank You.
17