R for data analysis and data mining_RUG

Download Report

Transcript R for data analysis and data mining_RUG

©UFS
R for Data Analysis and Data Mining
Jianping Liu
Mar 19, 2014
Outline
• R and RStudio installation
• Basics of R : data types and operators
• R for Statistical Analysis and Data mining
2
What is R?
•
“a language and environment for statistical computing and graphics”; a
combination of statistical packages ( interactive statistical analysis) and a
programming language
• a dialect of the S language that was developed at AT&T Bell Laboratories
by Rick Becker, John Chambers and Allan Wilks in 90’s
• Run on multiple platforms and various devices: MacOS, Windows, Linux,
PC, iPhone …
• Frequent releases and bugfix; active development
• Free
3
Installation of R and Resources online
•
http://www.r-project.org/
# R download & installation
• http://cran.r-project.org/doc/manuals/R-intro.html
•
http://www.rstudio.com/
# RStudio installation
•
http://www.rseek.org/
# web-based R search
•
http://www.ats.ucla.edu/stat/r/
# Stat analysis examples
•
http://www.rdatamining.com/
# data mining examples
•
http://www.coursera.org
# R Programming start 4/7/2014
RStudio : an integrated development environment for R
5
The uses of R
•
•
•
•
•
R may be used as a calculator
R provide numerical or graphical summaries of data
R has extensive graphical abilities
R will handle a variety of specific analyses
R is an interactive programming language
•
•
Software for Data Analysis: Programming with R (Statistics and Computing) by John M. Chambers
(Springer)
S Programming (Statistics and Computing) Brian D. Ripley and William N. Venables (Springer)
6
Packages
•Install.packages(“name of the package”)
•library(pkg)
•detach(“package:pkg”)
•update.packages(“”)
Example:
install.packages(“sos”)
library(sos)
Alert: R is case sensitive
7
Getting help and info
•
•
•
•
•
•
•
•
help(package=“sos”) #documentation on topic
?'&&'
??audit
help.search("time series")
library(sos)
findFn("time series")
example(data.frame)
demo(lm.glm, package=“stats”, ask=T)
8
Data Types and Basic Operations
R has five “atomic” classes of Objects:
• Character
• Numeric (real numbers)
• Integer
• Complex
• Logical(True/False)
The most basic object is a vector
• A vector contain objects of the same class : c()
• A list can contain objects of various classes: list()
9
Data Types and Basic Operations
Matrices are vectors with a dimension attribute.
• The dimension attribute is itself an integer vector of length 2
(nrow, ncol)
• Matrices are constructed column-wise, or specify row-wise
Factors are used to represent categorical data.
• Factors can be unordered or ordered.
• One can think of a factor as an integer vector where each
integer has a label.
10
Data Types and Basic Operations
Data frames are used to store tabular data
•They are fundamental to the use of the R modelling and graphics
functions
•They are represented as a special type of list where every element
of the list has to have the same length
•Unlike matrices, data frames can store different classes of objects in
each column (just like lists); matrices must have every element be
the same class
•Data frames are usually created by calling read.table() or read.csv()
•Can be converted to a matrix by calling data.matrix()
11
R for Regression Analysis
• Regression analysis is the analysis of the relationship
between a response or outcome variable and another set
of variables
• The relationship is expressed through a statistical model
equation that predicts a response variable (also called a
dependent variable or criterion) from a function of
explanatory variables (also called independent variables,
predictors, factors, or carriers) and parameters
http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf
12
R for Time series Analysis
• Introductory Time Series with R
http://elena.aut.ac.nz/~pcowpert/ts/#RScripts
• Time Series Analysis and Its
Applications: With R
Examples (3rd ed) by R.H.
Shumway and D.S. Stoffer.
Springer Texts in Statistics,
2011(package: astsa)
http://www.stat.pitt.edu/stoffer/tsa3/
13
R Reference Card
14
Data Mining with Rattle
# to install package rattle and load the GUI
install.packages("rattle", dependencies = c("Depends", "Suggests"))
library(rattle)
rattle()
• Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R!)
by Graham Williams
•
http://www.r-project.org/doc/bib/R-books.html
15
Drawbacks of R
•Little support on dynamic or interactive graphics
•Objects must generally be stored in physical memory
•Functionality is based on consumer demand and user distribution
•Not ideal for all situations
16
Thank you !
17