Introduction to R

Download Report

Transcript Introduction to R

Introduction to R
A. Di Bucchianico
Types of statistical software
• command-line software
– requires knowledge of syntax of commands
– reproducible results through scripts
– detailed analyses possible
• GUI-based software
– does not require knowledge of commands
– not reproducible actions
• hybrid types (both command-line and GUI)
Introduction to R
2
Well-known statistical software
•
•
•
•
•
•
•
SAS
SPSS
Minitab
Statgraphics
S-Plus
R
…
Introduction to R
3
R
•
•
•
•
•
free
language almost the same as S
maintained by top quality experts
available on all platforms
continuous improvement
Available through www.r-project.org
Introduction to R
4
Contents
•
•
•
•
•
•
•
•
Basic operations
Data creation + I/O
Component extraction
Plots
Basic statistics
Libraries
Regression analysis
Survival analysis
Introduction to R
5
Basic operations
• assignment operation: a <- 2+sqrt(5)
• help function:
– help(pnorm)
– help.search(“normal distribution”)
• probability functions:
– d (density): dgamma(x,n,)
– p (probability=cdf): pweibull(x,3,2)
– q (quantile): qnorm(0.95)
– r (random numbers): rexp(10,)
Introduction to R
6
Data creation + I/O
• create
– vectors: c(1,2,3)
– matrices: matrix(c(1,2,3,4,5,6),2,3,byrow=T) (2=#rows)
– list
• patterns:
– “:” (1,2,3) = 1:3
– seq (1,2,3) = seq(1,3,by=1)
• working directories and files:
– setwd
– getwd
– attach
• read data
– from file: read.table(“file.txt”,header=TRUE)
– from web: read.data.url
Introduction to R
7
Component extraction
•
•
•
•
•
d[r,]: rth row of object d
d[,c]: cth column of object d
d[r,c]: entry in row r and column c of object d
length(d): length of d
d[d<20]: extract all elements of d that are
smaller than 20
• d[“age”]: extract column “age” from object d
Introduction to R
8
Plots
• plot: both 1D and 2D plots
• hist: histogram
• qqnorm: normal probability plot (“quantilequantile” plot)
Save graphics by choosing File -> Save as
Introduction to R
9
Basic statistics
•
•
•
•
•
summary
mean
stdev
t.test
boxplot
Introduction to R
10
Packages
• specialized functions available through
packages and libraries
• in Windows interface choose Packages ->
Load Packages
• examples of packages:
– qcc (quality control)
– survival
Introduction to R
11
Functions
Analyses that have to be performed often
can be put in the form of functions
Example: simple <function(data,mean=0,alpha=0.05)
{hist(data),t.test(data,conf.level=alpha,mu=
mean,alternative=“two-sided”)}
simple(data,4) uses the default value 0.05
and test the null hypothesis mu=4.
Introduction to R
12
Regression analysis
• general command: lm (linear model)
• requires data to be available in the form of
a data frame
– more general than matrix because columns need not
have same length)
– use command data.frame for conversion
• other types of regression also possible
(see also dedicated packages)
Introduction to R
13
Survival analysis
• through library Surv of survival
• Cox proportional hazards: coxph
Introduction to R
14
Useful web sites
• www.r-project.org
• http://cran.r-project.org/doc/contrib/Short-refcard.pdf
• http://www.unimuenster.de/ZIV/Mitarbeiter/BennoSueselbeck/shtml/shelp.html
• http://www.maths.lth.se/help/R/
• http://www.mas.ncl.ac.uk/~ndjw1/teaching/sim/Rintro.html
• http://stats.math.uni-augsburg.de/JGR/
• http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/index.html
Introduction to R
15