Introduction to Graphics in R
Download
Report
Transcript Introduction to Graphics in R
Introduction to Programming in R
Department of Statistical Sciences and Operations
Research
Computation Seminar Series
Speaker: Edward Boone
Email: [email protected]
What is R?
The R statistical programming language is a free
open source package based on the S language
developed by Bell Labs.
The language is very powerful for writing programs.
Many statistical functions are already built in.
Contributed packages expand the functionality to
cutting edge research.
Since it is a programming language, generating
computer code to complete tasks is required.
Getting Started
Where to get R?
Go to www.r-project.org
Downloads: CRAN
Set your Mirror: Anyone in the USA is fine.
Select Windows 95 or later.
Select base.
Select R-2.4.1-win32.exe
The others are if you are a developer and wish to
change the source code.
Getting Started
The R GUI?
Getting Started
Opening a script.
This gives you a script window.
Getting Started
Submit Selection
Submitting a
program:
Use button
Right mouse click
and run selection.
Getting Started
Basic assignment and operations.
Arithmetic Operations:
Matrix Arithmetic.
+, -, *, /, ^ are the standard arithmetic operators.
* is element wise multiplication
%*% is matrix multiplication
Assignment
To assign a value to a variable use “<-”
Getting Started
How to use help in R?
R has a very good help system built in.
If you know which function you want help with
simply use ?_______ with the function in the
blank.
Ex: ?hist.
If you don’t know which function to use, then use
help.search(“_______”).
Ex: help.search(“histogram”).
Importing Data
How do we get data into R?
Remember we have no point and click…
First make sure your data is in an easy to
read format such as CSV (Comma Separated
Values).
Use code:
D <- read.table(“path”,sep=“,”,header=TRUE)
Working with data.
Accessing columns.
D has our data in it…. But you can’t see it
directly.
To select a column use D$column.
Working with data.
Subsetting data.
Use a logical operator to do this.
==, >, <, <=, >=, <> are all logical operators.
Note that the “equals” logical operator is two = signs.
Example:
D[D$Gender == “M”,]
This will return the rows of D where Gender is “M”.
Remember R is case sensitive!
This code does nothing to the original dataset.
D.M <- D[D$Gender == “M”,] gives a dataset with the
appropriate rows.
Creating a Vector
To create a vector use the c() function
b <- c(3,1,0.3,0.1)
This creates the column vector
3
1
b
0.3
0.1
Random Number Generation
Random number generation is important
in simulations as well as some model
fitting techniques.
Consider:
X1 <- rnorm(100,5,2)
This generates a vector of 100 normal
random variables with mean 5 and
standard deviation 2.
Random Number Generation
Generate two more vectors:
X2 <- rnorm(100,15,3)
X3 <- rnorm(100,22,5)
This gives us two more vectors of
normally distributed values.
Determining the Size of a Vector
Use the length function.
n1 <- length(X1)
Use this only for vectors. Can
produce different results on
matricies.
Creating a Vector of Repeated Values
Often we want a vector of ones around.
Use the rep() function.
ones <- rep(1,n1)
This creates a vector of ones of length
n1.
Creating a Matrix from Vectors
Use the cbind() function.
X <- cbind(ones,X1,X2,X3)
This binds the column vectors together into
a matrix.
Create a Regression Relationship
Using our randomly generated data create a
regression relationship.
Y X
~ N (0, I )
Use the code:
Y <- X%*%b + rnorm(100,0,1)
Estimate a Regression Model
Find the normal equations
X ' X X ' Y
Use the code
XtX <- t(X)%*%X
XtY <- t(X)%*%Y
Solve the normal equations
To estimate the regression parameters solve the
normal equations.
1
ˆ
( X ' X ) X 'Y
Use the following code.
bhat <- solve(XtX)%*%XtY
Check it
bhat
lm(Y ~ X1 + X2 + X3)
Create a Regression Function
Use the function() format
reg1 <- function(Y,X){
res <- solve(t(X)%*%X)%*%t(X)%*%Y
return(res)
}
Don’t forget to return the result.
Remember the code in braces is the function.
Try the function
Use
the data already created.
reg1(Y,X)
Add to the function
Use the list function to return more than one
result. Essentially, you are adding properties
to the object reg2.
reg2 <- function(Y,X){
coeff <- solve(t(X)%*%X)%*%t(X)%*%Y
resid <- Y - X%*%coeff
mse <- t(resid)%*%resid/(length(Y)-length(coeff)-1)
res <- list(coeff,resid,mse)
return(res)
}
Try the function
Use
the data already created.
reg2(Y,X)
Add names to the function properties
Use the names function allows you to name
the properties.
reg3 <- function(Y,X){
coeff <- solve(t(X)%*%X)%*%t(X)%*%Y
resid <- Y - X%*%coeff
mse <- t(resid)%*%resid/(length(Y)-length(coeff)-1)
res <- list(coeff,resid,mse)
names(res) <- c('coeff','residuals','mse')
return(res)
}
Programming Goal: PRESS
PRESS will give us the ability to
demonstrate basic programming
constructs in an application.
Matrix Operations
Creating Functions
Loops
Data subsetting and storage
Programming Goal: PRESS
PRESS is the predictive sums of squares of a
regression model. It is computed via:
n
2
ˆ
PRESS ( y y( i ) )
i 1
where yˆ ( i ) is the predicted value of yi using a model
fit with all of the data except observation i.
Loops
To construct a for loop use the following structure
for(i in 1:n){
Operations…
}
PRESS
PRESS <- function(Y,X){
n1 <- length(Y)
ind1 <- 1:n1
presshold <- rep(0,n1)
for(i in 1:n1){
X1 <- X[ind1 != i,]
Y1 <- Y[ind1 != i]
coef1 <- reg3(Y1,X1)$coeff
X2 <- X[ind1==i,]
Y2 <- Y[ind1==i,]
Yp <- X2%*%coef1
presshold[i] <- (Y2 - Yp)^2
}
res <- mean(presshold)
return(res)
}
Try the function
Use
the data already created.
PRESS(Y,X)
If…then constructs
If you are interested in an if… then statement
on a vector use the ifelse() function.
ifelse(condition, True action, False action)
Example
X1 <- runif(15,0,1)
X2 <- ifelse(X1<.5,1,0)
cbind(X1,X2)
Did it work?
If…then constructs
If you are not interested in a vector,
then use the if{}else{} construct.
Source Files
Source files allows you to store all of your
created functions in a single file and have
all those functions available to you.
To load a self created library use:
source(Path)
Don’t forget that \ in the path needs to be
replaced with \\
Writing to a file
To write to a file use the write.table()
function.
write.table(dataset, path, sep=“,”, header=TRUE)
This will produce a comma separated value
(csv) file.
Linear Algebra Extras
Eigenvalues and eigenvectors use the eigen() function.
This gives an object that contains both the eigenvalues and
eigenvectors
Example:
eigen(XtX)
$values
[1] 77901.567997
1375.036486
456.253787
1.847225
$vectors
[1,]
[2,]
[3,]
[4,]
[,1]
[,2]
[,3]
[,4]
-0.03534617 -0.02144023 -0.02084214 0.99892771
-0.18185911 -0.21669130 -0.95864785 -0.03108754
-0.54097195 -0.79158676 0.28253402 -0.03023690
-0.82038239 0.57094272 0.02710023 -0.01620879
Summary
R is programming environment with many
standard programming structures already
included.
Easy to create functions.
No support.
Allows users to create a library of functions.
Summary
All of the R code and files can be found at:
www.people.vcu.edu/~elboone2/CSS.htm