R Lecture 5 - Penn State Statistics Department

Download Report

Transcript R Lecture 5 - Penn State Statistics Department

R Lecture 5
Naomi Altman
Department of Statistics
Example: Regression
The data are available at
http://www.stat.psu.edu/~jls/stat511/homework/body.dat
?read.table
body=read.table("body.txt",header=T)
plot(body$hips,body$weight)
plot(body$waist,body$weight)
?formula
lm.out=lm(weight~hips+waist,data=body)
attributes(lm.out)
Formulas
lm fits the regression of Y on a set of X variables.
The variable for Y and the predictors are denoted
by a formula of the form.
You can also use formulas in other contexts. e.g.
plot(weight~waist, data=body)
Object Oriented Programming in R
or how a bunch of smart programming types
made R easier to use and harder to
program - at least in the eyes of a
statistician
In the bad old days
If I wanted to write a function similar to something already in R, I
would edit the R code:
myFun=edit(Rfun)
myDensity=edit(density)
Sometimes the R code would call a C or C++ program, but the
code for that is also available.
But now ...
plot
boxplot
rnorm
Classes and Generic Functions
I have already mentioned that one of the
attributes a R object can have is a class.
A generic function is a function that captures the
class of an object and then calls another
function to do the actual work. If the function is
called fun and the class is called cls, the
function that does the work is (almost always)
called fun.cls.
If there is no suitable fun.cls, then fun.default is
used.
e.g.
plot(body$hips,body$weight)
plot(lm.out)
plot.default
plot.lm
methods(plot)
Classes
Actually, a class can be a pair
c("first","second") in which the "first" "inherits
from" i.e. is a special case of "second". In
practise, this means that it has all the
components of class "first" objects but possibly
some additional ones.
If there is no fun.first, then the generic function will
search for fun.second. Only if there is also no
fun.second will fun.default be used.
e.g. plot
uses plot.lm on an object with class "lm"
and also on an object with class ("glm","lm")
'inherits' indicates whether its first argument inherits
from any
of the classes specified in the 'what' argument
glm.out=glm(weight~hips+waist,data=body)
class(glm.out)
"glm" "lm"
inherits(lm.out,"lm")
inherits(glm.out,"lm")
inherits(lm.out,"glm")
inherits(glm.out,"glm")
plot.lm
plot.glm
plot(glm.out)
unclass
If you remove the class, most objects are just lists.
lm.out
unclass(lm.out)
For example, the "lm" objects are lists with the following components:
"coefficients" "residuals"
"effects"
"assign"
"qr"
"df.residual"
"xlevels"
"call"
"terms"
"rank"
"fitted.values"
"model"
Some of these components are obvious.
Some of them are matrix computations that can be used to compute, e.g.
the leverages and Cook's Distance (notice that these have not been
stored).
Some of them are only empty - they are used primarily when the predictor
variable is a factor (ANOVA).
Why use classes
For the user: less to think about
e.g. you can try generic functions like plot and
summary with any output
For the programmer: provides a framework
e.g. you might think about having a plot.myfun and
summary.myfun for the function you are writing
also, you can use inheritance so that you do not need
to write your own functions
Generic Functions
Functions that act on many different types of
objects are termed "generic functions".
Examples include:
plot
summary
anova
print
coefficients
residuals
Generic Functions
We have already seen that generic functions
behave differently for different classes. The idea
is that the user should not have to remember a
lot of different function names.
Generic functions are a "good thing" when you
want R to do what someone else thinks it should
do and can be a "bad thing" when you are trying
to do something else with your data.
Generic Functions
The form of the generic function "genfun" is
genfun=function (object, ...) {
UseMethod("genfun")
}
Generic Functions
We can use UseMethod to give aliases to
the same function.
genfun=function (object, ...){
UseMethod("genfun")}
gen=function (object, ...){
UseMethod("genfun")}
gfun=function (object, ...){
UseMethod("genfun")}
Generic Functions
If you want an argument other than the first to
be the one whose class controls the generic
function, then the name of the argument must
be sent to UseMethod
genfun=function(x,y,z,...){
UseMethod("genfun",z)
}
Generic Functions
If UseMethod finds that the calling object
inherits from a class, it searches for a
function "genfun.class". If there is no
function that matches the class, it looks
through the inheritance list. If there is no
match, or no class, the function
"genfun.default" is used.
Generic Functions
There is a lot more on this in the
"S Poetry" manual - it looks very complete to me.
I have been writing programs in S/R since
1981, and have not needed to create classes or
methods but ...
Generic Functions
I have often used an existing function to create
new functions - I have been confused by failing
to understand generic functions (especially
"summary" and "print").
One way to become well-known is to distribute
your methodology as an R package. To be
distributed from CRAN or other project
repositories, your package must adhere to R
programming standards.
Generic Functions
Some of the newer packages (particularly
packages for bioinformatics) rely heavily on the
use of Generic Functions, and you can never
understand what they are doing without
understanding at least the basics of this
material.
Slots
I was not able to find an intuitive definition for "slot" so this is my
own heuristic.
An object is a list with a class.
A slot is a function that extracts data from an
object.
It may be one of the elements stored in the object,
or a derived data element.
Slots
For example: an lm object includes the list:
"coefficients" "residuals"
"effects"
"rank"
"fitted.values"
"assign"
"qr"
"df.residual"
"xlevels"
"call"
"terms"
"model"
We might build a new class, "Elm" (extended
"lm")
Slots
Suppose we wanted to write a method that
draws a histogram of any of dependent
variable, residuals, studentized residuals, fitted
values.
We could have a method of the form:
hist.Elm=function(object,slot)
Our slots would be: dependent, residuals,
student, fitted
Slots
If we set class(lm.out)=c("Elm","lm")
then
hist(lm.out,residual) would extract the residuals
from the list and draw the histogram.
hist(lm.out,student) would compute the
studentized residuals (which are not stored)
and draw the histogram.
Slots
By convention, the slots of an object can be
extracted either by:
objectname@slotname
or
slotname(objectname)
Slots
Again, I have used S/R for many years without
writing or even encountering slots.
But some of the recent packages use this
programming concept, so it is important to
understand it.
My understanding is that slots are used primarily
in areas like data-mining and microarrays,
where the data storage requirements are large.
Learning to Use Objects and
other Extensions
Calling C or C++ from R:
Writing R extensions
Object oriented programming in R
(S3 protocol)
R Language Definition
(S4 protocol)
R Internals