R workspaces
Download
Report
Transcript R workspaces
Sihua Peng, PhD
Shanghai Ocean University
2016.10
Four VIPs in statistics
Gosset
Pearson
Fisher
Neyman
William Sealy Gosset
William Sealy Gosset (1876 –1937)
was an English statistician.
He published under the pen name
Student, and developed the
Student's t-distribution.
Karl Pearson
Karl Pearson (1857 –1936) was an English
mathematician and biostatistician. He has
been credited with establishing the
discipline of mathematical statistics.
In 1911 he founded the world's first university
statistics department at University College
London.
Many familiar statistical terms such as
standard deviation, component analysis,
and chi-square test were proposed by him.
Ronald Fisher
Sir Ronald Aylmer Fisher (1890 –
1962), was an English statistician, and
biologist.
Many familiar statistical terms such as
F-distribution, Fisher's linear
discriminant, Fisher exact Test, Fisher's
permutation test, and Von Mises–Fisher
distribution were proposed by him.
F-distribution arises frequently as the
null distribution of a test statistic, most
notably in the analysis of variance.
Jerzy Neyman
Jerzy Neyman (1894 – 1981), was a Polish
mathematician and statistician who
spent most of his professional career at
the University of California, Berkeley.
Neyman was the first to introduce the
modern concept of a confidence interval
into statistical hypothesis testing.
References
Dr. Murray Logan
He is the author of our text book, and
he is an associate lecturer within the
School of Biological Sciences, Monash
University, Australia.
http://users.monash.edu.au/~murray/i
ndex.html
The data sets in this book:
http://users.monash.edu.au/~murray/BD
AR/index.html
Contents
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Introduction to R
Data sets
Introductory Statistical Principles
Sampling and experimental design with R
Graphical data presentation
Simple hypothesis testing
Introduction to Linear models
Correlation and simple linear regression
Single factor classification (ANOVA)
Nested ANOVA
Factorial ANOVA
Simple Frequency Analysis
1. Introduction to R
R: initially written by Ross Ihaka and
Robert Gentleman at Dep. of Statistics of
U of Auckland, New Zealand during
1990s.
VIPs of R
Ross Ihaka
Robert Gentleman
https://www.stat.auckland.ac.nz/~ihaka/
https://en.wikipedia.org/wiki/Robert_Gentleman_(statistician)
What R does and does not
o data handling and storage:
numeric, textual
o matrix algebra
o hash tables and regular expressions
o high-level data analytic and
statistical functions
o classes (“OO”)
o graphics
o programming language: loops,
branching, subroutines
• is not a database, but connects to
DBMSs
• language interpreter can be very
slow, but allows to call own C/C++
code
• no spreadsheet view of data, but
connects to Excel/MsOffice
• no professional / commercial
support
Download R
https://www.r-project.org/
Download R
Install R
The R environment
After installed, you can run R.
The R environment
Object:
R is an object oriented language and everything in R is an object.
For example, a single number is an object, a variable is an object,
output is an object, a data set is an object that is itself a collection
of objects, etc.
Vector :
A collection of one or more objects of the same type (e.g. all
numbers or all characters etc).
Function
A set of instructions carried out on one or more objects.
Functions are typically used to perform specific and common
tasks that would otherwise require many instructions.
The R environment
Parameter :
The kind of information that can be passed to a
function.
Argument :
The specific information passed to a function to
determine how the function should perform its task.
Operator :
Is a symbol that has a pre-defined meaning. Familiar
operators include + - * and /, which respectively
perform addition, subtraction, multiplication and
division.
Expressions, Assignment and Arithmetic
>2+3
[1] 5
←an expression
←the evaluated output
> VAR1 <- 2 + 3 ←assign expression to the object VAR1
>VAR2 <-9
← assign expression to object VAR2
> VAR2 - 1
←print the contents of VAR2 minus 1
[1] 8
> ANS1 <- VAR1 * VAR2 ←evaluated expression assigned to ANS1
> ANS1
←print the contents of ANS1 the evaluated output
[1] 40
Expressions, Assignment and Arithmetic
Objects can be concatenated (joined together) to
create objects with multiple entries using the c()
(concatenation) function.
> c(1, 2, 6) ←concatenate 1, 2 and 6
[1] 1 2 6 ←printed output
> c(VAR1, ANS1) ←concatenate VAR1 and ANS1 contents
[1] 5 25 ←printed output
R workspaces
> ls() ←list current objects in R environment
[1] "ANS1" "VAR1" "VAR2“
> rm(VAR1, VAR2) ←remove the VAR1 and VAR2 objects
rm(list = ls())
←remove all user defined objects
Workspaces:
Throughout an R session, all objects that have been
added are stored within the R global environment,
called the workspace.
R workspaces
save.image() to save the workspace and thus all
those objects (vectors, functions, etc)
load() to load the a previously saved workspace
and thus all those objects.
q() to quite R.
getwd() To displays the current working folder
setwd() To set the working folder
help()
>help(mean)
>?mean
Vectors - variables
The basic data storage unit in R is called a vector. A vector is a
collection of one or more entries of the same class (type).
Factors
To properly accommodate factorial (categorical)
variables, R has an additional class of vector called a
factor which stores the vector along with a list of the
levels of the factorial variable. The factor() function
converts a vector into a factor vector.
>SHADE <- c("no", "no", "no", "no", "no", "full", "full", "full", "full", "full")
> SHADE
[1] "no" "no" "no" "no" "no" "full" "full" "full"
[9] "full" "full“
>SHADE <- factor(SHADE)
> SHADE
[1] no no no no no full full full full full
Levels: full no
Matrices
A vector has only a single dimension – it has length.
However, a vector can be converted into a matrix (2
dimensional array).
X <- c(16.92, 24.03, 7.61, 15.49, 11.77)
Y <- c(8.37, 12.93, 16.65, 12.2, 13.12)
XY1 <- cbind(X, Y)
XY2 <- rbind(X, Y)
To access the data in Matrices
XY1[1,] First Row
XY1[,2] Second column
XY[2,2] the value in second row and second column
XY1[1:3,] Rows from 1 to3
XY1[,1:2] Columns from 1 to 2
Data frames
Data frames are generated by combining
multiple vectors together such that each
vector becomes a separate column in the
data frame. In this way, a data frame is
similar to a matrix in which each column
can represent a different vector type.
We will discuss Data Frame in details in the
next chapter.
Working with scripts
A collection of one or more commands is called a
script.
In R, a script is a plain text file with a separate
command on each line and can be created and read in
any text editor.
A script is read into R by providing the full filename of
the script file as an argument in the source() function.
>source("filename.R")
A typical script may look like the following:
References
Biostatistical Design and Analysis Using R: A
Practical Guide. By Murray Logan. WILEYBLACKWELL.
Introduction to Data Analysis and Graphical
Presentation in Biostatistics with R. By Thomas W.
MacFarland. Springer.