R workspaces

Download Report

Transcript R workspaces

Sihua Peng, PhD
Shanghai Ocean University
2016.10
Four VIPs in statistics
Gosset
Pearson
Fisher
Neyman
William Sealy Gosset
 William Sealy Gosset (1876 –1937)
was an English statistician.
 He published under the pen name
Student, and developed the
Student's t-distribution.
Karl Pearson
 Karl Pearson (1857 –1936) was an English
mathematician and biostatistician. He has
been credited with establishing the
discipline of mathematical statistics.
 In 1911 he founded the world's first university
statistics department at University College
London.
 Many familiar statistical terms such as
standard deviation, component analysis,
and chi-square test were proposed by him.
Ronald Fisher
 Sir Ronald Aylmer Fisher (1890 –
1962), was an English statistician, and
biologist.
 Many familiar statistical terms such as
F-distribution, Fisher's linear
discriminant, Fisher exact Test, Fisher's
permutation test, and Von Mises–Fisher
distribution were proposed by him.
 F-distribution arises frequently as the
null distribution of a test statistic, most
notably in the analysis of variance.
Jerzy Neyman
 Jerzy Neyman (1894 – 1981), was a Polish
mathematician and statistician who
spent most of his professional career at
the University of California, Berkeley.
 Neyman was the first to introduce the
modern concept of a confidence interval
into statistical hypothesis testing.
References
Dr. Murray Logan
 He is the author of our text book, and
he is an associate lecturer within the
School of Biological Sciences, Monash
University, Australia.
 http://users.monash.edu.au/~murray/i
ndex.html
 The data sets in this book:
http://users.monash.edu.au/~murray/BD
AR/index.html
Contents
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Introduction to R
Data sets
Introductory Statistical Principles
Sampling and experimental design with R
Graphical data presentation
Simple hypothesis testing
Introduction to Linear models
Correlation and simple linear regression
Single factor classification (ANOVA)
Nested ANOVA
Factorial ANOVA
Simple Frequency Analysis
1. Introduction to R
R: initially written by Ross Ihaka and
Robert Gentleman at Dep. of Statistics of
U of Auckland, New Zealand during
1990s.
VIPs of R
Ross Ihaka
Robert Gentleman
https://www.stat.auckland.ac.nz/~ihaka/
https://en.wikipedia.org/wiki/Robert_Gentleman_(statistician)
What R does and does not
o data handling and storage:
numeric, textual
o matrix algebra
o hash tables and regular expressions
o high-level data analytic and
statistical functions
o classes (“OO”)
o graphics
o programming language: loops,
branching, subroutines
• is not a database, but connects to
DBMSs
• language interpreter can be very
slow, but allows to call own C/C++
code
• no spreadsheet view of data, but
connects to Excel/MsOffice
• no professional / commercial
support
Download R
 https://www.r-project.org/
Download R
Install R
The R environment
 After installed, you can run R.
The R environment
 Object:
R is an object oriented language and everything in R is an object.
For example, a single number is an object, a variable is an object,
output is an object, a data set is an object that is itself a collection
of objects, etc.
 Vector :
A collection of one or more objects of the same type (e.g. all
numbers or all characters etc).
 Function
A set of instructions carried out on one or more objects.
Functions are typically used to perform specific and common
tasks that would otherwise require many instructions.
The R environment
 Parameter :
The kind of information that can be passed to a
function.
 Argument :
The specific information passed to a function to
determine how the function should perform its task.
 Operator :
Is a symbol that has a pre-defined meaning. Familiar
operators include + - * and /, which respectively
perform addition, subtraction, multiplication and
division.
Expressions, Assignment and Arithmetic
>2+3
[1] 5
←an expression
←the evaluated output
> VAR1 <- 2 + 3 ←assign expression to the object VAR1
>VAR2 <-9
← assign expression to object VAR2
> VAR2 - 1
←print the contents of VAR2 minus 1
[1] 8
> ANS1 <- VAR1 * VAR2 ←evaluated expression assigned to ANS1
> ANS1
←print the contents of ANS1 the evaluated output
[1] 40
Expressions, Assignment and Arithmetic
 Objects can be concatenated (joined together) to
create objects with multiple entries using the c()
(concatenation) function.
> c(1, 2, 6) ←concatenate 1, 2 and 6
[1] 1 2 6 ←printed output
> c(VAR1, ANS1) ←concatenate VAR1 and ANS1 contents
[1] 5 25 ←printed output
R workspaces
> ls() ←list current objects in R environment
[1] "ANS1" "VAR1" "VAR2“
> rm(VAR1, VAR2) ←remove the VAR1 and VAR2 objects
 rm(list = ls())
←remove all user defined objects
Workspaces:
 Throughout an R session, all objects that have been
added are stored within the R global environment,
called the workspace.
R workspaces
 save.image()  to save the workspace and thus all
those objects (vectors, functions, etc)
 load()  to load the a previously saved workspace
and thus all those objects.
 q()  to quite R.
 getwd() To displays the current working folder
 setwd() To set the working folder
 help()
>help(mean)
>?mean
Vectors - variables
 The basic data storage unit in R is called a vector. A vector is a
collection of one or more entries of the same class (type).
Factors
To properly accommodate factorial (categorical)
variables, R has an additional class of vector called a
factor which stores the vector along with a list of the
levels of the factorial variable. The factor() function
converts a vector into a factor vector.
>SHADE <- c("no", "no", "no", "no", "no", "full", "full", "full", "full", "full")
> SHADE
 [1] "no" "no" "no" "no" "no" "full" "full" "full"
 [9] "full" "full“
>SHADE <- factor(SHADE)
> SHADE
 [1] no no no no no full full full full full
 Levels: full no
Matrices
 A vector has only a single dimension – it has length.




However, a vector can be converted into a matrix (2
dimensional array).
X <- c(16.92, 24.03, 7.61, 15.49, 11.77)
Y <- c(8.37, 12.93, 16.65, 12.2, 13.12)
XY1 <- cbind(X, Y)
XY2 <- rbind(X, Y)
To access the data in Matrices
 XY1[1,]  First Row
 XY1[,2] Second column
 XY[2,2]  the value in second row and second column
 XY1[1:3,] Rows from 1 to3
 XY1[,1:2] Columns from 1 to 2
Data frames
Data frames are generated by combining
multiple vectors together such that each
vector becomes a separate column in the
data frame. In this way, a data frame is
similar to a matrix in which each column
can represent a different vector type.
We will discuss Data Frame in details in the
next chapter.
Working with scripts
 A collection of one or more commands is called a
script.
 In R, a script is a plain text file with a separate
command on each line and can be created and read in
any text editor.
 A script is read into R by providing the full filename of
the script file as an argument in the source() function.
>source("filename.R")
A typical script may look like the following:
References
 Biostatistical Design and Analysis Using R: A
Practical Guide. By Murray Logan. WILEYBLACKWELL.
 Introduction to Data Analysis and Graphical
Presentation in Biostatistics with R. By Thomas W.
MacFarland. Springer.