Powerpoint slides

Download Report

Transcript Powerpoint slides

Introduction to R / sma /
Bioconductor
Statistics for Microarray Data Analysis
The Fields Institute for Research in
Mathematical Sciences
May 25, 2002
Web sites + References
• http://www.R-project.org/
An introduction to R
W.N.Venables, D.M.Smith and the R Development
Core Team
• http://lib.stat.cmu.edu/R/CRAN/
• http://www.bioconductor.org/
Need to read files such as “swirl1.spot”
or “samples.swirl” into the R programs.
Functions:
read.table
scan
Save your workspace in R
Using the function
save.image
You will only see
name.RData or
.RData
In your directory
Download ?
Download SetupR.exe from http://cran.r-project.org/,
A few basics
• Working Directory
- getwd()
- setwd() or click on File and then click on Change Dir,
use Browse to determine your working directory.
• Workspace
- save(a, b, file=“my.RData”) : save objects a and b into
the workpace “my.RData”
- save.image(“my.RData”) : click on File and then click
on Save Workspace
- load(“my.RData”) : click on File and then click on
Load Workspace
• Help
- help.start()
- help(): e.g. help(plot)
Search paths + packages
search()
> search()
[1] ".GlobalEnv"
"package:ctest" "Autoloads"
"package:base"
library(cluster)
search()
> library(cluster)
Loading required package: mva
> search()
[1] ".GlobalEnv"
"package:mva"
[5] "Autoloads"
"package:base"
"package:cluster" "package:ctest"
ls() : list objects in the GlovalEnv
ls(3) : list objects in search position number 3, in the above example, it is
package:cluster
R Base packages:
base
ctest
mva
tcltk
etc…
Contributed packages:
ellipse
cluster
sma
GeneSOM
hdarray
affy
GeneClust
bioconductor
etc …
Submit to CRAN
mypackage
An introduction to R
based on the documents produced
by
W.N.Venables, D.M.Smith and the R
Development Core Team
Vectors and assignment
R operates on named data structures. The simplest such
structure is the numeric vector, which is a single entity
consisting of an ordered collection of numbers.
To set up a vector named x, say, consisting of five
numbers, namely 10.4, 5.6, 3.1, 6.4 and 21.7, use the R
command
x <- c(10.4, 5.6, 3.1, 6.4, 21.7) or
assign(“x”, c(10.4, 5.6, 3.1, 6.4, 21.7))
This is an assignment statement using the function c()
This is a numeric vector
> is.numeric(x)
[1] TRUE
numeric
vector
Character
logical
X <- c(1:5, 6, 9,3, 10)
X <- c(“a”, “b”, “c3”, “4”)
X <- c(1, 1, 0, TRUE, FALSE)
Other types of objects
matrices or more generally arrays are multi-dimensional
generalizations of vectors.
lists provide a convenient way to return the results of a statistical
computation.
data frames are matrix-like structures, in which the columns can be of
different types. Think of data frames as `data matrices' with one row
per observational unit but with (possibly) both numerical and
categorical variables.
functions are themselves objects in R which can be stored in the
project's workspace. This provides a simple and
convenient way to extend R.
Introduction to Bioconductor
(taken from http://www.bioconductor.org)
The packages in the initial release include tools which
facilitate:
- annotation (AnnBuilder, annotate)
- data management and organization through the use of the S4
class structure (Biobase, marrayClasses)
- identification of differentially expressed genes and clustering (edd,
genefilter, geneplotter, multtest, ROC)
- analysis of Affymetrix expression array data (affy)
- diagnostic plots and normalization for cDNA array data
(marrayInput, marrayNorm, marrayPlots)
- storage and retrieval of large datasets (rhdf5).
Most packages rely on the class/method mechanism
provided by John Chambers’ R methods package, which
allows object-oriented programming in R
Class
Slots
logical
Character
numeric
marrayInfo
maLabels
character
maInfo
data.frame
maNotes
character
This class can be used to store either the gene names
Information or samples information
marrayLayout
maNgr
numeric
maNgc
numeric
maSub
logical
maNsr
numeric
maPlate
factor
maNsc
numeric
maNspots
numeric
maControls
factor
Methods for quantities that are not slots of marrayLayout
maSpotRow
numeric
maSpotCol
numeric
maGridRow
numeric
maGridCol
numeric
maPrintTip
numeric
marrayRaw
maRf
matrix
maRb
matrix
maLayout
marrayLayout
maGf
matrix
maGb
matrix
maGnames
marrayInfo
maW
matrix
maTargets
marrayInfo
maNotes
character
Methods for quantities that are not slots of marrayRaw
maLR
matrix
maLG
matrix
maM
matrix
maA
matrix
marrayNorm
maA
matrix
maM
matrix
maLayout
marrayLayout
maMloc
matrix
maMscale
matrix
maGnames
marrayInfo
maNotes
character
maW
matrix
maTargets
marrayInfo
maNormCall
call
Swirl data
Data (Spot Files)
• swirl.1.spot
• swirl.2.spot
• swirl.3.spot
• swirl.4.spot
Layout:
Grid size: 4 by 4
Spot matrix: 22 by 24
Target information files
• SwirlSample.txt
Gene List
• fish.gal