R-project: A free environment for statistical computing and graphics

Download Report

Transcript R-project: A free environment for statistical computing and graphics

Basic Statistical Analyses and
Contributed Packages in R
FISH 397C
Winter 2009
© R Foundation, from http://www.r-project.org
Evan Girvetz
Basic Statistical Analysis in R
• correlation – cor.test()
• linear modeling – lm()
• t-test – t.test()
• ANOVA – aov()
• Chi squared – chisq.test()
Linear Regession: lm()
> lm(taill ~ totlngth, data = possum)
> taill.lm <- lm(taill ~ totlngth)
> summary(taill.lm)
Call:
lm(formula = taill ~ totlngth)
Residuals:
1Q
Median
3Q
Max
-3.3143 -0.9620
Min
0.1680
0.9544
2.6120
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.31880
4.84844
2.953
totlngth
0.05509
4.705 2.88e-05 ***
0.25920
0.00519 **
--Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.493 on 41 degrees of freedom
Multiple R-Squared: 0.3506,
Adjusted R-squared: 0.3348
F-statistic: 22.14 on 1 and 41 DF,
p-value: 2.883e-05
ANOVA
> sexTaill.aov <- aov(taill~sex, data
= possum)
> summary(sexTaill.aov)
ANOVA: interactions
> sexPopTaill.aov <- aov(taill~sex +
Pop + sex*Pop, data = possum)
> summary(sexPopTaill.aov)
Contributed Packages
• Go to http://www.r-project.org
– Click on CRAN
– Select a CRAN mirror close (e.g. USA-WA)
• Click on “packages” to look at descriptions
and more information about contributed
packages
• Go look at the vegan package
– Open up the reference manual link
Contributed Packages
• First you must install a package on your
machine (and must re-install when R is
updated)
– This can be done from the pull down menu in
the R GUI (this is the easiest)
– Or can be done using the command
install.packages()
– Or the packages can be downloaded and
installed manually as .zip files
Contributed Package
• Once a package is installed on your
computer, you must load it into an R
session each time you open the R
session.
– This can be done from the GUI pull down
menu (under packages)
– Or can be done using the command line
> library(vegan)
Hands-on Exercise
• Install the following packages on your
machine:
vegan
Hmisc
• Now load these packages into your R
session (and add the code to your script
for the class)
Cluster Analysis Example
• Select only possums greater than age 5
> possum5 <- possum[(possum$age >5) |
(is.na(possum$age)),]
• Calculate Jaccard distance matrix:
> possum5.jac <- vegdist(possum5[,6:14],
method = "jaccard")
• Run cluster analysis on distance matrix:
> possum5.jac.hclust <- hclust(possum5.jac,
method = "ward")
Plotting Cluster Analysis
> plot(possum5.jac.hclust, xlab = "Possum
Individuals", sub = "")
• This adds rectangles to create k = 4 groups:
> par(lwd = 3, lty = 2)
> rect.hclust(possum5.jac.hclust, k = 4 #
add rectangles to show groups
> par(lwd = 1, lty = 1)
Writing to graphic files
• Remember that this plot can be written to
a graphics file using the command:
> png(“dendrogram.png”, 1500, 1000,
pointsize = 30)
Put code for graphics here
> dev.off()
Adding Error Bars to Graphics
• There are many ways to do this.
– Hmisc has capability for this
> library(Hmisc)
> ?errbar
Hands On Exercise
• Create a new data table called
hdlngthBySite, with three columns:
– The site number
– The mean hdlngth for each site
– The standard deviation of hdlngth for each site
– (Remember you can use aggregate to do this)
• Then plot hdlnth vs site (scatter plot is fine)
Adding Error Bars to Graphics
> ?errbar
> yplus <- hdlngthBySite$hdlngth.mean
+ hdlngthBySite$hdlngth.sd
> yminus <hdlngthBySite$hdlngth.mean +
hdlngthBySite$hdlngth.sd
Adding Error Bars to Graphics
> plot(hdlngth.mean ~ site, data =
hdlngthBySite, ylim = c(80,105))
> errbar(x= hdlngthBySite$site, y=
hdlngthBySite$hdlngth.mean, yplus =
yplus, yminus = yminus, add=T, ylim
= c(80,105))