R, PostGIS, and Sweave: Reproducible Research
Download
Report
Transcript R, PostGIS, and Sweave: Reproducible Research
R, PostGIS, and Sweave:
Reproducible Research
Philip M. Hurvitz, PhD
Urban Form Lab
College of Built Environments
University of Washington
1 of 12
Acknowledgments
NIH
5R01HL091881-04 (PI B. Saelens)
2 of 12
Disclaimers
I
rarely make maps
I use GIS for analytics (usually I want numbers
and tables at the end of analyses)
So this talk might not be for you . . .
3 of 12
Background
Analysis
is fun
Writing is not fun
Documentation is the most important, yet the
most difficult part of research
Typical workflows are not conducive to efficient
documentation
4 of 12
The way most of us work
Perform a
bunch of analysis
Generate summary tables as files
Spreadsheet
SQL
database
Programming (e.g., MATLAB, R, NumPy)
Generate graphics
as files
ArcGIS
Mapserver,
QGIS, OpenJUMP
Compile
files in a word processed document
“If it works,” don’t fix it.
5 of 12
But does it really work?
What
if you need to change something (who
has ever needed to do that)?
…Lather, rinse, repeat…
Perform a bunch of analysis
Perform a bunch of analysis
Perform
a bunch
of analysis
Generate summary
tables
as files
Perform
aasbunch
analysis
Generate summary tables
files aofbunch
Perform
Generate summary tables as files of analysis
Spreadsheet
Generate summary tables as files
Spreadsheet
Generate summary tables as files
SQL database Spreadsheet
SQL database Spreadsheet
Spreadsheet
SQLMATLAB,
databaseR, NumPy)
Programming
(e.g.,
SQL
database
Programming (e.g., MATLAB, R, NumPy)
SQL
database
Programming
(e.g.,
MATLAB,
R, NumPy)
Generate graphics
as files Programming
(e.g.,
MATLAB, R, NumPy)
Generate graphics as files
Programming
(e.g., MATLAB, R, NumPy)
ArcGIS Generate graphics as files
Generate graphics as files
ArcGIS
Generate graphics as files
ArcGIS
Mapserver, QGIS,
OpenJUMP
OpenJUMP
ArcGIS
Mapserver, QGIS,
document
ArcGIS
Mapserver,
QGIS,
OpenJUMP
Compile files in a word processed
Mapserver,
QGIS,
OpenJUMP
Compile files in a word processed
document
Mapserver,
QGIS,
OpenJUMP
Compile files in a word processed
document
Compile files in a word processed document
Compile files in a word processed document
6 of 12
But does it really work?
What
if you need to revisit your analysis or
results (say, during peer review)?
If your results were copied into Excel & Word,
how will you track down what you did?
If you programmed your analyses, your
workflow should be stored in scripts
…
But where are your scripts (“I swear they where
here somewhere!”)? Or which version was
used?
7 of 12
How to fix it?
1.
Store your data in a format that can be
accessed programmatically
2.
PostGIS (others?)
Program your analyses so they are replicable
R
3.
4.
with rgdal, RPostgreSQL
Use Sweave to run your R code and place
results in a
(and PDF) document
Use RStudio server to streamline the process
Integrated development environment
Persistent sessions
Single-click from Rnw file to PDF file
8 of 12
How to do it
Code
your analysis in R
Reformat your R code as an Sweave file
(“.Rnw”)
R
code in “chunks” delimited with
<<>>=
R code
@
Manuscript
Pass through
in
syntax
Sweave and pdflatex
R> Sweave(“file.Rnw”)
$> pdflatex file.tex; pdflatex file.tex
9 of 12
Some details
Use xtable
library to format tabular data in
format
Generate graphics using native R or the
lattice/grid graphical environment
(Optional): Save tables and graphics as
separate files (for publication)
10 of 12
A sample workflow
Data:
Simulated
overlapping polygons (2 classes)
Simulated point observations (2 classes)
Analysis:
Generate
point density values for the 2 x 2
combination from intersected data
11 of 12
A demo
Rnw
code:
http://gis.washington.edu/phurvitz/presentations/2012
/cugos_spring_fling/cugos.Rnw
Resultant
PDF
http://gis.washington.edu/phurvitz/presentations/2012
/cugos_spring_fling/cugos.pdf
12 of 12