R, PostGIS, and Sweave: Reproducible Research

Download Report

Transcript R, PostGIS, and Sweave: Reproducible Research

R, PostGIS, and Sweave:
Reproducible Research
Philip M. Hurvitz, PhD
Urban Form Lab
College of Built Environments
University of Washington
1 of 12
Acknowledgments
 NIH
5R01HL091881-04 (PI B. Saelens)
2 of 12
Disclaimers
I
rarely make maps
 I use GIS for analytics (usually I want numbers
and tables at the end of analyses)
So this talk might not be for you . . .
3 of 12
Background
 Analysis
is fun
 Writing is not fun
 Documentation is the most important, yet the
most difficult part of research
 Typical workflows are not conducive to efficient
documentation
4 of 12
The way most of us work
 Perform a
bunch of analysis
 Generate summary tables as files
 Spreadsheet
 SQL
database
 Programming (e.g., MATLAB, R, NumPy)
 Generate graphics
as files
 ArcGIS
 Mapserver,
QGIS, OpenJUMP
 Compile
files in a word processed document
 “If it works,” don’t fix it.
5 of 12
But does it really work?
 What
if you need to change something (who
has ever needed to do that)?
 …Lather, rinse, repeat…


Perform a bunch of analysis
 Perform a bunch of analysis
 Perform
a bunch
of analysis
Generate summary
tables
as files

Perform
aasbunch
analysis
 Generate summary tables
files aofbunch

Perform
 Generate summary tables as files of analysis
 Spreadsheet
 Generate summary tables as files
 Spreadsheet
 Generate summary tables as files
 SQL database Spreadsheet
SQL database  Spreadsheet
 Spreadsheet
SQLMATLAB,
databaseR, NumPy)
 Programming
(e.g.,

SQL
database
 Programming (e.g., MATLAB, R, NumPy)
 SQL
database
 Programming
(e.g.,
MATLAB,
R, NumPy)
 Generate graphics
as files Programming
(e.g.,
MATLAB, R, NumPy)
 Generate graphics as files

Programming
(e.g., MATLAB, R, NumPy)
 ArcGIS  Generate graphics as files
 Generate graphics as files
 ArcGIS
 Generate graphics as files
 ArcGIS
 Mapserver, QGIS,
OpenJUMP
 OpenJUMP
ArcGIS
 Mapserver, QGIS,
document
ArcGIS

Mapserver,
QGIS,
OpenJUMP
 Compile files in a word processed

Mapserver,
QGIS,
OpenJUMP
 Compile files in a word processed
document

Mapserver,
QGIS,
OpenJUMP
 Compile files in a word processed
document


Compile files in a word processed document
 Compile files in a word processed document
6 of 12
But does it really work?
 What
if you need to revisit your analysis or
results (say, during peer review)?
 If your results were copied into Excel & Word,
how will you track down what you did?
 If you programmed your analyses, your
workflow should be stored in scripts
…
But where are your scripts (“I swear they where
here somewhere!”)? Or which version was
used?
7 of 12
How to fix it?
1.
Store your data in a format that can be
accessed programmatically

2.
PostGIS (others?)
Program your analyses so they are replicable
R
3.
4.
with rgdal, RPostgreSQL
Use Sweave to run your R code and place
results in a
(and PDF) document
Use RStudio server to streamline the process



Integrated development environment
Persistent sessions
Single-click from Rnw file to PDF file
8 of 12
How to do it
 Code
your analysis in R
 Reformat your R code as an Sweave file
(“.Rnw”)
R
code in “chunks” delimited with
<<>>=
R code
@
 Manuscript
 Pass through
in
syntax
Sweave and pdflatex
R> Sweave(“file.Rnw”)
$> pdflatex file.tex; pdflatex file.tex
9 of 12
Some details
 Use xtable
library to format tabular data in
format
 Generate graphics using native R or the
lattice/grid graphical environment
 (Optional): Save tables and graphics as
separate files (for publication)
10 of 12
A sample workflow
 Data:
 Simulated
overlapping polygons (2 classes)
 Simulated point observations (2 classes)
 Analysis:
 Generate
point density values for the 2 x 2
combination from intersected data
11 of 12
A demo
 Rnw
code:
http://gis.washington.edu/phurvitz/presentations/2012
/cugos_spring_fling/cugos.Rnw
 Resultant
PDF
http://gis.washington.edu/phurvitz/presentations/2012
/cugos_spring_fling/cugos.pdf
12 of 12