4-cushing-eim2008 - Environmental Information Management

Download Report

Transcript 4-cushing-eim2008 - Environmental Information Management

Integrating Ecological Data
Notes from the Grasslands ANPP Data Integration Project
http://canopy.evergreen.edu
Computer Scientists, The Evergreen State College
Judith B. Cushing, Juli Mallett, Lee Zeman, Natalie Kopytko
Ecologist: Carri Leroy
LTER and iLTER
Ecologists
Information Managers
Daniel Milchunas SGS
Nicole Kaplan SGS
Esteban Muldavin JRN
Kristin Vanderbilt SEV
Judith Kruger Kruger NP
Ken Ramsey JRN
Christine Laney JRN
Jincheng Gao KNZ
Cushing – EIM 2008
LTER Network Office, NSF CISE and BIO 04-0417311, 03-019309, 01-31952, 01-9309
1
Integrating Ecological Data
Notes from the Grasslands ANPP* Data Integration Project
* Above ground Net Primary Productivity
1. Motivation
• ANPP is important!
• Case study for CS
2. Challenges
• Sampling methods, idiosyncratic formats, species codes
3. The Data Model
4. Results & Products
• The database
• Preliminary scientific results
• Species code mappings – a web application: Specifik
5. Conclusions, Future Work
• Response variables “different”
• “Best Practices” and advice for curation
Cushing – EIM 2008
2
Motivation
1. ANPP is important!
Broken down by plant species and life forms,
the data could assess community and population
responses to global change.
2. Case study for CS
Cushing – EIM 2008
3
Challenges:
Sampling methods differ – even over grasslands!
Years of
Data
Number of
Vegetation Types
or other
Relevant
Treatments
Numbe
r of
SubSites
Number of
Sampling Units
(replicates)
Experimental
Units* in each
Sampling Unit
(plots per rep)
Total
Number of
Experiment
al Units
(plots)
Site
Sampling
Method
Times
Measur
ed per
year
Kruger
National
Park
(Kruger)
Regression
relationship
1
17
35
35
35
9-41
315-1435
Konza
Prairie
(KNZ)
Biomass
harvest
2
5
1
1
2
40
80
Jornada
Basin
(JRN)
Regression
relationship
3
17
5
15
15
49
735
Sevilleta
Wildlife
Refuge
(SEV)
Regression
relationship
3
8
3
3
15
16
720
Shortgrass
Steppe
(SGS)
Biomass
harvest
1
23
1
6
3
5
90
Cushing – EIM 2008
4
Challenges (cont)
2. Idiosyncratic data formats …
Robust, repeatable data integration process
Tools (scripts) for integration
“best practices” dictate using csv, no blank fields, etc.
correct data errors closer to collection
Validate at curation
3. Site specific species codes
used the PLANTS database codes
a tool to map site codes to PLANTS db…..
Cushing – EIM 2008
5
The GDI Data Model
Cushing – EIM 2008
6
Results: The GDI Database
5 LTERS (JRN, SEV, SGS, KNZ, Kruger NP)
20 years’ data
1697 distinct plots
160,000 distinct measurements
1600 species
Plots per LTER
79 105
240
536
SGS
SEV
JRN
KRG
KNZ
Cushing – EIM 2008
735
7
Results: The GDI Database (cont)
This database supports aggregation by species, family, growth form, vegetation
biome type, and physical location, ETC., and cross-site analysis of abiotic drivers
of ANPP, e.g., temperature and precipitation.
1
9
8
0
1
9
8
5
1
9
9
0
1
9
9
5
2
0
0
0
2
0
0
5
KNZ
KRG
SGS
SEV
JRN
Cushing – EIM 2008
8
Preliminary Scientific Results
CART Model explains 64% ANPP variation
over 23 yrs at SEV, SGS, JRN
Palmer Drought Severity Index (PSDI),
temperatures (max, mean)
Cushing – EIM 2008
precipitation.
9
Preliminary Scientific Results (cont)
Graminoid NPP Percentage
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
05
20
03
20
01
20
99
19
97
19
95
19
93
19
91
19
89
19
87
19
85
19
19
83
0
year
Jornada Basin
Five Points Grassland
Cushing – EIM 2008
Sevilleta Blue Grama
Konza fl1
SGS ESA 1
10
Results: Web App Specifik
Species Table
LTER
LTER Code
PLANTS Code(s)
(a code for each
species that matches
the LTER code)
Accepted
PLANTS code(s)
(One synonym for each PLANTS
Code)
Perl Script
PLANTS data
Family
Rank
Cushing – EIM 2008
Genus
Genus
Author
Species
Species Author
Trinomial
Rank
Variety or
Subspecies
Variety
Author
Scientific
Name
11
Results: Web App Specifik
(cont)
http://alala.evergreen.edu/~mallettj/specifik/
Database with
species data
Export species
table as CSV
Create a new
species table in
Specifik
Upload species
table CSV
Import table to
database
Download new CSV
table with PLANTS
codes appended
Cushing – EIM 2008
Select correct
PLANTS code for
each species in
your table from list
Answer questions
about table schema
12
Conclusions
1. Response (aka biotic) variables are “different”
2. Data integration prep. should be done at site
• Data Validation Tools
• Web Application – Specifik
3. Data integration will find data errors…. It’s good
4. Curator will probably be needed
• harvest coordination
• maintain integration tools
5. Interdisciplinary collaboration important!
Cushing – EIM 2008
13
Future Work
Work up the metadata… and release the database
Find a home and caretaker for it (SEV?)
Extend to other ecosystems and methodologies
Provide (or find) contextual data (ANPP drivers)
Determine appropriate analysis
• Drivers  response variables (connectivity)
6. Develop tools to make 3 & 4 easier
1.
2.
3.
4.
5.
Cushing – EIM 2008
14
Take-Home Messages
• Collaboration between Information Managers, Ecologists,
and Computer Scientists was good.
• Compare methodologies and identify sampling units early.
• Wherever possible, standardize units of measurement
and derivations.
• Standardize species codes, vegetative characteristics
and other metadata to facilitate analysis
• Exploratory analysis aids quality assurance
• Design the data model to support important & interesting
Cushing – EIM 2008
15