The Canopy Database Project Tools for Research & Information

Download Report

Transcript The Canopy Database Project Tools for Research & Information

The Canopy Database Project
Tools for Research & Information Integration
http://canopy.evergreen.edu
Judy Cushing, Nalini Nadkarni
Mike Finch, Anne Fiala
Youngmi Kim, Aaron Crosland and others
The Evergreen State College
Collaborating
Ecologists
Collaborating
Computer Scientists
Collaborating
LTER Information Mangers
Van Pelt, Bond, Dial, Ishii,
Keim, Parker, Shaw, Sillett,
Sumida, et al
Dave Maier, Lois
Delcambre, Travis Brooks
(OHSU)
Eda, Nicole, Kristin, Ken,
Jonathan, James Brunt
and others?
NSF CISE and BIO 04-xxx, 03-xxx, 01-31952, 01-9309 99-75510, 9630316, 93-07771
Cushing; LTER IM 2004
1
Canopy DB Vision
PI & IM use of
database technology & components
can ease
metadata provision, data validation and archiving,
and data mining for synthesis
BUT
Researchers aren’t programmers.
The technology must be easy to use
&
increase research productivity.
Cushing; LTER IM 2004
2
The Underlying Idea
Database Design with Domain Specific Components
Branch Length Measurement
Stem Model
Branch Foliage Model
Foliage
Start, stop
Foliage
inner, mid,
outer
Upright linear,
Height only
Upright cylinder,
Height, DBH
Upright cone,
Height, DBH
Upright stepped
cylinder,
Multiple girth
measures
Branch length
perpendicular
to stem
Branch length
along branch
Foliage
length and
width
Validate generated databases with rules
e.g., Stem:
• depends on study area, plot
Capitalize on core components for tools
• includes species table
Visualization, Metadata Provision,
Data Acquisition & Validation,
research protocol, statistical analysis….
Cushing; LTER IM 2004
3
Approach
• Pathfinder Projects
–
–
–
–
Ecologists design & carry out field research at several sites.
Find research, archiving and data mining bottlenecks.
Determine [spatial ] data structures.
Reverse-engineer components.
• Database Tools for the Field Ecologist
– Design field databases – DataBank.
– Visualize data using those databases – CanopyView.
– Lab-specific metadata acquisition.
– Hand-held (palm pilot) field data acquisition.
• Reality-check with LTER Information Managers.
• Web Accessible Research Reference -- BCD
Cushing; LTER IM 2004
4
Research Bottlenecks
Database Technology for Researcher Productivity
Study
Design
Field
Work
Data
Entry &
Verif’n
Data
Analysis
Data
Sharing
(w/in Group)
Journal
Pub
Data
Archive
Data
Mining
Metadata Generation
• Archive in Lab(common types)
Data Visualization
Statistical analysis
Data validation (against metadata)
Data and metadata capture
• Database and Protocol Design
• Research Reference Tools
Cushing; LTER IM 2004
Information Synthesis •
5
Recent Work
• Finding & maintaining the components
– Ecology Theory – spatial categorization of the Canopy
– Template Editor
• Refine existing software
–
–
–
–
Template-embedded semantic metadata, carried forward…
DataBank now stand alone
Generate Excel, as well as Access and other RDBMS
New visualizations
• Collaborate with other eco-informatics projects
– Closer integration with EML, Morpho
– LTER IM Collaboration – Kaplan, Melendez-Colom, Ramsey,
Vanderbilt, Walsh.
• Outreach to computer science community & agencies
– NSF/USGS/NASA/EPA/ – JIIS special issue – dg.o
Cushing; LTER IM 2004
6
Future Work
• Carry out collaborative field studies
– Develop and test synthesis hypotheses
– Develop theoretical constructs on canopy structure-function
– Develop statistical protocols that guide study design
• Create and enhance informatics tools
–
–
–
–
–
Build theory-based components
Build better UIs, data import & validation, more visualization
Build parameterized queries for standard statistical scripts
Develop better metadata capture and evolution
Develop or adapt warehouse & interface to other tools
• Field test tools from the get-go
Cushing; LTER IM 2004
7
How DataBank Works
Mike Finch
Cushing; LTER IM 2004
8
Research Bottlenecks
Database Technology & Research Productivity Gain
Study
Design
Field
Work
Data
Entry &
Verif’n
Data
Analysis
Data
Sharing
(w/in Group)
Journal
Pub
Data
Archive
Data
Mining
EML Generation
CanopyView
• DataBank Database Generator
• BCD
Cushing; LTER IM 2004
9
Conclusions
• Database design is a complex web app
• Sociological aspects are important
– Proprietary data
– Technology adoption
– Integrative ecology new
• Defining intuitive & adequate set of
templates is hard
• Spatial is special….
• Visualization is cool….
Cushing; LTER IM 2004
10
DataBank Workflow
Database Components
shopping cart
DB
design
Database Design
schema element
dependencies
entities
observations
attributes
Cushing; LTER IM 2004
Empty
DB
convert
SQL
MSSQL
MSAccess
11
DataBank Software Architecture
Internet
Browser
IE 5+
Netscape 6+
Web Server
(Apache)
Enhydra
(Middleware)
Viz Tookkit
JDK
Cushing; LTER IM 2004
Access
Field
DB
Databank Backend
(Java)
DB
SQL
Server
12
Canopy DataBank
• What is it
– End-user database design with components (aka templates)
– Variable & table level metadata inherent
– Study-level metadata available from the BCD
• Technology
– HTML, Java, Enhydra, SQLServer, Access, JTK
– Aim to produce XML/EML for exchange and archive
• Status
– Some templates (mostly spatial tree structure)
– About 5 field studies
– Some visualization
Cushing; LTER IM 2004
13
DataBank Architecture (workflow)
template.xml
descr.xml
pic.gif
bigpc.gif
shopping cart
‘TEOF’
internal object representation
schema element
dependencies
entities
observation
attributes
Cushing; LTER IM 2004
DB
design
Empty
DB
‘TDM’ convert
SQL
MSSQL
MSAccess
14
Next Steps
• XML/EML for data exchange
• Outreach to CS community
– VLDB Panel on Ecosystem Informatics (August)
– NSF BDEI PI’s Meetings & Forum (May, Nov)
• Further define & support spatial data
structures -- additional collaborator(s)?
• Visualization (!!!)
Cushing; LTER IM 2004
15
Discussion
Are we on the “right track” with visualization?
What off the shelf viz. tools are available?
Who might consult with us on visualization,
How about spatial scaling?
How to refine our spatial categorization scheme?
What collaborators (data sets) should we seek?
How is modeling linked to visualization?
Comments about DataBank?
Cushing; LTER IM 2004
16