Biodiversity data retrieval and integration.

Download Report

Transcript Biodiversity data retrieval and integration.

Biodiversity Data Retrieval and Integration
Distributed species, data, computation and credit
James H. Beach
Biodiversity Research Center
University of Kansas
[email protected]
SINE Workshop, 29-31 Oct 2001, SDSC
Museums and their Data
• 3 B specimens – and data – documenting the
distribution of life on earth
• 2 M species
• 300 years of biological exploration
• Data are held in dynamic, autonomous, selforganizing and spatially-distributed collections
SINE Workshop, 29-31 Oct 2001, SDSC
Paris Museum Mexican Birds
SINE Workshop, 29-31 Oct 2001, SDSC
British Museum Mexican Birds
SINE Workshop, 29-31 Oct 2001, SDSC
Field Museum Mexican Birds
SINE Workshop, 29-31 Oct 2001, SDSC
KU Museum Mexican Birds
SINE Workshop, 29-31 Oct 2001, SDSC
“World Museum” Mexican Birds
SINE Workshop, 29-31 Oct 2001, SDSC
The Species Analyst Network
• Direct access to live primary data
Desktop
• Ownership and control maintained locally
• Z39.50, HTTP, XML data, XML Query
Applications
Broadcast query
Data
Resources
Client
API
SINE Workshop, 29-31 Oct 2001, SDSC
Species Analyst HTML Gateway
SINE Workshop, 29-31 Oct 2001, SDSC
Results of Species Analyst Query
SINE Workshop, 29-31 Oct 2001, SDSC
GARP: Genetic Algorithm for Rule-set Production
• Developed by David Stockwell, San Diego
Supercomputer Center
• Takes advantage of multiple algorithms
(BIOCLIM, logistic regression, etc.)
• Different decision rules may apply to different
sectors of species’ distributions
• Uses a genetic algorithm for choosing rules
• Implemented on WWW, and open for public use
SINE Workshop, 29-31 Oct 2001, SDSC
Species Analyst + GARP: A Powerful Tool
•
•
•
•
•
Integrates distributed biodiversity data
Provides current information on species’ ranges
Models species’ ecological niches
Predicts geographic distributions
Integrates niche models with environmental
change scenarios, e.g. global climate change and
biodiversity, invasive species, emerging diseases
SINE Workshop, 29-31 Oct 2001, SDSC
Asian Longhorn Beetle (Anoplophora glabripennis)
SINE Workshop, 29-31 Oct 2001, SDSC
Longhorn Beetle - Modeled Asian Distribution
SINE Workshop, 29-31 Oct 2001, SDSC
Asian Longhorn Beetle – Predicted U.S. Distribution
SINE Workshop, 29-31 Oct 2001, SDSC
A Global Encyclopedia of Life or The World
According to GARP
• Research
–
–
–
–
Biogeographic analysis on distributions
Invasive species predictions
Monitoring and conservation planning
Global climate change impacts on Biota
• Outreach, Education and Training
– Backyard biodiversity, spatial data queries, GIS functions
– Interactive data entry, observational data
• Data Analysis Services for Museums
– Uniqueness and value of collections holdings
– Data quality issues
– Summary statistics and analyses
SINE Workshop, 29-31 Oct 2001, SDSC
A Global Encyclopedia of Life or The World
According to GARP (2)
• Every documented species with georeferenced
localities in the Species Analyst Network
• North America, Western Hemisphere, World
• Resolution 1 Km grid NA, 10 Km elsewhere
• 1 M+ species in collections with data?
• Computational Requirements
SINE Workshop, 29-31 Oct 2001, SDSC
Metacomputing Museum Data
• Global species distributions: parallel
computation
• SETI @ Home
– Collaborative computing
– 1 M simultaneous users
• Port GARP to Win32 to run in background
or foreground
SINE Workshop, 29-31 Oct 2001, SDSC
Lifemapper =
Georeferenced Species Data
+ Distributed Query Architecture
+ Predictive Modeling
+ Distributed Computation
+ Spatial Map and Model Archive
+ Open Access Web Portal
SINE Workshop, 29-31 Oct 2001, SDSC
Lifemapper Demonstration
• Server
• GARP client
SINE Workshop, 29-31 Oct 2001, SDSC
Lifemapper Future Directions
• Diversify modeling options, add interactivity, 3D
analysis and visualization
• Add new classes of data layers, remote sensing,
human impacts element, ecological models
• Add observational species data
• Embed dispersion models, temporal dimension
• Add internet services API, UDDI, SOAP
• Add more value-added services for data providers
• Embed LM data and analysis tools within a semantic
research and decision support network
• Integrate LM into informal and formal science
education
Lifemapper Social Scaling
•
•
•
•
•
Distributed authorship
Desktop computing
User preferences
Value-added collections data analysis
Acknowledgement and accreditation of
contributions, ranks and statistics
SINE Workshop, 29-31 Oct 2001, SDSC
Museums as Sensor Networks
• Data are dynamic, servers & connections
– Deborah Estrin -- Adaptive self-organization of the network,
unattended and untethered -- parallels to curators and collection
managers.
• Self-assembling, observational data
• Do not usually have the requirement of real time
• Changes are as important
– Source data (West Nile virus), model outputs
– Frank Vernon mentioned that in many cases it is not the data
values per se it is the change that is of importance
• People as part of the Network
– Doug Goodin people are part of the technological system” museum
are sensors, they are observatories, but the latency of bringing the
data into analysis engines is not measured in milliseconds but in
field seasons, or decades to get formal publication of new scientific
concepts. Many specimens and data are centuries old
SINE Workshop, 29-31 Oct 2001, SDSC
Acknowledgements
• University of Kansas
– Dave Vieglais, Ricardo Pereira, Aimee Stewart,
Greg Vorontsov, Town Peterson, BRC
• SDSC
– David Stockwell, Environmental Computing
• University of Massachusetts-Boston
– Bob Morris, CS, Rob Stevenson, Biology
• UC Berkeley
– John Wiecorek, Museum of Vertebrate Zoology
– Dan Wertheimer, Space Science Laboratory
• Agriculture Canada
– Derek Munro, ITIS Canada Office
• California Academy of Sciences
– Stan Blum, Informatics
SINE Workshop, 29-31 Oct 2001, SDSC