Organising data flows and modelling for Essential Biodiversity

Download Report

Transcript Organising data flows and modelling for Essential Biodiversity

Organising data flows and
modelling for the
Essential Biodiversity Variables
Hannu Saarenmaa – University of Eastern Finland
• GEO BON, WG8 – Data Integration and Interoperability
• EU BON, WP2 – Data Integration and Interoperability
• BioVeL, WP2 – Workflows for Scientific Research
GEO - X Plenary
Geneva, 14 January 2014
1
Essential Biodiversity Variables
•
•
•
•
•
•
•
Conceived by GEO BON Collaborators (Pereira et.al. (2013) “Essential Biodiversity
Variables”, Science, Vol. 339, 18 Jan 2013).
EBVs facilitate data integration by providing an intermediate abstraction layer
between primary observations and indicators.
Computed from a large number of inputs (monitoring/incidental data).
EBVs aim to help observation communities harmonise monitoring, by identifying
how variables should be sampled and measured.
EBVs standardise an ontology for biodiversity and harmonise measurements,
observations, and protocols.
Endorsed by Convention on Biological Diversity (CBD) and in line with the 2020
Aichi Targets.
Provide focus for GEO BON and hence for the interoperability thrust within GEO
BON.
• A use case that GEO BON, EU BON and BioVeL focus on.
Where does the data come from?
• In Europe there are about 2000 biodiversity
observation networks (only 643 listed by EUMON).
• GBIF has 10,000 data sets, openly accessible,
conforming to GEOSS Data Sharing Principles.
• LTER/DataONE has 1,000’s biodiversity datasets.
• EU BON is carrying out a gap analysis:
– There is a massive duplication of effort in data management,
and lack of data sharing.
– There are very few data sets whose ”quality” (coverage,
accuracy, etc.) has been documented and guaranteed.
– So called ”Data core” in biodiversity has not yet been
defined.
4
Biodiversity Virtual
e-Laboratory
BioVeL processing services and workflows
• “Workflows” (series of data analysis steps)
allow to process vast amounts of data.
• Build your own workflow: select and apply
successive “services” (data processing
techniques.)
• Import data from one’s own research
and/or from existing libraries (i.e. GBIF,
Catalogue of Life).
•
Access a library of workflows and re-use
existing workflows.
• Cut down research time and
overhead expenses.
Part of a workflow to study the
ecological niche of the horseshoe crab
6
Aim: Predictive modelling of biodiversity change
Available tools from a growing family of ENM workflows
– released to public at www.biovel.eu
1.
Data assembly, cleaning,
and refinement
Ecological Niche Modelling Workflow (ENM)
– Classic ENM with 15 algorithms
– Separate BioClim workflow (requires special inputs)
3.
Data discovery
Data Refinement Workflow (DRW) for pre-processing
– Taxonomic Name Resolution / Occurrence retrieval
– Geo-temporal data selection using ‘BioSTIF’.
– Data quality checks / filtering using ‘Google Refine’.
2.
The analytical cycle
ENM Statistical Workflow (ESW) for post-processing
– DIFF: Extent and intensity of change
– STACK: Extent, intensity, and a cumulated potential
– SHIFT: of the centre of gravity (direction, length, in
kilometers)
Ecological Niche
Modeling
Statistical analysis
8
Seamless
exchange
of data
layers
http://openmodeller.cria.org.br/
Use case: The spruce bark beetle, Ips typographus,
disturbance of forest ecosystems
Pre 2002
Year 2050
Difference
• Statistical processing of the difference in Finland indicates that susceptibility of
spruce forests to Ips typographus damage will get five-fold by 2050.
• Policy advise: Stricter forest hygiene through tougher legislation, so that
Ips populations are kept at minimum, because of the increased risk.
• Papers for Silva Fennica and INTECOL session proceedings at Journal of Ecology.
Outline of the use case
• Running Ecological Niche Modeling (ENM) workflow for large
number of species
–
–
–
–
–
Process data points for hundreds of species (e.g. plants, butterflies, …)
Use data mostly from GBIF, but also from elsewhere
Each individual species may have 105 of data points
Run openModeller based ENM for all the data points
Choose predictive layers from WorldClim and GEOSS sources
• Generate summary statistics that can answer questions such as:
– How many species are increasing? How many are decreasing?
EBVs?
Does the flora/fauna move to any direction? Is distribution
fragmenting? Is distribution shrinking? How many populations are
becoming marginalised?
– Prototype automatic data processing for computing the Essential
Biodiversity Variables (EBV)
11
Status of the current BioVeL ENM workflow
• Current openModeller based ENM workflows work at a
smaller scale – focus on one or a few selected species
• Current workflow requires frequent interaction with the user
(many clicks if we simply multiply runs)
• We need a system that is scalable and automated to run ENM
for hundreds of species
• We need a system that can perform a summary analysis
across all the species based on the individual ENM runs
• The 2nd generation BioVeL portal will provide the required
capabilities.
• To be released publicly in January 2014 (currently in beta mode)
12
Envisaged application structure
Selected species
ENM parameter
sets for species
GBIF
query
LTER
query
...
ENM
workflow
ENM
workflow
...
ENM
output
file
ENM
output
file
EUMON
query
ENM
workflow
ENM
output
file
Summary
analysis
• Multiple species may use the same ENM
parameter set (e.g. Mediterranean dryland plants)
• Parameter sets are generated and tested with
another workflow (see next slide)
• Some species may need other
offline data, or private data
(uploaded from user side).
• One ENM workflow predicts the impact of
environmental changes on the distribution of
one species.
• Portal offers files for download
• Performed with R-based custom tool outside the portal
• EBV production by combining data from different models
13
ENM parameter optimisation workflow
Selected species
Parameter
test and
selection
job
Parameter
test and
selection
job
Parameter matrix
...
ENM parameter
sets for species
• Possible parameter combinations.
Parameter
test and
selection
job
• The optimal parameter input for the large
ENM workflow (see previous slide)
14
Initialising the data sweep
on portal
15
Results of data sweep,
ready to be mapped, and
statistically analysed
16
Example product: Accumulated invasive
potential for ecological groups
20 blacklisted species divided in 4 ecological regimes
Zoobenthos
Phytobenthos
Example: Stack of combined macrozoobenthic invasion heatmaps
Zoopelagial
Phytopelagial
Slide by Matthias Obst, BioVeL
QUESTIONS?
www.earthobservations.org/geobon.shtml
www.eubon.eu
www.biovel.eu
18