Transcript slides

Accessing Biodiversity Resources in
Computational Environments from Workflow
Application
J. S. Pahwa, R. J. White, A. C. Jones, M. Burgess, W. A. Gray,
N. J. Fiddian, T. Sutton, P. Brewer, C. Yesson, N. Caithness,
A. Culham, F. A. Bisby, M. Scoble, P. Williams and S. Bhagwat
WORKS 2006, Paris
Overview
•
•
•
The Biodiversity World (BDW) Project
The three exemplars chosen for BDW
BDW Architectural Components
a)
b)
c)
d)
•
•
•
Resource Wrappers
BiodiversityWorld-GRID Interface (BGI) Communications Layer
BDW Datatypes
The Metadata Repository (MTR)
Using BDW for bioclimatic modelling
Access to computational resources in BDW environment
Further Work & Conclusions
The BDW System
• A framework for biodiversity problem-solving
• provides access to widely dispersed, disparate
data sources and analytical tools
• Intended particularly for analysis and modelling of
biodiversity patterns
• Provides access to resources originally
designed for use in isolation
• Resources may be composed into complex
workflows
BDW Exemplars
A. Biodiversity richness analysis and
conservation evaluation
B. Bioclimatic modelling and global climate
change
C. Phylogenetic analysis and biogeography
Biodiversity Richness Analysis and
Conservation Evaluation
Aim:
• analysis of biodiversity richness patterns for a
particular taxon (e.g. group of species) around the
world
The BDW System enables:
• Taxonomic verification using the Species 2000
Catalogue of Life service
• Composition of distribution datasets for the chosen
taxon from various sources around the world
• Use of the WorldMap System to
• visualise the distribution datasets, and
• help identify priority areas for biodiversity conservation
Bioclimatic Modelling and Global Climate
Change
Aim:
• Understand impact of global climate change on
distribution and diversity of plant & animal species
• Identify climatic & ecological conditions under which
a single species lives, extrapolating from known
occurrences
• Hence calculate a potentially wider set of areas
where the species might occur, or predict future
distribution under anticipated climatic conditions
• A bioclimatic modelling workflow example follows
later
Phylogenetic Analysis and Biogeography
Aim:
• Discover ancestral relationships between groups of
organisms using methods of phylogenetic analysis
• Estimate ages of species
• Use estimates of historical climate to produce
plausible estimates of geographical distributions
• Assess historical relationships between changing
climate and development of new species
The BDW System provides (1):
• A flexible and extensible problem solving
environment (PSE)
• Means of
• bringing together heterogeneous, globally distributed,
biodiversity-related resources & analytical tools
• assembling resources into workflows to perform complex
scientific analyses
• Consistent mechanisms to achieve interoperability
of system components
The BDW System provides (2):
• Uniform interfaces for heterogeneous
resources (resource wrappers)
• Mechanism for data packaging & transfer
• Compatibility with the Triana Workflow
System for assembling and executing
workflows
• Web Services-based Grid middleware for
accessing remote computational resources
The BDW System Architecture
BDW architectural components (1)
Resource Wrappers
• Provide consistent interface to local & remote resources, and
standard resource access/invocation mechanism
• Insulate the core BDWorld System from resource
heterogeneity
• Wrap various kinds of resources and analytical tools and can
be deployed in Grid/Web Services environment.
• Give consistent form to data retrieved by encapsulating them
into BDWorld data types
• Resources wrapped include AVH, GBIF, OpenModeller, etc.
Resource Wrapper Architecture
BDW architectural components (2)
BDW-GRID Interface (BGI) Layer
• Provides standard mechanisms for invoking operations on
heterogeneous resources
• Acts as an integrated mechanism for accessing all resource
wrappers
• Isolates resource wrapper implementation to a separate layer
to enable the use of web services/grid technologies
BDW architectural components (3)
BDW Datatypes
• Encapsulate different types of data and sub-datatypes for
transporting data between end points
• Can be transformed into xml representations which can be
easily serialised
• Flexible enough to encapsulate user-defined xml documents
or data in a string representation
• Extensible; new datatypes can be incorporated
BDW Datatypes
BDW architectural components (4)
BDW Metadata Repository
• A specialised BDWorld resource
• Provides information such as:
•
•
•
•
Available resources
Operations supported by each resource
Data types used by operations
Location of resource wrapper
• Stores semantic information in the BDWorld ontology,
to answer questions such as
• ‘Which resources can provide me with species data?’
• ‘Which available operations can accept the outputs from a
specific operation?’
Bioclimatic Modelling (1)
• By using the known localities of a species, a
climate preference profile is produced by
cross-referencing with present day climate
data
• This climate preference profile is then used to
locate other areas where such a climate
exists, indicating areas climatically suitable for
the species
Bioclimatic Modelling (2)
• Using present-day climate:
• assess areas under threat from invasive species,
or
• those that may benefit from the introduction of a
new crop
• Using climate predictions for the future:
• assess possible effects of global climate change
on the distribution of study species
• Using climate predictions for the past:
• assess changes caused by natural factors in the
past
Bioclimatic Modelling Workflow performed by
Triana workflow package in BDW system
Example model output for the clover species Trifolium patens Schreber (a member of the bean
family). The map shows areas (shaded regions across Central and Eastern Europe, South America,
Asia and Australia) predicted to be suitable for the species in the 2050’s using the bioclimatic
modelling algorithm GARP and the Hadley Centre climate model using the SRES A1F climate
scenario.
The Current BDW Architecture:
Enables execution of BDW workflow tasks in
remote nodes but with a limited scope.
- Lacks in giving sufficient control and
flexibility to the user.
- Does not provide the functionality of
distributing user jobs across several
nodes.
- Dependent on libraries at the client side.
The new BDW System architecture (1):
• Provides user with access to:
- Biodiversity resources.
- Computational resources.
• Use the existing mechanism of invoking
operations on remote resources via resource
wrapper web services.
• It also uses condor middleware for utilising
computational resources and distributing
workload across available nodes.
The new BDW System architecture (2):
• Provides access to the condor pool via the web
service interface.
• Gives user to flexibility to choose available
computational node by using Ganglia cluster
monitoring toolkit.
• Enables matching of workflow task with preferred
resource(s).
The new BDW System architecture (2):
Conclusions and Further Work
• BDW brings together varied, distributed resources and
analytical tools for biodiversity researchers and analyse
biodiversity patterns
• Disparate resources can be accessed in the Web-Service
enabled BDW PSE.
• The BDW PSE has uniform access to heterogeneous
resources
• BDW allows linking of tools and resources in a workflow to
automate different activities of an experiment
• Three current exemplar study areas
• The new BDW architecture also provides access to
computational resources.
• Security – Shibboleth/chroot
Acknowledgements
•
•
•
•
•
BDW team
Species 2000
OpenModeller Community (including CRIA)
BBSRC
…