- Global Biodiversity Information Facility

Download Report

Transcript - Global Biodiversity Information Facility

Biodiversity Data Publishing
May 2011
Core publishing focus
Primary Biodiversity Data (Specimens &
Observations, Ecological Data)
- Core data type is an occurrence of a taxon
Taxonomic Catalogues*, and Annotated
Species Checklists.
- Core data type is a taxon
Enriched resource metadata – primarily
focused on occurrence and taxon datasets.
* To distinguish our efforts from COL – GBIF provides the means not the ends
Data Publishing Platform for who?
Institutional publishers in developed countries.
Large proportion of current publishers in this
category.
• Smaller institutions with less technical
capacity. Many in high-biodiverse regions
• Small (individual scientist) data holders
‘Disenfranchised” potential publishers who
currently don’t recognise GBIF as a publishing
option
Data publishing strategy
How
• Consolidate
• Strengthen
• Simplify
• Accelerate
• Extend
Consolidated data standards
Primary
Biodiversity
Data
Taxonomic
Data
Darwin Core
• 172 Terms
• Ratified in 2009
• Text files
• Extensible
Metadata
Ecological Metadata
Language (EML)
• Rich dataset
descriptions
• GBIF Profile
Darwin Core Archive
Primary
Biodiversity
Data
Taxonomic
Data
Metadata
http://www.someplace.org/data.zip
Darwin Core Archive
• Complete package of data
– One file
– Multiple files
• Text Files…
• Self-documenting
• Intended to be shared/distributed
Archives always have a ‘core’ data file
My_data.txt
The core data file is a text file.
Rule 1: Archives always have a ‘core’ data file
Records are based on taxa – one species per row
Species only appear once in a species lists
OR
Records are based on species occurrences– one occurrence per row
Same species may be recorded many times
The core file contains a column with a
“core ID”
This identifier must be unique for each
row in the core file.
Columns are mapped to corresponding
Darwin Core terms
There are two ways to map columns to Darwin Core terms
Rename columns to match Darwin Core
terms
In this case an archive consists of
a single text file
Taxon.txt
Describe the mappings in a separate file
This file is called “meta.xml”
Columns that do not map to a term
can be included – but are ignored
“Wingspan” is not a Darwin Core term
Darwin Core Archive (two files)
meta.xml describes the mappings in the
core data file (species.txt)
More on how we make the meta.xml file later…
The core file can be extended with
extension files
Species.txt
Extensions share a common core ID
Common_names.txt
Extensions allow multiple records to be linked to the core record.
Multiple extensions are available
Columns in extensions are mapped to Darwin Core using the
meta.xml file
Many extensions are available
http://rs.gbif.org/extension/
Global Names Architecture
A Darwin Core-based profile using the GBIF network
to share taxonomic information.
Evaluation underway – 16 reviewers / 39 checklists
Rule 3: Document your dataset
A Darwin Core Archive contains a resource metadata document.
Ecological Metadata Language (EML)
For describing datasets – even if you don’t publish data
•
•
•
•
•
•
•
Title and Abstract
Citation and Attribution
Contact and Authors
Geographic Scope
Sampling Methods
Bibliography
and more…
Note: XML knowledge not required!
Darwin Core Archive
All files are stored in a single folder
Darwin Core Archive
The folder is zipped.
• Your data
• Data mapping file
• Dataset documentation
This is a Darwin Core Archive
A Published Darwin Core Archive
http://www.organisation.org /my_data.zip
When the file is stored on a web server it gets a URL.
This URL is shared with others to “publish” your data!
How to create Darwin Core Archives
Integrated Publishing Toolkit
Metadata Authoring
Primary Biodiversity Data
Species Checklists
Data Hosting Centers
Coming soon…
Endangered Wildlife Trust
SABIF EIA Data Center
INBIF EIA Data Center
Publish with spreadsheets
•
•
•
•
•
Metadata
Primary Biodiversity data
Species Checklists
Publishing via email
For biologists and database managers
No special software required
Darwin Core Mapping Assistant
Metafile
http://tools.gbif.org/dwca-assistant/
Suite of publishing options
Data Publishing documentation
•
•
Full documentation for all aspects of data publishing
Living documents
GBIF Schema Repository
Darwin Core Terms
List of Extensions
Vocabularies
http://rs.gbif.org/
An schema repository for developers and trainers