- Global Biodiversity Information Facility

Transcript - Global Biodiversity Information Facility

Experts Workshop on the IPT, v. 2, Copenhagen, Denmark
The Pathway to the Integrated Publishing Toolkit
version 2
Tim Robertson
Systems Architect
Global Biodiversity Information Facility (GBIF)
[email protected]
20 June 2011
Agenda
‣ Why an IPT?
‣ The project history
‣ IPT version 1.0
‣ The rationale for version 2.0
‣ Key functionality of the IPT v2.0
Who has used an IPT?
Who has installed an IPT?
The IPT Vision
‣ A single platform allowing the sharing of
‣ Primary biodiversity data
‣ Species name information
‣ Dataset descriptions (metadata)
The IPT Vision
‣ The ability to register with GBIF
‣ Technical contact information
‣ E.g. Internet URLs
‣ Physical contact information
‣ E.g. telephone details
‣ Institutional affiliations
‣ Accurate attribution
The IPT Vision
‣ Connect databases
‣ Upload text files
‣ Lower the technical threshold for
participation
The IPT Vision
‣ Flexibility to accommodate data extensions
‣ Support efficient and simple transfer of
content
‣ An open source project
Why an IPT?
‣ Biodiversity provider tools existed
‣ DiGIR
‣ PHP implementation
‣ BioCASe
‣ Python implementation
‣ TAPIR
‣ PHP / .NET implementation
Why an IPT?
‣ Limitations in existing tools
‣ Checklist content lacking
‣ No formally recognized metadata standards
‣ No automatic registration with GBIF
‣ Schemas either simple or very complex
‣ Data transfer sub-optimal (e.g. speed)
‣ No ability to upload data
Why an IPT?
Why an IPT?
Who has used the
IPT v1.0?
Who had trouble using
the IPT v1.0?
IPT v1.0
‣ First released 2009
‣ Java based web application
IPT v1.0: Feature rich
‣ Administration
‣ Users, organisations, extensions, vocabularies
‣ Datasets
‣ Text files, connect a database
‣ Discovery of content
‣ Graphs, metrics, maps, search, browse
‣ Interfaces
‣ DwC Archive, TAPIR, OGC WMS
Consequences of features
‣ Required an embedded database
‣ Limited performance
‣ Required a mapping server
‣ Significant resources (memory)
Community Feedback
‣
‣
‣
‣
Server requirements too high for many
Performance unsatisfactory
Dataset size limitations a barrier
Stability unacceptable
‣ Data loss in 2 instances
‣ Complexity too high for some
The concept was
sound!
…rationale for
Who has used the
IPT v2.0?
Who has installed
the IPT v2.0?
v2.0: Key functionality
‣
‣
‣
‣
‣
‣
User management
Extension management
Institution management
Configuring datasets
Managing dataset state
Interfaces
User management
‣ Administrator
‣ Manager (different trust levels)
‣ With registration permissions
‣ Without registration permissions
‣ General user
Extension management
‣ By communicating with the GBIF
registry, automatically discover
‣ Data extensions
‣ Vocabularies
Institution management
‣ No ability to create institutions
‣ By communicating with the GBIF
registry, select
‣ Institution hosting the IPT
‣ Institutions that will share datasets in the IPT
Configure Datasets
‣ Author metadata
‣ GBIF Metadata profile
‣ Upload text files
‣ CSV, tab delimited etc.
‣ Connect a database
‣ MySQL, Oracle, SQL Server, PostgreSQL etc.
Configure Datasets
‣ Map content to extensions
‣ Manage user permissions
‣ Shared dataset management
Configure Datasets
‣ Manage dataset state
‣ Private: only to the managers
‣ Public: anybody
‣ Registered: On the GBIF network
Interfaces
‣ Darwin Core Archive
‣ Ecological Metadata Language
‣ Now as a manuscript also in 2.0.2+
Differences:
‣ Reduced functionality
‣ TAPIR
‣ Geoserver
‣ Visualisations
‣ Search and browse
and
Differences:
and
‣ Reduced server requirements
‣ Memory 1-2GB (v1.0) now 256MB (v2.0)
Differences:
‣ Increased performance
‣ 24m records
‣ 50 minutes
‣ MySQL
‣ 256MB memory
and
Differences:
and
‣ No internal database
‣ Increase robustness with simple files

- Global Biodiversity Information Facility

Transcript - Global Biodiversity Information Facility

Directory