Imaging Goals and Components

Download Report

Transcript Imaging Goals and Components

SERNEC Image/Metadata
Database Goals and Components
Steve Baskauf
2009-11-04
1
Overall goals
• To create a metadata database structure that
is flexible and can handle specimen data,
specimen images, and live plant images. The
database will be designed to easily output to
consumers including Morphbank, GBIF, and a
SERNEC web portal.
• To create contributor interface(s) that will
allow rapid data entry or transfer with
minimal contributor effort.
2
Conceptual scheme: players
Contributors without
institutional
infrastructure
SERNEC
Web portal
SERNEC database
Institutional
database
Morphbank
Conversion
utility
GBIF
contributors
consumers
3
General Principles
• SERNEC acts as a facilitator.
– Participation in the SERNEC database doesn’t prevent
contributors from doing anything that they were already doing
– SERNEC doesn’t “own” anything
– SERNEC sets minimum standards for participation that will allow
the system to operate and that will ensure the quality of the
metadata served
• Components in the system are “black boxes” that don’t
require participants to understand other parts
• Interactions among components are governed by
generally recognized standards for communication: XML,
LSIDs or LSID-based HTTP URIs, Darwin Core, MRTG
• System should not collapse if any component disappears.
4
Facts About Persistent Identifiers
• Persistent identifiers (universally unique
identifiers=UUIDs=GIUDs) are coming.
• In a complex system, unique identifiers are
needed to determine whether a resource
exists already (to prevent creation of duplicate
records)
• Use comes with responsibilities:
– Must guarantee uniqueness
– Persistence
– Should be actionable (provide metadata to users)
5
LSIDs (or HTTP URI) assignment
• urn:lsid:<authority>:<namespace>:<objectID> or
http://authority.org/urn:lsid:<authority>:<namespace>:<objectID>
• It appears likely that resolution service will be provided
centrally by a big player like GBIF, i.e. they will be the
authority: gbif.org .
• Individual users will be responsible for making sure
that their resources have unique string identifiers.
• SERNEC is probably going to have to be the party
ensuring that the namespace is unique (by negotiation
with the authority)
• Some users may generate their own persistent
identifiers and that will have to be fine with SERNEC.
6
Strategy for Generating Internal Unique IDs
• Each participating institution MUST have unique IDs within
each of their collections (this is the <objectID>)
• SERNEC keeps a list of institution codes checked with biocol.org
for uniqueness.
• If unique IDs within institution, <namespace> is institutioncode
• If unique IDs within collection but not institution, <namespace>
is institutioncode_collectioncode
• Internal Unique ID = <namespace>:<objectID>
• When an authority is willing to handle our GUIDs, we check to
make sure that each SERNEC namespace is unique within their
authority, then concatenate internal unique ID to authority part
of LSID.
7
System component: the database
SERNEC database
• Structure needs to be able to handle both specimen
and live plant images
• Must keep track of the status of resources
– Are they new with non-redundant IDs?
– Have they been updated?
– Has the data/metadata been passed on to the consumers?
• Should be simple enough or exportable enough to
outlive SERNEC if necessary
8
Individual
Herbarium
specimen
Specimen image
Individual
Live plant image
Live plant image
Live plant image
Specimen image
• Relevant occurrence types are specimens & images
• Record fields governed by:
• Darwin Core (general specimen & live-plant image metadata )
• MRTG (image-specific specimen & live-plant metadata)
• Individuals may be represented by a composite of the relationship
types shown if the plant is both imaged directly and collected.
9
Individual
(I)
determination 1
(D1)
determination 2
(D2)
resource
taxon 1
(T1)
taxon 2
(T2)
resource
resource
• Determination structure compatible with annotations
• Determination structure compatible with taxonomic
concept mapping (multiple possible names)
• Determination structure capable of tracking resources
used to make determination
• Determinations linked to standardized taxon units (ITIS
10
TSNs and/or LSIDS
SERNEC database /consumer relationships
SERNEC
Web portal
SERNEC database
Morphbank
GBIF
consumers
• SERNEC web portal: regional data, end-user
educational resources, facilitation of collaboration
• Morphbank: permanent image repository, provider
to downstream secondary consumers (i.e. EOL)
• GBIF: primary biodiversity database, possible future
resolution service for persistent identifiers
11
SERNEC database/web portal
• Support Flora of the Southeast or successor
web documentation efforts
• Provide user-friendly mechanisms for
searching for data and images, organize
“courtesy requests” for non-commercial use of
large numbers of images
• Provide access to data-driven
educational/research applications, e.g. visual
keys, iPhone data apps, teacher lesson plans
12
SERNEC database/Morphbank
• Capable of generating XML needed by
Morphbank for image submission.
• Query Morphbank services to determine
whether contributor has already uploaded the
image to Morphbank
• Update Morphbank image records if
contributor changes metadata.
13
SERNEC database/GBIF
• Provide primary biodiversity records to GBIF
using IPT/TAPIR protocol for institutions not
capable of maintaining their own services.
• Assuming at some point in the future GBIF or
another organization provides resolution
services for organizations not capable of
acting as LSID authorities, data from the
SERNEC database would be passed to the
resolution provider to be used for LSID
resolution.
14
SERNEC database/provider relationships
Contributors without
institutional
infrastructure
SERNEC database
Institutional
database
Conversion
utility
• Contributors without institutional infrastructure: SERNEC-created
web-based tools would allow users having limited record-keeping
capabilities and IT infrastructure to submit metadata and images
• Contributors with institutional infrastructure: SERNEC would create
customized conversion utilities that would accept database output of
various formats and convert them to a form that can be recognized by
the SERNEC database
15
SERNEC/Contributors without IT infrastructure
• Users would be responsible for:
– Collecting and organizing their own metadata using
software (e.g. Specify or Excel) capable of simple text (CSV
or tab delineated) or Excel output.
– Maintaining identifiers (strings) that are unique within
their institution.
• SERNEC-provided software would generate LSIDs and
convert metadata to fit SERNEC database data model
as well as facilitating the association of images with
metadata
• It is assumed that contributors will have little or no
interaction with consumers (GBIF, Morphbank)
outside of that facilitated by SERNEC
16
SERNEC/contributors with IT infrastructure
• Contributors may have their own system for:
– maintaining a complex database for metadata
– generating LSIDs and either maintaining their own
authority or transmitting metadata directly to
another institution acting as the authority (e.g. GBIF)
– managing specimen and live-plant images and
associating them with the appropriate metadata in
their database
• Conversion utility enables the SERNEC database
to “talk” to contributor’s system and update
SERNEC database
17
Main points
• All the necessary components (standards, contributors,
consumer organizations) exist or will exist within the next
year.
• SERNEC has established relationships with all of the
required players.
• Players are willing to participate and have a vested interest
in seeing it succeed.
• SERNEC has the human, financial, and IT resources to pull
this off.
• Participants take care of themselves to the maximum
extent possible, SERNEC “helps” smaller institutions to
participate on same level as bigger players.
18