NDG Vocabulary Server Outline Description
Download
Report
Transcript NDG Vocabulary Server Outline Description
Vocabulary Workshop, RAL, February 25, 2009
NERC DataGrid
NERC DataGrid
Vocabulary Server
Description
Outline
Vocabulary Server:
NERC DataGrid
Data model
Implementation
Content
Usage
Development path
Vocabulary Server Data Model
NERC DataGrid
The fundamental building block of the
data model is a term, which is
equivalent to a SKOS “concept”
Each term has:
Key: a semantically neutral string that forms
the basis of a URN
Label: a human-readable name for the
concept
Alternative label: used for abbreviations
Definition: more verbose explanation of the
concept
Vocabulary Server Data Model
The terms are aggregated into lists
equivalent to SKOS ‘collections’
NERC DataGrid
Each list is given a semantically neutral
identifier (4-byte string)
Lists may aggregated in ‘Superlists’
Each ‘Superlist’ is given a semantically
opaque identifier (bytes 1-3 of the
component list identifiers)
Vocabulary Server Data Model
The ‘Superlist’ concept was inherited
from 1980s BODC infrastructure
NERC DataGrid
It has no parallel in any knowledge
representation standard
It is has the unpleasant side effect of
giving terms alternative possible URNs
Its deprecation is becoming a priority
Vocabulary Server Implementation
Server back end is an Oracle relational
database
NERC DataGrid
All terms are stored in a single table
List and superlist aggregations implemented
as a 2-level indexing table hierarchy
Heavily defended by constraints and triggers
Fully automated timestamps and update
‘fingerprints’
Fully automated audit trails
Fully automated list and superlist versioning
Vocabulary Server Implementation
NERC DataGrid
Term URLs, list URLs and API calls
invoke Java applications that submit
SQL queries and wrap up the output as
XML documents
Vocabulary Server Implementation
Why not XML?
NERC DataGrid
Grew out of an integral part of the BODC
Oracle infrastructure
Experiments with XML – particularly OWL –
technology did not go well
Maintenance tools seem less effective
Navigation difficulties through very large XML
documents
Performance issues with lists containing 20000+
terms
XML has benefits such as access to
inference engines, so worth persevering
Answer might be to have operational XML
builds from a relational back end
Vocabulary Server Content
Server Contents (2009-02-10)
NERC DataGrid
76 public superlists
125 public lists
124701 public terms
80987 public mappings (RDF triples)
Some of the subject areas covered
Parameters
Platforms
Instruments
Coverage terms
Geographic keywords
Vocabulary Server Usage
Server Usage for 2008 (2009 to 2009-02-10 in
brackets)
NERC DataGrid
4793116 (607172) total hits
56232 (7134) vocabulary catalogue downloads
78708 (10233) vocabulary term/list downloads
1367 (433) vocabulary map downloads
2479 (73) term searches
1501 (74) term verifications
Rest of total is robots mining semantic links
(getRelatedRecordByTerm method)
VS Development Path
Version 1.1 current operational version
NERC DataGrid
Version 1.2 currently under development
Transparent upgrade (no change to WSDL)
Bug fix and activation of versioned list
serving
Additional service API providing list content
upgrade functionality to authenticated,
authorised external users
VS Development Path
Version 2.0 currently being designed
NERC DataGrid
Revisit back end design
Governance labelling
Deprecation support
Introduce more XML technology?
Introduce formally-registered, truly
permanent URNs
Single RESTful API giving both read and
write access through appropriate HTTP
methods
Output document revision to SKOS 2008
VS Development Path
NERC DataGrid
Whatever happens with V2.0 we will not
annoy a large and very active user base
through change
Both versions will therefore run in
parallel until V1.2 calls are no longer
logged