No Slide Title - Eionet Projects
Download
Report
Transcript No Slide Title - Eionet Projects
National Biological Information
Infrastructure:
Status of the Biocomplexity Thesaurus
Vivian Hutchison
NBII Metadata Program Coordinator
Ecoterm Meeting: Berlin
April 15, 2005
NBII: What is it?
• A broad, collaborative program to
provide access to data and information
on our Nation’s biological resources.
NBII Node Structure
Regional
Thematic
Infrastructure
Regional Node Structure
GLOBAL
REGIONAL
NATIONAL
World Data Centers (WDC),
Global Biodiversity Information Facility (GBIF),
Clearinghouse Mechanism (CHM)
Pacific Biodiversity Information Forum (PBIF),
The Inter-American Biodiversity Information
Network (IABIN)
NBII (US), CBIN (Canada),
ERIN (Australia)
LOCAL
State Heritage Programs,
GAP Analysis,
County Park Information
NBII partners are comprised of a
variety of organizations
GOVERNMENTS
PRIVATE
UNIVERSITIES
MUSEUMS
NBII Metadata Standardization
• Dublin Core (Web Resources Catalog)
• Biocomplexity Thesaurus (biological terminology)
• Integrated Taxonomic Information System (ITIS)
(taxonomies)
• FGDC Biological Data Profile (data set metadata)
• UDDI & OGC Catalog Services Registry (services)
• Darwin Core (species collections)
Why a Biocomplexity Thesaurus?
NBII mission to bring together the nation’s
biological resources requires a common
language to describe resources
Thesauri provide a common language for:
More efficient searches
Provide for more precise searches
Increase relevant results
Bottom Line: controlled vocabulary increases
data discovery
Creation of the Biocomplexity Thesaurus
NBII partnered with Cambridge
Scientific Abstracts (CSA)
Five (5) existing thesauri combined into
one Biocomplexity Thesaurus:
CSA Aquatic Sciences and Fisheries
Thesaurus
CSA Life Sciences Thesaurus
CSA Pollution Thesaurus
CSA Sociological Thesaurus
California Environmental Resources
Evaluation System (CERES)/NBII
Biocomplexity Thesaurus: The Basics
• Over 9,500 terms
• Different kinds of relationships:
7,200 Broader / narrower term pairs (BT / NT)
27,000 Related terms (RT)
12,700 Subject categories (SC)
500 Scope notes (SN)
2200 Preferred term pairs (USE / UF)
• All terms in English
• CSA’s copyright prohibits redistribution of complete
product
• Visible at: http://thesaurus.nbii.gov
Biocomplexity Thesaurus: The Website
Biocomplexity Thesaurus: The Basics
Look-up tool automatically
stems for prefixes and
suffixes
Thesaurus terms can be
rotated by clicking on the
linked term to further
examine facets of that term
Technical Details:
Microsoft SQL server
database
Active Server Page
(ASP) scripts
A Need to Share the Thesaurus…
• Need to share
thesaurus:
–
–
–
–
–
Thesaurus Web site
Clearinghouse search
BioBot (Web search)
Geospatial applications
Partner use, for
applications unknown
Use Case: NBII Metadata Clearinghouse
• Over 17,000 records
• Contributions from 25
partners
•All records contain keywords
How can a thesaurus web
service be applied to the
Clearinghouse?
NBII Biocomplexity Thesaurus and
Web Services: Possible Uses
In the NBII Clearinghouse:
A user searches on “bug”
The web service extends the search to all synonyms of “bug” – all in
the background
User gets more precision oriented results
OR
A user searches on a term and gets no hits
Web service can offer a broader or narrower term suggestion
OR
Indexing: if a metadata record has a term in it, all broader or narrower
terms could be added in to that record by using the thesaurus
Example: “Smokey Mountains” is in record – BT=“Tennessee”
NBII Biocomplexity Thesaurus and Web Services
Web services are best
method for interoperability
Hardware and programming
language are independent
NBII Thesaurus is now
available via a basic web
services (SOAP) interface
Development began with a
format from the National
Agricultural Library (NAL)
Used the biocomplexity
thesaurus database on the
backend
Tested functionality
NBII Biocomplexity Thesaurus and Web
Services: Next Steps
Recognize need for international interoperability
Move from NAL model to the SKOS (Simple Knowledge
Organization Systems) international thesaurus standard model
SKOS has multi-lingual capacity and is a recognized standard
Once completed, NBII can then write one client to query all web
services that are developed in the SKOS standard model
Biocomplexity Thesaurus and SKOS:
Implementation Process
Biocomplexity Thesaurus database tables are
designed differently from the way SKOS looks at it
Need to develop a mapping to SKOS from current
database layout
Migrating a MS SQL structured thesaurus database
into Resource Description Framework (RDF) using a
custom built utility application
Able to create multilingual supporting RDF
repository, working on retrieval of multilingual data
through SKOS API
CSA/NBII Biocomplexity Thesaurus: Recap
Current Features:
Term/concept lookup available on web
Recommend new terms/concepts
Basic web service
Future Enhancements:
Alphabetical term browser
Topic trees
Convert current web service to SKOS
Multilingual capabilities
Will be available for use by other organizations (may require
free registration)
Potential new partnerships to expand content in areas of:
Forestry
Fire Ecology & Management
Questions? Comments?
Thank you!
Viv Hutchison
NBII Program Office
[email protected]
Technical Questions?
Contact Tim Rhyne
Oak Ridge National Lab
[email protected]
Backup Tim Rhyne Slides…
Why a Web Service?
•
•
•
•
Today’s best technology for interoperability
Extensible Markup Language (XML)
Simple Object Access Protocol (SOAP)
Hardware and programming language
independent
Approach
• Nothing off the shelf found
• Other Web Service experience: Web
Map Server (OGC-compatible), Perl
SOAP-lite
• Reviewed a couple Java frameworks
for Web services)
– AXIS (Apache) – most widely used,
selected this
– JibxSoap (SourceForge) – impressive
benchmark performance but in an early
alpha release. Will keep an eye on this.
• “Googled” for thesaurus web service
implementations
Test Setups
• National Agricultural Library (NAL) protocol
– Simple
– Supported thesaurus Web site needs
• SKOS
– More complex, new terminology - concept
approach instead of term approach
– Difficulty setting up and running examples
– Differences in back-end implementation with
RDF – Need a RDF repository, beneath the web
service
– Sample Implementation - Sesame acts as the
backend RDF repository on top of a RDBMS
and the web service takes client request,
communicates with sesame (using RDF Query
Language) and responds backs to the client
“Are we there yet?”
NAL had limitations
• International
interoperability—we need
to share with and from
others
• Multilingual—identified a
near-term need to support
Latin America
SKOS II, Return of the API
• Working through examples
• Migrating a MS SQL structured
thesaurus database into RDF using a
custom built utility application
• Currently using MySQL as the sesame
backend RDBMS, sesame also supports
MS SQLServer and Oracle databases
• Able to create multilingual supporting
RDF repository, working on retrieval of
multilingual data through SKOS API
Status
• Beta release by end of month
• Production release by end of
May
• Possible project to translate to
Spanish and Portuguese for
Western Hemisphere use
• Will be available for use by
other organizations (may
require free registration)