No Slide Title - Eionet Projects

Download Report

Transcript No Slide Title - Eionet Projects

National Biological Information
Infrastructure:
Status of the Biocomplexity Thesaurus
Vivian Hutchison
NBII Metadata Program Coordinator
Ecoterm Meeting: Berlin
April 15, 2005
NBII: What is it?
• A broad, collaborative program to
provide access to data and information
on our Nation’s biological resources.
NBII Node Structure


Regional
Thematic

Infrastructure
Regional Node Structure
GLOBAL
REGIONAL
NATIONAL
World Data Centers (WDC),
Global Biodiversity Information Facility (GBIF),
Clearinghouse Mechanism (CHM)
Pacific Biodiversity Information Forum (PBIF),
The Inter-American Biodiversity Information
Network (IABIN)
NBII (US), CBIN (Canada),
ERIN (Australia)
LOCAL
State Heritage Programs,
GAP Analysis,
County Park Information
NBII partners are comprised of a
variety of organizations
GOVERNMENTS
PRIVATE
UNIVERSITIES
MUSEUMS
NBII Metadata Standardization
• Dublin Core (Web Resources Catalog)
• Biocomplexity Thesaurus (biological terminology)
• Integrated Taxonomic Information System (ITIS)
(taxonomies)
• FGDC Biological Data Profile (data set metadata)
• UDDI & OGC Catalog Services Registry (services)
• Darwin Core (species collections)
Why a Biocomplexity Thesaurus?
 NBII mission to bring together the nation’s
biological resources requires a common
language to describe resources
 Thesauri provide a common language for:
 More efficient searches
 Provide for more precise searches
 Increase relevant results
 Bottom Line: controlled vocabulary increases
data discovery
Creation of the Biocomplexity Thesaurus
NBII partnered with Cambridge
Scientific Abstracts (CSA)
Five (5) existing thesauri combined into
one Biocomplexity Thesaurus:
CSA Aquatic Sciences and Fisheries
Thesaurus
CSA Life Sciences Thesaurus
CSA Pollution Thesaurus
CSA Sociological Thesaurus
California Environmental Resources
Evaluation System (CERES)/NBII
Biocomplexity Thesaurus: The Basics
• Over 9,500 terms
• Different kinds of relationships:
7,200 Broader / narrower term pairs (BT / NT)
27,000 Related terms (RT)
12,700 Subject categories (SC)
500 Scope notes (SN)
2200 Preferred term pairs (USE / UF)
• All terms in English
• CSA’s copyright prohibits redistribution of complete
product
• Visible at: http://thesaurus.nbii.gov
Biocomplexity Thesaurus: The Website
Biocomplexity Thesaurus: The Basics
Look-up tool automatically
stems for prefixes and
suffixes
Thesaurus terms can be
rotated by clicking on the
linked term to further
examine facets of that term
Technical Details:
Microsoft SQL server
database
Active Server Page
(ASP) scripts
A Need to Share the Thesaurus…
• Need to share
thesaurus:
–
–
–
–
–
Thesaurus Web site
Clearinghouse search
BioBot (Web search)
Geospatial applications
Partner use, for
applications unknown
Use Case: NBII Metadata Clearinghouse
• Over 17,000 records
• Contributions from 25
partners
•All records contain keywords
How can a thesaurus web
service be applied to the
Clearinghouse?
NBII Biocomplexity Thesaurus and
Web Services: Possible Uses
 In the NBII Clearinghouse:
 A user searches on “bug”
 The web service extends the search to all synonyms of “bug” – all in
the background
 User gets more precision oriented results
OR
 A user searches on a term and gets no hits
 Web service can offer a broader or narrower term suggestion
OR
 Indexing: if a metadata record has a term in it, all broader or narrower
terms could be added in to that record by using the thesaurus
 Example: “Smokey Mountains” is in record – BT=“Tennessee”
NBII Biocomplexity Thesaurus and Web Services
 Web services are best
method for interoperability
 Hardware and programming
language are independent
 NBII Thesaurus is now
available via a basic web
services (SOAP) interface
 Development began with a
format from the National
Agricultural Library (NAL)
 Used the biocomplexity
thesaurus database on the
backend
 Tested functionality
NBII Biocomplexity Thesaurus and Web
Services: Next Steps
 Recognize need for international interoperability
 Move from NAL model to the SKOS (Simple Knowledge
Organization Systems) international thesaurus standard model
 SKOS has multi-lingual capacity and is a recognized standard
 Once completed, NBII can then write one client to query all web
services that are developed in the SKOS standard model
Biocomplexity Thesaurus and SKOS:
Implementation Process
Biocomplexity Thesaurus database tables are
designed differently from the way SKOS looks at it
Need to develop a mapping to SKOS from current
database layout
Migrating a MS SQL structured thesaurus database
into Resource Description Framework (RDF) using a
custom built utility application
Able to create multilingual supporting RDF
repository, working on retrieval of multilingual data
through SKOS API
CSA/NBII Biocomplexity Thesaurus: Recap
 Current Features:
 Term/concept lookup available on web
 Recommend new terms/concepts
 Basic web service
 Future Enhancements:
 Alphabetical term browser
 Topic trees
 Convert current web service to SKOS
 Multilingual capabilities
 Will be available for use by other organizations (may require
free registration)
 Potential new partnerships to expand content in areas of:
Forestry
Fire Ecology & Management
Questions? Comments?
Thank you!
Viv Hutchison
NBII Program Office
[email protected]
Technical Questions?
Contact Tim Rhyne
Oak Ridge National Lab
[email protected]
Backup Tim Rhyne Slides…
Why a Web Service?
•
•
•
•
Today’s best technology for interoperability
Extensible Markup Language (XML)
Simple Object Access Protocol (SOAP)
Hardware and programming language
independent
Approach
• Nothing off the shelf found
• Other Web Service experience: Web
Map Server (OGC-compatible), Perl
SOAP-lite
• Reviewed a couple Java frameworks
for Web services)
– AXIS (Apache) – most widely used,
selected this
– JibxSoap (SourceForge) – impressive
benchmark performance but in an early
alpha release. Will keep an eye on this.
• “Googled” for thesaurus web service
implementations
Test Setups
• National Agricultural Library (NAL) protocol
– Simple
– Supported thesaurus Web site needs
• SKOS
– More complex, new terminology - concept
approach instead of term approach
– Difficulty setting up and running examples
– Differences in back-end implementation with
RDF – Need a RDF repository, beneath the web
service
– Sample Implementation - Sesame acts as the
backend RDF repository on top of a RDBMS
and the web service takes client request,
communicates with sesame (using RDF Query
Language) and responds backs to the client
“Are we there yet?”
NAL had limitations
• International
interoperability—we need
to share with and from
others
• Multilingual—identified a
near-term need to support
Latin America
SKOS II, Return of the API
• Working through examples
• Migrating a MS SQL structured
thesaurus database into RDF using a
custom built utility application
• Currently using MySQL as the sesame
backend RDBMS, sesame also supports
MS SQLServer and Oracle databases
• Able to create multilingual supporting
RDF repository, working on retrieval of
multilingual data through SKOS API
Status
• Beta release by end of month
• Production release by end of
May
• Possible project to translate to
Spanish and Portuguese for
Western Hemisphere use
• Will be available for use by
other organizations (may
require free registration)