Thesaurus-based access to multimedia collections

Download Report

Transcript Thesaurus-based access to multimedia collections

Reflections from the FACET Project
Doug Tudhope
Hypermedia Research Unit
University of Glamorgan
NKOS Workshop, JCDL 2005
Presentation
• FACET Project
– Faceted Knowledge Organisation Systems (KOS)
– Semantic expansion
– Web Demonstrator
• Reflections / Current work
– Need for standard representations and API
– Pilot Terminology Services
– KOS and Semantic Web
– Cost/Benefit issues
FACET - Faceted Access to Cultural
hEritage Terminology
FACET - a collaborative project investigating the potential of
semantic term expansion in retrieval
Aims:
• Integration of thesaurus into the interface
• Semantic term expansion and matching function
taking advantage of facet structure
http://www.comp.glam.ac.uk/~FACET/
FACET Collaborators
•
Research Council Funding: EPSRC 3 years
•
National Museum of Science and Industry (NMSI):
National Railway Museum and Science Museum Collections Database
•
J. Paul Getty Trust
Art and Architecture Thesaurus (AAT)
•
Museum Documentation Association (MDA)
Railway Thesaurus
•
Canadian Heritage Information Network (CHIN)
Advisors
NRM Collection
examples of free text object descriptor fields
•
•
•
•
•
•
•
•
•
Chair, London Midland & Scottish Railway, straight wooden back initials
carved on back, green leatherette seat.
Chair, Railway Clearing House, Curved back with blue leather inset &
blue leather seat. R. C.H. carved on back
Chair, M.S. & L.R., Straight back, blue leather seat with M.S. & L.R.
carved across back
Armchair, Pullman, green plush, fringed from Pullman section.
Carver chair, Oak with oval brocade seat. Prince of Wales crest on
back from Royal Saloon of 1876
Armchair, Upholstered in blue maquette with curved, buttoned back &
scroll arms. Wooden legs
Occasional table, Oak with drawer, ornately carved. From Royal Saloon
of 1876
Set of 4 chairs, High-backed carver chairs upholstered in floral
maquette
Clock, made by Jno Walker, 250 Regent Street. Metal face/Roman
numerals. Carved wooden square case. 20"x18"x10"
Semantic Term Expansion
Reasoning over thesaurus semantic relationships
allows the system to play an active role
•
•
•
•
Ranking of matching items in a result set
Automatic suggestion of terms to be considered for query
Query reformulation and ‘more like this’ option
Augmented Browsing tools – semantic expansion
Underpinning technologies:
• Measures of distance over the semantic index space
• Matching Function for sets of terms
FACET Prototype
• SQLServer database: collections DB and Thesaurus
• C++ thesaurus term expansion engine
• Dual thesaurus representations
– database
– in-memory data structure
• Visual Basic and Web client interfaces
–
–
–
–
–
‘Find Term’ mapping to terms, alternates, scope notes
Browse hierarchies
Semantic browsing
Query Builder
Ranked results
Faceted Knowledge Organisation Systems
Faceted classifications based on primary division
into fundamental, high-level categories (facets)
Compound descriptors (multi-concept headings) are synthesised
by combination of terms from limited number of fundamental facets
In constructing AAT, adjectival noun phrases very common:
e.g. painted oak furniture
“Rather than enumerate the nearly infinite number of object and
subject descriptions needed by thesaurus users, the AAT decided to
pursue the building blocks of these descriptors in the form of a faceted
vocabulary”
(Guide to Indexing and Cataloging with the Art & Architecture Thesaurus)
Matching Problem
“The major problem lies in developing a system whereby individual parts of
subject headings containing multiple AAT terms are broken apart, individually
exploded hierarchically, and then reintegrated to answer a query with
relevance”
(Toni Petersen, AAT Director)
Query: mahogany, dark yellow, brocading, Edwardian, armchair
Descriptor: oak, light yellow, crests, ovals, brocade, Victorian, Carver chair
Potentially extra / missing / partially and non-matching terms
System Architecture
FACET standalone system
http://www.comp.glam.ac.uk/~facet/webdemo/
[email protected]
FACET Web Demonstrator
•
illustrates thesaurus content and semantic expansion in a fairly realistic
Web prototype application
•
Intended more as an exploration of FACET research outcomes as
dynamically generated Web components than a general interface but
suggestive of possible interface components
•
Not rely on pre-built static HTML pages thesaurus content is generated dynamically
http://www.comp.glam.ac.uk/~FACET/webdemo/
FACET Web Demonstrator implementation
• Browser-based interface (ASP application), using a combination
of server-side scripting and compiled components
• Persistence of state information between page requests a
problematic issue - HTTP protocol is (by design) stateless
• Solution adopted for current demonstrator involved small
'scriptlet' interface components to communicate with server
without causing a browser to refresh the entire page.
• But side effect of introducing some (IE) platform dependence
FACET Web Demonstator
Some lessons learned
• Results from FACET show potential of faceted KOS for
– Query expansion (ranked results based on semantic closeness)
– Semantic expansion as a browsing tool when wishing to use KOS
behind the scenes
• Web demonstrator first step
– Based on custom API
– KOS and database on same server (but need not be)
– How to generalise these techniques?
 need for
• Common KOS representations and APIs
for general terminology (KOS) services
KOS integration into DL services
from Hill et al Research Agenda (SigCR Workshop 2002)
Taxonomy of KOS - KOS types linked to DL service protocols
Registries of KOS and KOS-level metadata to represent them
RDF/XML KOS representations - customisable
Core set of relationship types across all KOS
General KOS service protocol
from which protocols for specific types of KOS can be derived
Robust linking model in which DL entities (collections, objects, and
services) can refer to KOS entities (concepts, labels, and
relationships)
Visualization tools that fully use and display the rich semantics
embedded in KOS
Towards Terminology Services
• KOS-based services as elements of applications with some form
of search/indexing component
• Next phase of work looks at common KOS representation
formats and API protocols - making content available via
programmatic interfaces
• Eg SKOS Core (RDF/XML) Schema and SKOS API deliverables
of SWAD-Europe Thesaurus Activity - http://www.w3.org/2001/sw/Europe/reports/thes
• Experiments with XPATH-based KOS interfaces (using XML and
SKOS schemas) promising for relatively small KOS held within
the web browser
Pilot KOS Browser Client Web Service
• SKOS API designed to provide programmatic access to thesauri
and related KOS via the web
– Builds on Zthes, ADL Protocols
• DREFT demonstration web services server based on SKOS API
available(?) at ILRT http://www.w3.org/2001/sw/Europe/reports/thes/dreft/
• Only a subset of SKOS API calls were available at time of work
we investigated possibilities with just 2 API calls –
pilot SKOS API browsing client
demonstrates browsing of online thesaurus (GEMET - GEneral
Multilingual Environmental Thesaurus) via web service calls.
• Also GEMET thesaurus own work on web service API
Pilot SKOS API Web Service Browser
getConcept
getAllConceptRelatives
show semantically connected
concepts but not relationships
Navigation history and
local cache of retrieved concepts
implemented
API needs more work
but is a basis for web services
Semantic Expansion Service
• API should reflect use patterns and include composite calls in
addition to returning atomic KOS data elements
• Ongoing work - semantic expansion as a service
– as an API protocol element
would yield
• different configurations KOS interface displays by single call
• novel interfaces, such as navigation via semantic expansion
• Query expansion for various ranked result query services
• Term suggestion to assist indexing/annotation
• More details:
KOS at your Service: Programmatic Access to Knowledge Organisation
Systems http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Binding/
Future work - KOS and Semantic Web?
• Important to provide a bridge/migration between KOS and
Ontologies. KOS can be an element of higher level ontologies
and schemas and can help leverage them.
Eg utilising SKOS RDF/XML Schemas
Eg DELOS JPA semantic interoperability project
mapping a thesaurus to CRM Upper Ontology
• Ontologies as formal precise definition of relationships
can be combined with inference rules and automated systems
many useful applications (eg e-Science)
where well defined objects and operations
but also
• Take advantage of existing KOS in Semantic Web
Some confusion as to how KOS intended to be used
Need for education as to KOS design context/purpose
The ‘ontological ideology’ (Adorno)
• Assumption that allocation of instances to categories is
unproblematic (in everyday life)
– tendency to make invisible the ‘interpretive work’ in assigning
objects to concepts, the bending of categories and evolution of the
meaning of concepts through use
• DL application of concepts to ‘documents’ in indexing/search is
also not unproblematic
– Related via “aboutness” not clear-cut instance relationship
– Indexer - Searcher (and Indexer) variation in concept selection
– Use of results based on probable relevance judgements
KOS (intellectual) usually
• Designed in order to assist generalised retrieval
• Basis of construction is perceived assistance in indexing/
searching/browsing as much as logical properties of attributes
• Recognition that the semantic structure is to some extent
‘conventional’ with different possible cognitive viewpoints
but that users can be assisted to explore a given structure
and make use of it for own purposes
How to apply KOS?
• Domain dependent level of precision in concept use
Important to take into account how applications will process concepts
• Current KOS relationships at a useful level of generality
for many applications (with some specialisation?)
where results are based on probable relevance judgements
Eg Thesaurus pragmatic tool
includes semantics, domain lexicon (UF/ALTs, Scope Notes)
• Cost/benefit issues for KOS applications in granularity of
relationships and degree of formalisation
• Role for knowledge-based interactive tools in semantic web
– old debates on Expert Systems Vs Systems for Experts
NKOS Workshop at ECDL 2005
on related theme to this workshop
•
NKOS Workshop –
Mapping Knowledge Organisation Systems:
User-centred Strategies
EDCL2005, September 22nd, Vienna
see http://www2.db.dk/nkos2005/
• Selected papers from the NKOS workshop
will be considered for forthcoming special issue
of journal New Review of Hypermedia and Multimedia
along with an open call for papers.
References
Binding C., Tudhope D. 2004. KOS at your Service: Programmatic Access to Knowledge
Organisation Systems. JoDI 4(4), http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Binding/
FACET Case Study, DigiCult Thematic Issue 6: Resource Discovery Technologies for the
Heritage Sector,http://www.digicult.info/pages/Themiss.php [pdf]
FACET website. http://www.comp.glam.ac.uk/~FACET/
FACET Web demonstrator http://www.comp.glam.ac.uk/~FACET/webdemo/
FACET Xpath work http://www.comp.glam.ac.uk/~FACET/formats/
Hill et al. 2002. Integration of Knowledge Organization Systems into Digital Library
Architectures. ASIST SigCR - http://www.lub.lu.se/SEMKOS/docs/Hill_KOSpaper7-2final.doc
Tudhope D., Binding C., Blocks D., Cunliffe D. 2002. Compound Descriptors in Context: A
Matching Function for Classifications and Thesauri. JCDL 2002, 84-93. full paper (pdf)
Contact Information
Doug Tudhope
School of Computing
University of Glamorgan
Pontypridd CF37 1DL
Wales, UK
[email protected]
http://www.comp.glam.ac.uk/pages/staff/dstudhope