Semantic annotation and search of large virtual heritage collections

Download Report

Transcript Semantic annotation and search of large virtual heritage collections

Semantic annotation and search of
large virtual heritage collections
Guus Schreiber
Free University Amsterdam
Overview
• A non-technical view on the Semantic Web
• Work on Semantic-Web deployment
– SKOS, RDFa
• Semantic annotation and search in virtual
collections: the E-Culture example
The Web:
resources and links
URL
Web link
URL
The Semantic Web:
typed resources and links
Painting
“Femme aux chapeau”
SFMOMA
URL
Dublin Core
ULAN
creator
Henri Matisse
Web link
URL
Principle 1: semantic annotation
• Description of
web objects with
“concepts” from
a shared
vocabulary
Principle 2: semantic search
• Search for objects
which are linked via
concepts (semantic
link)
• Use the type of
semantic link to
provide meaningful
presentation of the
search results
ape
great ape
urang-utang
orange
Principle 3: multiple vocabularies. or: the
myth of a unified vocabulary
• In large virtual collections there are always multiple
vocabularies
– In multiple languages
• Every vocabulary has its own perspective
– You can’t just merge them
• But you can use vocabularies jointly by defining a
limited set of links
– “Vocabulary alignment”
• It is surprising what you can do with just a few links
Example
“Tokugawa”
AAT style/period
Edo (Japanese period)
Tokugawa
SVCN period
Edo
SVCN is local in-house thesaurus
A link between two thesauri
RDF/OWL language constructs
•
•
•
•
•
classes and individuals
subclasses
properties
subproperties
domain/range of
properties
• XML Schema datatypes
• equality, inequality
• inverse, transitive,
symmetric, functional
properties
• property constraints:
cardinality,
allValuesFrom,
someValuesFrom
• conjunction, disjunction,
negation of classes
• hasValue, enumerated
type
How useful are RDF and OWL?
• RDF: basic level of interoperability
• Some constructs of OWL are key:
– Logical characteristics of properties: symmetric,
transitive, inverse
– Identity: sameAs
• OWL pitfalls
– Bad: if it is written in OWL it is an ontology
– Worse: if it is not in OWL, then it is not an
ontology
W3C Semantic Web Deployment Working
Group
making vocabularies/thesauri/ontologies available
on the Web
• Schema for interoperable RDF/OWL
representation of vocabularies
– SKOS
• Publication guidelines:
– URI management, representation of versions
• Embedding RDF in (X)HTML pages
– RDFa
SKOS:
pattern for thesaurus modeling
• Based on ISO standard
• RDF representation
• Documentation:
http://www.w3.org/TR/swbp-skos-coreguide/
• Base class: SKOS Concept
Multi-lingual labels for concepts
Semantic relation:
broader and narrower
• No subclass semantics assumed!
Indexing a resource with a SKOS
concept
• primarySubject is
defined as
subproperty
Adding semantics
• Adding OWL statements
• Interpretations of thesaurus relations such as
narrower as subclass-of are often imprecise
(but can still be useful)
• Learning relations between thesauri is
important form of additional semantics
– Example: AAT contains styles; ULAN contains
artists, but there is no link
– Availability of this kind of alignment knowledge is
extremely useful
W3C standardization process
•
•
•
•
•
•
•
•
Input: draft specification
Collect use cases
Derive requirements
Create issues list: requirements that cannot be
handled by the draft spec
Propose resolutions for issues
Continuously: ask for public feedback/comments
Get consensus on amended spec
Find two independent implementation for each
feature in the spec
Example issue:
relationships between lexical labels
• In draft SKOS spec lexical labels of concepts are
represented as datatype properties
• Use cases require relations between labels, e.g.
“AAT” is an acronym of “Art & Architecture
Thesaurus”
• This is a problem because literals have no URI (so
cannot be subject of an RDF property)
• Possible resolutions:
– Labels/terms as classes
– Relaxing constraints on label property
– …..
Recipes for vocabulary URIs
• Simplified rule:
– Use “hash" variant” for vocabularies that are
relatively small and require frequent access
http://www.w3.org/2004/02/skos/core#Concept
– Use “slash” variant for large vocabularies, where
you do not want always the whole vocabulary to
be retrieved
http://xmlns.com/foaf/0.1/Person
• For more information and other recipes, see:
http://www.w3.org/TR/swbp-vocab-pub/
Query for WordNet URI returns
“concept-bounded description”
RDFa: embedding RDF metadata in
an (X)HTML file
Regular HTML
HTML with RDFa
Resulting RDF statements
More information
E-Culture
demonstrator
• Part of large Dutch
knowledge-economy project
MultimediaN
• Partners: VU, CWI, UvA,
DEN,ICN
• People:
•
– Alia Amin, Lora Aroyo, Mark
van Assem, Victor de Boer,
Lynda Hardman, Michiel
Hildebrand, Laura Hollink,
Marco de Niet, Borys
Omelayenko, Marie-France van
Orsouw, Jos Taekema, Annemiek
Teesing, Anna Tordai, Jan
Wielemaker, Bob Wielinga
Artchive.com, ICN: Rijksmuseum
Amsterdam, Dutch ethnology
musea (Amsterdam, Leiden),
National Library (Bibliopolis)
Use case: painting style
Find paintings of
a similar style
KLIMT, Gustav
Portrait of Adele BlochBauer I
1907
Oil and gold on canvas
138 x 138 cm
Austrian Gallery, Vienna
How can we find this other
‘Art nouveau’ painting?
MUNCH, Edvard
The Scream
1893
Oil, tempera and pastel on
cardboard
91 x 73.5 cm
National Gallery, Oslo
Issues w.r.t. the use case
• Parse annotation to find matches with thesauri terms
– E.g. match artists to ULAN individuals
• Artists-style links
– AAT contains styles; ULAN contains artists, but there is no
link
• Learn link from corpora
• Derive it from other annotations
– Domain-specific rules/reasoning needed
• see example in SWRL doc
• Painters may have painted in multiple styles
Example enrichment
• Learning relations between art styles in AAT
and artists in ULAN through NLP of
art0historic texts
• But don’t learn things that already exist!
Culture Web demonstrator
http://e-culture.multimedian.nl
16 Nov 2006
Perspectives
• Basic Semantic Web technology is ready for
deployment
– in open knowledge-rich domains
– Important research issues: scalability, vocabulary alignment,
metadata extraction
• Web 2.0 features:
– Involving community experts in annotation
– Personalization, myArt
• Social barriers have to be overcome!
– “open door” policy
– Involvement of general public => issues of “quality”
• Importance of using open standards
– Away from custom-made flashy web sites