E-Culture: Challenging Use Cases for the Semantic Web

Download Report

Transcript E-Culture: Challenging Use Cases for the Semantic Web

Tearing down walls
and
Building bridges
Principles and
pragmatics of a
Semantic Culture Web
Overview
• Virtual collections and Semantic Web
• Semantic collection-search demonstrator
– For cultural heritage objects
• Metadata & vocabulary representation and
enrichment
• Principles for knowledge engineering on the
Web
Acknowledgements
• Part of large Dutch
knowledge-economy
project MultimediaN
• Partners: VU, CWI, UvA,
DEN,ICN
• People:
Alia Amin, Lora Aroyo, Mark van
Assem, Victor de Boer, Lynda
Hardman, Michiel Hildebrand, Laura
Hollink, Marco de Niet, Borys
Omelayenko, Marie-France van
Orsouw, Jacco van Ossenbruggen,
Guus Schreiber Jos Taekema,
Annemiek Teesing, Anna Tordai, Jan
Wielemaker, Bob Wielinga
•
Artchive.com, Rijksmuseum
Amsterdam, Dutch ethnology
musea (Amsterdam, Leiden),
National Library (Bibliopolis)
Hypothesis
• Semantic Web technology is in particular
useful in knowledge-rich domains
or formulated differently
• If we cannot show added value in
knowledge-rich domains, then it may have no
value at all
The Web:
resources and links
Web link
URL
URL
The Semantic Web:
typed resources and links
Painting
“Woman with hat
SFMOMA
Dublin Core
ULAN
creator
Henri Matisse
Web link
URL
URL
Principle 1: semantic annotation
• Description of
web objects with
“concepts” from
a shared
vocabulary
Principle 2: semantic search
• Search for objects
which are linked via
concepts (semantic
link)
• Use the type of
semantic link to
provide meaningful
presentation of the
search results
Query
“Paris”
Paris
PartOf
Montmartre
The myth of a unified vocabulary
• In large virtual collections there are always multiple
vocabularies
– In multiple languages
• Every vocabulary has its own perspective
– You can’t just merge them
• But you can use vocabularies jointly by defining a
limited set of links
– “Vocabulary alignment”
• It is surprising what you can do with just a few links
Principle 3: vocabulary alignment
“Tokugawa”
AAT style/period
Edo (Japanese period)
Tokugawa
AAT is Getty’s
Art & Architecture Thesaurus
SVCN period
Edo
SVCN is local in-house
ethnology thesaurus
A link between two thesauri
Levels of interoperability
• Syntactic interoperability
– using data formats that you can share
– XML family is the preferred option
• Semantic interoperability
– How to share meaning / concepts
– Technology for finding and representing semantic
links
Distributed vs. centralized collection
data
• Minimal requirement: collection object has
image URI
• Preference for external metadata, accessed
through protocol such as OAI
• In practice, external metadata access is still
cumbersome
http://e-culture.multimedian.nl/demo/search
Search strategies
• Basic search: keyword-oriented
• Advanced search:
– Tweaking default search parameters
– Time-related queries
• Faceted search
• Relation search
– How are two URIs related?
Keyword search with semantic
clustering
1. Btree of literals plus Porter stem and
metaphone index
2. Find resources with matching labels
•
Default resources are “Work”s
3. Find related resources by one-way graph
traversal
•
•
owl:inverseOf is used
Threshold used for constraining search
4. Cluster results (group instances)
Search: WordNet patterns that increase
recall without sacrificing precisions
Term disambiguation is key issue in
semantic search
• Post-query
– Sort search results based on different meanings
of the search term
– Mimics Google-type search
• Pre-query
– Ask user to disambiguate by displaying list of
possible meanings
– Interface is more complex, but more search
functionality can be offered
Faceted search
• Use Dublin Core scheme to formulate
complex queries
• Navigate through relevant metadata
Faceted search
Faceted
search
What do you need to do to make
your collection part of a Semantic
Culture Web?
Four activities
From metadata to
semantic metadata
1. Make vocabulary
interoperable
4. Align
vocabulary
2. Align metadata
schema
3. Enrich
metadata
Activity 1: syntactic vocabulary
interoperability
• Making vocabularies available in the Web
standard RDF
• Many organizations already do this
• W3C provides the SKOS template to make
this almost straightforward
• Effort required: at most a few days
Multi-lingual labels for concepts
33
Semantic relation:
broader and narrower
• No subclass semantics assumed!
34
Activity 2: aligning the metadata
schema
• Specify your collection metadata scheme as
a specialization of Dublin Core
• With RDF/OWL this is easy/trivial!
• Cf. DC Application Profiles
Aligning VRA with Dublin Core
• VRA is specialization of Dublin Core for
visual resources
• VRA properties “material.medium” and
“material.support” are specializations of
Dublin Core property “format”
vra:material.medium rdfs:subPropertyOf
dc:fotmat .
vra:material.medium rdfs:subPropertyOf
dc:format .
Activity 3: enriching the metadata
• Extracting additional concepts from an
annotation
– Matching the string “Paris” to a vocabulary term
• Information-extraction techniques exists (and
continue to be developed)
• Effort required can be up to a few weeks
– The more concepts, the better, but no need to be
perfect!
Example textual annotation
Resulting semantic annotation
(rendered as HTML with RDFa)
RDFa: embedding RDF in (X)HTML
Regular HTML
HTML with RDFa
Resulting RDF statements
41
Activity 4: aligning the vocabulary
• Find semantic links between vocabulary links
– Derain (ULAN) related-to Fauve (AAT))
• Automatic techniques exists, but performance varies
• Often combination of automatic and manual
alignment
• Effort strongly dependent on vocabularies
– But “a little semantic goes a long way” (Hendler)
Learning alignments
• Learning relations between art styles in AAT
and artists in ULAN through NLP of art
historic texts
– “Who are Impressionist painters?”
Extracting additional knowledge
from scope notes
Principles for
knowledge engineering
on the Web
Principle 1: Be modest!
• Ontology engineers should refrain from
developing their own idiosyncratic ontologies
• Instead, they should make the available rich
vocabularies, thesauri and databases
available in web format
• Initially, only add the originally intended
semantics
Principle 2: Think large!
Doug Lenat
"Once you have a truly massive amount of
information integrated as knowledge, then the
human-software system will be superhuman, in
the same sense that mankind with writing is
superhuman compared to mankind before
writing."
Principle 3: Develop and use
patterns!
• Don’t try to be (too) creative
• Ontology engineering should not be an art
but a discipline
• Patterns play a key role in methodology for
ontology engineering
• See for example patterns developed by the
W3C Semantic Web Best Practices group
http://www.w3.org/2001/sw/BestPractices/
• SKOS can also be considered a pattern
Principle 4: Don’t recreate, but enrich
and align
• Techniques:
– Learning ontology relations/mappings
– Semantic analysis, e.g. OntoClean
– Processing of scope notes in thesauri
Principle 5: Beware of ontological
over-commitment!
Principle 6: Specifying a data model
in OWL does ot make it an ontology!
• Papers about your own idiosyncratic
“university ontology” should be rejected at
SW conferences
• The qality of an ontology does not depend on
the number of OWL constrcts sed
Principle 7: Required level of formal
semantics depends on the domain!
• In our semantic search we use three OWL
constructs:
– owl:sameAs, owl:TransitiveProperty,
owl:SymmetricProperty
• But cultural heritage has is very different from
medicine and bioinformatics
– Don’t over-generalize on requirements for e.g.
OWL
Perspectives
• Basic Semantic Web technology is ready for
deployment
• Research themes:
– Scalability, vocabulary alignment, metadata
extraction
• Web 2.0 facilities fit well:
– Involving community experts in annotation
– Personalization
• Social barriers have to be overcome!