Transcript hider2008
Towards a semantic web
Philip Hider
This talk
The Semantic Web vision
Scenarios
Standards
Semantic Web & RDA
Web 1.0, 2.0, 3.0
Internet to WWW (Web 1.0)
Web 1.0 allows people to navigate the Internet easily,
through hyperlinks
Web 2.0 allows people to collaborate more on the Web
Web 3.0 allows computers to find and use the data
contained in Web documents
Web 3.0 = the Semantic Web vision
The Semantic Web vision
It will allow computers to make sense of the content of
Web documents, so that they can find and use this data
independently
Basis of SW already developed, with standards such as
XML and RDF
Like Web 1.0, it represents a bottom-up, distributed
approach
How would it work?
Computers would be able to identify and ‘understand’
particular data in a Web document according to the
metadata associated with that data
metadata could be inside our outside the document
Computers (agents) would then be able to relate that data to
other data in other documents (or the same document)
according to specified schemas, ontologies and rules
They could then independently integrate data and process
information according to tasks set by their human users
A Semantic Web scenario
User asks ‘Trip Agent’ to purchase the ‘best’ deal for a
trip to New Zealand with date range x, family members
y, time of day z, etc. etc.
‘Trip agent’ searches the Web for flights and
accommodation, and is able to look up databases and
specify conditions according to what it ‘knows’ about
user’s preferences
Semantic Web scenario
Agent is able to ‘understand’ the deals available on
different websites by integrating data from different
sources, e.g. looking up geographic information systems
(how far from the sea, shops, etc.), weather forecasts,
family members’ calendars, etc. an ultimately
suggesting the optimal combination of flight, hotel,
tours, etc.
Another scenario
User asks if the latest Stephen King
book is available in a nearby library,
can’t remember what it’s called
‘Library Agent’ searches the Web for nearby libraries
with books by ‘Stephen King’, finds a few different
Stephen Kings, confirms with user which Stephen
King, then identifies the latest novel via the official
Stephen King website, but chooses the secondnearest library (by car) which holds it because of
availability/format/library opening hours, etc.
What do SW agents need?
Information about the data, i.e. metadata,
in a machine-readable format
Including a shared understanding of the structure of that
metadata and its relationship to other knowledge
structures (ontologies)
Some clever programming
Standards for the Semantic Web
Resource Description Framework
Universal Resource Identifiers
XML
Unicode
Schemas (such as XML schemas)
Ontologies written in e.g. OWL
Rules written in RIF, etc.
SPARQL
Resource Description Framework
W3C standard
A model used to structure resource descriptions
Can be used to structure data about any kind of resource
could be a book, or a car, or a flight ticket, or an experiment, etc.
Based on ‘triples’, i.e.
Resource – Property – Value
(Subject – Predicate – Object)
Universal Resource Identifiers
For example, URLs
And ISBNs
People don’t have them yet
OCLC working on ‘work identifiers’
Properties and some values are referenced as part of
particular schemas, ontologies, etc.
eXtensible Markup Language (XML)
Another W3C standard
More flexible than HTML, XHTML
Can be used to encode any data
Data can be in the same Web document or another document
Can be used to express RDF, i.e. RDF/XML
RDF/XML basis for metadata structures such as schemas and
ontologies
Schemas
Standardised structures of resource description that
define property elements in a taxonomic way
Mostly based on a particular domain,
e.g. pertaining to bibliographic data, or geospatial data,
or flight booking data, or used car data, etc.
Schemas
Two main groups of schemas –
XML schemas and RDFS (RDF schemas)
Superseding Document Type Definitions (DTDs)
Specific well-known schemas include
Dublin Core
ONIX
RSS
Some metadata encoded in RDF/XML
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description
rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">
<dc:title>Tony Benn</dc:title>
<dc:publisher>Wikipedia</dc:publisher>
<foaf:primaryTopic> <foaf:Person> <foaf:name>Tony
Benn</foaf:name> </foaf:Person> </foaf:primaryTopic>
</rdf:Description> </rdf:RDF>
Some metadata encoded in RDF/XML
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description
rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">
<dc:title>Tony Benn</dc:title>
<dc:publisher>Wikipedia</dc:publisher>
<foaf:primaryTopic> <foaf:Person> <foaf:name>Tony
Benn</foaf:name> </foaf:Person> </foaf:primaryTopic>
</rdf:Description> </rdf:RDF>
Ontologies
More sophisticated than schemas, formalising
more complex relationships between elements
Also usually domain-specific
Use extra languages, such as OWL, on top of
RDF/XML etc.
Ontologies give more scope for agents to be ‘clever’
Dublin Core can be expressed as an ontology or a
schema
What about MARC?
MARC files are rather flat and do not readily define
relationships between elements
But can be expressed as an XML schema,
i.e. MARCXML
MODS is a lite version of MARCXML
Mappings between MARCXML and other schemas
(e.g. DC)
Mappings
Lots of them!
Between different schemas, ontologies, languages, etc.
AKA crosswalks
By UKOLN, LC, OCLC, etc. etc.
The more standards and adaptations, the more
crosswalks
Value sets
Resource – Property – Value
Schemas and ontologies may point to particular value sets,
e.g.
Book A hasaSubjectcalled DCterms:LCSH Apples
where Apples is a value in the set of values known as LCSH
In other words, they may point to controlled vocabularies
SKOS
Simple Knowledge Organization Systems
SW standard for expressing controlled vocabularies
such as subject thesauri
http://www.w3.org/2004/02/skos
Might promote use of LCSH, etc.
Semantic Web & cataloguing
More sophisticated use of library catalogues if they can
be understood by Semantic Web agents
Library resources more likely to be used in conjunction
with non-library web resources
SW about agents using cataloguing,
not replacing cataloguing
Semantic Web & RDA
RDA is therefore aligning itself with DC and RDF
RDA elements mapped to DC, ONIX, etc.
DCMI/RDA Task Group
RDA-DC application profile
http://dublincore.org/dcmirdataskgroup
Prospects for SW
Examples of Semantic Web developments:
http://www.w3.org/2001/sw/sweo/public/UseCases
A lot of standards now in place, technology not so
much of an issue
With RDA, bibliographic domain ripe for SW take-up
Pre-SW library work
Post-SW library work
Thank you.
[email protected]