Transcript hider2008

Towards a semantic web
Philip Hider
This talk
 The Semantic Web vision
 Scenarios
 Standards
 Semantic Web & RDA
Web 1.0, 2.0, 3.0
 Internet to WWW (Web 1.0)
 Web 1.0 allows people to navigate the Internet easily,
through hyperlinks
 Web 2.0 allows people to collaborate more on the Web
 Web 3.0 allows computers to find and use the data
contained in Web documents
 Web 3.0 = the Semantic Web vision
The Semantic Web vision
 It will allow computers to make sense of the content of
Web documents, so that they can find and use this data
independently
 Basis of SW already developed, with standards such as
XML and RDF
 Like Web 1.0, it represents a bottom-up, distributed
approach
How would it work?
 Computers would be able to identify and ‘understand’
particular data in a Web document according to the
metadata associated with that data
 metadata could be inside our outside the document
 Computers (agents) would then be able to relate that data to
other data in other documents (or the same document)
according to specified schemas, ontologies and rules
 They could then independently integrate data and process
information according to tasks set by their human users
A Semantic Web scenario
 User asks ‘Trip Agent’ to purchase the ‘best’ deal for a
trip to New Zealand with date range x, family members
y, time of day z, etc. etc.
 ‘Trip agent’ searches the Web for flights and
accommodation, and is able to look up databases and
specify conditions according to what it ‘knows’ about
user’s preferences
Semantic Web scenario
 Agent is able to ‘understand’ the deals available on
different websites by integrating data from different
sources, e.g. looking up geographic information systems
(how far from the sea, shops, etc.), weather forecasts,
family members’ calendars, etc. an ultimately
suggesting the optimal combination of flight, hotel,
tours, etc.
Another scenario
User asks if the latest Stephen King
book is available in a nearby library,
can’t remember what it’s called
‘Library Agent’ searches the Web for nearby libraries
with books by ‘Stephen King’, finds a few different
Stephen Kings, confirms with user which Stephen
King, then identifies the latest novel via the official
Stephen King website, but chooses the secondnearest library (by car) which holds it because of
availability/format/library opening hours, etc.
What do SW agents need?
 Information about the data, i.e. metadata,
in a machine-readable format
 Including a shared understanding of the structure of that
metadata and its relationship to other knowledge
structures (ontologies)
 Some clever programming
Standards for the Semantic Web
Resource Description Framework
Universal Resource Identifiers
XML
Unicode
Schemas (such as XML schemas)
Ontologies written in e.g. OWL
Rules written in RIF, etc.
SPARQL
Resource Description Framework
 W3C standard
 A model used to structure resource descriptions
 Can be used to structure data about any kind of resource
 could be a book, or a car, or a flight ticket, or an experiment, etc.
 Based on ‘triples’, i.e.
Resource – Property – Value
(Subject – Predicate – Object)
Universal Resource Identifiers
For example, URLs
And ISBNs
People don’t have them yet
OCLC working on ‘work identifiers’
Properties and some values are referenced as part of
particular schemas, ontologies, etc.
eXtensible Markup Language (XML)
 Another W3C standard
 More flexible than HTML, XHTML
 Can be used to encode any data
 Data can be in the same Web document or another document
 Can be used to express RDF, i.e. RDF/XML
 RDF/XML basis for metadata structures such as schemas and
ontologies
Schemas
 Standardised structures of resource description that
define property elements in a taxonomic way
 Mostly based on a particular domain,
e.g. pertaining to bibliographic data, or geospatial data,
or flight booking data, or used car data, etc.
Schemas
 Two main groups of schemas –
XML schemas and RDFS (RDF schemas)
 Superseding Document Type Definitions (DTDs)
 Specific well-known schemas include
 Dublin Core
 ONIX
 RSS
Some metadata encoded in RDF/XML
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description
rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">
<dc:title>Tony Benn</dc:title>
<dc:publisher>Wikipedia</dc:publisher>
<foaf:primaryTopic> <foaf:Person> <foaf:name>Tony
Benn</foaf:name> </foaf:Person> </foaf:primaryTopic>
</rdf:Description> </rdf:RDF>
Some metadata encoded in RDF/XML
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description
rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">
<dc:title>Tony Benn</dc:title>
<dc:publisher>Wikipedia</dc:publisher>
<foaf:primaryTopic> <foaf:Person> <foaf:name>Tony
Benn</foaf:name> </foaf:Person> </foaf:primaryTopic>
</rdf:Description> </rdf:RDF>
Ontologies
More sophisticated than schemas, formalising
more complex relationships between elements
Also usually domain-specific
Use extra languages, such as OWL, on top of
RDF/XML etc.
Ontologies give more scope for agents to be ‘clever’
Dublin Core can be expressed as an ontology or a
schema
What about MARC?
 MARC files are rather flat and do not readily define
relationships between elements
 But can be expressed as an XML schema,
i.e. MARCXML
 MODS is a lite version of MARCXML
 Mappings between MARCXML and other schemas
(e.g. DC)
Mappings
 Lots of them!
 Between different schemas, ontologies, languages, etc.
 AKA crosswalks
 By UKOLN, LC, OCLC, etc. etc.
 The more standards and adaptations, the more
crosswalks
Value sets
 Resource – Property – Value
 Schemas and ontologies may point to particular value sets,
e.g.
Book A hasaSubjectcalled DCterms:LCSH Apples
where Apples is a value in the set of values known as LCSH
 In other words, they may point to controlled vocabularies
SKOS
 Simple Knowledge Organization Systems
 SW standard for expressing controlled vocabularies
such as subject thesauri
 http://www.w3.org/2004/02/skos
 Might promote use of LCSH, etc.
Semantic Web & cataloguing
 More sophisticated use of library catalogues if they can
be understood by Semantic Web agents
 Library resources more likely to be used in conjunction
with non-library web resources
 SW about agents using cataloguing,
not replacing cataloguing
Semantic Web & RDA
 RDA is therefore aligning itself with DC and RDF
 RDA elements mapped to DC, ONIX, etc.
 DCMI/RDA Task Group
 RDA-DC application profile
 http://dublincore.org/dcmirdataskgroup
Prospects for SW
 Examples of Semantic Web developments:
http://www.w3.org/2001/sw/sweo/public/UseCases
 A lot of standards now in place, technology not so
much of an issue
 With RDA, bibliographic domain ripe for SW take-up
Pre-SW library work
Post-SW library work
Thank you.
[email protected]