MS PowerPoint format

Download Report

Transcript MS PowerPoint format

Introduction to the
Semantic Web and
Linked Open Data
Dramatis Personae
Christopher Gutteridge
Nick Gibbins
(in spirit)
Goals
• Overview of issues relating to the publication and use of
linked data in HEIs
• The lessons that we’ve learned!
• Pragmatism rather than perfection
• General guidelines rather than detailed specifications
• Coining cool URIs
• Publication alongside existing resources
• Licensing
http://is.gd/dqiJc
(The only URL you need to write down)
Non-Goals
• Detailed tutorial on the finer points of:
•
•
•
•
•
•
RDF
RDFa
RDF Schema
OWL
SPARQL
…
(an hour and a half isn’t enough for this – and there are good tutorials
available online)
“If HP knew what HP knows, we’d be three times more
profitable”
Lew Platt
Hewlett-Packard Chairman and CEO
Linked Data in a Nutshell
http://www.flickr.com/photos/arielarielariel/322301228/
• Linked Data is about providing structured data on the Web
• Doesn’t necessarily require RDF (though it usually uses it)
The triple
• Underlying model of triples used to describe the relations
between entities in linked data
• This is the basis of the RDF data model
• (subject, predicate, object)
• e.g. “The Hobbit”, “created by”, “JRR Tolkien”
The Hobbit
subject
created by
predicate
JRR Tolkien
object
Example
• Take a citation:
• Tim Berners-Lee, James Hendler and Ora Lassila. The Semantic
Web. Scientific American, May 2001
• We can identify a number of distinct statements in this
citation:
•
•
•
•
There is an article titled “The Semantic Web”
One of its authors is a person named “Tim Berners-Lee” (etc)
It appeared in a publication titled “Scientific American”
It was published in May 2001
Example
• We can represent these statements graphically:
2001-05
The Semantic Web
title
publishedIn
title
Scientific American
date
name
Tim Berners-Lee
creator
name
James Hendler
creator
name
Ora Lassila
creator
Example
• There are two types of node in this graph:
• Literals, which have a value but no identity
(a string, a number, a date)
Scientific American
• Resources, which represent objects with identity
(a web page, a person, a journal)
Example
• Resources are identified by URIs
• Property labels are also identified by URIs, and are drawn
from a vocabulary or ontology
http://purl.org/dc/elements/1.1/title
Scientific American
http://www.sciam.com/
subject
predicate
object
Mixing Vocabularies
• The triple-based graph model makes it possible to mix
terms from different vocabularies in the same graph
• Simplifies the task of information integration
2001-05
The Semantic Web
title
publishedIn
title
Scientific American
date
name
Tim Berners-Lee
creator
name
James Hendler
creator
name
Ora Lassila
creator
foaf
dc
bibo
Linked Data Principles
Set of publishing practices for SW data:
1. Use URIs as names for things
2.Use HTTP URIs so that people can look up those names
3.When someone looks up a URI, provide useful information
4.Include links to other URIs. so that they can discover more
things
Effectively, putting the hypertext back into the Semantic Web
Simplifies integration between datasets while maintaining
loose coupling
Example
graph describing ‘sw’
2001-05
title
publishedIn
tbl
date
The Semantic Web
creator
sw
sci
am
graph describing ‘tbl’
name
Tim Berners-Lee
tbl
creator jh
graph describing ‘jh’
jh
creator ora
name
James Hendler
graph describing ‘ora’
graph describing ‘sciam’
sci
am
title
Scientific American
ora
name
Ora Lassila
Person  Document
• URIs must only identify one concept. Ever.
• I am not my homepage.
Publishing Example
• URI represents a person.
• Requesting URI via web gets a “See Other” response.
• Requester redirected to most appropriate document URL.
usually HTML or RDF+XML
Publishing RDF
<<>><<><>><>>><>><>><>><>><><>>>><<><><<<<<><
><><><><><><><><><><><><<<<>>><><<><><>><>
• DON’T worry about understanding the XML. It’s the
equivalent of “view-source” in a webpage!
• Use a tool to covert it to something less icky!
(http:/graphite.ecs.soton.ac.uk/browser/ for example)
Access Control
• Worry about it later!
• Start with data you can make freely available
Licensing
• You want your data to be used & reused, right?
• Don’t prevent commercial use.
• Don’t prevent derivative works (prevents people using it at
all!)
• If there are any things which your data should not be used
for why are you publishing it?
Licensing Options
• Must-Attribute license
• Public Domain license
(your info still can’t be used in illegal ways, of course)
• Procrastinate and worry about it later
(much better than not publishing your data)
Breakout
Task
• What datasets does your organisation already maintain?
• What is the business case for making them available?
• in a machine readable form
• to all members
• without bureaucracy or restriction.
• What are the barriers to putting them online and
maintaining them?
• What are the benefits to the wider community?
• What are the risks?
Task
• List your 3 easiest wins - the lowest hanging fruit.
• Starting suggestion: Every building & campus in your
organisation with:
• Number
• Building Name
• Site (Campus)
• Lat & Long
This data changes very slowly and also made freely available already.
ECS Demo
• http://id.ecs.soton.ac.uk/docs/
• http://rdf.ecs.soton.ac.uk/person/1248
• http://rdf.ecs.soton.ac.uk/project/42
Cool URIs
Beauty
• http://domain/classOfThing/scheme/identifier
• http://domain/classOfThing/scheme/identifier.rdf
• http://domain/classOfThing/scheme/identifier.html
• http://mysite.org/person/username/t23
• http://mysite.org/person/username/t23.rdf
• http://mysite.org/person/username/t23.html
Scheme is optional but futureproofs you against next time
the university reorganises everything.
And The Beast
http://www.diy.com/diy/jsp/bq/nav.jsp?action=detail&fh_
oneslice=true&fh_view_size=10&fh_reffacet=styleStyle&fh
_location=%2f%2fcatalog01%2fen_GB%2fcategories%3C{
9372014}%2fcategories%3C{9372039}%2fcategories%3C{
9372150}%2fspecificationsProductType%3done_hole_taps
%2fstyleStyle%3E{adelaide}&fh_refview=summary&fh_ref
path=facet_159017215&fh_secondid=10507747&fh_eds=%
C3%9F&ts=1279018688652
Further Reading
http://www.flickr.com/photos/markhillary/337685031/
W3C Specifications
•
•
•
•
http://www.w3.org/standards/semanticweb/
http://www.w3.org/standards/techs/rdf
http://www.w3.org/standards/techs/owl
http://www.w3.org/TR/swbp-vocab-pub/
Tools
• Graphite Browser
• http://graphite.ecs.soton.ac.uk/browser/
• Tabulator
• http://www.w3.org/2005/ajar/tab
Linked Data Help
• Linked Data Website
• http://linkeddata.org/
• The Patterns Book
• http://patterns.dataincubator.org/book/
• Semantic Overflow
• http://www.semanticoverflow.com/
Common Namespaces
• SKOS (Simple Knowledge Organisation Scheme)
• Taxonomies and thesauri
• SIOC (Semantically Interlinked Online Communities)
• Web forums, mailing lists, etc
• FOAF (Friend of a Friend)
• People, social networks
• DC (Dublin Core)
• Basic bibliographic information
• BIBO (Bibliographic Ontology)
• Advanced bibliographic information
• GEO
• Simple geolocation (lat/long) ontology
Cool URIs
• Cool URIs don't change (by TimBL)
• http://www.w3.org/Provider/Style/URI
• Cool URIs for the Semantic Web
• http://www.w3.org/TR/cooluris/
• ECS URI scheme documentation
• http://id.ecs.soton.ac.uk/docs/
Infrastructure Namespaces
• RDF & RDFS
• These describe classes & predicates which are used to tie everything
together. rdf:type is used to give a URI a class
<http://id.ecs.soton.ac.uk/person/1248> rdf:type foaf:Person .
• OWL
• Used to describe the meaning of predicates & classes in machinereadable form.
• Start with a human readable documents, OWL is not widely
consumed (yet?)
• XSD
• Describes datatypes like String, Positve Integer etc.
Take Home Messages
http://www.flickr.com/photos/71894657@N00/2696793132/
Good URI Selection
• ‘Cool URIs don’t change’ – once you’ve chosen a URI
convention for your organisation, it’s a pain to change it
• Getting this right is key to having your linked data used
more widely
We think that we got this one mostly right…
…but we still had too many anonymous nodes around
Start with the easy stuff
• Go for an incremental approach
• …but keep an eye on possible avenues for future expansion
• RDFa is not for beginners!
• Don’t do as we did: we tried to build linked data for all of
our internal data in one go
Don’t reinvent the wheel
• Regardless of your application domain, there is probably
already an ontology that does some of what you want
• …but don’t be afraid to invent relationships and classes if
you can’t find any suitable
• Don’t do as we did! we wrote a new ontology from scratch,
rather than reusing FOAF+DC)
Eat your own dogfood
• Build linked data for your own consumption first
• You know what your use cases are – better to support these
than to second guess those of unknown future users
• Don’t do as we did: we overcomplicated our data by trying
to support all of the plausible scenarios that we could think
of, rather than concentrating on what mattered to us
(be glad I couldn't find any clip art for this slide)
Don’t underestimate CSV
• You should aim to publish as RDF
• Publishing as CSV may get your data out there faster as an
interim measure
We used CSV as a ‘glue’ data format between different
systems, but chose not to expose data until we could do so
as RDF.
Thanks
• [email protected]
• @cgutteridge
• http://blogs.ecs.soton.ac.uk/webteam/
http://is.gd/dqiJc