Creating Linked Data

Download Report

Transcript Creating Linked Data

Creating Linked Data
Juan F. Sequeda
Semantic Technology Conference
June 2011
Linked Data is a set of best practices
to publish and interlink data
on the web
Linked Data Principles
1. Use URIs as names for
things
2. Use HTTP URIs so that
people can look up
(dereference) those
names.
3. When someone looks up
a URI, provide useful
information.
4. Include links to other
URIs so that they can
discover more things.
1) Use URIs as names for things
1) Use URIs as names for things
• Uniform Resource Identifiers identify real
world objects and abstract concepts
– Not only web documents and digital content
– People, places, locations, my car
– Know somebody, from somewhere
1) Use URIs as names for things
http://juansequeda.com/foaf.rdf#me
http://www.w3.org/People/Berners-Lee/card#i
http://xmlns.com/foaf/0.1/knows
1) Use URIs as names for things
• http://juansequeda.com/foaf.rdf#me
– Identifies the person
• http://juansequeda.com/foaf.rdf
– Identifies an RDF document
2) Use HTTP URIs so that people can
look up (dereference) those names.
2) Use HTTP URIs so that people can
look up (dereference) those names.
• HTTP protocol is the Web’s universal access
mechanism
• Linked Data only uses HTTP URIs
– URI: unique name
– HTTP URI: universal means of access to the URI
• HTTP URIs should be dereferencable
Dereference a URI?
What’s with the redirection?
RDFa
<html>
…
<div xmlns:dc=“http://purl.org/dc/elements/1.1/”>
<h2 property=“dc:title”>The trouble with Bob</h2>
<h3 property=“dc:creator”>Alice</h3>
….
</div>
…
</html>
Minting HTTP URIs
• If you own the domain name and run a web
server at that location, mint URIs in this
namespace
• I own the domain mycompany.com
• I run a webserver http://mycompany.com
• I now can mint URIs in this namespace:
– http://mycompany.com/person/Juan-Sequeda
Create Cool URIs
• If you don’t control a namespace, don’t
misuse it
– http://www.imdb.com/title
• Avoid implementation details
– http://foo.mycompany.com:8080/person.php?id=
123&format=rdf
• Use Natural Keys within URI
– http://mycompany.com/person/Juan-Sequeda
– http://mycompany.com/person/123
Three different URIs
• URI for the real world object (non-information resource)
–
–
–
–
http://dbpedia.org/resource/London
http://id.mycompany.com/person/Juan-Sequeda
http://mycompany.com/person/Juan-Sequeda
http://www.juansequeda.com/foaf.rdf#me
• URI for the HTML document (information resource) that describes
the real world object
– http://dbpedia.org/page/London
– http://pages.mycompany.com/person/Juan-Sequeda
– http://mycompany.com/person/Juan-Sequeda.html
• URI for the RDF document (information resource) that describes the
real world object
–
–
–
–
http://dbpedia.org/data/London
http://data.mycompany.com/Juan-Sequeda
http://mycompany.com/person/Juan-Sequeda.rdf
http://www.juansequeda.com/foaf.rdf
3) Provide useful information
3) Provide useful information
• How do we provide useful information in
document form on the web?  HTML
• How do we provide useful information in data
form on the web  RDF
• Different ways of serializing RDF
– RDF/XML
– RDFa
– N3
– turtle
RDF
subject – predicate – object
Coldplay is the artist of Viva la Vida
Coldplay is the artist of Viva la Vida
http://dbpedia.org/resource/Coldplay
http://dbpedia.org/ontology/artist
http://dbpedia.org/resource/Viva_la_Vida
prefix dbpedia-owl: <http://dbpedia.org/ontology/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix dbprop: <http://dbpedia.org/property/>
prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
51.507778
-0.128056
geo:lat
geo:long
http://dbpedia.org/resource/London
dbprop:origin
http://dbpedia.org/resource/Coldplay
dbpedia-owl:artist
foaf:name
http://dbpedia.org/resource/Viva_la_Vida
“Coldplay”
ntriples
<http://dbpedia.org/resource/Coldplay> <http://dbpedia.org/ontology/artist> <http://dbpedia.org/resource/Viva_la_Vida> .
<http://dbpedia.org/resource/Coldplay> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Band> .
RDF/XML
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<dbpedia-owl:Band xmlns:dbpedia-owl="http://dbpedia.org/ontology/"
rdf:about="http://dbpedia.org/resource/Coldplay">
<dbpedia-owl:artist rdf:resource="http://dbpedia.org/resource/Viva_la_Vida"/>
</dbpedia-owl:Band>
</rdf:RDF>
turtle
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://dbpedia.org/resource/Coldplay>
a <http://dbpedia.org/ontology/Band> ;
<http://dbpedia.org/ontology/artist> <http://dbpedia.org/resource/Viva_la_Vida> .
HTML
<div>
My name is Bob Smith, but people call me Smithy. Here is my home page:
<a href="http://www.example.com">www.example.com</a>.
I live in Albuquerque, NM and work as an engineer at ACME Corp.
My friends:
<a href="http://darryl-blog.example.com">Darryl</a>,
<a href="http://edna-blog.example.com">Edna</a>
</div>
RDFa (RDF in HTML)
<div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Person">
My name is <span property="v:name">Bob Smith</span>,
but people call me <span property="v:nickname">Smithy</span>.
Here is my homepage:
<a href="http://www.example.com" rel="v:url">www.example.com</a>.
I live in
<span rel="v:address">
<span typeof="v:Address">
<span property="v:locality">Albuquerque</span>,
<span property="v:region">NM</span>
</span>
</span>
and work as an <span property="v:title">engineer</span>
at <span property="v:affiliation">ACME Corp</span>.
My friends:
<a href="http://darryl-blog.example.com" rel="v:friend">Darryl</a>,
<a href="http://edna-blog.example.com" rel="v:friend">Edna</a>
</div>
What to publish?
• Literal Triples
<http://www.bbc.co.uk/music/artists/cc197bad-dc9c-440d-a5b5-d52ba2e14234#artist>
<foaf:name>
“Coldplay”
• Outgoing Links
<http://www.bbc.co.uk/music/artists/cc197bad-dc9c-440d-a5b5-d52ba2e14234#artist>
<owl:sameAs>
<http://dbpedia.org/resource/Coldplay>
• Incoming Link
<http://www.bbc.co.uk/music/artists/18690715-59fa-4e4d-bcf3-8025cf1c23e0#artist>
<mo:member_of>
<http://www.bbc.co.uk/music/artists/cc197bad-dc9c-440d-a5b5d52ba2e14234#artist>
What to publish?
• Description of the data set
– Semantic Sitemaps
– voiD (Vocabulary of Interlinked Datasets)
• Provenance Metadata
• Licenses Information
Vocabularies (or Schemas or
Ontologies)
• Create your own using
– Simple Knowledge Organization Systems (SKOS)
• Taxonomy
– RDF Vocabulary Description Language (RDF
Schema)
• Light weight vocabularies
– Web Ontological Language (OWL)
• Highly expressive and capable of inferencing
Vocabularies (or Schemas or
Ontologies)
• Reuse vocabularies
– Dublin Core: metadata attributes
– Friend of a Friend (FOAF): persons and relationships
– Semantically Interlinked Online Communities (SIOC):
describing users, posts, blogs, etc
– Description of a Project (DOAP)
– Music Ontology
– Programmes Ontology: TV and radio programs
– Good Relations: describing products and services
– Review Vocabulary
– Basic Geo (WGS84) Vocabulary
4) Include links to other things
4) Include links to other things
• Set external RDF links into other data sources
on the Web
– Subject of the triple is in the namespace of one
data set
– Object of the triple is a URI in the namespace of
another data set
• Connect siloed data islands
• Enable discovery
4) Include links to other things
• Relationship Links
<http://www.bbc.co.uk/music/artists/cc197bad-dc9c-440d-a5b5-d52ba2e14234#artist>
<http://xmlns.com/foaf/0.1/based_near>
<http://dbpedia.org/resource/London>
• Identity Link
<http://www.bbc.co.uk/music/artists/cc197bad-dc9c-440d-a5b5-d52ba2e14234#artist>
<http://www.w3.org/2002/07/owl#sameAs>
<http://dbpedia.org/resource/Coldplay>
• Vocabulary Links
<http://purl.org/ontology/mo/image>
<http://www.w3.org/2000/01/rdf-schema#subPropertyOf>
<http://xmlns.com/foaf/0.1/depiction>
Which predicate for linking to choose?
• Depends on your domain
• Is it widely used?
– owl:sameAs
– foaf:knows
– foaf:based_near
–…
• If you create your own, relate it to a widely
used predicate
How to create the links?
• Manually
– Works for small and static data sets
– I want to find another URI that identifies the same
real object that I have
• Sindice and Falcons provide index of URIs by keyword
• (Semi) Automatic
– Record Linkage/Identity Resolution/Co-reference
– Silk: http://www4.wiwiss.fu-berlin.de/bizer/silk/
– LIMES: http://aksw.org/Projects/limes