Linked Open Data
Download
Report
Transcript Linked Open Data
Linking Open Data
Linking the world of data
from LOD mailinglist
Acknowledgement for Tom Heath (Talis)
Ying Ding ([email protected])
http://info.slis.indiana.edu/~dingying/
What is now
User generated content is growing
tremendously
Isolated contents need deadly to get
connected.
The world is connected, so do the data,
information and knowledge
Old terms
Data -- sensing the world
What you sense (see, hear, smell, touch…)
Information – perceiving the world
Perceive the sensed data
Knowledge – contextualizing information
Comprehend the perceived information
Add context
Context ultimately determines what’s actually
what.
What is our daily life
Access data
Manipulate data (add, delete, change)
Process data
Generate information (tables, forms)
Create knowledge (reports, papers..)
Data is our life
Data is our daily bread
Do we have identifier for data?
Not really important if data is small and individual
Really important if data is huge and connected
? Should we need identifier for our data
? Why do we need our name, or social security number
? Can you refer to someone without identifier
?a person with good heart----
Make our busy life less messy
We just got 24 hours per day, not more
Add identifier to our data
Give the everyone-agreed-unique-identifier to each data
-- the perfect world of our dreamland
We will not have any integration problem, most of the IT
departments can be closed
Different groups give different identifiers to the same
data – we can live with that, it is more real in our daily
life, standardization bodies and IT guys are helping us.
We are happy that we can refer to data
Where are our data
In computer
On the Web
In my paper notes
In printed books
…
Data are being digitalized and are available online
Web Data
Web data
Data on the Web
Online journal
Blog
Wiki
…
Data in physical world
Yourself
Table
Book in library
Computer you are using
…
The boundary is blurring
Paper is both in your hand and on the Web
How to refer data
Web data
DOI (Digital Object Identifier)
OpenID (people, …)
URI (blog, wiki, homepage, …)
…
URI (Uniform Resource Identifier)
To identify or name a resource on the Internet
The main purpose is to enable interaction with
representations of the resource over a network,
typically WWW, using specific protocols
–from Wikipedia
URN – like a person’s name
urn:isbn:0-486-27557-4 – Book of “Romeo and Juliet”
URL – like a street address
http://www.slis.indiana.edu
Linked Data
A term coined by Tim Berners-Lee
It describes HTTP-based Data Access by
Reference for the Web
Current web is changing from hypertext links
(link documents) to hyperdata links (linking data)
Data are small components of the resources
It drills deep to the details of the resources
Linked data provides a powerful mechanism for
meshing disparate and heterogeneous data
Vision from Sir Berners-Lee
“The Semantic Web isn’t just about putting data on the
web. It is about making links”.
Four Rules for linking data
Use URIs as names for things
Use HTTP URIs so that people can look up those names
When someone looks up a URI, provide useful information
(URI dereferencing)
Include links to other URIs, so that they can discover more
things
“Breaking them does not destroy anything, but misses an
opportunity to make data interconnected. This in turn
limits the ways it can later be reused in unexpected
ways. It is the unexpected re-use of information which is
the value added by the web”
W3C SWEO Linking Open Data Project
Project aims to
Publish existing open license datasets as linked
data on the web
Interlink things between different data sources
Develop clients and applications that consume
linked data from the web
Bubbles in May 2007
Over 500M RDF triples
Around 120K RDF links between data sources
Bubbles in April 2008
>2B RDF triples
Around 3M RDF links
2011
What are Linked Data?
Linked Data require RDF
Why not XML?
Different model theory
But not all RDF data are linked data
You have to compliant your RDF data
according to the four rules mentioned by
Berners-Lee
Do you have linked data
Linked data are just RDF triples
<rdf:Description about=“http://example.org/smith#albert”>
<fam:hasChild rdf:Resource="http://example.org/smith#brian">
<fam:hasChild rdf:Resource="http://example.org/smith#carol">
</rdf:Description>
How can I get RDF triples
Relational database:
D2R tools can convert them for you
RDFizers from SIMILE:
Can convert JPEG, MARC/MODS, OAI-PMH, OCW(MIT
Open Course), Email, BibTex, Java, Javadoc, etc. to RDF
Thumb of the rules
Understand your data
What do you want to have in your data
Do not reinvent – REUSE!
Potential ontologies/vocabularies
• FOAF, Dublin Core, SKOS
URI Aliases
Different URIs for the same non-information resource
(Berlin, etc.)
owl:sameAs to link these URI aliases
More principles
Linked Data is simply about using the Web
to create typed links between data from
different sources.
The principle of Linked data is to:
Use the RDF data model to publish structured
data on the web
Use RDF links to interlink data from different
data sources.
Use HTTP URIs to identify resource
To avoid other URI schemes (URNs or DOIs)
Power of Linked Data
rdf:type
ying
foaf:Person
dblp:publications
foaf:name
foaf:publication
Ying Ding
foaf:knows
Stefan
foaf:based_near
72K
dp:population
db:Galway
skos:subject
dp:Dublin
skos:subject
dp:Cities_in_Ireland
How to become a bubble
Publishing your bubble
Are you ready?
Dereferencing HTTP URIs
Information resources (resources available on the
web):
• HTTP GET HTTP response code 200 OK
Non-information resources (real-word objects that
exist outside of the web):
• HTTP GET HTTP 303 See Other (303 redirect)
You are not your homepage, but you can be
dereferenced by your homepage
Publish your bubble
Step 1: Choosing URIs
Use HTTP URIs for everything (http://)
Make it dereferenable
Try to use the existing dereferencable URIs to represent
common things (city, music, artist, etc.):
http://esw.w3.org/topic/TaskForces/CommunityProjects/Linking
OpenData/CommonVocabularies
For instance: Geonames, DBpedia, Musicbrainz, dbtune, RDF
Book Mashup
Keep implementation info out of your URIs
Keep your URIs stable and persistent
Publish your bubble
Step 1: Choosing URIs
http://dbpedia.org/resource/Berlin
http://dbpedia.org/page/Berlin
http://dbpedia.org/data/Berlin
http://id.dbpedia.org/Berlin
http://pages.dbpedia.org/Berlin
http://data.dbpedia.org/Berlin
http://dbpedia.org/Berlin
http://dbpedia.org/Berlin.html
http://dbpedia.org/Berlin.rdf
Reference: Sauermann et al.: Cool URIs
for the Semantic Web (tutorial on URI
dereferencing and content-negotiation)
Publish your bubble
Step2: choose the vocabularies to represent information
Reuse terms from well-known vocabularies wherever possible
Friend of a Friend (FOAF)
Dublin Core (DC)
Semantically-Interlinked Online Communities (SIOC)
Description of a Project (DOAP)
Simple Knowledge Organization System (SKOS)
Creative Commons (CC)
More:
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpe
nData/CommonVocabularies
You should only define new terms yourself if you cannot find
required terms in existing vocabularies
Publish your bubble
Step2: choose the vocabularies to represent
information
If you really have to define your own vocabularies:
Do not define new vocabularies from scratch
Provide for both humans and machines (rdf:comments,
rdfs:label)
Make term URIs dereferenceable
Make use of other people’s terms
State all important information explicitly
Do not create over-constrained, brittle models, leave some
flexibility for growth
Publish your bubble
Step3: Link your bubble with other bubbles
RDF links enable browsers and crawlers to
navigate between data sources and to discover
additional data.
foaf:knows, foaf:based_near, foaf:topic_interest
owl:sameAs (map different URI aliases)
Publish your bubble
Step3: Link your bubble with other bubbles
Auto-generating RDF Links:
ISBN for books (e.g., RDF Book Mashup)
<http://dbpedia.org/resource/Harry_Potter_and_the_Half-Blood_Prince>
owl:sameAs <http://www4.wiwiss.fu-berlin.de/bookmashup/books/0747581088>
More complex property-based algorithms
Interlinking DBpedia and Geonames
Interlinking Jamendo and MusicBrainz
Publish your bubble
Recipes for publishing different information as
Linked Data on the Web
Things must be identified with dereferenceable HTTP
URIs
If such a URI is dereferenced asking for the MIME-type
application/rdf+xml, a data source must return an
RDF/XML description of the identified resource
URIs that identify non-information resources should
return HTTP 303 redirect
Besides RDF links to resources within the same data
source, RDF descriptions should also contain other RDF
links to link to other resources, so that you can browse
the web of data.
Test your bubble
Step4: test and debug linked data
Vapour linked validation service: a linked data
validator (http://vapour.sourceforge.net/)
Use Linked browsers to see whether your
information display correctly and your RDF links
work
Tabulator, Marbles, OpenLink RDF Browser, Disco
Welcome to the bubble world
Very excited!
Then what is my contribution and benefit?
Add more data to RDF data
Increase semantic content
…
…
Bring Web to its full potential!
Create your own LOD
Step 1: Select >2 datasets/tables (e.g., music
data+freebase or Yago or dbpedia)
Step 2: define the URI naming convention
Step 3: try to use existing popular metadata
schema via namespace (foaf, dubline core,
schema.org, etc.)
Step 4: convert them into RDF triples
Step 5: add owl:sameAs to connect dots
Step 6: browse your just-created LOD using
D2R server or others
Creating your own LOD
URI naming convention
For example: Chem2Bio2RDF
Entity type lists
(http://chem2bio2owl.wikispaces.com/Version+1.0)
For each entity:
http://chem2bio2rdf.org/databasename/resource/databasena
me_entity/entityIDfromdatabase
For example:
<http://chem2bio2rdf.org/drugbank/resource/drugbank_drug/
DB00333>
Creating your own LOD
Add owl:sameAs
For example: Metformin (drug)
<http://chem2bio2rdf.org/drugbank/resource/drugbank_drug/DB00331>
<http://chem2bio2rdf.org/drugbank/resource/drugbank_drug/DB00331>
owl:sameAs <http://bio2rdf.org/drugbank_drugs:DB00331>
owl:sameAS <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB00331>
owl:sameAS <http://www.dbpedia.org/resource/Metformin>
owl:sameAs <http://www4.wiwiss.fu-berlin.de/dailymed/resource/ingredient/Metformin>
owl:sameAs < http://www.freebase.com/guid/9202a8c04000641f8000000000194e39>
What LOD can bring?
It will lift current document web up to a data web
LOD browsers can let you navigate between
different data sources by following RDF links.
It can drill down to the lower granularity of the
information
allowing you for more fine search on the web
making the question-answer search on the Web
possible
meshing up different data through RDF links
Making the built-on-top application easier
Document Web vs. Data Web
Document Web
Glued by hyperlinks
Data are HTML pages
Query result is HTML
pages, which can not be
further processed
Data are just interlinked,
but not integrated
Data access through
different APIs
Data Web
Glued by RDF links
Data are RDF triples
Query result is RDF
triples which can be easily
further processed (e.g.,
web services)
Data are interlinked and
integrated, and links are
typed
Data access through a
single and standardized
access mechanism
(maybe it will called in the
future LOD API?)