Linked Open Data

Download Report

Transcript Linked Open Data

Linking Open Data
Linking the world of data
from LOD mailinglist
Acknowledgement for Tom Heath (Talis)
Ying Ding ([email protected])
http://info.slis.indiana.edu/~dingying/
What is now
User generated content is growing
tremendously
Isolated contents need deadly to get
connected.
The world is connected, so do the data,
information and knowledge
Old terms
 Data -- sensing the world
What you sense (see, hear, smell, touch…)
 Information – perceiving the world
Perceive the sensed data
 Knowledge – contextualizing information
Comprehend the perceived information
Add context
Context ultimately determines what’s actually
what.
What is our daily life
Access data
Manipulate data (add, delete, change)
Process data
Generate information (tables, forms)
Create knowledge (reports, papers..)
Data is our life
 Data is our daily bread
 Do we have identifier for data?
Not really important if data is small and individual
Really important if data is huge and connected
? Should we need identifier for our data
? Why do we need our name, or social security number
? Can you refer to someone without identifier
?a person with good heart----
Make our busy life less messy
 We just got 24 hours per day, not more
 Add identifier to our data
Give the everyone-agreed-unique-identifier to each data
-- the perfect world of our dreamland
 We will not have any integration problem, most of the IT
departments can be closed
Different groups give different identifiers to the same
data – we can live with that, it is more real in our daily
life, standardization bodies and IT guys are helping us.
We are happy that we can refer to data
Where are our data
 In computer
 On the Web
 In my paper notes
 In printed books
…
Data are being digitalized and are available online
Web Data
Web data
 Data on the Web
 Online journal
 Blog
 Wiki
…
 Data in physical world
 Yourself
 Table
 Book in library
 Computer you are using
…
 The boundary is blurring
 Paper is both in your hand and on the Web
How to refer data
Web data
DOI (Digital Object Identifier)
OpenID (people, …)
URI (blog, wiki, homepage, …)
…
URI (Uniform Resource Identifier)
 To identify or name a resource on the Internet
 The main purpose is to enable interaction with
representations of the resource over a network,
typically WWW, using specific protocols
–from Wikipedia
URN – like a person’s name
 urn:isbn:0-486-27557-4 – Book of “Romeo and Juliet”
URL – like a street address
 http://www.slis.indiana.edu
Linked Data
 A term coined by Tim Berners-Lee
 It describes HTTP-based Data Access by
Reference for the Web
 Current web is changing from hypertext links
(link documents) to hyperdata links (linking data)
Data are small components of the resources
It drills deep to the details of the resources
 Linked data provides a powerful mechanism for
meshing disparate and heterogeneous data
Vision from Sir Berners-Lee
 “The Semantic Web isn’t just about putting data on the
web. It is about making links”.
 Four Rules for linking data
 Use URIs as names for things
 Use HTTP URIs so that people can look up those names
 When someone looks up a URI, provide useful information
(URI dereferencing)
 Include links to other URIs, so that they can discover more
things
 “Breaking them does not destroy anything, but misses an
opportunity to make data interconnected. This in turn
limits the ways it can later be reused in unexpected
ways. It is the unexpected re-use of information which is
the value added by the web”
W3C SWEO Linking Open Data Project
Project aims to
Publish existing open license datasets as linked
data on the web
Interlink things between different data sources
Develop clients and applications that consume
linked data from the web
Bubbles in May 2007
Over 500M RDF triples
Around 120K RDF links between data sources
Bubbles in April 2008
>2B RDF triples
Around 3M RDF links
2011
What are Linked Data?
Linked Data require RDF
Why not XML?
Different model theory
But not all RDF data are linked data
You have to compliant your RDF data
according to the four rules mentioned by
Berners-Lee
Do you have linked data
 Linked data are just RDF triples
<rdf:Description about=“http://example.org/smith#albert”>
<fam:hasChild rdf:Resource="http://example.org/smith#brian">
<fam:hasChild rdf:Resource="http://example.org/smith#carol">
</rdf:Description>
 How can I get RDF triples
Relational database:
 D2R tools can convert them for you
RDFizers from SIMILE:
 Can convert JPEG, MARC/MODS, OAI-PMH, OCW(MIT
Open Course), Email, BibTex, Java, Javadoc, etc. to RDF
Thumb of the rules
Understand your data
What do you want to have in your data
Do not reinvent – REUSE!
Potential ontologies/vocabularies
• FOAF, Dublin Core, SKOS
URI Aliases
Different URIs for the same non-information resource
(Berlin, etc.)
owl:sameAs to link these URI aliases
More principles
Linked Data is simply about using the Web
to create typed links between data from
different sources.
The principle of Linked data is to:
Use the RDF data model to publish structured
data on the web
Use RDF links to interlink data from different
data sources.
Use HTTP URIs to identify resource
To avoid other URI schemes (URNs or DOIs)
Power of Linked Data
rdf:type
ying
foaf:Person
dblp:publications
foaf:name
foaf:publication
Ying Ding
foaf:knows
Stefan
foaf:based_near
72K
dp:population
db:Galway
skos:subject
dp:Dublin
skos:subject
dp:Cities_in_Ireland
How to become a bubble
Publishing your bubble
Are you ready?
Dereferencing HTTP URIs
Information resources (resources available on the
web):
• HTTP GET HTTP response code 200 OK
Non-information resources (real-word objects that
exist outside of the web):
• HTTP GET HTTP 303 See Other (303 redirect)
You are not your homepage, but you can be
dereferenced by your homepage
Publish your bubble
 Step 1: Choosing URIs
Use HTTP URIs for everything (http://)
Make it dereferenable
 Try to use the existing dereferencable URIs to represent
common things (city, music, artist, etc.):
http://esw.w3.org/topic/TaskForces/CommunityProjects/Linking
OpenData/CommonVocabularies
 For instance: Geonames, DBpedia, Musicbrainz, dbtune, RDF
Book Mashup
Keep implementation info out of your URIs
Keep your URIs stable and persistent
Publish your bubble
 Step 1: Choosing URIs
http://dbpedia.org/resource/Berlin
http://dbpedia.org/page/Berlin
http://dbpedia.org/data/Berlin
http://id.dbpedia.org/Berlin
http://pages.dbpedia.org/Berlin
http://data.dbpedia.org/Berlin
http://dbpedia.org/Berlin
http://dbpedia.org/Berlin.html
http://dbpedia.org/Berlin.rdf
Reference: Sauermann et al.: Cool URIs
for the Semantic Web (tutorial on URI
dereferencing and content-negotiation)
Publish your bubble
 Step2: choose the vocabularies to represent information
 Reuse terms from well-known vocabularies wherever possible
 Friend of a Friend (FOAF)
 Dublin Core (DC)
 Semantically-Interlinked Online Communities (SIOC)
 Description of a Project (DOAP)
 Simple Knowledge Organization System (SKOS)
 Creative Commons (CC)
 More:
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpe
nData/CommonVocabularies
 You should only define new terms yourself if you cannot find
required terms in existing vocabularies
Publish your bubble
 Step2: choose the vocabularies to represent
information
If you really have to define your own vocabularies:
 Do not define new vocabularies from scratch
 Provide for both humans and machines (rdf:comments,
rdfs:label)
 Make term URIs dereferenceable
 Make use of other people’s terms
 State all important information explicitly
 Do not create over-constrained, brittle models, leave some
flexibility for growth
Publish your bubble
Step3: Link your bubble with other bubbles
RDF links enable browsers and crawlers to
navigate between data sources and to discover
additional data.
foaf:knows, foaf:based_near, foaf:topic_interest
owl:sameAs (map different URI aliases)
Publish your bubble
Step3: Link your bubble with other bubbles
Auto-generating RDF Links:
ISBN for books (e.g., RDF Book Mashup)
<http://dbpedia.org/resource/Harry_Potter_and_the_Half-Blood_Prince>
owl:sameAs <http://www4.wiwiss.fu-berlin.de/bookmashup/books/0747581088>
More complex property-based algorithms
Interlinking DBpedia and Geonames
Interlinking Jamendo and MusicBrainz
Publish your bubble
 Recipes for publishing different information as
Linked Data on the Web
Things must be identified with dereferenceable HTTP
URIs
If such a URI is dereferenced asking for the MIME-type
application/rdf+xml, a data source must return an
RDF/XML description of the identified resource
URIs that identify non-information resources should
return HTTP 303 redirect
Besides RDF links to resources within the same data
source, RDF descriptions should also contain other RDF
links to link to other resources, so that you can browse
the web of data.
Test your bubble
Step4: test and debug linked data
Vapour linked validation service: a linked data
validator (http://vapour.sourceforge.net/)
Use Linked browsers to see whether your
information display correctly and your RDF links
work
Tabulator, Marbles, OpenLink RDF Browser, Disco
Welcome to the bubble world
Very excited!
Then what is my contribution and benefit?
Add more data to RDF data
Increase semantic content
…
…
Bring Web to its full potential!
Create your own LOD
 Step 1: Select >2 datasets/tables (e.g., music
data+freebase or Yago or dbpedia)
 Step 2: define the URI naming convention
 Step 3: try to use existing popular metadata
schema via namespace (foaf, dubline core,
schema.org, etc.)
 Step 4: convert them into RDF triples
 Step 5: add owl:sameAs to connect dots
 Step 6: browse your just-created LOD using
D2R server or others
Creating your own LOD
 URI naming convention
 For example: Chem2Bio2RDF
Entity type lists
(http://chem2bio2owl.wikispaces.com/Version+1.0)
For each entity:
 http://chem2bio2rdf.org/databasename/resource/databasena
me_entity/entityIDfromdatabase
For example:
 <http://chem2bio2rdf.org/drugbank/resource/drugbank_drug/
DB00333>
Creating your own LOD
 Add owl:sameAs
 For example: Metformin (drug)
<http://chem2bio2rdf.org/drugbank/resource/drugbank_drug/DB00331>
<http://chem2bio2rdf.org/drugbank/resource/drugbank_drug/DB00331>
owl:sameAs <http://bio2rdf.org/drugbank_drugs:DB00331>
owl:sameAS <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB00331>
owl:sameAS <http://www.dbpedia.org/resource/Metformin>
owl:sameAs <http://www4.wiwiss.fu-berlin.de/dailymed/resource/ingredient/Metformin>
owl:sameAs < http://www.freebase.com/guid/9202a8c04000641f8000000000194e39>
What LOD can bring?
 It will lift current document web up to a data web
 LOD browsers can let you navigate between
different data sources by following RDF links.
 It can drill down to the lower granularity of the
information
allowing you for more fine search on the web
making the question-answer search on the Web
possible
meshing up different data through RDF links
Making the built-on-top application easier
Document Web vs. Data Web
 Document Web
 Glued by hyperlinks
 Data are HTML pages
 Query result is HTML
pages, which can not be
further processed
 Data are just interlinked,
but not integrated
 Data access through
different APIs
 Data Web
 Glued by RDF links
 Data are RDF triples
 Query result is RDF
triples which can be easily
further processed (e.g.,
web services)
 Data are interlinked and
integrated, and links are
typed
 Data access through a
single and standardized
access mechanism
(maybe it will called in the
future LOD API?)