Folie 1 - VideoLectures.NET

Download Report

Transcript Folie 1 - VideoLectures.NET

Georgi Kobilarov, Chris Bizer, Sören Auer, Jens Lehmann
Freie Universität Berlin, Universität Leipzig
Querying Wikipedia
like a Database
Domain
specific
Data
Title
Images
Description
Languages
Infoboxes
Web Links
Categorization
Infobox Extraction
dbpedia:Albert_Einstein p:name
„Albert Einstein“
dbpedia:Albert_Einstein p:birth_place
dbpedia:Ulm
dbpedia:Albert_Einstein p:birth_date
„ 1956-07-09“
Property Synonyms
Structuring Wikipedia‘s Knowledge
• Structuring actual data, not modeling the
world
• Bound to Wikipedia Templates, parsers handle
template values based on rules (property
splitting, merging, transformation)
DBpedia Ontology
• DBpedia Ontology build from scratch
• 170 classes, 900 properties
No living things
Class Hierarchy
„Select all TV Episodes …“
Template Mapping
Class TV Episode (Work)
Wikipedia Templates:
Television Episode
UK Office Episode
Simpsons Episode
DoctorWhoBox
Template Mapping
Infobox Cricketer
Infobox Historic Cricketer
Infobox Recent Cricketer
Infobox Old Cricketer
Infobox Cricketer Biography
=> Class Cricketer (Athlete)
People
Actors
Athlete
Journalist
MusicalArtist
Politician
Scientist
Writer
Places
Airport
City
Country
Island
Mountain
River
Organisations
Band
Company
Educational Institution
Radio Station
Sports Team
Event
Convention
Military Conflict
Music Event
Sport Event
Work
Book
Broadcast
Film
Software
Television
More structured data
•
•
•
•
Categories in SKOS
Intra-wiki links
Disambiguation
Redirects
• Links to Images (and Flickr)
• Links to external webpages
• Data about 2.6 million “things”
• 274 million pieces of information (RDF triples)
Multilingual
Abstracts
–
–
–
–
–
–
–
–
–
–
–
English: 2,613,000
German: 391,000
French: 383,000
Dutch: 284,000
Polish: 256,000
Italian: 286,000
Spanish: 226,000
Japanese: 199,000
Portuguese: 246,000
Swedish: 144,000
Chinese: 101,000
DBpedia as
Linked Data Hub
Semantic Web
“My document can point at your document on
the Web, but my database can't point at
something in your database without writing
special purpose code. The Semantic Web aims
at fixing that.”
Prof. James Hendler
Web of Documents
Search
Engines
Web
Browsers
HTTP
HTML
HTML
hyper
links
A
hyper
links
B
HTML
HTML
hyper
links
C
D
Web of Data
Linked Data
Mashups
Search
Engines
Linked Data
Browsers
HTTP
HTTP
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
Thing
data
link
A
data
link
B
data
link
C
data
link
D
E
Linked Data
•
•
•
•
Use URIs as names for things
Use HTTP URIs so that people can look up those names.
When someone looks up a URI, provide useful information.
Include links to other URIs. so that they can discover more
things.
Wikipedia Article URI:
http://en.wikipedia.org/wiki/Madrid
DBpedia Resource URI
http://dbpedia.org/resource/Madrid
HTTP URIs
Information Resources
Real-World Resources
http://dbpedia.org/resource/Madrid
http://dbpedia.org/page/Madrid
HTTP GET -> 303 See other
HTTP GET -> 200 OK
http://dbpedia.org/page/Madrid
http://dbpedia.org/data/Madrid
-> 200 OK
Music
Online Activities
Publications
Geographic
Cross-Domain
Life Sciences
4.5 billion triples
180 million data links
Use Cases
Use Cases
1. Data Source for Web-Applications
2. Querying Wikipedia like a database
3. Tag Web content with concepts instead of
free-text tags
4. Vocabulary and semantic backbone for
enterprise linked data integration
DBpedia as data source
• Embed structured information from
Wikipedia into your web applications
• Build (mobile) maps applications using
DBpedia data about places
• Display multilingual titles &
descriptions in 15 languages
DBpedia Mobile
Sparql Endpoint
http://dbpedia.org/sparql
Wikipedia Query
Annotating Documents
• Use DBpedia concepts to annotate documents
instead of free-text tags
• Named Entity Extraction Systems already use DBpedia URIs
(OpenCalais, Muddy Boots)
• Social Bookmarking with DBpedia URIs as tags
www.faviki.com
„Apple“
http://dbpedia.org/resource/Apple_Inc.
http://dbpedia.org/resource/Apple_(fruit)
http://dbpedia.org/resource/Apple_Records
Annotating Documents
• BBC editors tag news articles with DBpedia
concepts
• DBpedia Lookup Service
http://lookup.dbpedia.org
Linking Enterprise Data
Take the Linking Open Data
approach to the enterprises
Linking Enterprise Data
• Connect data sets with DBpedia as shared vocabulary
• Enable meaningful navigation paths across BBC websites
• Browsing Madonna-related information across BBC News,
BBC Music, BBC Programmes, …
• Make use of the rich background information:
relate the release of a music album to a news article about
the artist
The Future of DBpedia
Improve Information Extraction
Croud-source
Information Extraction
Crowd Sourced Extraction
Where‘s the user benefit?
Data Fusion
Cross-Language Data Fusion
• 264 Wikipedia Editions in different languages
– Italian Wikipedians know more about Italian
villages
– German Wikipedia contains more person
infoboxes
• Augment the infobox dataset with facts from
other Wikipedia editions.
Augment DBpedia with External Data
• Linking Open Data cloud provides more data than
Wikipedia
– EuroStat provides additional statistical information
about countries.
– Musicbrainz contains additional information about
other bands.
– Geonames provides additional information about
locations.
• Idea
– Augment DBpedia with additional data from external
sources.
Contribute back to Wikipedia
• Opportunity
– Feed data back to Wikipedia
• Extend the Wikipedia authoring environment
with
– Suggestions for infobox values
– Cross-language consistency checking for infoboxes
• Currently going on
– New maps in Wikipedia based on Dbpedia Mobil
Code (OpenStreetMap)
Contribute back to Wikipedia
• Initialize Wikipedia Clean-Up Cycles
– Data-driven search interfaces expose the
weaknesses of Wikipedia template system.
– Preferred items not showing up in end-user
interfaces may motivate Wikipedia editors to use
templates more stringently.
Live Update
• Current Situation
– DBpedia update cycle: 3 month
– Wikipedia provides us with access to the live
update stream
• Opportunity
– Increase the currency of the DBpedia dataset
using this update stream
• Result
– DBpedia in synchronization with Wikipedia.
Open Source
Open Data
What is the
Wikipedia for Data?
Wikipedia is the
Wikipedia for Data
Summary
http://dbpedia.org
[email protected]