Introduction to the Semantic Web

Download Report

Transcript Introduction to the Semantic Web

Ivan Herman, W3C,
W3C Brazil Office Meeting
São Paulo, Brazil, 2010-10-15
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)



(21)
You had to consult a large number of sites, all
different in style, purpose, possibly language…
You had to mentally integrate all those
information to achieve your goals
We all know that, sometimes, this is a long and
tedious process!

All those pages are
only tips of
respective icebergs:
◦ the real data is hidden
in databases, XML files,
Excel sheets, …
◦ you only have access to
what the Web page
designers allow you to
see
(22)

Specialized sites (Expedia, TripAdvisor) do a bit
more:
◦ they gather and combine data from other sources
(usually with the approval of the data owners)
◦ but they still control how you see those sources

(23)
But sometimes you want to personalize: access
the original data and combine it yourself!
(24)
(25)
(26)
(27)


(28)
I have to type the
same data again
and again…
This is even worse: I
feed the icebergs…

The raw data should be available on the Web
◦ let the community figure out what applications are
possible…
(29)
(30)
(31)
(32)

Mashup sites are forced to do very ad-hoc jobs
◦ various data sources expose their data via Web
Services, API-s
◦ each with a different API, a different logic, different
structure
◦ mashup sites are forced to reinvent the wheel many
times because there is no standard way getting to the
data!
(33)

The raw data should be available in a standard
way on the Web
◦ i.e., using URI-s to access data
◦ dereferencing that data should lead to something
useful
(34)

What makes the current (document) Web work?
◦ people create different documents
◦ they give an address to it (ie, a URI) and make it
accessible to others on the Web
(35)
(36)


Others discover the site and they link to it
The more they link to it, the more important
and well known the page becomes
◦ remember, this is what, eg, Google exploits!

(37)
This is the “Network effect”: some pages
become important, and others begin to rely on
it even if the author did not expect it…
(38)
(39)

The same network effect works on the raw data
◦ Many people link to the data, use it
◦ Much more (and diverse) applications will be created
than the “authors” would even dream of!
(40)
(41)
(42)
Photo credit “nepatterson”, Flickr
(43)

A “Web” where
◦ documents are available for download on the Internet
◦ but there would be no hyperlinks among them

(44)
This is certainly not what we want!
(45)
(46)
(47)
(48)
(49)


(50)
The raw data should be available in a standard
way on the Web
There should be links among datasets
Photo credit “kxlly”, Flickr
(51)


(52)
On the traditional Web, humans are implicitly
taken into account
A Web link has a “context” that a person may
use
(53)
(54)


A human understands that this is where my
office is, ie, the institution’s home page
He/she knows what it means
◦ realizes that it is a research institute in Amsterdam

(55)
When handling data, something is missing;
machines can’t make sense of the link alone

New lesson learned:
◦ extra information (“label”) must be added to a link: “this
links to my institution, which is a research institute”
◦ this information should be machine readable
◦ this is a characterization (or “classification”) of both the
link and its target
◦ in some cases, the classification should allow for some
limited “reasoning”
(56)




(57)
The raw data should be available in a standard
way on the Web
Datasets should be linked
Links, data, sites, should be characterized,
classified, etc.
The result is a Web of Data
(58)
(59)


It is that simple…
Of course, the devil is in the details
◦ a common data model data has to be provided
◦ the “classification” of the terms can become very
complex
◦ but these details are fleshed out by experts as we
speak!
(60)


A set of core technologies are in place
Lots of data (billions of relationships) are
available in standard format
◦ often referred to as “Linked Open Data Cloud”
(61)
(62)

There is a vibrant community of
◦ academics: universities of Southampton, Oxford,
Stanford, PUC
◦ small startups: Garlik, Talis, C&P, TopQuandrant,
Cambridge Semantics, OpenLink, …
◦ major companies: Oracle, IBM, SAP, …
◦ users of Semantic Web data: Google, Facebook, Yahoo!
◦ publishers of Semantic Web data: New York Times, US
Library of Congress, open governmental data (US, UK,
France,…)
(63)

Companies, institutions begin to use the
technology:
◦ BBC, Vodafone, Siemens, NASA, BestBuy, Tesco, Korean
National Archives, Pfizer, Chevron, …
 see http://www.w3.org/2001/sw/UseCases

Truth must be said: we still have a way to go
◦ deployment may still be experimental, or on some
specific places only
(64)
(65)
(66)
(67)
(68)



(69)Courtesy
Help in finding the best drug regimen for a
specific case, per patient
Integrate data from various sources (patients,
physicians, Pharma, researchers, ontologies, etc)
Data (eg, regulation, drugs) change often, but
the tool is much more resistant against change
of Erick Von Schweber, PharmaSURVEYOR Inc., (SWEO Use Case)


(70)Courtesy
Integration of
relevant data
in Zaragoza
Use rules to
provide a
proper
itinerary
of Jesús Fernández, Mun. of Zaragoza, and Antonio Campos, CTIC (SWEO Use Case)

More an more data
should be “published”
on the Web
◦ this can lead to the
“network effect” on data

New breeds of
applications come to
the fore
◦ “mashups on steroids”
◦ better representation
and usage of community
knowledge
◦ new customization
possibilities
◦ …
(71)


A huge amount of data (“information”) is
available on the Web
Sites struggle with the dual task of:
◦ providing quality data
◦ providing usable and attractive interfaces to access that
data
(72)

Semantic Web technologies allow a
separation of tasks:
publish quality, interlinked datasets
2. “mash-up” datasets for a better user experience
1.
“Raw Data Now!”
Tim Berners-Lee, TED Talk, 2009
http://bit.ly/dg7H7Z
(73)



(74)
The “network effect” is also valid for data
There are unexpected usages of data that
authors may not even have thought of
“Curating”, using, exploiting the data requires a
different expertise

W3C
◦ was one of the initiators of the Semantic Web (Tim
Berners-Lee and others)
◦ is the place where Semantic Web Standards are
developed and defined
◦ is integral part of the Semantic Web community
(75)


It is done by groups, with W3C members
delegating experts
Each group has at least one W3C staff member
to help the process and contribute to the
technology
◦ there is a formal process that has to be followed
◦ the price to pay…
(76)
(77)


The public can comment at specific points in
the process
Groups must take all comments into account
◦ the number of comments can be in the hundreds...
(78)





(79)
Regular telecons (usually once a week)
Possibly 1-2 face-to-face meetings a year
Lots of email discussions
Editorial work to get everything properly written
down
Average life-span: 2-3 years
(80)
Thank you for your attention!
These slides are also available on the Web:
http://www.w3.org/2010/Talks/1015-SaoPaulo-Office-IH/
(81)