Semantic Web - Columbia University

Download Report

Transcript Semantic Web - Columbia University

Web Enhanced Information Management
Semantic Web
COMS 6135 Class Presentation
Jian Pan
Department of Computer Science
Columbia University
Semantic Web




An extension to the World Wide Web to make the
web readable by machines, computers or software
agents.
Goal achieved by adding metadata into web pages to describe
data or info about the page, thus making the pages “machine
understandable”.
Not necessarily will it make computers fully intelligent or selfaware, but it will make machine more able to find, integrate
information, and also make inference.
It derived from Tim Berners-Lee’s vision of the web as a
universal medium for data, information, and knowledge exchange.
Motivation of Semantic Web




Data in daily application are scattered in different formats
computers can’t understand or utilize.
Spreadsheets,
Data in a relational DB,
Data on your own desktop
Information on the web today are represented in natural
language computers can’t understand.
Many tasks today require to combine data of different format
on the Web
Hotel and car rental info may come from different sites
We can use machine to integrate, create association on data,
make inference on data if we can represent data in a machine
understandable language:
Metadata: visible to machines, invisible to human
An Example Application: Buying a DVD online
MetaData
Web
Site1
Search
Site
MetaData
Site1
Site
Web
MetaData
Site1
Site
Search
Search
Machine Agent
Automatically
process and make
inference on data
Final
Result
Traditional Web
Semantic Web
Components: XML

XML: compliments
HTML by adding tags
that can describe the
data.
Machine codes such
as Web Crawler thus
can understand the
data.
Components: RDF


RDF: Resource Description
Framework: identifies a
resource with its location
and relations to other
resources on the Web.
Everything on the Web is
a resource, and can be
described in RDF terms
Components: RDF(1)
 Example
RDF snippets
<rdf:Description rdf:ID=“Spiderman3”>
<rdf:type rdf:BlockbusterMovie
rdf:resource= "http://www.w3.org/2000/01
/rdf-schema#BlockbusterMovie"/>
<rdf:type rdf:Movie
rdf:resource= "http://www.w3.org/2000/01
/rdf-schema#Movie"/>
</rdf:Description>
Components: RDF(2)
 RDF
have three components:
Subject, Property, Value
Subject
Spiderman 3
Property
Is a kind of
Value
Blockbuster
movie
Components: RDF(3)

A network of resources described by RDF can form a
resource network
Components: RDF(4)

Semantic Web will enable software agents to make
inference on resources described using RDF
Movie
rdf:IsAClassOf
rdf:IsAClassOf
Blockbuster
Movie
Spiderman 3
rdf:type
Components: URI



URI (Unified Resource
Identifiers): the
computer needs to be
directed to where the
resource is located
Every resource can be
described by a URI.
URI not URL, doesn’t have to
be a Web link,
can be the computer on
your refrigerator !
Components: URI(2)
URI
<rdf:Description rdf:ID=“Spiderman3”>
<rdf:type rdf:BlockbusterMovie
rdf:resource= "http://www.w3.org/2000/01
/rdf-schema#BlockbusterMovie"/>
<rdf:type rdf:Movie
rdf:resource= "http://www.w3.org/2000/01
/rdf-schema#Movie"/>
</rdf:Description>
To Enable Further Inference Capabilities



We need to describe the
data in more detail:
Ontology : a vocabulary
that tries to describe a
resource and how their
relationship with each
other.
Human understand words,
but computer need a
dictionary to look up what
the words mean, how they
connect to each other to
build its logic connections.
Components: Ontology
Movie
Spiderman3
Is a kind of
Stars in
Is a kind of
BlockBuster
Movie
Stars in
Stars in
Kirsten Dunst Tobey Maguire
Stars in
Example Revisited: Buying a DVD online

The entire workflow of buying a DVD online in a semantic
web would be as follows:

The websites adds metadata to each DVD item to be
readable by computers or software agents

The metadata would be in XML formatted RDF tags. RDF
will describe in full all properties of the DVD item

There will be a coherent industry ontology created to
describe each DVD item shared by all online DVD stores


Software agents on the client side traverse all DVD item
metadata from different stores and return the desired
search results to the user
The results returned by the agent will improve by inference
what the user intend to search for.
Typical Applications


Symptom diagnosis: doctors informed of the
patient’s symptoms will not always be able to
determine the disease. Semantic Web using
inference based on other online resources will
make better suggestions for the doctor to choose
Biomedical applications: data mining to build
connections between certain genes or protein
structure of medicine with clinical trial results by
using inference.
Challenges





Many ontology are very difficult to create: use
modularization (create only part of the ontology)
Debate of whether using rule based inference is
better or using ontology based reasoning is better
Trust: can we trust the metadata provided? Who’ is
authorized to publish what metadata?
Improve the efficiency of the inference algorithms
Governance: no one can monitor all the rules, data in
the Semantic Web
Thank You !
References



Semantic Web - XML2000 by Tim Berners-Lee,
http://www.w3.org/2000/Talks/1206-xml2k-tbl/Overview.html
Introduction to the Semantic Web, Ivan Herman, International Conference
on Semantic Web & Digital Libraries, 2007/02/21, Bangalore, India.
Semantic Web: Building on What Exists, 2006/04/04,Tim Berners-Lee, MIT
Information Technology Conference keynote