Transcript semanticWeb

cs236607
1
The Need
“Most of the Web's content today is designed for
humans to read, not for computer programs to
manipulate meaningfully.”
Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American,
May 2001
cs236607
2
Semantic Processing
 We want to be able to pose complex search tasks that
use the semantics of pieces of information, e.g.
I want to purchase a DVD
of “Dore the Explorer” at
a price lower than 10$.
Is such a CD available at
amazon.com?
cs236607
3
Current search agents are not suitable for such task
cs236607
4
Current Solution
Use “intelligent” agents
The Semantic-Web Approach
Content is machine-understandable by being bound to
some formal description of itself (i.e. metadata)
cs236607
5
Goals
 Web of data - provides common data representation
framework to facilitate integrating multiple sources to
draw new conclusions
 Increase the utility of information by connecting it to
its definitions and to its context
 More efficient information access and analysis
cs236607
6
Applications
 Agents searching Web and retrieving valuable
information to the end user
 Web services publishing their information
 Programs running to merge data of different web
services and create new results from them
cs236607
7
Ontologies & Inference Engines
“For the semantic web to function, computers must
have access to structured collections of information
and sets of inference rules that they can use to conduct
automated reasoning.”
Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American,
May 2001
cs236607
8
The Four Building Blocks
1.
2.
3.
4.
XML
RDF
Ontologies
Agents
cs236607
9
XML
“XML allows users to add arbitrary structure to their
documents but says nothing about what the structures
mean”
cs236607
10
RDF –Resource Description
Framework
 Meaning encoded in sets of ‘triples’: entities have
properties which have values
 Entities, properties and values all have distinct URIs
“imagine that we have access to a variety of databases with information about
people, including their addresses. If we want to find people living in a specific zip
code, we need to know which fields in each database represent names and which
represent zip codes. RDF can specify that "(field 5 in database A) (is a field of
type) (zip code)," using URIs rather than phrases for each term.”
Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American, May 2001a
cs236607
11
Ontologies
 Database A and Database B may use different
fields to contain ‘zip code’
 Ontologies sort this out
 Ontology = ‘a document or file that formally
defines the relations among terms’
 Ontologies for the web normally have
 A taxonomy
 A set of inference rules
cs236607
12
Agents
“Agent based computing appears to be the appropriate
paradigm to work in a complex world with multiple
ontologies, fragments and multiple inferencing engines.”
Stork, Hans-Georg and Mastroddi, Franco, Semantic Web Technologies
- a New Action Line in the European Commission’s IST Programme,
2001
cs236607
13
The Power of Agents - Integration
“The real power of the Semantic Web will be
realized when people create many programs
that collect Web content from diverse sources,
process the information and exchange the
results with other programs. The effectiveness
of such software agents will increase
exponentially as more machine-readable Web
content and automated services (including
other agents) become available.”
Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific
American, May 2001
cs236607
14
‘Ambient Intelligence’
“In the next step, the Semantic Web will break out of
the virtual realm and extend into our physical world.
URIs can point to anything, including physical entities,
which means we can use the RDF language to describe
devices such as cell phones and TVs.”
Berners-Lee, T, Hendler, J & Lassila, O ‘The semantic web’, Scientific American,
May 2001
cs236607
15
cs236607
16
What is RDF?
 A part of the semantic-Web activity
 RDF is a general-purpose language for representing
information on the Web
 Specifically, objects and relationships
 Designed to allow computer applications to process data
based on its semantics
 Rather than displaying data to humans (as opposed to
RSS)
 An RDF document is actually a labeled graph that is
represented in XML
 The specific language is called RDF/XML
 W3C recommendation (Feb. 2004)
17
RDF Data
The basic element: Triple (labeled edge)
Statement
Subject
RDF document: edgelabeled graph
predicate
Object
Person#845
address
#1002
street
Herzel
city
Haifa
postalCode
6941
18
The XML Syntax of RDF
John Smith
page.html
John’s Home Page
<?xml version=“1.0”?>
<rdf:RDF
xmlns:rdf=“http://www.w3.org/TR/WD-rdf-syntax#”
xmlns:dc=“http://purl.org/metadata/dublin_core#”>
<rdf:Description about=“page.html”>
<dc:Creator>John Smith</dc:Creator>
<dc:Title>John’s Home Page</dc:Title>
</rdf:Description>
</rdf:RDF>
19
Structured Values
page.html
dc:Title
John’s Home Page
dc:Creator
Name
John Smith
Email
[email protected]
...
<Description about=“page.html”>
<dc:Creator>
<Description>
<corp:Name>John Smith</corp:Name>
<corp:Email>[email protected]</corp:Email>
</Description>
</dc:Creator>
<dc:Title>John’s Home Page</dc:Title>
</Description>
</RDF>
20
Dublin Core
 A set of fifteen basic properties for describing
generalized Web resources
 The “obvious” mapping of Dublin Core properties
into RDF properties has not yet been approved by
the Dublin Core initiative, but is generally a good
example
21
Dublin Core
 “Title”: the name given to the resource
 “Creator”: the person or organization primarily





responsible for the resource
“Subject”: what the resource is about
“Description”: a description of the content
“Publisher”: the person or organization responsible for
making the resource available
“Contributor”: someone who has provided content to
the resource other than the creator
“Date”: date of creation or publication
22
Dublin Core
 “Type”: type of resource, such as home page, technical report,







novel, photograph…
“Format”: data format of the resource
“Identifier”: URL, ISBN number, …
“Source”: another resource that this resource is derived from
“Language”: the language of the content
“Relation”: another resource and its relationship to this one
“Coverage”: the portion of time or space described by this
resource (atlases, histories, etc.)
“Rights”: the intellectual property rights adhering to this
resource, or a pointer to them
23
Advanced RDF
 Containers: bags, sequences, alternatives
 aboutEach, aboutEachPrefix
 Reification (higher order statements)
 Namespaces and Vocabularies
24
Creating RDF documents
 Manually from HTML or “user domain XML”
 With special assisting tools – like Protégé, Reggie, DC-
dot, RDF for XML
 Ideally – with some automated procedure from
HTML/XML documents
 Can we use XSLT there?
25
cs236607
26
RDF Schema
 Not an XML Schema!
 A “companion” specification for RDF spec
 Class, Type, subClassOf,
 domain, range
 Misc: label, comment, isDefinedBy,etc.
27
Example
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xml:base= "http://www.animals.fake/animals#">
<rdfs:Class rdf:ID="animal" />
<rdfs:Class rdf:ID="horse">
<rdfs:subClassOf rdf:resource="#animal"/>
</rdfs:Class>
</rdf:RDF>
cs236607
28
cs236607
29
RDF Schema is Limited
 We cannot express facts such as
 Two classes are disjoint
 Build a class that is the union of two classes
 Cardinality restriction
 Scope of properties
 Provide relationships between properties, such as
transitive, unique, inverse
cs236607
30
OWL
 A Web ontology language that is more expressive than
RDF and RDF Schema
 Written in XML on top of RDF
 Using OWL we want to provide exact descriptions of
items and the relationships between them
 Basically, built upon Description Logics
cs236607
31