Lecture Notes - University of Huddersfield

Download Report

Transcript Lecture Notes - University of Huddersfield

The Semantic Web –
introduction to the
basic technology
Week 2 - XML
Lee McCluskey
Recap




The Semantic Web is the Vision (not a current reality) of
having an internet with resources that are machine
understandable or accessible to automated processes machines should do much more than present the
information visually or do human-consumable IR.
Central idea – we agree on a way of SPECIFYING
vocabularies rather than agreeing on a particular
vocabularies/languages. Then in communication,
processes only need to point to the language (vocabulary)
they are using. This is much more flexible than a common
language.
XML is like a “machine code” in the SW.
Processes on the SW will need to perform reasoning to fully
exploit the SW to do Knowledge Acquisition etc.
Artform Research Group
WWW
A tool for people to access information
Interface to certain (online) databases, and to businesses
Human interface to some services (info retrieval, weather,
train timetables etc)
The WWW is successful largely through the use of layers of
internationally accepted standards (TCP/IP,html) and now
the fact that it is
-
-
Ubiquitous
Organic + Distributed
Dynamic + Unbounded
Artform Research Group
WWW - a standard
- ‘first generation’ - hand written html pages
- ‘second generation’ - dynamic web - pages created
by programs to display the results of a process, or
the output of a query of an accessed database.
Web pages used as an interface to networked
processes (services) as well as for general
information display.
Artform Research Group
WWW +
Much R&D has been directed at writing programs/services
that utilise HTML web info
EG the University of California’s travel assistant - a web
service that uses other web services (weather, timetables,
hotel) to make travel plans in response to a high level
directive
“I need to be in X on days Y using budget Z”
BUT: this is very hard because of the web’s unstructured data
.. Eg ISI’s travel assistant has to use a learning program to
induce web page ‘wrappers’ before it can reliably extract
data.
Artform Research Group
WWW html example
<html>
<head><title> Lee McCluskey </title></head>
<body bgcolor="#ffffff">
<body>
<h1> McCluskey, Thomas Leo </h1>
<br> BSc (Maths), MSc (Maths), PhD (Computer Science), MBCS, C.Eng
<br> Professor of Software Technology
<br>
<br> School of Computing and Engineering,
<br> University of Huddersfield,
<br> Huddersfield,
<br> West Yorkshire,
<br> HD1 3DH,
<br> United Kingdom.
<p> <b>email:</b> t.l.mccluskey followed by @hud.ac.uk</a>
<br> <b>telephone (direct):</b> (+44) (0) 1484 472247
<br> <b>telephone (internal):</b> 2247
<br> <b>telephone (messages):</b> (+44) (0) 1484 472150
<br> <b>fax:</b> (+44) (0) 1484 421106
<br> <b>room number:</b> CW2/09
</p>
Artform Research Group
Metadata and XML



We can start to giving ‘meaning’ to info on the web
using META-DATA eg using tags around data to
describe its content.
In XML - eXtensible Mark-up Language - tags are
not fixed - one can invent new tags to structure the
information in a web page.
XML is considered to be the basis for all semantic
web languages - the “machine code” of the new
generation web
Artform Research Group
Rough Hierarchy of Languages in
the Semantic Web
OWL
DAML
RDFS
RDF
XML
Artform Research Group
.. Ontology language
.. gives logic
.. gives classes
.. gives tuples
.. gives content
XML Overview


XML is a subset of SGML (standard general markup language) which was written originally for
electronic documents and publications
XML has the advantages of HTML – it is platformindependent and a standardised language
see http://www.w3.org/TR/REC-xml/
But HTML has a FIXED set of tags, and holds no
MEANING about the data in its document.
Artform Research Group
Rough syntax of XML
= list of <name attributes> element </name>

XML structures information using TAGS in a
composite fashion eg
<someTag> …… </someTag>
<someTag Attribute = “Value”> …… </someTag>

Info between tags is called an “element”
Artform Research Group
XML



XML allows the content to be structured so that it
is easy for a machine to extract meaningful data
from an XML page. It is a meta-language – a
language used in the description of other
languages.
It can be used to structure data in a database, or
as a communication language
It can be formatted using a style sheet language
called XSL (like CSS for HTML)
Artform Research Group
Example
<?xml version="1.0"?>
<email date=“30/09/04”>
<to>fred</to>
<from>sue</from>
<subject>xml example</subject>
<message>This is the message</message>
</email>
 All tags have a start and end
 Tags must be correctly nested as a tree syntax
 Tags can have attributes
Artform Research Group
Example - better
<?xml version="1.0"?>
<email>
<to>fred</to>
<from>sue</from>
<date>
<day>30</day>
<month>9</month>
<year>2004</year>
</date>
<subject>xml example</subject>
<message>this is the message</message>
</email>
Artform Research Group
Elements ..
Logically every element has four key pieces:
 A name

The attributes of the element

The namespaces in scope on the element

The content of the element
The content can be text, comments, more tagged info or
Processing Information eg
<?xml-stylesheet type="text/xml" href="limited.xsl"?>
This is meta info about the document
Artform Research Group
DTD’s



XML is self describing – it uses a DTD
(Document Type Definition) to formally describe
the structure of its contents
An XML doc is well-formed if its syntax is ok
according to the XML standard. It is VALID if
additionally it conforms to its DTD
DTD’s are formed so that we can share our
document structures with other parties. Knowing
our DTD, they can write programs to process our
XML documents.
Artform Research Group
Example with DTD
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT email (to,from,subject,message)>
<!ELEMENT to (#PCDATA
<!ELEMENT from (#PCDATA)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT message (#PCDATA)> ]>
<email date=“30/09/04”>
<to>fred</to>
<from>sue</from>
<subject>xml example</subject>
<message>this is the message</message>
</email>
Artform Research Group
DTD are like grammars..
<!ELEMENT address_book (listing+) >
<!ELEMENT listing (name, address) >
<!ELEMENT name (last_name, first_name) >
<!ELEMENT last_name (#PCDATA) >
<!ELEMENT first_name (#PCDATA) >
<!ELEMENT address (street, city, (state|province), zip) >
<!ELEMENT street (#PCDATA) >
<!ELEMENT city (#PCDATA) >
<!ELEMENT state (#PCDATA) >
<!ELEMENT province (#PCDATA) >
<!ELEMENT zip (#PCDATA) >
Artform Research Group
DOMs
“.. The promise of the Internet is very much
tied to interoperability and the value
proposition of e-business depends on the
ability to truly collaborate with partners and
customers in a meaningful and efficient
way..”
http://www.4infinitesolutions.com/course%20XML%20DTDs_S
chema_DOM.htm
Artform Research Group
DOMs


Document Object Models (DOMs) give an
(abstract) program interface for constructing,
querying accessing, and manipulating XML
documents.
Concrete DOMs define methods and properties
(instantiated for each programming language)
which can be used to access/change XML
documents from programs
Artform Research Group
The Uniform Resource Identifier (URI)
!!! A “URI” is fundamental to the SW – it ‘defines a
unique resource’ – a string that uniquely defines
something.
Often (but not always) URI points to a webpage or an
XML document.
In XML, element type names (tags) and attribute
names may be qualified with a URI – so that the
name can be understood globally.
Artform Research Group
The Uniform Resource Identifier (URI)
Example: you need to refer to an ELEMENT annotated by
<email> in the document..
http://scom.hud.uk/scomtlm/namespaces/example
You would set up a “namespace” in your XML document say
tlm = http://scom.hud.uk/scomtlm/namespaces/example
Then in your document you would use
tlm:email
To denote that this <email> tag is the same as the one in
http://scom.hud.uk/scomtlm/namespaces/example
Artform Research Group
Namespaces - xmlns examples
<tlm:email
xmlns:email="http://scom.hud.ac.uk/scomtlm/namespaces/example">
… <email:message …. >
</email>
You can also define a default namespace:
<email xmlns="http://scom.hud.ac.uk/scomtlm/namespaces/example">
</email>
Artform Research Group
exercises
Read through some XML tutorials from relevant sites on the web eg

http://www.ddj.com/documents/s=2803/nam1012432263/

http://www.ddj.com/documents/s=2799/nam1012432259/

http://www.xmlfiles.com/xml/

http://www.dcs.napier.ac.uk/~andrew/xml/ (this has some nice
tutorial questions and answers!)
Try the following exercises:
1.
1. Write a small XML Bibliography, and then write a DTD for it.
2.
2. Write a small XML Address book, and then write a DTD for it.
3.
3. Cut and paste an XSL style-sheet from one of the example
websites and try to use it to present your XML files.
For the Week ahead:
Continue to read through the tutorials, and write down some notes on the
meaning and different roles of DTD, XSL, DOM and all the other
jargon you come across!
Artform Research Group