Semantic Web - IRAN LEARNER

Download Report

Transcript Semantic Web - IRAN LEARNER

XML & XML Schema
Semantic Web - Spring 2008
Computer Engineering Department
Sharif University of Technology
Outline
• Markup Languages
– SGML, HTML, XML
•
•
•
•
XML Building Blocks
XML Applications
Namespaces
XML Schema
Semantic web - Computer Engineering Dept. - Spring 2008
2
SGML(ISO 8879)
• Standard Generalized Markup Language
• The international standard for defining descriptions of
structure and content in text documents
• Interchangeable: device-independent, system-independent
• tags are not predefined
• Using DTD to validate the structure of the document
• Large, powerful, and very complex
• Heavily used in industrial and commercial usages for over
a decade
Semantic web - Computer Engineering Dept. - Spring 2008
3
HTML(RFC 1866)
• HyperText Markup Language
• A small SGML application used on web (a DTD
and a set of processing conventions)
• Only uses a predefined set of tags
Semantic web - Computer Engineering Dept. - Spring 2008
4
What is XML?
•
•
•
•
eXtensible Markup Language
A simplified version of SGML
Maintains the most useful parts of SGML
Designed so that SGML can be delivered over the
Web
• More flexible and adaptable than HTML
• XHTML: a reformulation of HTML 4 in XML 1.0
Semantic web - Computer Engineering Dept. - Spring 2008
5
HTML vs. XML
HTML is used to mark up
text so it can be displayed
to users.
HTML describes both
structure (e.g. <p>, <h2>,
<em>) and appearance
(e.g. <br>, <font>, <i>)
HTML uses a fixed,
unchangeable set of tags.
XML is used to mark up
data so it can be
processed by computers.
XML describes only
content, or “meaning”
In XML, you make up your
own tags.
Semantic web - Computer Engineering Dept. - Spring 2008
6
HTML vs. XML (2)
• HTML is for humans
– HTML describes web pages
– You don’t want to see error messages about the web pages you
visit
– Browsers ignore and/or correct as many HTML errors as they can,
so HTML is often sloppy
• XML is for computers
– XML describes data
– The rules are strict and errors are not allowed
• In this way, XML is like a programming language
– Current versions of most browsers can display XML
• However, browser support of XML is spotty at best
Semantic web - Computer Engineering Dept. - Spring 2008
7
XML-related technologies
• DTD (Document Type Definition) and XML
Schemas are used to define legal XML tags and
their attributes for particular purposes
• XSLT (eXtensible Stylesheet Language
Transformations) and XPath are used to translate
from one form of XML to another
• SAX (Simple API for XML)
Semantic web - Computer Engineering Dept. - Spring 2008
8
XML Building blocks - Elements
•
•
•
•
Delimited by angle brackets
Identify the nature of the content they surround
General format: <element> … </element>
Empty element: <empty-Element />
• XML Elements have Relationships
– Elements are related as parents and children
• Elements have Content
– Elements can have different content types:
• Element, mixed, Simple, empty
Semantic web - Computer Engineering Dept. - Spring 2008
9
XML Building blocks - Attributes
Name-value pairs that occur inside start-tags after element name, like:
<element attribute=“value” />
• Provide additional information about elements that often is not a part of
data.
•
•
•
Attributes and elements are somewhat interchangeable
Should I use an element or an attribute?
Example using just elements:
<name>
<first>David</first>
<last>Matuszek</last>
</name>
•
Example using attributes:
<name first="David" last="Matuszek"></name>
metadata (data about data) should be stored as attributes, and that
data itself should be stored as elements
Semantic web - Computer Engineering Dept. - Spring 2008
10
XML Building blocks - Entities
Five special characters must be written as entities:
&amp; for
&lt;
for
&gt;
for
&quot; for
&apos; for
&
<
>
"
'
(almost always necessary)
(almost always necessary)
(not usually necessary)
(necessary inside double quotes)
(necessary inside single quotes)
These entities can be used even in places where they
are not absolutely required.
These are the only predefined entities in XML.
Semantic web - Computer Engineering Dept. - Spring 2008
11
XML Building blocks - Declaration
The XML declaration looks like this:
<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>
– The XML declaration is not required by browsers, but is
required by most XML processors (so include it!)
– If present, the XML declaration must be first--not even
whitespace should precede it
– Note that the brackets are <? and ?>
– version="1.0" is required (this is the only version so far)
– encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode), or
something else, or it can be omitted
– standalone tells whether there is a separate DTD
Semantic web - Computer Engineering Dept. - Spring 2008
12
XML Building blocks - Processing
instructions
• PIs (Processing Instructions) may occur anywhere in the
XML document (but usually first)
• A PI is a command to the program processing the XML
document to handle it in a certain way
• XML documents are typically processed by more than one
program
• Programs that do not recognize a given PI should just
ignore it
• General format of a PI: <?target instructions?>
• Example: <?xml-stylesheet type="text/css"
href="mySheet.css"?>
Semantic web - Computer Engineering Dept. - Spring 2008
13
XML Building blocks - Comments
• <!-- This is a comment in both HTML and XML -->
• Comments can be put anywhere in an XML document
• Comments are useful for:
– Explaining the structure of an XML document
– Commenting out parts of the XML during development and testing
• The character sequence -- cannot occur in the comment
• Comments are not displayed by browsers, but can be
seen by anyone who looks at the source code
Semantic web - Computer Engineering Dept. - Spring 2008
14
CDATA
• By default, all text inside an XML document is parsed
• You can force text to be treated as unparsed character
data by enclosing it in <![CDATA[ ... ]]>
• Any characters, even & and <, can occur inside a CDATA
• Whitespace inside a CDATA is (usually) preserved
• The only real restriction is that the character sequence ]]>
cannot occur inside a CDATA
• CDATA is useful when your text has a lot of illegal
characters (for example, if your XML document contains
some HTML text)
Semantic web - Computer Engineering Dept. - Spring 2008
15
XML Syntax
•
•
•
•
•
•
•
•
All XML elements must have a closing tag
XML tags are case sensitive
All XML elements must be properly nested
All XML documents must have a root tag
Attribute values must always be quoted
With XML, white space is preserved
With XML, a new line is always stored as LF
Comments in XML: <!-- This is a comment -->
Semantic web - Computer Engineering Dept. - Spring 2008
16
Well-formed XML
• Every element must have both a start tag and an end tag, e.g.
<name> ... </name>
– But empty elements can be abbreviated: <break />.
– XML tags are case sensitive
– XML tags may not begin with the letters xml, in any combination of
cases
• Elements must be properly nested, e.g. not <b><i>bold and
italic</b></i>
• Every XML document must have one and only one root element
• The values of attributes must be enclosed in single or double quotes,
e.g. <time unit="days">
• Character data cannot contain < or &
Semantic web - Computer Engineering Dept. - Spring 2008
17
Displaying XML
• XML documents do not carry information about how to display the
data
• We can add display information to XML with
– CSS (Cascading Style Sheets)
– XSL (eXtensible Stylesheet Language) --- preferred
Semantic web - Computer Engineering Dept. - Spring 2008
18
XML Applications (1)
Separate data
XML can Separate Data from HTML
•
•
•
•
Store data in separate XML files
Using HTML for layout and display
Using Data Islands
Data Islands can be bound to HTML elements
Benefits:
Changes in the underlying data will not require any changes to your
HTML
Semantic web - Computer Engineering Dept. - Spring 2008
19
XML Applications (2)
Exchange data
XML is used to Exchange Data
• Text format
• Software-independent, hardware-independent
• Exchange data between incompatible systems, given that they agree on
the same tag definition.
• Can be read by many different types of applications
Benefits:
• Reduce the complexity of interpreting data
• Easier to expand and upgrade a system
Semantic web - Computer Engineering Dept. - Spring 2008
20
XML Application (3)
Store Data
XML can be used to Store Data
•
•
•
•
Plain text file
Store data in files or databases
Application can be written to store and retrieve information from the store
Other clients and applications can access your XML files as data sources
Benefits:
Accessible to more applications
Semantic web - Computer Engineering Dept. - Spring 2008
21
XML Applications (4)
Create new language
XML can be used to Create new Languages, e.g. :
• WML (Wireless Markup Language) used to markup Internet applications
for handheld devices like mobile phones (WAP)
• MusicXML used to publishing musical scores
Semantic web - Computer Engineering Dept. - Spring 2008
22
Names in XML
• Names (as used for tags and attributes) must begin with a
letter or underscore, and can consist of:
– Letters, both Roman (English) and foreign
– Digits, both Roman and foreign
. (dot)
- (hyphen)
_ (underscore)
: (colon) should be used only for namespaces
– Combining characters and extenders (not used in English)
Semantic web - Computer Engineering Dept. - Spring 2008
23
Namespaces
• Namespaces are a simple mechanism for creating
globally unique names for the elements and attributes of
your markup language.
• Benefits:
– De-conflicts the meaning of identical names in different markup
languages.
– Allows different markup languages to be mixed together without
ambiguity.
• Namespaces are implemented by requiring every XML
name to consist of two parts: a prefix and a local part:
<xsd:integer>
Semantic web - Computer Engineering Dept. - Spring 2008
24
Namespaces and URIs
• A namespace is defined as a unique string
– To guarantee uniqueness, typically a URI (Uniform
Resource Indicator) is used, because the author
“owns” the domain
– It doesn't have to be a “real” URI; it just has to be
a unique string
– Example: http://ce.sharif.edu/sw
• There are two ways to use namespaces:
– Declare a default namespace
– Associate a prefix with a namespace, then use the
prefix in the XML to refer to the namespace
Semantic web - Computer Engineering Dept. - Spring 2008
25
Namespace syntax
• In any start tag you can use the reserved attribute name xmlns:
<book xmlns="http://ce.sharif.edu/sw">
– This namespace will be used as the default for all elements up to
the corresponding end tag
– You can override it with a specific prefix
• You can use almost this same form to declare a prefix:
<book xmlns:dave="http://ce.sharif.edu/sw">
– Use this prefix on every tag and attribute you want to use from
this namespace, including end tags--it is not a default prefix
<dave:chapter dave:number="1">To Begin</dave:chapter>
• You can use the prefix in the start tag in which it is defined:
<dave:book xmlns:dave=“http://ce.sharif.edu/sw">
Semantic web - Computer Engineering Dept. - Spring 2008
26
Review of XML rules
• Start with <?xml version="1"?>
• XML is case sensitive
• You must have exactly one root element that
encloses all the rest of the XML
• Every element must have a closing tag
• Elements must be properly nested
• Attribute values must be enclosed in double or
single quotation marks
• There are only five pre-declared entities
Semantic web - Computer Engineering Dept. - Spring 2008
27
XML as a tree
• An XML document represents a hierarchy; a hierarchy is
a tree
novel
foreword
chapter
number="1"
paragraph
paragraph
paragraph
This is the great
American novel.
It was a dark
and stormy night.
Suddenly, a shot
rang out!
Semantic web - Computer Engineering Dept. - Spring 2008
28
Extended document standards
• You can define your own XML tag sets, but here are some
already available:
–
–
–
–
–
–
–
–
–
XHTML: HTML redefined in XML
SMIL: Synchronized Multimedia Integration Language
MathML: Mathematical Markup Language
SVG: Scalable Vector Graphics
DrawML: Drawing MetaLanguage
ICE: Information and Content Exchange
ebXML: Electronic Business with XML
cxml: Commerce XML
CBL: Common Business Library
Semantic web - Computer Engineering Dept. - Spring 2008
29
XML Schema
XML Validation
• "Well Formed" XML document
– correct XML syntax
• "Valid" XML document
– “well formed”
– Conforms to the rules of a DTD
• XML DTD
– defines the legal building blocks of an XML document
– Can be inline in XML or as an external reference
• XML Schema
– an XML based alternative to DTD, more powerful
– Support namespace and data types
Semantic web - Computer Engineering Dept. - Spring 2008
31
An Example XML with DTD
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>
Semantic web - Computer Engineering Dept. - Spring 2008
32
XML Schemas
• “Schema” is a general term
– DTDs are a form of XML schemas
• When we say “XML Schemas,” we usually mean
the W3C XML Schema Language
– This is also known as “XML Schema Definition”
language, or XSD.
Semantic web - Computer Engineering Dept. - Spring 2008
33
XSD vs. DTD
• DTDs provide a very weak specification language
– You can’t put any restrictions on text content
– You have very little control over mixed content (text plus elements)
– You have little control over ordering of elements
• DTDs are written in a strange (non-XML) format
– You need separate parsers for DTDs and XML
• The XML Schema Definition language solves these
problems
– XSD gives you much more control over structure and content
– XSD is written in XML
Semantic web - Computer Engineering Dept. - Spring 2008
34
Referring to a schema
• To refer to a DTD in an XML document, the reference goes before
the root element:
– <?xml version="1.0"?>
<!DOCTYPE rootElement SYSTEM "url">
<rootElement> ... </rootElement>
• To refer to an XML Schema in an XML document, the reference
goes in the root element:
– <?xml version="1.0"?>
<rootElement
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
(The XML Schema Instance reference is required)
xsi:noNamespaceSchemaLocation="url.xsd">
(This is where your XML Schema definition can be found)
...
</rootElement>
Semantic web - Computer Engineering Dept. - Spring 2008
35
The XSD document
• Since the XSD is written in XML, it can get confusing
which we are talking about.
• The file extension is .xsd
• The root element is <schema>
• The XSD starts like this:
• <?xml version="1.0"?>
<xs:schema
xmlns:xs="http://www.w3.rg/2001/XMLSchema">
Semantic web - Computer Engineering Dept. - Spring 2008
36
<schema>
• The <schema> element may have attributes:
– xmlns:xs="http://www.w3.org/2001/XMLSchema"
• This is necessary to specify where all our XSD tags are
defined
– elementFormDefault="qualified"
• This means that all XML elements must be qualified (use a
namespace)
• It is highly desirable to qualify all elements, or problems will
arise when another schema is added
Semantic web - Computer Engineering Dept. - Spring 2008
37
“Simple” and “complex”
elements
• A “simple” element is one that contains text and nothing
else
–
–
–
–
A simple element cannot have attributes
A simple element cannot contain other elements
A simple element cannot be empty
However, the text can be of many different types, and may have
various restrictions applied to it
• If an element isn’t simple, it’s “complex”
– A complex element may have attributes
– A complex element may be empty, or it may contain text, other
elements, or both text and other elements
Semantic web - Computer Engineering Dept. - Spring 2008
38
Defining a simple element
• A simple element is defined as
<xs:element name="name" type="type" />
where:
– name is the name of the element
– the most common values for type are
xs:boolean
xs:integer
xs:date
xs:string
xs:decimal
xs:time
• Other attributes a simple element may have:
– default="default value"
– fixed="value"
if no other value is specified
no other value may be specified
Semantic web - Computer Engineering Dept. - Spring 2008
39
Defining an attribute
• Attributes themselves are always declared as simple types
• An attribute is defined as
<xs:attribute name="name" type="type" />
where:
– name and type are the same as for xs:element
• Other attributes a simple element may have:
–
–
–
–
default="default value" if no other value is specified
fixed="value"
no other value may be specified
use="optional"
the attribute is not required (default)
use="required"
the attribute must be present
Semantic web - Computer Engineering Dept. - Spring 2008
40
Restrictions, or “facets”
• The general form for putting a restriction on a
text value is:
– <xs:element name="name">
<xs:restriction base="type">
... the restrictions ...
</xs:restriction>
</xs:element>
(or xs:attribute)
• For example:
– <xs:element name="age">
<xs:restriction base="xs:integer">
<xs:minInclusive value="0">
<xs:maxInclusive value="140">
</xs:restriction>
</xs:element>
Semantic web - Computer Engineering Dept. - Spring 2008
41
Restrictions on numbers
• minInclusive -- number must be ≥ the given value
• minExclusive -- number must be > the given value
• maxInclusive -- number must be ≤ the given value
• maxExclusive -- number must be < the given value
• totalDigits -- number must have exactly value digits
• fractionDigits -- number must have no more than value
digits after the decimal point
Semantic web - Computer Engineering Dept. - Spring 2008
42
Restrictions on strings
•
•
•
•
•
length -- the string must contain exactly value characters
minLength -- the string must contain at least value characters
maxLength -- the string must contain no more than value characters
pattern -- the value is a regular expression that the string must match
whiteSpace -- not really a “restriction”--tells what to do with
whitespace
– value="preserve" Keep all whitespace
– value="replace"
Change all whitespace characters to spaces
– value="collapse"
Remove leading and trailing whitespace, and
replace all sequences of whitespace with a single space
Semantic web - Computer Engineering Dept. - Spring 2008
43
Enumeration
• An enumeration restricts the value to be one of a
fixed set of values
• Example:
– <xs:element name="season">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Spring"/>
<xs:enumeration value="Summer"/>
<xs:enumeration value="Autumn"/>
<xs:enumeration value="Fall"/>
<xs:enumeration value="Winter"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
Semantic web - Computer Engineering Dept. - Spring 2008
44
Complex elements
• A complex element is defined as
<xs:element name="name">
<xs:complexType>
... information about the complex type...
</xs:complexType>
</xs:element>
• Example:
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstName" type="xs:string" />
<xs:element name="lastName" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
• <xs:sequence> says that elements must occur in this order
• Remember that attributes are always simple types
Semantic web - Computer Engineering Dept. - Spring 2008
45
Declaration and use
• So far we’ve been talking about how to
declare types, not how to use them
• To use a type we have declared, use it as
the value of type="..."
– Examples:
• <xs:element name="student" type="person"/>
• <xs:element name="professor" type="person"/>
– Scope is important: you cannot use a type if is
local to some other type
Semantic web - Computer Engineering Dept. - Spring 2008
46
xs:sequence
• We’ve already seen an example of a
complex type whose elements must occur in
a specific order:
• <xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstName" type="xs:string" />
<xs:element name="lastName" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
Semantic web - Computer Engineering Dept. - Spring 2008
47
xs:all
• xs:all allows elements to appear in any order
• <xs:element name="person">
<xs:complexType>
<xs:all>
<xs:element name="firstName" type="xs:string" />
<xs:element name="lastName" type="xs:string" />
</xs:all>
</xs:complexType>
</xs:element>
• Despite the name, the members of an xs:all group can
occur once or not at all
• You can use minOccurs="0" to specify that an element is
optional (default value is 1)
– In this context, maxOccurs is always 1
Semantic web - Computer Engineering Dept. - Spring 2008
48
Empty elements
• Empty elements are (ridiculously) complex
• <xs:complexType name="counter">
<xs:complexContent>
<xs:extension base="xs:anyType"/>
<xs:attribute name="count" type="xs:integer"/>
</xs:complexContent>
</xs:complexType>
Semantic web - Computer Engineering Dept. - Spring 2008
49
Mixed elements
• Mixed elements may contain both text and elements
• We add mixed="true" to the xs:complexType element
• The text itself is not mentioned in the element, and
may go anywhere (it is basically ignored)
• <xs:complexType name="paragraph" mixed="true">
<xs:sequence>
<xs:element name="someName” type="xs:anyType"/>
</xs:sequence>
</xs:complexType>
Semantic web - Computer Engineering Dept. - Spring 2008
50
Extensions
• You can base a complex type on another complex
type
• <xs:complexType name="newType">
<xs:complexContent>
<xs:extension base="otherType">
...new stuff...
</xs:extension>
</xs:complexContent>
</xs:complexType>
Semantic web - Computer Engineering Dept. - Spring 2008
51
Predefined string types
• Recall that a simple element is defined as:
<xs:element name="name" type="type" />
• Here are a few of the possible string types:
– xs:string -- a string
– xs:normalizedString -- a string that doesn’t contain tabs,
newlines, or carriage returns
– xs:token -- a string that doesn’t contain any whitespace
other than single spaces
• Allowable restrictions on strings:
– enumeration, length, maxLength, minLength, pattern,
whiteSpace
Semantic web - Computer Engineering Dept. - Spring 2008
52
Predefined date and time types
• xs:date -- A date in the format CCYY-MM-DD, for
example, 2002-11-05
• xs:time -- A date in the format hh:mm:ss (hours,
minutes, seconds)
• xs:dateTime -- Format is CCYY-MMDDThh:mm:ss
– The T is part of the syntax
• Allowable restrictions on dates and times:
– enumeration, minInclusive, minExclusive,
maxInclusive, maxExclusive, pattern, whiteSpace
Semantic web - Computer Engineering Dept. - Spring 2008
53
Predefined numeric types
• Here are some of the predefined numeric types:
xs:decimal
xs:positiveInteger
xs:byte
xs:negativeInteger
xs:short
xs:nonPositiveInteger
xs:int
xs:nonNegativeInteger
xs:long
• Allowable restrictions on numeric types:
– enumeration, minInclusive, minExclusive, maxInclusive,
maxExclusive, fractionDigits, totalDigits, pattern, whiteSpace
Semantic web - Computer Engineering Dept. - Spring 2008
54
Questions?
References
• http://www.w3.org/XML/
• http://www.w3.org/XML/Schema
Semantic web - Computer Engineering Dept. - Spring 2008
56