PPT - Faculty Personal Homepage
Download
Report
Transcript PPT - Faculty Personal Homepage
5. Data Description and Transformation
1.
XML
2.
XPath
3.
XSL /XSLT
4.
DTD
5.
XSD
6.
DOM
SWE 444 - Internet & Web Application Development
5.1
5.1 XML
What is XML?
Why XML?
Brief History and Versions
Sample XML Documents
XML Namespaces
SWE 444 - Internet & Web Application Development
5.2
What is XML?
XML stands for EXtensible Markup Language
A meta-language for descriptive markup: you invent your own tags
XML uses a Document Type Definition (DTD) or an XML Schema to
describe the data
XML with a DTD or XML Schema is designed to be self-descriptive
Built-in internationalization via Unicode
Built-in error-handling
A forgotten tag, or an attribute without quotes renders an XML
document unusable
Tons of support from the big IT companies
SWE 444 - Internet & Web Application Development
5.3
Why XML?
Much of shareable data reside in computer systems and databases
in incompatible formats
use conflicting hardware and/or software.
One of the most time-consuming challenges for developers has
been to exchange data between such systems over the Internet
Converting the data to XML can greatly reduce the complexity and
create data that can be read by many different applications
XML data is stored in plain text format – hardware and software
independent
XML can be used to create new languages
Allows us to define our own markup languages
SWE 444 - Internet & Web Application Development
5.4
Brief XML History
SGML (Standard Generalized Markup Language)
ISO Standard, 1986, for data storage & exchange
Metalanguage for defining languages (through DTDs)
A famous SGML language: HTML
Separation of content and display
Used in U.S. gvt. & contractors, large manufacturing
companies, technical info. Publishers,...
SGML reference is 600 pages long
XML
W3C recommendation in 1998
Simple subset (80/20 rule) of SGML: “ASCII of the Web”,
“Semantic Web”
XML specification is 26 pages long
SWE 444 - Internet & Web Application Development
5.5
… Brief XML History
1986
1989
XHTML becomes W3C Recommendation
A Reformulation of HTML 4 in XML 1.0
Feb 2004
XML 1.0 W3C Recommendation
Jan 2000
W3C established
1998
Tim Berners-Lee creates the WWW
1994
SGML becomes a standard
W3c XML 1.0 (Third Edition) Recommendation
http://www.w3.org/TR/2004/REC-xml-20040204/
Feb 2004
XML 1.1 Recommendation
http://www.w3.org/TR/2004/REC-xml11-20040204/
updates XML to use Unicode 3
SWE 444 - Internet & Web Application Development
5.6
XML and HTML
XML is not a replacement for HTML
In future Web development, XML is likely to be used to describe
data while HTML will be used to format and display the same
data (one interpretation of XML)
XML and HTML were designed with different goals
XML was designed to describe data and to focus on what data is
HTML was designed to display data and to focus on how data
looks.
XML describes only content, or “meaning”
HTML describes both structure (e.g. <p>, <h2>, <em>) and
appearance (e.g. <br>, <font>, <i>)
XML is for computers while HTML is for humans
XML is used to mark up data so it can be processed by
computers
HTML is used to mark up text so it can be displayed to users
SWE 444 - Internet & Web Application Development
5.7
XML does not DO anything
XML was not designed to DO anything
A piece of software must be written to do something (send, receive or
display the document)
The following example is a book info, stored as XML:
<?xml version='1.0'?>
<bookstore>
<book genre='autobiography' publicationdate='1981'
ISBN='1-861003-11-0'>
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
…
</bookstore>
SWE 444 - Internet & Web Application Development
5.8
XML is Free and Extensible
XML tags are not predefined
You must "invent" your own tags
The tags used to mark up HTML documents and the
structure of HTML documents are predefined
The author of HTML documents can only use tags
that are defined in the HTML standard
XML allows the author to define his own tags
and his own document structure
SWE 444 - Internet & Web Application Development
5.9
XML Future
XML is going to be everywhere
A large number of software vendors adopted the XML standard very quickly
XML is a cross-platform, software and hardware independent tool for
transmitting information.
XML
XML
Application X
Documents
XML
Repository
SWE 444 - Internet & Web Application Development
XML
Configuration
Database
5.10
Benefits of XML
Open W3C standard – non-proprietary
Representation of data across heterogeneous environments
Cross platform
Allows for high degree of interoperability
Strict rules that make it relatively easy to write XML parsers
E.g., ability to exchange data between incompatible applications with
incompatible data formats
Syntax
Structure
Case sensitive
XML can make data more useful
s/w, h/w and application independence of XML makes data available
to more users not only HTML browsers
SWE 444 - Internet & Web Application Development
5.11
Components of an XML Document
XML declaration
Processing instructions
Encoding specification (Unicode by default)
Namespace declaration
Schema declaration
Elements
Each element has a beginning and ending tag
<TAG_NAME>...</TAG_NAME>
Elements can be empty (<TAG_NAME />)
Attributes
Describes an element; e.g. data type, data range, etc.
Can only appear on beginning tag
SWE 444 - Internet & Web Application Development
5.12
Components of an XML Document
<?xml version="1.0" ?>
<?xml-stylesheet type="text/xsl" href="template.xsl"?>
<ROOT>
<ELEMENT1><SUBELEMENT1 /><SUBELEMENT2 /></ELEMENT1>
<ELEMENT2> </ELEMENT2>
<ELEMENT3 type='string'> </ELEMENT3>
<ELEMENT4 type='integer' value='9.3'> </ELEMENT4>
</ROOT>
Elements with Attributes
Elements
Processing Instructions
SWE 444 - Internet & Web Application Development
5.13
XML Declaration
The XML declaration looks like this:
<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>
The XML declaration is not required by browsers, but is
required by most XML processors (so include it!)
If present, the XML declaration must be first--not even
whitespace should precede it
Note that the brackets are <? and ?>
The version attribute is required
encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode),
or something else, or it can be omitted
An XML document is standalone if it makes use of no
external markup (DTD) declarations
Default value for this attribute is no
SWE 444 - Internet & Web Application Development
5.14
Processing Instructions
A PI is a command to the program processing the XML document to
handle it in a certain way
PIs (Processing Instructions) may occur anywhere in the XML document
(but usually first)
XML documents are typically processed by more than one program
Programs that do not recognize a given PI should just ignore it
General format of a PI:
<?target instructions?>
Example:
<?xml-stylesheet type="text/css" href="mySheet.css"?>
SWE 444 - Internet & Web Application Development
5.15
XML Elements
An XML element is everything from the element's start
tag to the element's end tag
XML Elements are extensible and they have
relationships
Related as parents and children
XML Elements have simple naming rules
Names can contain letters, numbers, and other characters
Names must not start with a number or punctuation character
Names must not start with the letters xml (or XML or Xml ..)
Names cannot contain spaces
SWE 444 - Internet & Web Application Development
5.16
XML Attributes
XML elements can have attributes
Data can be stored in child elements or in attributes
Should you avoid using attributes?
Here are some of the problems using attributes:
attributes cannot contain multiple values (child elements can)
attributes are not easily expandable (for future changes)
attributes cannot describe structures (child elements can)
attributes are more difficult to manipulate by program code
attribute values are not easy to test against a Document Type Definition
(DTD) - which is used to define the legal elements of an XML document
Experience shows that attributes are handy in HTML but child
elements should be used in their place in XML
Use attributes only to provide information that is not relevant to the
data
SWE 444 - Internet & Web Application Development
5.17
An XML Document
<?xml version='1.0'?>
<bookstore>
<book genre='autobiography' publicationdate='1981'
ISBN='1-861003-11-0'>
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre='novel' publicationdate='1967' ISBN='0-201-63361-2'>
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
SWE 444 - Internet & Web Application Development
5.18
Another XML Document
<?xml version="1.0"?>
<weatherReport>
<date>7/14/97</date>
<city>North Place</city>, <state>NX</state>
<country>USA</country>
High Temp: <high scale="F">103</high>
Low Temp: <low scale="F">70</low>
Morning: <morning>Partly cloudy, Hazy</morning>
Afternoon: <afternoon>Sunny & hot</afternoon>
Evening: <evening>Clear and Cooler</evening>
</weatherReport>
SWE 444 - Internet & Web Application Development
5.19
XML Validation
There is a difference between a well-formed XML
document and a valid XML document
A well-formed XML document is one with correct XML
syntax
See next slide for well-formedness rules
XML syntax is constrained by a grammar (DTD or
Schema) that governs the permitted tag names,
attachment of attributes to tags, and so on.
A well-formed XML document that also conforms to a
given DTD or schema is said to be valid.
Every valid XML document is well-formed but the reverse is not
necessarily the case
SWE 444 - Internet & Web Application Development
5.20
Rules For Well-Formed XML
There must be one, and only one, root element
All XML elements must have a closing tag
Sub-elements must be properly nested
Attributes are optional
Defined by an optional schema
Attribute values must be enclosed in “” or ‘’
Processing instructions are optional
XML is case-sensitive
SWE 444 - Internet & Web Application Development
5.21
XML DTD
A DTD defines the legal elements of an XML document
XML Schema
The W3C XML specification states that a program should not continue
to process an XML document if it finds a validation error
Processing an XML document requires a software program called
an XML Parser (or XML Processor)
XML Schema is an XML based alternative to DTD
Errors in XML documents will stop the XML program
defines the document structure with a list of legal elements
http://www.xml.com/xml/pub/Guide/xml_parsers
There are two flavors of parsers:
Non-validating: checks for a document’s well-formedness (e.g.,
Browsers)
Validating: checks for a document’s validity
SWE 444 - Internet & Web Application Development
5.22
Browsers Support for XML
Netscape 6 supports XML
Internet Explorer 5.0 supports the XML 1.0 standard
Internet Explorer 5.0 has the following XML support:
Viewing of XML documents
Displaying XML with CSS
Transforming and displaying XML with XSL
XML embedded in HTML as Data Islands
Binding XML data to HTML elements
Access to the XML DOM
Full support for W3C DTD standards
SWE 444 - Internet & Web Application Development
5.23
Viewing XML Documents
Raw XML files can be viewed in IE 5.0 (and higher) and
in Netscape 6
XML documents do not carry information about how to
display the data
To make them display like a web page, you have to add some
display information
Different solutions to the display problem, using CSS,
XSL, XML Data Islands, and JavaScript
Will you be writing your future Homepages in XML?
Most Microsoft pages are XML based and the server converts
them to HTML on-the-fly when requested
SWE 444 - Internet & Web Application Development
5.24
Displaying XML with CSS
With CSS (Cascading Style Sheets) you can
add display information to an XML document
Formatting XML with CSS is NOT the future of
the Web
Formatting with XSL will be the new standard
SWE 444 - Internet & Web Application Development
5.25
Example: the xml file
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/css" href="cd_catalog.css"?>
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<CD>
<TITLE>Hide your heart</TITLE>
<ARTIST>Bonnie Tyler</ARTIST>
<COUNTRY>UK</COUNTRY>
<COMPANY>CBS Records</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1988</YEAR> </CD>
....
</CATALOG>
SWE 444 - Internet & Web Application Development
5.26
Example: the css file
CATALOG
{ background-color: white; width: 100%; }
CD
{ display: block; margin-bottom: 30pt; margin-left: 0; }
TITLE
{ color: red; font-size: 20pt; }
ARTIST
{ color: blue; font-size: 20pt; }
COUNTRY,PRICE,YEAR,COMPANY
{ display: block; color: black; margin-left: 20pt; }
SWE 444 - Internet & Web Application Development
5.27
Displaying XML with XSL
With XSL you can add display information to
your XML document
XSL is the preferred style sheet language of
XML
XSL (the eXtensible Stylesheet Language) is far
more sophisticated than CSS
One way to use XSL is to transform XML into HTML
before it is displayed by the browser
SWE 444 - Internet & Web Application Development
5.28
Example: the xml file
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="simple.xsl" ?>
<breakfast_menu>
<food>
<name>Belgian Waffles</name>
<price>$5.95</price>
<description>two of our famous Belgian Waffles with plenty of real maple syrup</description>
<calories>650</calories>
</food>
<food>
<name>Strawberry Belgian Waffles</name>
<price>$7.95</price>
<description>light Belgian waffles covered with strawberries and whipped cream</description>
<calories>900</calories>
</food>
…
</breakfast_menu>
SWE 444 - Internet & Web Application Development
5.29
Example: the xsl file
<?xml version="1.0" encoding="ISO-8859-1"?>
<html xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/TR/xhtml1/strict">
<body style="font-family:Arial,helvetica,sans-serif;font-size:12pt; background-color:#EEEEEE">
<xsl:for-each select="breakfast_menu/food">
<div style="background-color:teal;color:white;padding:4px">
<span style="font-weight:bold;color:white">
<xsl:value-of select="name"/></span>
- <xsl:value-of select="price"/>
</div>
<div style="margin-left:20px;margin-bottom:1em;font-size:10pt">
<xsl:value-of select="description"/>
<span style="font-style:italic">
(<xsl:value-of select="calories"/> calories per serving)
</span>
</div>
</xsl:for-each>
</body>
</html>
SWE 444 - Internet & Web Application Development
5.30
View the result in IE 6
SWE 444 - Internet & Web Application Development
5.31
XML Embedded in HTML
XML can be embedded within HTML pages in Data Islands
Manipulated via client side script or data binding
The unofficial <xml> tag is used to embed XML data within HTML
The id attribute of the <xml> tag defines an ID for the data island, and the
src attribute points to the XML file to embed:
<html>
<body>
<xml id="note" src="note.xml"></xml>
</body>
</html>
The next step is to format and display the data in the data island by binding
it to HTML elements.
SWE 444 - Internet & Web Application Development
5.32
Bind Data Island to HTML Elements
Data Islands can be bound to HTML elements (like HTML tables)
<html>
<body>
<xml id="cdcat" src="cd_catalog.xml"></xml>
<table border="1" datasrc="#cdcat">
<tr>
<td> <span datafld="ARTIST"> </span> </td>
<td> <span datafld="TITLE"> </span> </td>
</tr>
</table>
</body>
</html>
An XML data island with ID “cdcat” is loaded from an external file XML file
An HTML table is bound to the data Island with a datasrc attribute
The td elements are bound to the XML data with a datafld attribute inside a span.
SWE 444 - Internet & Web Application Development
5.33
The Microsoft XML Parser
To read and update an XML document, you need an XML parser
The Microsoft XML parser comes with Microsoft Internet Explorer
5.0
Once you have installed IE 5.0, the parser is available to scripts,
both inside HTML documents.
The parser features a language-neutral programming model that
supports:
JavaScript, VBScript, Perl, VB, Java, C++ and more
W3C XML 1.0 and XML DOM
DTD and validation
You can create an XML document object with the following code:
var xmlDoc=new ActiveXObject("Microsoft.XMLDOM")
SWE 444 - Internet & Web Application Development
5.34
Loading an XML file into the parser
XML files can be loaded into the parser using script code.
The following code loads an XML document (note.xml)
into the XML parser:
<script type="text/javascript">
var xmlDoc = new ActiveXObject("Microsoft.XMLDOM")
xmlDoc.async="false"
xmlDoc.load("note.xml")
// ....... processing the document goes here
</script>
The second line in the code above creates an instance of the
Microsoft XML parser
The third line turns off asynchronized loading, to make sure that the
parser will not continue execution before the document is fully
loaded
The fourth line tells the parser to load the XML document called
note.xml
We will revisit these issues later
SWE 444 - Internet & Web Application Development
5.35
Namespaces
XML allows you to define a new document format by combining and
reusing other formats
This can lead to name conflicts since the document formats being combined
may have the same element names that are used for different purposes
Namespaces allow authors to differentiate between tags of the same name
(using a prefix)
That is, name conflicts are solved using a prefix
Frees author to focus on the data and decide how to best describe it
The W3C namespace specification states that a namespace should be
identified by a URI (Uniform Resource Identifier)
A URI is a string of characters which identifies an Internet resource
A URL is the most common URI used to identify resources and their location on
the Internet
Another less common type of URI is URN (Universal Resource Name)
When a URL is used in a namespace declaration, the URL does NOT have to
represent a live server
The only purpose is to give the namespace a unique name. However, very often
companies use the namespace as a pointer to a real Web page containing information
about the namespace
SWE 444 - Internet & Web Application Development
5.36
Namespaces: Declaration
Namespace declaration examples:
xmlns: bk = "http://www.example.com/bookinfo/"
xmlns: bk = "urn:mybookstuff.org:bookinfo"
xmlns: bk = "http://www.example.com/bookinfo/"
Namespace declaration
SWE 444 - Internet & Web Application Development
Prefix
URI (URL)
5.37
Namespaces: Examples
<BOOK xmlns:bk="http://www.bookstuff.org/bookinfo">
<bk:TITLE>All About XML</bk:TITLE>
<bk:AUTHOR>Joe Developer</bk:AUTHOR>
<bk:PRICE currency='US Dollar'>19.99</bk:PRICE>
</BOOK>
<bk:BOOK xmlns:bk="http://www.bookstuff.org/bookinfo"
xmlns:money="urn:finance:money">
<bk:TITLE>All About XML</bk:TITLE>
<bk:AUTHOR>Joe Developer</bk:AUTHOR>
<bk:PRICE money:currency='US Dollar'>
19.99</bk:PRICE>
</bk:BOOK>
SWE 444 - Internet & Web Application Development
5.38
Namespaces: Default Namespace
An XML namespace declared without a prefix
becomes the default namespace for all
sub-elements
All elements without a prefix will belong to the
default namespace:
<BOOK xmlns="http://www.bookstuff.org/bookinfo">
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
SWE 444 - Internet & Web Application Development
5.39
Namespaces: Scope
Unqualified elements belong to the inner-most
default namespace.
BOOK, TITLE,
and AUTHOR belong to the default BOOK
namespace
PUBLISHER and NAME belong to the default PUBLISHER
namespace
<BOOK xmlns="www.bookstuff.org/bookinfo">
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
<PUBLISHER xmlns="urn:publishers:publinfo">
<NAME>Microsoft Press</NAME>
</PUBLISHER>
</BOOK>
SWE 444 - Internet & Web Application Development
5.40
Namespaces: Attributes
Unqualified attributes do NOT belong to any
namespace
Even if there is a default namespace
They don’t need to since scope of attributes is only
within the element for which they are attributes
This differs from elements, which belong to the
default namespace
SWE 444 - Internet & Web Application Development
5.41
Entities
Entities provide a mechanism for textual substitution for special
characters, e.g.
Entity
Substitution
<
<
&
&
XML parsers normally parse all the text in an XML document
When an XML element is parsed, the text between the XML tags is
also parsed
If you place special characters like “<“ inside an XML element, it will
generate an error because the parser interprets it as the start of a
new element
Entity references are used to avoid such errors
SWE 444 - Internet & Web Application Development
5.42
CDATA
By default, all text inside an XML document is parsed
You can force text to be treated as unparsed character data by enclosing it in <![CDATA[
Any characters, even & and <, can occur inside a CDATA
Whitespace inside a CDATA is (usually) preserved
The only real restriction is that the character sequence ]]> cannot occur inside a CDATA
CDATA is useful when your text has a lot of illegal characters (for example, if your XML document
contains some HTML text)
Example:
...
]]>
<?xml version=‘1.0’?>
<myTag>
<![CDATA[
function matchwo(a,b){
if(a<b) && a<0) then
return 1;
else
return 0;
}
]]>
</myTag>
SWE 444 - Internet & Web Application Development
5.43
References
W3 Schools XML Tutorial
W3C XML page
http://www.programmingtutorials.com/xml.aspx
Online resource for markup language technologies
http://www.w3.org/XML/
XML Tutorials
http://www.w3schools.com/xml/default.asp
http://xml.coverpages.org/
Several Online Presentations
SWE 444 - Internet & Web Application Development
5.44
5.2 XPath
What is XPath?
Sample Syntactic Elements
Path
Slashes
Brackets
Stars
Arithmetic Expressions
Some XPath Functions
SWE 444 - Internet & Web Application Development
5.45
What is XPath?
XPath is a syntax used for selecting parts of an XML
document
The way XPath describes paths to elements is similar to
the way an operating system describes paths to files
XPath is almost a small programming language; it has
functions, tests, and expressions
XPath is a W3C standard
http://www.w3.org/TR/xpath
XPath is not itself written as XML, but is used heavily in
XSLT
SWE 444 - Internet & Web Application Development
5.46
SWE 444 - Internet & Web Application Development
5.47
Terminology
<library>
<book>
<chapter>
</chapter>
<chapter>
<section>
<paragraph/>
<paragraph/>
</section>
</chapter>
</book>
</library>
SWE 444 - Internet & Web Application Development
library is the parent of book; book is the
parent of the two chapters
The two chapters are the children of
book, and the section is the child of the
second chapter
The two chapters of the book are
siblings (they have the same parent)
library, book, and the second chapter
are the ancestors of the section
The two chapters, the section, and the
two paragraphs are the descendents of
the book
5.48
Paths
Operating system:
XPath:
/ = the root directory
/library = the root element (if named
library )
/users/dave/foo = the
file named foo in dave in
users
/library/book/chapter/section = every
section element in a chapter in every
book in the library
foo = the file named foo in the
section = every section element that
current directory
is a child of the current element
. = the current directory
. = the current element
.. = the parent directory
.. = parent of the current element
/users/dave/* = all the files in
/users/dave
/library/book/chapter/* = all the
elements in /library/book/chapter
SWE 444 - Internet & Web Application Development
5.49
Slashes
A path that begins with a / represents an absolute path, starting
from the top of the document
Example: /email/message/header/from
Note that even an absolute path can select more than one element
A slash by itself means “the whole document”
A path that does not begin with a / represents a path starting from
the current element
Example: header/from
A path that begins with // can start from anywhere in the
document
Example: //header/from selects every element from that is a child
of an element header
This can be expensive, since it involves searching the entire
document
SWE 444 - Internet & Web Application Development
5.50
Brackets and last()
A number in brackets selects a particular matching child
The function last() in brackets selects the last matching
child
Example: /library/book[1] selects the first book of the library
Example: //chapter/section[2] selects the second section of
every chapter in the XML document
Example: //book/chapter[1]/section[2]
Only matching elements are counted; for example, if a book
has both sections and exercises, the latter are ignored when
counting sections
Example: /library/book/chapter[last()]
You can even do simple arithmetic
Example: /library/book/chapter[last()-1]
SWE 444 - Internet & Web Application Development
5.51
Stars
A star, or asterisk, is a “wild card”--it means “all
the elements at this level”
Example: /library/book/chapter/* selects every
child of every chapter of every book in the library
Example: //book/* selects every child of every book
(chapters, tableOfContents, index, etc.)
Example: /*/*/*/paragraph selects every paragraph
that has exactly three ancestors
Example: //* selects every element in the entire
document
SWE 444 - Internet & Web Application Development
5.52
Attributes I
You can select attributes by themselves, or elements
that have certain attributes
Remember: an attribute consists of a name-value pair, for
example in <chapter num="5">, the attribute is named num
To choose the attribute itself, prefix the name with @
Example: @num will choose every attribute named num
Example: //@* will choose every attribute, everywhere in the
document
To choose elements that have a given attribute, put the
attribute name in square brackets
Example: //chapter[@num] will select every chapter element
(anywhere in the document) that has an attribute named num
SWE 444 - Internet & Web Application Development
5.53
Attributes II
//chapter[@num] selects every chapter
element with an attribute num
//chapter[not(@num)] selects every chapter
element that does not have a num attribute
//chapter[@*] selects every chapter element
that has any attribute
//chapter[not(@*)] selects every chapter
element with no attributes
SWE 444 - Internet & Web Application Development
5.54
Values of attributes
//chapter[@num='3'] selects every chapter element with
an attribute num with value 3
The normalize-space() function can be used to remove
leading and trailing spaces from a value before
comparison
Example: //chapter[normalize-space(@num)="3"]
SWE 444 - Internet & Web Application Development
5.55
Arithmetic Expressions
+
add
-
subtract
*
multiply
div
(not /) divide
mod
modulo (remainder)
SWE 444 - Internet & Web Application Development
5.56
Equality Tests
=
“equals”
!=
“not equals”
But it’s not that simple!
(Notice it’s not ==)
value = node-set will be true if the node-set contains any
node with a value that matches value
value != node-set will be true if the node-set contains any
node with a value that does not match value
Hence,
value = node-set and value != node-set may both be
true at the same time!
SWE 444 - Internet & Web Application Development
5.57
Other Boolean Operators
and
(infix operator)
or
(infix operator)
Example: count = 0 or count = 1
not()
The following are used for numerical comparisons only:
<
<=
>
>=
(function)
“less than”
“less than or equal to”
“greater than”
“greater than or equal to”
SWE 444 - Internet & Web Application Development
5.58
Some XPath Functions
XPath contains a number of functions on node sets,
numbers, and strings; here are a few of them:
count(elem) counts the number of selected elements
name() returns the name of the element
Example: //*[starts-with(name(), 'sec')]
contains(arg1, arg2) tests if arg1 contains arg2
Example: //*[name()='section'] is the same as //section
starts-with(arg1, arg2) tests if arg1 starts with arg2
Example: //chapter[count(section)=1] selects chapters with
exactly one section child
Example: //*[contains(name(), 'ect')]
Examples
http://www.zvon.org/xxl/XPathTutorial/General/examples.html
SWE 444 - Internet & Web Application Development
5.59
References
W3School XPath Tutorial
http://www.w3schools.com/xpath/default.asp
MSXML 4.0 SDK
Several online presentations
SWE 444 - Internet & Web Application Development
5.60
5.3 XSL / XSLT
What is XSL?
Some XSLT Constructs
xsl:value-of
xsl:for-each
xsl:if
xsl:choose
xsl:sort
xsl:text
xsl:attribute
Templates
XSL on the Client
XSL on the Server
SWE 444 - Internet & Web Application Development
5.61
What is XSL?
XSL stands for eXtensible Stylesheet Language
a standard recommended by the W3C
http://www.w3.org/TR/xsl/
CSS was designed for styling HTML pages, and can be used to style XML
pages
XSL was designed specifically to style XML pages, and is much more
sophisticated than CSS
XSL consists of three languages:
XSLT (XSL Transformations) is a language used to transform XML documents
into other kinds of documents (most commonly HTML, so they can be
displayed)
XPath is a language to select parts of an XML document to transform with
XSLT
XSL-FO (XSL Formatting Objects) is a replacement for CSS
The future of XSL-FO as a standard is uncertain, because much of its functionality
overlaps with that provided by cascading style sheets (CSS) and the HTML tag set
SWE 444 - Internet & Web Application Development
5.62
How does it work?
The XML source document is parsed into an XML source
tree
You use XPath to define templates that match parts of the
source tree
You use XSLT to transform the matched part and put the
transformed information into the result tree
The result tree is output as a result document
Parts of the source document that are not matched by a
template are typically copied unchanged
SWE 444 - Internet & Web Application Development
5.63
Simple XPath
Here’s a simple XML document:
<?xml version="1.0"?>
<library>
<book>
<title>XML</title>
<author>Gregory Brill</author>
</book>
<book>
<title>Java and XML</title>
<author>Brett Scott</author>
</book>
</library >
SWE 444 - Internet & Web Application Development
XPath expressions look a
lot like paths in a
computer file system
/ means the document
itself (but no specific
elements)
/library selects the
root element
/library/book
selects every book
element
//author selects every
author element,
wherever it occurs
5.64
Simple XSLT
<xsl:for-each
select="//book"> loops through every
book element, everywhere in the document
<xsl:value-of
select="title"/> chooses the content
of the title element at the current location
<xsl:for-each select="//book">
<xsl:value-of select="title"/>
</xsl:for-each>
chooses the content of the title element for each book in
the XML document
SWE 444 - Internet & Web Application Development
5.65
Using XSL to Create HTML
Our goal is to turn this:
<?xml version="1.0"?>
<library>
<book>
<title>XML</title>
<author>Gregory Brill</author>
</book>
<book>
<title>Java and XML</title>
<author>Brett Scott</author>
</book>
</library >
Book Titles:
• XML
• Java and XML
Book Authors:
• Gregory Brill
• Brett Scott
SWE 444 - Internet & Web Application Development
Into HTML that displays
something like this:
Note that we’ve grouped titles
and authors separately
5.66
What we need to do
We need to save our XML into a file (let’s call it
books.xml)
We need to create a file (say, books.xsl) that
describes how to select elements from
books.xml and embed them into an HTML page
We do this by intermixing the HTML and the XSL in
the books.xsl file
We need to add a line to our books.xml file to
tell it to refer to books.xsl for formatting
information
SWE 444 - Internet & Web Application Development
5.67
books.xml, revised
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="books.xsl"?>
<library>
<book>
This tells you where
<title>XML</title>
to find the XSL file
<author>Gregory Brill</author>
</book>
<book>
<title>Java and XML</title>
<author>Brett McLaughlin</author>
</book>
</library >
SWE 444 - Internet & Web Application Development
5.68
Desired HTML
<html>
<head>
<title>Book Titles and Authors</title>
</head>
<body>
Red text is data extracted
<h2>Book titles:</h2>
from the XML document
<ul>
<li>XML</li>
Blue text is our
<li>Java and XML</li>
</ul>
HTML template
<h2>Book authors:</h2>
<ul>
<li>Gregory Brill</li>
We don’t necessarily
<li>Brett Scott</li>
know how much data
</ul>
we will have
</body>
</html>
SWE 444 - Internet & Web Application Development
5.69
XSL Outline
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html> ... </html>
</xsl:template>
</xsl:stylesheet>
SWE 444 - Internet & Web Application Development
5.70
Selecting Titles and Authors
<h2>Book titles:</h2>
<ul>
<xsl:for-each select="//book">
<li>
<xsl:value-of select="title"/>
</li>
</xsl:for-each>
</ul>
<h2>Book authors:</h2>
Notice the
xsl:foreach loop
...same thing, replacing title with author
Notice that XSL can rearrange the data; the HTML result
can present information in a different order than the XML
SWE 444 - Internet & Web Application Development
5.71
All of books.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl"
href="books.xsl"?>
<library>
<book>
<title>XML</title>
<author>Gregory Brill</author>
</book>
<book>
<title>Java and XML</title>
<author>Brett Scott</author>
</book>
</library >
Note: if you do View Source, this is
what you will see, not the resultant
HTML
SWE 444 - Internet & Web Application Development
5.72
All of books.xsl
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/
<h2>Book authors:</h2>
XSL/Transform">
<ul>
<xsl:template match="/">
<xsl:for-each
<html>
select="//book">
<head>
<li>
<title>Book Titles and Authors</title>
<xsl:value-of
</head>
<body>
select="author"/>
<h2>Book titles:</h2>
</li>
<ul>
</xsl:for-each>
<xsl:for-each select="//book">
</ul>
<li>
</body>
<xsl:value-of select="title"/>
</html>
</li>
</xsl:template>
</xsl:for-each>
</xsl:stylesheet>
</ul>
SWE 444 - Internet & Web Application Development
5.73
How to use it
In a modern browser, such as Netscape 6,
Internet Explorer 6, or Mozilla 1.0, you can just
open the XML file
Older browsers will ignore the XSL and just show
you the XML contents as continuous text
You can use a program such as Xalan, MSXML,
or Saxon to create the HTML as a file
This can be done on the server side, so that all the
client side browser sees is plain HTML
The server can create the HTML dynamically from
the information currently in XML
SWE 444 - Internet & Web Application Development
5.74
The result (in IE)
SWE 444 - Internet & Web Application Development
5.75
XSLT
XSLT stands for eXtensible Stylesheet
Language Transformations
XSLT is used to transform XML documents into
other kinds of documents--usually, but not
necessarily, XHTML
XSLT uses two input files:
The XML document containing the actual data
The XSL document containing both the “framework”
in which to insert the data, and XSLT commands to
do so
SWE 444 - Internet & Web Application Development
5.76
Understanding the XSLT Process
SWE 444 - Internet & Web Application Development
5.77
The XSLT Processor
SWE 444 - Internet & Web Application Development
5.78
The .xsl file
An XSLT document has the .xsl extension
The XSLT document begins with:
Contains one or more templates, such as:
<xsl:template match="/"> ... </xsl:template>
And ends with:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
</xsl:stylesheet>
The template <xsl:template match="/"> says select the
entire file
You can think of this as selecting the root node of the XML tree
SWE 444 - Internet & Web Application Development
5.79
Where XSLT can be used
A server can use XSLT to change XML files into
HTML files before sending them to the client
A modern browser can use XSLT to change
XML into HTML on the client side
This is what we will mostly be doing here
Most users seldom update their browsers
If you want “everyone” to see your pages, do any
XSL processing on the server side
Otherwise, think about what best fits your situation
SWE 444 - Internet & Web Application Development
5.80
xsl:value-of
<xsl:value-of select="XPath expression"/>
selects the contents of an element and adds it to
the output stream
The select attribute is required
Notice that xsl:value-of is not a container tag,
hence it needs to end with a slash
SWE 444 - Internet & Web Application Development
5.81
xsl:for-each
xsl:for-each is a kind of loop statement
The syntax is
<xsl:for-each select="XPath expression">
Text to insert and rules to apply
</xsl:for-each>
Example: to select every book (//book) and make an
unordered list (<ul>) of their titles (title), use:
<ul>
<xsl:for-each select="//book">
<li> <xsl:value-of select="title"/> </li>
</xsl:for-each>
</ul>
SWE 444 - Internet & Web Application Development
5.82
Filtering Output
You can filter (restrict) output by adding a
criterion to the select attribute’s value:
<ul>
<xsl:for-each select="//book">
<li>
<xsl:value-of
select="title[../author='Brett Scott']"/>
</li>
</xsl:for-each>
</ul>
This will select book titles by Brett Scott
SWE 444 - Internet & Web Application Development
5.83
Filter Details
Here is the filter we just used:
<xsl:value-of
select="title[../author='Brett Scott']"/>
author is a sibling of title, so from title we have
to go up to its parent, book, then back down to author
This filter requires a quote within a quote, so we need
both single quotes and double quotes
Legal filter operators are:
=
!=
<
>
Numbers should be quoted
SWE 444 - Internet & Web Application Development
5.84
But it doesn’t work right!
Here’s what we did:
<xsl:for-each select="//book">
<li>
<xsl:value-of
select="title[../author='Brett Scott']"/>
</li>
</xsl:for-each>
This will output <li> and </li> for every book, so we will
get empty bullets for authors other than Brett Scott
There is no obvious way to solve this with just
xsl:value-of
SWE 444 - Internet & Web Application Development
5.85
xsl:if
xsl:if allows us to include content if a given
condition (in the test attribute) is true
Example:
<xsl:for-each select="//book">
<xsl:if test="author='Brett Scott'">
<li>
<xsl:value-of select="title"/>
</li>
</xsl:if>
</xsl:for-each>
This does work correctly!
SWE 444 - Internet & Web Application Development
5.86
xsl:choose
The xsl:choose ... xsl:when ... xsl:otherwise
construct is XML’s equivalent of Java’s switch ...
case ... default statement
The syntax is:
<xsl:choose>
<xsl:when test="some condition">
... some code ...
</xsl:when>
<xsl:otherwise>
... some code ...
xsl:choose is often
</xsl:otherwise>
used within an
</xsl:choose>
xsl:for-each loop
SWE 444 - Internet & Web Application Development
5.87
xsl:sort
You can place an xsl:sort inside an xsl:for-each
The attribute of the sort tells what field to sort on
Example:
<ul>
<xsl:for-each select="//book">
<xsl:sort select="author"/>
<li> <xsl:value-of select="title"/> by
<xsl:value-of select="author">
</li>
</xsl:for-each>
</ul>
This example creates a list of titles and authors, sorted by
author
SWE 444 - Internet & Web Application Development
5.88
xsl:text
Used inside templates to indicate that its contents should be output as text
Its contents are pure text, not elements, and white space is not collapsed
<xsl:text>...</xsl:text> helps deal with two common problems:
XSL isn’t very careful with whitespace in the document
This doesn’t matter much for HTML, which collapses all whitespace anyway
<xsl:text> gives you much better control over whitespace; it acts like the
<pre> element in HTML
Since XML defines only five entities, you cannot readily put other entities
(such as ) in your XSL
These are & (&), < (<), > (>), " (“), ' (‘)
Others can be inserted using their decimal or hexadecimal number forms
You may use the following secret formula for entities:
<xsl:text disable-output-escaping="yes">&nbsp;</xsl:text>
•
A “yes” value means special characters like “<“ should be output as is. “no”
indicates that “<“ should be output as “<”. Default is “no”
SWE 444 - Internet & Web Application Development
5.89
Creating Tags from XML Data
Suppose the XML contains
<name>Dr. Scott's Home Page</name>
<url>http://www.kfupm.edu/~scott</url>
And you want to turn this into
<a href="http://www.kfupm.edu/~scott">
Dr. Scott's Home Page</a>
We need additional tools to do this
It doesn’t even help if the XML directly contains
<a href="http://www.kfupm.edu/~scott">
Dr. Scott's Home Page</a> -- we still can’t move it to the
output
The same problem occurs with images in the XML
A reason for the above is that attribute fields may not
contain reserved characters like < and > in XML
SWE 444 - Internet & Web Application Development
5.90
Creating Tags - solution 1
Suppose the XML contains
<name>Dr. Scott's Home Page</name>
<url>http://www.kfupm.edu/~scott</url>
<xsl:attribute name="..."> adds the named attribute to the
enclosing tag
The value of the attribute is the content of this tag
Example:
<a>
</a>
Result:
<xsl:attribute name="href">
<xsl:value-of select="url"/>
</xsl:attribute>
<xsl:value-of select="name"/>
<a href="http://www.kfupm.edu/~scott">
Dr. Scott's Home Page</a>
SWE 444 - Internet & Web Application Development
5.91
Creating Tags - solution 2
Suppose the XML contains
<name>Dr. Scott's Home Page</name>
<url>http://www.kfupm.edu/~scott</url>
An attribute value template (AVT) consists of braces { } inside the
attribute value
The content of the braces is replaced by its value
Example:
<a href="{url}">
<xsl:value-of select="name"/>
</a>
Result:
<a href="http://www.kfupm.edu/~scott">
Dr. Scott's Home Page</a>
SWE 444 - Internet & Web Application Development
5.92
Modularization
Modularization: breaking up a complex program into
simpler parts (is an important programming tool)
For example, suppose we have a DTD for book with
parts titlePage, tableOfContents, chapter, and
index
In programming languages modularization is often done with
functions or methods
In XSL we can do something similar with
xsl:apply-templates
We can create separate templates for each of these parts
Template rules are used to control what output is
created from what input
SWE 444 - Internet & Web Application Development
5.93
…Modularization
A template rule is represented by an <xsl:template>
element
The <xsl:template> element has
A match attribute that contains an XPath pattern identifying the
input it matches
A template that is instantiated and output when the pattern is
matched
Template skeleton:
<xsl:template match=“person”>
A Person
</xsl:template>
The above says that every time a <person> element is
seen, the stylesheet processor should emit the text “A
Person”
SWE 444 - Internet & Web Application Development
5.94
Book example
<xsl:template match="/">
<html> <body>
<xsl:apply-templates/>
</body> </html>
</xsl:template>
<xsl:template match="tableOfContents">
<h1>Table of Contents</h1>
<xsl:apply-templates select="chapterNumber"/>
<xsl:apply-templates select="chapterName"/>
<xsl:apply-templates select="pageNumber"/>
</xsl:template>
Etc.
SWE 444 - Internet & Web Application Development
5.95
xsl:apply-templates
The <xsl:apply-templates> element
applies a template rule to the current element or
to the current element’s child nodes
If we add a select attribute, it applies the
template rule only to the child that matches
If we have multiple <xsl:apply-templates>
elements with select attributes, the child
nodes are processed in the same order as the
<xsl:apply-templates> elements
SWE 444 - Internet & Web Application Development
5.96
When templates are ignored
Templates aren’t used unless they are applied
Exception: Processing always starts with
select="/"
If it didn’t, nothing would ever happen
If your templates are ignored, you probably
forgot to apply them
If you apply a template to an element that has
child elements, templates are not automatically
applied to those child elements
SWE 444 - Internet & Web Application Development
5.97
Applying templates to children
<book>
<title>XML</title>
<author>Gregory Brill</author>
</book>
With this line:
XML by Gregory Brill
<xsl:template match="/">
<html> <head></head> <body>
<b><xsl:value-of select="/book/title"/></b>
<xsl:apply-templates select="/book/author"/>
</body> </html>
</xsl:template>
<xsl:template match="/book/author">
by <i><xsl:value-of select="."/></i>
</xsl:template>
Without this line:
XML
SWE 444 - Internet & Web Application Development
5.98
Built-in Templates
XSLT has a couple of built in templates, which say:
when you apply templates to an element, process its child elements
when you apply templates to a text node, give its value
Together, it means that if you apply templates to an element but don't have
an explicit template for that element, then its content gets processed and
eventually you end up with the text that the element contains.
Here are the built-in template rules for each of the seven XPath node types:
Elements
Apply templates to children
Text
Copy text to the result tree
Comments
Do nothing
PIs
Do nothing
Attributes
Copy the value of the attribute to the result tree
Name spaces
Do nothing
Root
Apply templates to children
SWE 444 - Internet & Web Application Development
5.99
XSL - On the Client
If your browser supports XML, XSL can be used to transform the
document to XHTML in your browser
A JavaScript Solution
By using JavaScript, we can:
Even if this works fine, it is not always desirable to include a style
sheet reference in an XML file (i.e. it will not work in a non XSL aware
browser.)
A more versatile solution would be to use a JavaScript to do the XML
to XHTML transformation
do browser-specific testing
use different style sheets according to browser and user needs
XSL transformation on the client side is bound to be a major part of
the browsers work tasks in the future, as we will see a growth in the
specialized browser market (Braille, aural browsers, Web printers,
handheld devices, etc.)
SWE 444 - Internet & Web Application Development
5.100
Transforming XML to XHTML in Your Browser
<html>
<body>
<script type="text/javascript">
// Load XML
var xml = new ActiveXObject("Microsoft.XMLDOM")
xml.async = false
xml.load(“books.xml")
// Load XSL
var xsl = new ActiveXObject("Microsoft.XMLDOM")
xsl.async = false
xsl.load(“books.xsl")
// Transform
document.write(xml.transformNode(xsl))
</script>
</body>
</html>
SWE 444 - Internet & Web Application Development
5.101
XSL - On the Server
Since not all browsers support XML and XSL, one
solution is to transform the XML to XHTML on the server
To make XML data available to all kinds of browsers, we
have to transform the XML document on the SERVER
and send it as pure XHTML to the BROWSER
That's another beauty of XSL! One of the design goals
for XSL was to make it possible to transform data from
one format to another on a server, returning readable
data to all kinds of future browsers
SWE 444 - Internet & Web Application Development
5.102
Thoughts on XSL
XSL is a programming language--and not a particularly
simple one
Expect to spend considerable time debugging your XSL
These slides have been an introduction to XSL and
XSLT--there’s a lot more of it we haven’t covered
As with any programming, it’s a good idea to start simple
and build it up incrementally: “Write a little, test a little”
This is especially a good idea for XSLT, because you don’t get
a lot of feedback about what went wrong
Try jEdit with the XML plugin
write (or change) a line or two, check for syntax errors, then
jump to IE and reload the XML file
SWE 444 - Internet & Web Application Development
5.103
References
W3School XSL Tutorial
http://www.w3schools.com/xsl/default.asp
MSXML 4.0 SDK
http://www.topxml.com
http://www.xml.org
http://www.xml.com
Several online presentations
SWE 444 - Internet & Web Application Development
5.104
5.4 Document Type Definitions (DTDs)
What are DTDs?
Why DTDs?
DTD Syntactic Elements
ELEMENT
ATTRIBUTE
ENTITY
Types
Examples
Validation
SWE 444 - Internet & Web Application Development
5.105
What are DTDs?
Document Type Definition (DTD) is a grammar that
describes the structure of a class of XML documents
structure of the documents is described via
Element declarations
name the allowable set of elements within the document, and
specify whether and how declared elements and runs of character
data may be contained within each element.
Attribute-list declarations
element and attribute-list declarations.
name the allowable set of attributes for each declared element,
including the type of each attribute value, if not an explicit set of
valid value(s).
DTDs are written in EBNF-like notation
SWE 444 - Internet & Web Application Development
5.106
Why DTDs?
XML documents are designed to be processed by
computer programs
If you can put just any tags in an XML document, it’s very hard to
write a program that knows how to process the tags
A DTD specifies what tags may occur, when they may occur, and
what attributes they may (or must) have
A DTD allows the XML document to be verified (shown to
be legal)
A DTD that is shared across groups allows the groups to
produce consistent XML documents
SWE 444 - Internet & Web Application Development
5.107
Parsers
An XML parser is an API that reads the content
of an XML document
Currently popular APIs are DOM (Document Object
Model) and SAX (Simple API for XML)
A validating parser is an XML parser that
compares the XML document to a DTD and
reports any errors
SWE 444 - Internet & Web Application Development
5.108
An XML example
<novel>
<foreword>
<paragraph>This is a great novel.
</paragraph>
</foreword>
<chapter number="1">
<paragraph>It was a dark and stormy
night.</paragraph>
<paragraph>Suddenly, a shot rang
out!</paragraph>
</chapter>
</novel>
An XML document contains (and the DTD describes):
Elements, such as novel and paragraph, consisting of tags and
content
Attributes, such as number="1", consisting of a name and a value
Entities (not used in this example)
SWE 444 - Internet & Web Application Development
5.109
A DTD example
<!DOCTYPE novel [
<!ELEMENT novel (foreword, chapter+)>
<!ELEMENT foreword (paragraph+)>
<!ELEMENT chapter (paragraph+)>
<!ELEMENT paragraph (#PCDATA)>
<!ATTLIST chapter number CDATA #REQUIRED>
]>
A novel consists of a foreword and one or more chapters, in that order
Each chapter must have a number attribute
A foreword consists of one or more paragraphs
A chapter also consists of one or more paragraphs
A paragraph consists of parsed character data (text that cannot contain any other
elements)
PCDATA is text that will be parsed by a parser. Tags inside the text will be treated
as markup and entities will be expanded.
CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT
be treated as markup and entities will not be expanded.
SWE 444 - Internet & Web Application Development
5.110
ELEMENT descriptions
Suffixes:
?
+
*
foreword?
chapter+
appendix*
Separators
,
|
optional
one or more
zero or more
both, in order
or
foreword?, chapter+
section|chapter
Grouping
( )
grouping
SWE 444 - Internet & Web Application Development
(section|chapter)+
5.111
Elements without children
The syntax is <!ELEMENT name category>
The name is the element name used in start and end
tags
The category may be EMPTY:
or just <br />
In the XML, an empty element may not have any
content between the start tag and the end tag
An empty element may (and usually does) have
attributes
In the DTD: <!ELEMENT br EMPTY>
In the XML: <br></br>
SWE 444 - Internet & Web Application Development
5.112
Elements with unstructured children
The syntax is <!ELEMENT name category>
The category may be ANY
This indicates that any content -- character data, elements, even
undeclared elements -- may be used
Since the whole point of using a DTD is to define the structure of a
document, ANY should be avoided wherever possible
The category may be (#PCDATA), indicating that only
character data may be used
In the DTD: <!ELEMENT paragraph (#PCDATA)>
In the XML: <paragraph>A shot rang out!</paragraph>
The parentheses are required!
Note: In (#PCDATA), whitespace is kept exactly as entered
Elements may not be used within parsed character data
Entities are character data, and may be used
SWE 444 - Internet & Web Application Development
5.113
Elements with children
A category may describe one or more children:
<!ELEMENT novel (foreword, chapter+)>
Parentheses are required, even if there is only one child
A space must precede the opening parenthesis
Commas (,) between elements mean that all children must
appear, and must be in the order specified
“|” separators means any one child may be used
All child elements must themselves be declared
Children may have children
Parentheses can be used for grouping:
<!ELEMENT novel (foreword, (chapter+|section+))>
SWE 444 - Internet & Web Application Development
5.114
Elements with mixed content
#PCDATA describes elements with only
character data
#PCDATA can be used in an “or” grouping:
<!ELEMENT note (#PCDATA|message)*>
This is called mixed content
Certain (rather severe) restrictions apply:
#PCDATA must be first
The separators must be “|”
The group must be starred (meaning zero or more)
SWE 444 - Internet & Web Application Development
5.115
Names and namespaces
All names of elements, attributes, and entities, in both
the DTD and the XML, are formed as follows:
The name must begin with a letter or underscore
The name may contain only letters, digits, dots, hyphens,
underscores, and colons
The DTD doesn’t know about namespaces -- as far as it
knows, a colon is just part of a name
The following are different (and both legal):
<!ELEMENT chapter (paragraph+)>
<!ELEMENT myBook:chapter (myBook:paragraph+)>
Avoid colons in names, except to indicate namespaces
SWE 444 - Internet & Web Application Development
5.116
An expanded DTD example
<!DOCTYPE novel [
<!ELEMENT novel
(foreword, chapter+, biography?, criticalEssay*)>
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
foreword (paragraph+)>
chapter (section+|paragraph+)>
section (paragraph+)>
biography(paragraph+)>
criticalEssay (section+)>
paragraph (#PCDATA)>
]>
SWE 444 - Internet & Web Application Development
5.117
Attributes and entities
In addition to elements, a DTD may declare attributes
and entities
An attribute describes information that can be put within
the start tag of an element
In XML: <car name= "Toyota" model= "2001"></car>
In DTD: <!ATTLIST car
name CDATA #REQUIRED
model CDATA #IMPLIED >
An entity describes text to be substituted
In XML: ©right;
In the DTD: <!ENTITY copyright "Copyright KFUPM">
SWE 444 - Internet & Web Application Development
5.118
Attributes
The format of an attribute is:
<!ATTLIST element-name
name
name
type
type
requirement
requirement>
where the name-type-requirement may be repeated as
many times as desired
Note that only spaces separate the parts, so careful counting is
essential
The element-name tells which element may have these
attributes
The name is the name of the attribute
Each attribute has a type, such as CDATA (character data)
Each attribute may be required, optional, or “fixed”
In the XML, attributes may occur in any order
SWE 444 - Internet & Web Application Development
5.119
Important attribute types
There are ten attribute types
These are the most important ones:
CDATA
The value is character data
(man|woman|child)
The value is one from this list
ID
The value is a unique identifier
NMTOKEN
ID values must be legal XML names and must be unique within the
document
The value is a legal XML name
This is sometimes used to disallow whitespace in the name
It also disallows numbers, since an XML name cannot begin with a
digit
The other seven, less frequently used, are:
IDREF, IDREFS, NMTOKENS, ENTITY, ENTITIES,
NOTATION, xml:
SWE 444 - Internet & Web Application Development
5.120
Requirements
Recall that an attribute has the form
<!ATTLIST element-name name
type
requirement>
The requirement is one of:
A default value, enclosed in quotes
Example: <!ATTLIST degree CDATA "PhD">
#REQUIRED
#IMPLIED
The attribute must be present
The attribute is optional
#FIXED "value"
The attribute always has the given value
If specified in the XML, the same value must be used
SWE 444 - Internet & Web Application Development
5.121
Entities
There are exactly five predefined entities: <, >, &,
", and '
Additional entities can be defined in the DTD:
<!ENTITY copyright "Copyright KFUPM">
Entities can be defined in another document:
<!ENTITY copyright SYSTEM "MyURI">
Example of use in the XML:
This document is ©right; 2002.
Entities are a way to include fixed text (sometimes called
“boilerplate”)
Entities should not be confused with character references, which
are numerical values between & and #
Example: &233#; or &xE9#; to indicate the character é
SWE 444 - Internet & Web Application Development
5.122
Another example: XML
<?xml version="1.0"?>
<!DOCTYPE myXmlDoc SYSTEM
"http://www.mysite.com/mydoc.dtd">
<weatherReport>
<date>05/29/2002</date>
<location>
<city>Philadelphia</city>, <state>PA</state>
<country>USA</country>
</location>
<temperature-range>
<high scale="F">84</high>
<low scale="F">51</low>
</temperature-range>
</weatherReport>
SWE 444 - Internet & Web Application Development
5.123
The DTD for this example
<!ELEMENT weatherReport (date, location,
temperature-range)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT location (city, state, country)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT country (#PCDATA)>
<!ELEMENT temperature-range
((low, high)|(high, low))>
<!ELEMENT low (#PCDATA)>
<!ELEMENT high (#PCDATA)>
<!ATTLIST low scale (C|F) #REQUIRED>
<!ATTLIST high scale (C|F) #REQUIRED>
SWE 444 - Internet & Web Application Development
5.124
Inline DTDs
If a DTD is used only by a single XML document,
it can be put directly in that document:
<?xml version="1.0">
<!DOCTYPE myRootElement [
<!-- DTD content goes here -->
]>
<myRootElement>
<!-- XML content goes here -->
</myRootElement>
An inline DTD can be used only by the document
in which it occurs
SWE 444 - Internet & Web Application Development
5.125
External DTDs
An external DTD (a DTD that is a separate document) is
declared with a SYSTEM or a PUBLIC command:
The file extension for an external DTD is .dtd
<!DOCTYPE myRootElement SYSTEM
"http://www.mysite.com/mydoc.dtd">
The name that appears after DOCTYPE (in this example,
myRootElement) must match the name of the XML document’s
root element
Use SYSTEM for external DTDs that you define yourself, and
use PUBLIC for official, published DTDs
External DTDs can only be referenced with a URL
External DTDs are almost always preferable to inline
DTDs, since they can be used by more than one
document
SWE 444 - Internet & Web Application Development
5.126
Limitations of DTDs
DTDs are a very weak specification language
You can’t put any restrictions on element contents
It’s difficult to specify:
All the children must occur, but may be in any order
This element must occur a certain number of times
There are only ten data types for attribute values
But most of all: DTDs aren’t written in XML!
If you want to do any validation, you need one parser for the
XML and another for the DTD
This makes XML parsing harder than it needs to be
There is a newer and more powerful technology: XML
Schemas
However, DTDs are still very much in use
SWE 444 - Internet & Web Application Development
5.127
Validators
Opera 5 and Internet Explorer 5 can validate your XML
against an internal DTD
jEdit with the XML plugin will check for wellstructuredness and (if the DTD is inline) will validate
your XML each time you do a Save
IE provides (slightly) better error messages
Opera apparently just ignores external DTDs
IE considers an external DTD to be an error
http://www.jedit.org/
Validate [Using Inline DTD]
http://www.stg.brown.edu/service/xmlvalid/
SWE 444 - Internet & Web Application Development
5.128
References
W3School DTD Tutorial
http://www.w3schools.com/dtd/default.asp
MSXML 4.0 SDK
http://www.topxml.com
http://www.xml.org
http://www.xml.com
Several online presentations
SWE 444 - Internet & Web Application Development
5.129
5.5 XML Schema Definition (XSD)
What is XSD?
An XML Document with Its Schema
Referencing A Schema from XML Document
Simple and Complex Elements
Predefined Types
Numeric types
Date and Time types
String types
Defining Schema Components
Simple Elements
Attributes
Restrictions or Facets
Enumeration
Complex Elements
SWE 444 - Internet & Web Application Development
5.130
What is XML Schema?
The origin of schema
XML Schema documents are used to define and
validate the content and structure of XML data
XML Schema was originally proposed by Microsoft,
but became an official W3C recommendation in May
2001
http://www.w3.org/XML/Schema
SWE 444 - Internet & Web Application Development
5.131
Why Schema?
Separating Information from Structure and Format
Information
Information
Structure
Format
Format
Structure
Traditional Document:
Everything is clumped together
SWE 444 - Internet & Web Application Development
“Fashionable” Document: A document
is broken into discrete parts, which
can be treated separately
5.132
Why Schema?
Schema Workflow
SWE 444 - Internet & Web Application Development
5.133
DTD vs. Schema
DTD
XSD
No constraints on character data
Can constrain character data like requiring
a string to be of a fixed characters
Not using XML syntax
Uses XML syntax and thus frees developer
of the need to learn another language.
XML transformations can be applied,
too.
No support for namespace
Very limited for reusability and extensibility
Very limited for reusability and extensibility
Can reuse in other schemas, create own
derived data types and reference
multiple schemas from same document
Easier to write DTD-based validators: may
only need to check existence of
content like PCDATA
Schema-based validators are more difficult
to write because we may have to
validate content detail
Easier to understand
More complex: The notion of “type” adds
an extra layer of confusing complexity
SWE 444 - Internet & Web Application Development
5.134
XML.org Registry
The XML.org Registry offers a central clearinghouse for developers and
standards bodies to publicly submit, publish and exchange XML schemas,
vocabularies and related documents
SWE 444 - Internet & Web Application Development
5.135
Example 1: An XML Document Instance
<?xml version="1.0" encoding="utf-8"?>
<book isbn="0836217462">
<title> … </title>
<author> … </author>
<qualification> … </qualification>
</book>
SWE 444 - Internet & Web Application Development
5.136
Schema for Example 1
<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
<xs:element name="qualification" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
book.xsd
SWE 444 - Internet & Web Application Development
5.137
Example 2: An XML Document and Its Schema
<letter> Dear Mr.<name>John Smith</name>.
Your order <orderid>1032</orderid> will
be shipped on <shipdate>2001-07-13</shipdate>.
</letter>
<xs:element name="letter">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="orderid" type="xs:integer"/>
<xs:element name="shipdate" type="xs:date"/>
</xs:sequence>
</xs:complexType>
</xs:element>
SWE 444 - Internet & Web Application Development
5.138
The XSD Document
Since the XSD is written in XML, it can get
confusing which we are talking about
The file extension is .xsd
The root element is <schema>
The XSD starts like this:
<?xml version="1.0"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema">
SWE 444 - Internet & Web Application Development
5.139
<schema>
The <schema> element may have attributes:
xmlns:xs="http://www.w3.org/2001/XMLSchema"
Indicates that the elements used in the schema (schema,
element, complextType, etc) come from this namespace
elementFormDefault="qualified"
This means that all XML elements must be qualified (i.e.,
prefixed with xs)
SWE 444 - Internet & Web Application Development
5.140
Referring to a Schema
To refer to a DTD in an XML document, the reference goes before
the root element:
<?xml version="1.0"?>
<!DOCTYPE rootElement SYSTEM "url">
<rootElement> ... </rootElement>
To refer to an XML Schema in an XML document, the reference
goes in the root element:
<?xml version="1.0"?>
<rootElement
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="url.xsd">
...
</rootElement>
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
Schema instance namespace
This attribute has two values for
The namespace to use and
the second value is the location of the XML schema to use for that namespace:
SWE 444 - Internet & Web Application Development
5.141
“Simple” and “Complex” Elements
A “simple” element is one that contains text and
nothing else
A simple element cannot have attributes
A simple element cannot contain other elements
A simple element cannot be empty
However, the text can be of many different types,
and may have various restrictions applied to it
If an element isn’t simple, it’s “complex”
A complex element may have attributes
A complex element may be empty, or it may contain
text, other elements, or both text and other elements
SWE 444 - Internet & Web Application Development
5.142
Predefined Numeric Types
Here are some of the predefined numeric types:
xs:decimal
xs:byte
xs:short
xs:int
xs:long
xs:positiveInteger
xs:negativeInteger
xs:nonPositiveInteger
xs:nonNegativeInteger
Allowable restrictions on numeric types:
enumeration, minInclusive, minExclusive,
maxInclusive, maxExclusive, fractionDigits,
totalDigits, pattern, whiteSpace
SWE 444 - Internet & Web Application Development
5.143
Predefined Date and Time Types
xs:date - A date in the format CCYY-MM-DD, for
example, 2003-11-05
xs:time - A time in the format hh:mm:ss (hours,
minutes, seconds)
xs:dateTime - Format is CCYY-MM-DDThh:mm:ss
Allowable restrictions on dates and times:
enumeration, minInclusive,
minExclusive, maxInclusive,
maxExclusive, pattern, whiteSpace
SWE 444 - Internet & Web Application Development
5.144
Predefined String Types
Recall that a simple element is defined as:
<xs:element
type="type" />
Here are a few of the possible string types:
name="name"
xs:string - a string
xs:normalizedString - a string that doesn’t contain
tabs, newlines, or carriage returns
xs:token - a string that doesn’t contain any whitespace other
than single spaces
Allowable restrictions on strings:
enumeration, length, maxLength, minLength,
pattern, whiteSpace
SWE 444 - Internet & Web Application Development
5.145
Defining a Simple Element
A simple element is defined as
<xs:element
name="name"
type="type" />
where:
name is the name of the element
the most common values for type are
xs:boolean
xs:date
xs:decimal
xs:integer
xs:string
xs:time
Other attributes a definition of a simple element may
have:
default="default value"
fixed="value"
SWE 444 - Internet & Web Application Development
if no other value is specified
no other value may be specified
5.146
Defining an Attribute
Attributes themselves are always declared as simple
types
An attribute is defined as
<xs:attribute
name="name"
type="type" />
where:
name and type are the same as for xs:element
Other attributes a definition of a simple element may
have:
default="default value"
fixed="value"
use="optional"
use="required"
SWE 444 - Internet & Web Application Development
if no other value is specified
no other value may be specified
the attribute is not required (default)
the attribute must be present
5.147
Restrictions, or “Facets”
The general form for putting a restriction on a text value is:
<xs:element name="name">
(or xs:attribute)
<xs:simpleType>
<xs:restriction base="type">
... the restrictions ...
</xs:restriction>
</xs:simpleType>
</xs:element>
For example:
<xs:element name="age">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="20"/>
<xs:maxInclusive value="100"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
SWE 444 - Internet & Web Application Development
5.148
Restrictions, or “Facets”
The “age" element is a simple type with a
restriction. The acceptable values are: 20 to 100
The example above could also have been
written like this:
<xs:element name="age" type="ageType"/>
<xs:simpleType name="ageType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="20"/>
<xs:maxInclusive value="100"/>
</xs:restriction>
</xs:simpleType>
SWE 444 - Internet & Web Application Development
5.149
Restrictions on numbers
minInclusive
number must be ≥ the given value
minExclusive
number must be > the given value
maxInclusive
number must be ≤ the given value
maxExclusive
number must be < the given value
totalDigits
number must have exactly value digits
fractionDigits number must have no more than value
digits after the decimal point
SWE 444 - Internet & Web Application Development
5.150
Restrictions on strings
length
the string must contain exactly value characters
minLength
the string must contain at least value characters
maxLength
the string must contain no more than value characters
pattern
the value is a regular expression that the string must match
whiteSpace not really a “restriction” - tells what to do with whitespace
value="preserve"
value="replace"
value="collapse"
SWE 444 - Internet & Web Application Development
Keep all whitespace
Change all whitespace characters to spaces
Remove leading and trailing whitespace, and replace
all sequences of whitespace with a single space
5.151
Restriction with Regular Expression Patterns
<xs:element name=“letter">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value=“([a-z])*"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:element name=“password">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value=“[a-zA-Z0-9]{8}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
Test these and find out whether the semantics of regular
expressions is the same as that in JavaScript
SWE 444 - Internet & Web Application Development
5.152
Enumeration
An enumeration restricts the value to be one of a fixed
set of values
Example:
<xs:element name="season">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Spring"/>
<xs:enumeration value="Summer"/>
<xs:enumeration value="Autumn"/>
<xs:enumeration value="Fall"/>
<xs:enumeration value="Winter"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
SWE 444 - Internet & Web Application Development
5.153
Complex Elements
A complex element is defined as
<xs:element
name="name">
<xs:complexType>
... information about the complex
type...
</xs:complexType>
</xs:element>
Example:
<xs:element
name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstName" type="xs:string" />
<xs:element name="lastName" type="xs:string" />
</xs:sequence>
</xs:complexType>
</xs:element>
SWE 444 - Internet & Web Application Development
5.154
Complex Elements
Another example – using a type attribute
<xs:element name="employee" type="personinfo"/>
<xs:complexType name="personinfo">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
SWE 444 - Internet & Web Application Development
5.155
xs:sequence
We’ve already seen an example of a complex
type whose elements must occur in a specific
order:
<xs:element
name="person">
<xs:complexType>
<xs:sequence>
<xs:element
<xs:element
</xs:sequence>
</xs:complexType>
name="firstName" type="xs:string" />
name="lastName" type="xs:string" />
</xs:element>
SWE 444 - Internet & Web Application Development
5.156
xs:all
xs:all allows elements to appear in any order
<xs:element
name="person">
<xs:complexType>
<xs:all>
<xs:element name="firstName" type="xs:string" />
<xs:element name="lastName" type="xs:string" />
</xs:all>
</xs:complexType>
</xs:element>
Despite the name, the members of an xs:all group can occur once
or not at all
You can use minOccurs="n" and maxOccurs="n" to specify how
many times an element may occur (default value is 1)
In this context, n may only be 0 or 1
SWE 444 - Internet & Web Application Development
5.157
Extensions
You can base a complex type on another
complex type
<xs:complexType name="newType">
<xs:complexContent>
<xs:extension base="otherType">
...new stuff...
</xs:extension>
</xs:complexContent>
</xs:complexType>
SWE 444 - Internet & Web Application Development
5.158
Text Element with Attributes
If a text element has attributes, it is no longer a
simple type
<xs:element name="population">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:integer">
<xs:attribute name="year"
type="xs:integer">
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
SWE 444 - Internet & Web Application Development
5.159
Empty Elements
Empty elements are (ridiculously) complex
<xs:complexType name="counter">
<xs:complexContent>
<xs:extension base="xs:integer"/>
<xs:attribute name="count"
type="xs:integer"/>
</xs:complexContent>
</xs:complexType>
SWE 444 - Internet & Web Application Development
5.160
Mixed Elements
Mixed elements may contain both text and elements
We add mixed="true" to the xs:complexType
element
The text itself is not mentioned in the element, and may
go anywhere (it is basically ignored)
<xs:complexType name="paragraph" mixed="true">
<xs:sequence>
<xs:element name="someName"
type="xs:anyType"/>
</xs:sequence>
</xs:complexType>
See Example 2 at the start of this section
SWE 444 - Internet & Web Application Development
5.161
References
W3School XSD Tutorial
http://www.w3schools.com/schema/default.asp
MSXML 4.0 SDK
Several online presentations
SWE 444 - Internet & Web Application Development
5.162
Reading List
W3School XSD Tutorial
http://www.w3schools.com/schema/default.asp
SWE 444 - Internet & Web Application Development
5.163
5.6 XML DOM
The XML DOM
XML Parsers
DOM-based
SAX-based
Examples
Cross-browser XML DOM object creation
Creating HTML table using XML data
Some XML DOM properties and methods
Creating XML using DOM methods
SWE 444 - Internet & Web Application Development
5.164
XML DOM
The DOM is a collection of interfaces that parser vendors and browser
manufacturers implement
The DOM interfaces are specified in modules, making it possible for
implementations to support parts of the DOM
XML parsers, for instance, aren’t required to provide support for the HTMLspecific part of the DOM
The W3C DOM is separated into different parts (Core, XML, and HTML)
and different levels (DOM Level 1/2/3):
To enable creation and manipulation of XML documents
Core DOM - defines a standard set of objects for any structured document
XML DOM - defines a standard set of objects for XML documents
HTML DOM - defines a standard set of objects for HTML documents
HTML DOM extends the Core XML DOM
Core DOM provides interface definition for manipulating and working with any
XML.
HTML DOM augments this with additional interfaces definitions for HTML
specific elements.
SWE 444 - Internet & Web Application Development
5.165
… XML DOM
The XML DOM is designed to be used with any programming
language and any operating system.
It is fully described in the W3C DOM specification
http://www.w3.org/DOM/
With the XML DOM, a programmer can create an XML document,
navigate its structure, and add, modify, or delete its elements
DOM provides generic access to DOM-compliant documents: add,
edit, delete, manipulate
DOM is language-independent
The DOM is based on a tree view of your document. Nodes! Nodes!
Nodes!
SWE 444 - Internet & Web Application Development
5.166
SWE 444 - Internet & Web Application Development
5.167
XML Parsers
As mentioned earlier, a software program called a parser is required to
process an XML document
Parsers can support the DOM and/or the SAX for accessing a document’s
content programmatically using Java, C, JavaScript etc
A DOM-based parser builds a tree structure containing the XML document’s
data in memory
A SAX (Simple API for XML)-based parser processes the document and
generates events when tags, text, comments etc are encountered
SAX and DOM are standards for XML parsers
DOM is a W3C standard
SAX is an ad-hoc (but very popular) standard
Examples: JAXP (Java API for XML Parsing), MSXML 3.0 (Microsoft XML
parser), Xerces (Apatche’s Xerces parser)
All support both SAX and DOM
SWE 444 - Internet & Web Application Development
5.168
SAX Callbacks
SAX works through callbacks: you call the
parser, it calls methods that you supply
Your program
startDocument(...)
The SAX parser
main(...)
parse(...)
startElement(...)
characters(...)
endElement( )
endDocument( )
SWE 444 - Internet & Web Application Development
5.169
Difference between SAX and DOM
DOM
SAX
Tree-based model
Event-based model: invokes methods when
markup is encountered
Data can be accessed quickly (randomly)
since all data is in memory
No tree structure is created: data is passed to
the application from the XML document as
it is found. SAX provides only sequential
access to data
Provides facilities for adding and removing
nodes (i.e., modifying the document)
SAX implementations do not
Requires too much space. Cannot be used
for large XML documents
Less memory overhead
SWE 444 - Internet & Web Application Development
5.170
DOM components
Document top-level view of the document, with
access to all nodes (including root element)
createElement method - creates an element node
createAttribute method - creates an attribute node
createComment method - creates a comment node
getDocumentElement method - returns root element
appendChild method - appends a child node
getChildNodes method - returns child nodes
SWE 444 - Internet & Web Application Development
5.171
DOM components II
Node represents a node - "A node is a reference to an
element, its attributes, or text from the document."
cloneNode method - duplicates a node
getNodeName method - returns the node name
getNodeName method - returns the node's name
getNodeType method - returns the node's type
getNodeValue method - returns the node's value
getParentNode method - returns the node's parent's name
hasChildNodes method - true if has child nodes
insertBefore method - stuffs child in before specified child
removeChild method - removes the child node
replaceChild method - replaces one child with another
setNodeValue method - sets node's value
SWE 444 - Internet & Web Application Development
5.172
DOM components III
attribute represents an attribute node
getAttribute method - gets attribute!
getTagName method - gets element's name
removeAttribute method - deletes it
setAttribute method - sets att's value
SWE 444 - Internet & Web Application Development
5.173
DOM Access with JavaScript
We have seen that the DOM can be mapped against XSLT style
sheets to transform an XML document into a formatted Web page
The DOM presents an XML document as a tree-structure (a node
tree), with the elements, attributes, and text defined as nodes.
We now show how JavaScript can be used to navigate the
document tree and manipulate its data.
Although this does not permit saving an XML document locally
An XML file is made accessible to scripting by loading it into a
Document Object Model.
When the MSXML parser loads an XML document, it reads it from
start to finish and creates a logical tree model of it.
SWE 444 - Internet & Web Application Development
5.174
Creating a DOM Object
<script type="text/javascript">
function Load_DOM() {
XMLDoc = new ActiveXObject("Microsoft.XMLDOM")
XMLDoc.async = false
XMLDoc.load("books.xml")
if (XMLDoc.parseError.errorCode != 0) {
alert("DOM Not Loaded: XML file has error(s)!")
}
}
…
</script>
By default, the load() method returns control to the caller before the download is
finished.
This async="true" action avoids unusually long waits for the Web page to load while
the DOM is loading a large document.
For small to moderate size XML files use async="false" to load and display the page
concurrently with the XML file.
The document's parseError object returns an errorCode property as a decimal number
associated with an error.
If the error code is zero, then loading of the file into the DOM was successful.
SWE 444 - Internet & Web Application Development
5.175
Example 1: Cross-Browser Code
<script type="text/javascript">
var xmlDoc
function loadXML() {
//load xml file
if (window.ActiveXObject) {// code for IE
xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
xmlDoc.async=false;
xmlDoc.load("note.xml");
getmessage()
}else if (document.implementation && //code for Mozilla
document.implementation.createDocument) {
xmlDoc=
document.implementation.createDocument("","",null);
xmlDoc.load("note.xml");
xmlDoc.onload=getmessage
} else {
alert('Your browser cannot handle this script');
}
}
// continued …
SWE 444 - Internet & Web Application Development
5.176
… Cross-Browser Code
function getmessage() {
document.getElementById("to").innerHTML=
xmlDoc.getElementsByTagName("to")[0].firstChild.nodeValue
document.getElementById("from").innerHTML=
xmlDoc.getElementsByTagName("from")[0].firstChild.nodeValue
document.getElementById("message").innerHTML=
xmlDoc.getElementsByTagName("body")[0].firstChild.nodeValue
}
</script>
</head><body onload="loadXML()" bgcolor="silver">
<h1>W3Schools Internal Note</h1>
<p><b>To:</b> <span id="to"></span><br />
<b>From:</b> <span id="from"></span>
<hr />
<b>Message:</b> <span id="message"></span>
SWE 444 - Internet & Web Application Development
5.177
Example 2: Creating HTML Table Using XML Data
<script type="text/javascript">
function Show_Node() {
…
Load_DOM();
OutString = "<table border='1'>"
OutString += "<tr style='background-color:#E6E6E6'>"
OutString += " <th>Title</th>"
OutString += " <th>Author</th>"
OutString += " <th>Price</th>"
OutString += "</tr>“
BookNode = XMLDoc.selectSingleNode("/library/book[@edition]")
OutString += "<tr>"
NodeList = BookNode.childNodes
for (i=0; i < NodeList.length; i++) {
if (i < NodeList.length) {
OutString += "<td>" + NodeList(i).text + "</td>"
}
}
OutString += "</tr>"
OutString += "</table>"
document.all.Output.innerHTML = OutString
}
</script>
SWE 444 - Internet & Web Application Development
5.178
Example 3: Some XML DOM Properties/Methods
<script type="text/javascript">
…
// get the root element
var element = xmlDocument.documentElement;
document.writeln("<p>Here is the root node of the document:" );
document.writeln( "<strong>" + element.nodeName + "</strong>" );
document.writeln(
"<br>The following are its child elements:" );
document.writeln( "</p><ul>" );
// traverse all child nodes of root element
for ( i = 0; i < element.childNodes.length; i++ ) {
var curNode = element.childNodes.item( i );
// print node name of each child element
document.writeln( "<li><strong>" + curNode.nodeName
+ "</strong></li>" );
}
document.writeln( "</ul>" );
// get the first child node of root element
var currentNode = element.firstChild;
…
</script>
SWE 444 - Internet & Web Application Development
5.179
Example 4: Creating XML Using DOM Methods
<script type="text/javascript">
var comment, stud1, stud2, studs
function mkRecord(){
comment = document.createComment(‘My Tullab records');
studs = document.createElement('tullab');
stud1 = createStudent("G. Al-Good", 4.0);
stud2 = createStudent("P. Al-Probation", 1.7);
studs.appendChild(comment);
studs.appendChild(stud1);
studs.appendChild(stud2);
showObject(studs);
}
function createStudent(name, gpa){
var result = document.createElement('talib');
nm = document.createElement('name');
GPA = document.createElement('gpa');
nm.appendChild(document.createTextNode(name));
GPA.appendChild(document.createTextNode(gpa));
result.appendChild(nm);
result.appendChild(GPA);
return result;
}
</script>
SWE 444 - Internet & Web Application Development
5.180