Transcript chapter 1

Chapter 24
XML
CHAPTER GOALS
• Understanding XML elements and
attributes
• Understanding the concept of an XML
parser
• Being able to read and write XML
documents
• Being able to design Document Type
Definitions for XML documents
XML
• Stands for Extensible Markup
Language
• Lets you encode complex data in a
form that the recipient can parse easily
• Is independent from any programming
language
XML Encoding of Coin Data
<coin>
<value>0.5</value>
<name>half dollar</name>
</coin>
Advantages of XML
• XML files are readable by both computers
and humans
• XML formatted data is resilient to change
o It is easy to add new data elements
o Old programs can process the old
information in the new data format
Differences Between XML and
HTML
• Both are descendants of SGML (Standard
Generalized Markup Language)
• XML is a simplified version of SGML
• XML is very strict but HTML (as used today)
is not
• XML tells what the data means; HTML tells
how to display data
Differences Between XML and
HTML
•
XML tags are case-sensitive
o
<LI> is different from <li>
•
Every XML start tag must have a matching end
tag
•
If a tag has no end-tag, it must end in />
o
•
<img src="hamster.jpeg"/>
XML attribute values must be enclosed in quotes
o
<img src="hamster.jpeg" width="400" height="300"/>
Structure of an XML Document
• An XML data set is called a document
• The document starts with a header
<?xml version 1.0?>
• The data are contained in a root element
<?xml version 1.0?>
<purse>
more data
</purse>
• The document contains elements and text
Structure of an XML Document
• An XML element has one of two forms
<elementTag optional attributes> contents </elementTag>
or
<elementTag optional attributes/>
• The contents can be elements or text or both
• An example of an element with both elements and text
(mixed content):
<p>Use XML for <strong>robust</strong> data
formats.</p>
• Avoid mixed content for data descriptions
Structure of an XML Document
• An element can have attributes
• The a element in HTML has an href attribute
<a href="http://java.sun.com"> ... </a>
• An attribute has a name (such as href) and a value
• The attribute value is enclosed in either single or double quotes
• Attribute is intended to provide information about the content
<value currency="USD">0.5</value>
or
<value currency="EUR">0.5</value>
• An element can have multiple attributes
Parsing XML Documents
• A parser is a program that
o Reads a document
o Checks whether it is syntactically cornet
o Takes some action as it processes the
document
• There are two kinds of XML parsers
o SAX (Simple Access to XML)
o DOM ( Document Object Model)
Parsing XML Documents
• SAX parser
o Event-driven
o It calls a method you provide to process each
construct it encounters
o More efficient for handling large XML
documents
• DOM parser
o Builds a tree that represents the document
o When the parser is done, you can analyze the tree
o Easier to use for most applications
JAXP
• Stands for Java API for XML Processing
• Provides a standard mechanism for DOM
parsers to read and create documents
• Part of Java1.4 and above
• Earlier versions need to download additional
libraries
Parsing XML Documents
• Document interface describes the tree structure of an XML
document
• A DocumentBuilder can generate an object of a class that
implements Document interface
• Get a DocumentBuilder by calling the static newInstance
method of the DocumentBuilderFactory class
• Call newDocumentBuilder method of the factory to get a
DocumentBuilder
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Parsing XML Documents
• To read a document from a file
String fileName = . . . ;
File f = new File(filename);
Document doc = builder.parse(f);
• To read a document from a URL on the Internet
String urlName = . . . ;
URL u = new URL(urlName);
Document doc = builder.parse(u);
• To read from an input stream
InputStream in = . . . ;
Document doc = builder.parse(in);
Parsing XML Documents
• You can inspect or modify the document
• The document tree consists of nodes
• Two node type are Element and Text
• Element and Text are subinterfaces of the Node
interface
An XML Document
<?xml version="1.0"?>
<items>
<item>
<product>
<description>Ink Jet Refill Kit</description>
<price>29.95</price>
</product>
<quantity>8</quantity>
</item>
<item>
<product>
<description>4-port Mini Hub</description>
<price>19.95</price>
</product>
<quantity>4</quantity>
</item>
</items>
Tree View of XML Document
Parsing XML Documents
• Start inspection of the tree by getting the root element
Element root = doc.getDocumentElement();
• To get the child elements of an element
o Use the GetChildNodes method of the Element interface
o The nodes are stored in an object of a class that implements the
NodeList interface
• Use a NodeList to visit the child nodes of an element
o getLength method gives the number of elements
o item method gets an item in the node list
• Code to get a child node
NodeList nodes = root.getChildNodes();
int i = . . . ; //a value between o and getlength() - 1
Node child = nodes.item(i);
• The XML parser keeps all white spaces if you don't use a DTD
o You can include a test to ignore the white space
Parsing XML Documents
• Get an element name with the getTagName
Element priceElement = . . . ;
String name = priceElement.getTagName();
• To find the value of the currency attribute
String attributeValue = priceElement.getAttribute("currency")
• You can also iterate through all attributes
o Use a NamedNodeMap
o Each attribute is stored in a Node
Parsing XML Documents
• Some elements have children that contain text
• Document builder creates nodes of type Text
• If you don't use mixed content elements
o Any element containing text has a single Text child node
o Use getFirstChild method to get it
o Use getData method to read the text
• To determine the price stored in the price element
Element priceNode = . . . ;
Text priceData = (Text)priceNode.getFirstChild();
String priceString = priceNode.getData();
double price = Double.parseDouble(priceString);
File ItemListParser.java
001: import java.io.File;
002: import java.io.IOException;
003: import java.util.ArrayList;
004: import javax.xml.parsers.DocumentBuilder;
005: import javax.xml.parsers.DocumentBuilderFactory;
006: import javax.xml.parsers.ParserConfigurationException;
007: import org.w3c.dom.Attr;
008: import org.w3c.dom.Document;
009: import org.w3c.dom.Element;
010: import org.w3c.dom.NamedNodeMap;
011: import org.w3c.dom.Node;
012: import org.w3c.dom.NodeList;
013: import org.w3c.dom.Text;
014: import org.xml.sax.SAXException;
015:
016: /**
017: An XML parser for item lists
018: */
019: public class ItemListParser
020: {
021: /**
022:
Constructs a parser that can parse item lists
023: */
024: public ItemListParser()
025:
throws ParserConfigurationException
026: {
027:
DocumentBuilderFactory factory
028:
= DocumentBuilderFactory.newInstance();
029:
builder = factory.newDocumentBuilder();
030: }
031:
032: /**
033:
Parses an XML file containing an item list
034:
@param fileName the name of the file
035:
@return an array list containing all items in the XML file
036: */
037: public ArrayList parse(String fileName)
038:
039:
040:
041:
042:
043:
044:
045:
046:
047:
048:
049:
050:
051:
052:
053:
054:
055:
056:
057:
throws SAXException, IOException
{
File f = new File(fileName);
Document doc = builder.parse(f);
// get the <items> root element
Element root = doc.getDocumentElement();
return getItems(root);
}
/**
Obtains an array list of items from a DOM element
@param e an <items> element
@return an array list of all <item> children of e
*/
private static ArrayList getItems(Element e)
{
ArrayList items = new ArrayList();
058:
// get the <item> children
059:
060:
NodeList children = e.getChildNodes();
061:
for (int i = 0; i < children.getLength(); i++)
062:
{
063:
Node childNode = children.item(i);
064:
if (childNode instanceof Element)
065:
{
066:
Element childElement = (Element)childNode;
067:
if (childElement.getTagName().equals("item"))
068:
{
069:
Item c = getItem(childElement);
070:
items.add(c);
071:
}
072:
}
073:
}
074:
return items;
075: }
076:
077: /**
078:
Obtains an item from a DOM element
079:
@param e an <item> element
080:
@return the item described by the given element
081: */
082: private static Item getItem(Element e)
083: {
084:
NodeList children = e.getChildNodes();
085:
Product p = null;
086:
int quantity = 0;
087:
for (int j = 0; j < children.getLength(); j++)
088:
{
089:
Node childNode = children.item(j);
090:
if (childNode instanceof Element)
091:
{
092:
Element childElement = (Element)childNode;
093:
String tagName = childElement.getTagName();
094:
if (tagName.equals("product"))
095:
p = getProduct(childElement);
096:
else if (tagName.equals("quantity"))
097:
{
098:
099:
100:
101:
102:
103:
104:
105:
106:
107:
108:
109:
110:
111:
112:
113:
114:
115:
116:
117:
Text textNode = (Text)childElement.getFirstChild();
String data = textNode.getData();
quantity = Integer.parseInt(data);
}
}
}
return new Item(p, quantity);
}
/**
Obtains a product from a DOM element
@param e a <product> element
@return the product described by the given element
*/
private static Product getProduct(Element e)
{
NodeList children = e.getChildNodes();
String name = "";
double price = 0;
for (int j = 0; j < children.getLength(); j++)
118:
{
119:
Node childNode = children.item(j);
120:
if (childNode instanceof Element)
121:
{
122:
Element childElement = (Element)childNode;
123:
String tagName = childElement.getTagName();
124:
Text textNode = (Text)childElement.getFirstChild();
125:
126:
String data = textNode.getData();
127:
if (tagName.equals("description"))
128:
name = data;
129:
else if (tagName.equals("price"))
130:
price = Double.parseDouble(data);
131:
}
132:
}
133:
return new Product(name, price);
134: }
135:
136: private DocumentBuilder builder;
137: }
File ItemListParserTest.java
01: import java.util.ArrayList;
02:
03: /**
04: This program parses an XML file containing an item list.
05: It prints out the items that are described in the XML file.
06: */
07: public class ItemListParserTest
08: {
09: public static void main(String[] args) throws Exception
10: {
11:
ItemListParser parser = new ItemListParser();
12:
ArrayList items = parser.parse("items.xml");
13:
for (int i = 0; i < items.size(); i++)
14:
{
15:
Item anItem = (Item)items.get(i);
16:
System.out.println(anItem.format());
17:
}
18: }
19: }
Creating XML Documents
• We can build a Document object in a Java program
and then save it as an XML document
• We need a DocumentBuilder object to create a new,
empty document
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder =
factory.newDocumentBuilder();
Document doc = builder.newDocument();
//empty document
• The Document class has methods to create elements
and text nodes
Creating XML Documents
• To create an element use createElement method and
pass it a tag
Element itemElement = doc.createElement("item");
• To create a text node, use createTextNode and pass it
a string
Text quantityText= doc.createTextNode("8");
• Use setAttribute method to add an attribute to the tag
priceElement.setAttribute("currency", "USD");
Creating XML Documents
• To construct the tree structure of a document
o start with the root
o add children with appendChild
• To build an XML tree that describes an item
// create elements
Element itemElement = doc.createElement("item");
Element productElement = doc.createElement("product");
Element descriptionElement =
doc.createElement("description");
Element priceElement = doc.createElement("price");
Element quantityElement =
doc.createElement("quantity");
Text descriptionText =
doc.createTextNode("Ink Jet Refill Kit");
Text priceText = doct.createTextNode("29.95");
Text quantityText = doc.createTextNode("8");
// add elements to the document
doc.appendChild(itemElement);
itemElement.appendChild(productElement);
itemElement.appendChild(quantityElement);
productElement.appendChild(descriptionElement);
productElement.appendChild(priceElement);
descriptionElement.appendChild(descriptionText);
priceElement.appendChild(priceText);
quantityElement.appendChild(quantityText);
Creating XML Documents
• Use a Transformer to write an XML document to a stream
• Create a transformer
Transformer t =
TransformerFactory.newInstance().newTransformer();
• Create a DOMSource from your document
• Create a StreamResult from your output stream
• Call the transform method of your transformer
t.transform(new DOMSource(doc),
new StreamResult(System.out));
File ItemListBuilder.java
001: import java.util.ArrayList;
002: import javax.xml.parsers.DocumentBuilder;
003: import javax.xml.parsers.DocumentBuilderFactory;
004: import javax.xml.parsers.ParserConfigurationException;
005: import org.w3c.dom.Document;
006: import org.w3c.dom.Element;
007: import org.w3c.dom.Text;
008:
009: /**
010: Builds a DOM document for an array list of items.
011: */
012: public class ItemListBuilder
013: {
014: /**
015:
Constructs an item list builder.
016: */
017: public ItemListBuilder()
018:
019:
020:
021:
022:
023:
024:
025:
026:
027:
028:
029:
030:
031:
032:
033:
034:
035:
036:
037:
throws ParserConfigurationException
{
DocumentBuilderFactory factory
= DocumentBuilderFactory.newInstance();
builder = factory.newDocumentBuilder();
}
/**
Builds a DOM document for an array list of items.
@param items the items
@return a DOM document describing the items
*/
public Document build(ArrayList items)
{
doc = builder.newDocument();
Element root = createItemList(items);
doc.appendChild(root);
return doc;
}
038:
039:
040:
041:
042:
043:
044:
045:
046:
047:
048:
049:
050:
051:
052:
053:
054:
055:
056:
057:
/**
Builds a DOM element for an array list of items.
@param items the items
@return a DOM element describing the items
*/
private Element createItemList(ArrayList items)
{
Element itemsElement = doc.createElement("items");
for (int i = 0; i < items.size(); i++)
{
Item anItem = (Item)items.get(i);
Element itemElement = createItem(anItem);
itemsElement.appendChild(itemElement);
}
return itemsElement;
}
/**
Builds a DOM element for an item.
@param anItem the item
058:
059:
060:
061:
062:
063:
064:
065:
066:
067:
068:
069:
070:
071:
072:
073:
074:
075:
076:
077:
@return a DOM element describing the item
*/
private Element createItem(Item anItem)
{
Element itemElement = doc.createElement("item");
Element productElement
= createProduct(anItem.getProduct());
Text quantityText = doc.createTextNode(
"" + anItem.getQuantity());
Element quantityElement = doc.createElement("quantity");
quantityElement.appendChild(quantityText);
itemElement.appendChild(productElement);
itemElement.appendChild(quantityElement);
return itemElement;
}
/**
Builds a DOM element for a product.
@param p the product
078:
@return a DOM element describing the product
079: */
080: private Element createProduct(Product p)
081: {
082:
Text descriptionText
083:
= doc.createTextNode(p.getDescription());
084:
Text priceText = doc.createTextNode("" + p.getPrice());
085:
086:
Element descriptionElement
087:
= doc.createElement("description");
088:
Element priceElement = doc.createElement("price");
089:
090:
descriptionElement.appendChild(descriptionText);
091:
priceElement.appendChild(priceText);
092:
093:
Element productElement = doc.createElement("product");
094:
095:
productElement.appendChild(descriptionElement);
096:
productElement.appendChild(priceElement);
097:
098:
return productElement;
099: }
100:
101: private DocumentBuilder builder;
102: private Document doc;
103: }
File ItemListBuilderTest.java
01: import java.util.ArrayList;
02: import org.w3c.dom.Document;
03: import javax.xml.transform.Transformer;
04: import javax.xml.transform.TransformerFactory;
05: import javax.xml.transform.dom.DOMSource;
06: import javax.xml.transform.stream.StreamResult;
07:
08: /**
09: This program tests the item list builder. It prints the
10: XML file corresponding to a DOM document containing a list
11: of items.
12: */
13: public class ItemListBuilderTest
14: {
15: public static void main(String[] args) throws Exception
16: {
17:
ArrayList items = new ArrayList();
18:
items.add(new Item(new Product("Toaster", 29.95), 3));
19:
items.add(new Item(new Product("Hair dryer", 24.95), 1));
20:
21:
ItemListBuilder builder = new ItemListBuilder();
22:
Document doc = builder.build(items);
23:
Transformer t = TransformerFactory
24:
.newInstance().newTransformer();
25:
t.transform(new DOMSource(doc),
26:
new StreamResult(System.out));
27: }
28: }
Document Type Definitions
• A DTD is a set of rules for correctly formed documents of a particular type
o
Describes the legal attributes for each element type
o
Describes the legal child elements for each element type
• Legal child elements are described with an ELEMENT rule
<!ELEMENT items (item*)>
• The items element (the root in this case) can have 0 or more item elements
• Definition of an item node
<!ELEMENT item (product, quantity)>
• Children of the item node must be a product node followed by a quantity
node
Document Type Definitions
• Definition of product node
<! ELEMENT product (description, price)>
• The other nodes
<!ELEMENT quantity (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT price (#PCDATA)>
• #PCDATA stands for parsable character data which is just text
o Can contain any characters
o Special characters have to be encoded when they occur in
character data
Encodings for Special Characters
DTD for Item List
<!ELEMENT items (item)*>
<!ELEMENT item (product, quantity)>
<!ELEMENT product (description, price)>
<!ELEMENT quantity (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT price (#PCDATA)>
Regular Expressions for Element Content
Document Type Definitions
• A DTD gives you control over the allowed attributes
of an element
<!ATTLIST Element Attribute Type Default>
• Type can be any sequence of character data specified
as CDATA
• Type can also specify a finite number of choices
<!ATTLIST price currency (USD | EUR | JPY )
#REQUIRED >
Common Attribute Types
Attribute Defaults
Document Type Definitions
•
#IMPLIED keyword means you can supply an
attribute or not.
<!ATTLIST price currency CDATA #IMPLIED >
•
If you omit the attribute, the application
processing the XML data implicitly assumes
some default value
•
You can specify a default to be used if the
attribute is not specified
<!ATTLIST price currency CDATA "USD" >
Parsing with Document Type
Definitions
• Specify a DTD with every XML document
• Instruct the parser to check that the document
follows the rules of the DTD
• Then the parser can be more intelligent about
parsing
• If the parser knows that the children of an
element are elements, it can suppress white spaces
Parsing with Document Type
Definitions
• An XML document can reference a DTD in one of
two ways
• The document may contain the DTD
• The document may refer to a DTD stored elsewhere
• A DTD is introduced with a DOCTYPE declaration
Parsing with Document Type
Definitions
• If the document contains the DTD, the declaration looks like this:
<!DOCTYPE rootElement [ rules ]>
• Example
<?xml version="1.0"?>
<!DOCTYPE items [
<!ELEMENT items (item*)>
<!ELEMENT item (product, quantity)>
<!ELEMENT product (description, price)>
<!ELEMENT quantity (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT price (#PCDATA)>
]>
<items>
<item>
<product>
<description>Ink Jet Refill Kit</description>
<price>29.95</price>
</product>
<quantity>8</quantity>
</item>
<item>
<product>
<description>4-port Mini Hub</description>
<price>19.95</price>
</product>
<quantity>4</quantity>
</item>
</items>
Parsing with Document Type
Definitions
• If the DTD is stored outside the document, use the SYSTEM
keyword inside the DOCTYPE declaration
• This indicates that the system must locate the DTD
• The location of the DTD follows the SYSTEM keyword
• A DOCTYPE declaration can point to a local file
<!DOCTYPE items SYSTEM "items.dtd" >
• A DOCTYPE declaration can point to a URL
<!DOCTYPE items
SYSTEM "http://www.mycompany.com/dtds/items.dtd">
Parsing with Document Type
Definitions
• When your XML document has a DTD, use validation when
parsing
• Then the parser will check that all child elements and attributes
conform
to the ELEMENT and ATTRIBUTE rules in the DTD
• The parser throws an exception if the document is invalid
• Use the setValidating method of the DocumentBuilderFactory
before calling newDocumentBuilder method
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setValidating(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(. . .);
Parsing with Document Type
Definitions
• If the parser validates the document with a DTD, you
can avoid validity checks in your code
• You can tell the parser to ignore white space in non-text
elements
factory.setValidating(true);
factory.setIgnoringElementContentWhitespace(true);
• If the parser has access to a DTD, it can fill in defaults
for attributes
File ItemListParser.java
001: import java.io.File;
002: import java.io.IOException;
003: import java.util.ArrayList;
004: import javax.xml.parsers.DocumentBuilder;
005: import javax.xml.parsers.DocumentBuilderFactory;
006: import javax.xml.parsers.ParserConfigurationException;
007: import org.w3c.dom.Attr;
008: import org.w3c.dom.Document;
009: import org.w3c.dom.Element;
010: import org.w3c.dom.NamedNodeMap;
011: import org.w3c.dom.Node;
012: import org.w3c.dom.NodeList;
013: import org.w3c.dom.Text;
014: import org.xml.sax.SAXException;
015:
016: /**
017: An XML parser for item lists
018: */
019: public class ItemListParser
020: {
021: /**
022:
Constructs a parser that can parse item lists
023: */
024: public ItemListParser()
025:
throws ParserConfigurationException
026: {
027:
DocumentBuilderFactory factory
028:
= DocumentBuilderFactory.newInstance();
029:
factory.setValidating(true);
030:
factory.setIgnoringElementContentWhitespace(true);
031:
builder = factory.newDocumentBuilder();
032: }
033:
034: /**
035:
Parses an XML file containing an item list
036:
@param fileName the name of the file
037:
@return an array list containing all items in the XML file
038:
039:
040:
041:
042:
043:
044:
045:
046:
047:
048:
049:
050:
051:
052:
053:
054:
055:
056:
057:
*/
public ArrayList parse(String fileName)
throws SAXException, IOException
{
File f = new File(fileName);
Document doc = builder.parse(f);
// get the <items> root element
Element root = doc.getDocumentElement();
return getItems(root);
}
/**
Obtains an array list of items from a DOM element
@param e an <items> element
@return an array list of all <item> children of e
*/
private static ArrayList getItems(Element e)
{
058:
059:
060:
061:
062:
063:
064:
065:
066:
067:
068:
069:
070:
071:
072:
073:
074:
075:
076:
077:
ArrayList items = new ArrayList();
// get the <item> children
NodeList children = e.getChildNodes();
for (int i = 0; i < children.getLength(); i++)
{
Element childElement = (Element)children.item(i);
Item c = getItem(childElement);
items.add(c);
}
return items;
}
/**
Obtains an item from a DOM element
@param e an <item> element
@return the item described by the given element
*/
private static Item getItem(Element e)
078:
079:
080:
081:
082:
083:
084:
085:
086:
087:
088:
089:
090:
091:
092:
093:
094:
095:
096:
097:
{
NodeList children = e.getChildNodes();
Product p = getProduct((Element)children.item(0));
Element quantityElement = (Element)children.item(1);
Text quantityText
= (Text)quantityElement.getFirstChild();
int quantity = Integer.parseInt(quantityText.getData());
return new Item(p, quantity);
}
/**
Obtains a product from a DOM element
@param e a <product> element
@return the product described by the given element
*/
private static Product getProduct(Element e)
{
098:
NodeList children = e.getChildNodes();
099:
100:
Element descriptionElement = (Element)children.item(1);
101:
Text descriptionText
102:
= (Text)descriptionElement.getFirstChild();
103:
String description = descriptionText.getData();
104:
105:
Element priceElement = (Element)children.item(1);
106:
Text priceText
107:
= (Text)priceElement.getFirstChild();
108:
double price = Double.parseDouble(priceText.getData());
109:
110:
return new Product(description, price);
111: }
112:
113: private DocumentBuilder builder;
114: }
File ItemListParserTest.java
01: import java.util.ArrayList;
02:
03: /**
04: This program parses an XML file containing an item list.
05: The XML file should reference the items.dtd
06: */
07: public class ItemListParserTest
08: {
09: public static void main(String[] args) throws Exception
10: {
11:
ItemListParser parser = new ItemListParser();
12:
ArrayList items = parser.parse("items.xml");
13:
for (int i = 0; i < items.size(); i++)
14:
{
15:
Item anItem = (Item)items.get(i);
16:
System.out.println(anItem.format());
17:
}
18: }
19: }