SAX and parsing examples

Download Report

Transcript SAX and parsing examples

SAX: Simple API for XML 1.0
Showing structure of XML with a java program
The java program Tree.java runs the SAX parser on an XML file to
display “tree” structure and can also be used to show parsing errors.
Program appears in notes and in Dietel xml text, chapter 9.
An xml file: spacing1.xml
<?xml version = "1.0"?>
<!-- Fig. 9.4 : spacing1.xml
-->
<!-- Whitespaces in nonvalidating parsing -->
<!-- XML document without DTD
-->
<test name = " spacing 1 ">
<example><object>World</object></example>
</test>
SAX: Simple API for XML
A java program to
show tree structure
of an XML
document
Tree.java.
Run on command line:
java Tree yes/no f.xml
Run Tree on notvalid.xml
<?xml version = "1.0"?>
<!-- Fig. 9.6 : notvalid.xml
-->
<!-- Validation and non-validation -->
<!DOCTYPE test [
<!ELEMENT test (example)>
<!ELEMENT example (#PCDATA)>
]>
<test>
<?test message?>
<example><item><![CDATA[Hello & Welcome!]]></item></example>
</test>
Run Tree on notvalid.xml
C:\PROGRA~1\JAVA\JDK15~1.0_0\BIN>java Tree no notvalid.xml
URL: file:C:/PROGRA~1/Java/JDK15~1.0_0/bin/notvalid.xml
[ document root ]
+-[ element : test ]
+-[ ignorable ]
+-[ proc-inst : test ] "message"
+-[ ignorable ]
+-[ element : example ]
+-[ element : item ]
+-[ text ] "Hello & Welcome!"
+-[ ignorable ]
[ document end ]
C:\PROGRA~1\JAVA\JDK15~1.0_0\BIN>
Running again but showing SAX
parse errors
C:\PROGRA~1\JAVA\JDK15~1.0_0\BIN>java Tree yes notvalid.xml
URL: file:C:/PROGRA~1/Java/JDK15~1.0_0/bin/notvalid.xml
[ document root ]
+-[ element : test ]
+-[ ignorable ]
+-[ proc-inst : test ] "message"
+-[ ignorable ]
+-[ element : example ]
Parse Error: Element type "item" must be declared.
C:\PROGRA~1\JAVA\JDK15~1.0_0\BIN>
The pastry xml file
<?xml version = "1.0"?>
<!-- pastry.xml
-->
<!-- Using an external subset -->
<!DOCTYPE donuts SYSTEM "pastry.dtd">
<donuts>
<jelly>grape</jelly>
<lemon>sour</lemon>
<lemon>real sour</lemon>
<glazed>chocolate</glazed>
</donuts>
Running a SAX parser on this (java
code in notes)
C:\PROGRA~1\JAVA\JDK15~1.0_0\BIN>java MySAXApp pastry.xml
Start document
Start element: donuts
Start element: jelly
Characters: "grape"
End element: jelly
Start element: lemon
Characters: "sour"
End element: lemon
Start element: lemon
Characters: "real sour"
End element: lemon
Start element: glazed
Characters: "chocolate"
End element: glazed
End element: donuts
End document
C:\PROGRA~1\JAVA\JDK15~1.0_0\BIN>
A day planner using the SAX
parser for XML and java
SAX: Simple API for XML 2.0
Sax 2.0 recently released
• Xerces parser available at
http://xml.apache.org/#xerces
You may need to search the apache site, I
found the latest version, zip file at
http://apache.cs.utah.edu/xml/xerces-j/
This SAX v2.0 parser is needed to run
PrintXML.java example, chapter 9 Dietel.
Sun App Server –an aside
This server may be needed for the WSDP
(webservicesdevelopmentpack) which
contains a tutorial, xerxes and xalan
parsers, and classes for SOAP
C:\Sun\AppServer\docs\about.html
Xerces parser
• Dietel’s XML How to program comes with
a release of the Xerces parser. You can
unzip these files – you’ll need them for
SOAP in any case. You can also find the
Xerces parser on the Apache site.
Xerces
•
•
•
•
Xerces-J is packaged as a ZIP file for all platforms and operating systems.
You can run the Java jar command to unpack the distribution.
jar xf Xerces-J-bin.1.2.0.zip
jar xf Xerces-J-src.1.2.0.zip
•
•
This command creates a "xerces-1_2_0" sub-directory in the current directory
containing all the files.
•
Files in the binary package release
•
•
•
•
•
•
•
•
•
LICENSE
License for Xerces-J
Readme.html
Web page redirect to docs/html/index.html
xerces.jar
Jar file containing all the parser class files
xercesSamples.jar Jar file containing all sample class files
data/
Directory containing sample XML data files
docs/html/
Directory containing documentation
docs/apiDocs/
Directory containing Javadoc API for parser framework
To use Xerces-J you do not need the source files.
Xerces
• I just ran the .exe files in the release which
seemed to unpack things ok.
Xerces
• Running Xerces. Xerces is a java
program which comes as a jar file. You’ll
need the path set to your java/bin directory
and the classpath set to wherever
Xerces.jar is. I created batch files to run
SAXCount and DOMCount java programs
displaying the parser at work on xml files.
The next few examples
• These examples come with the distribution
and are at
• C:\Xerces\xerces1_2_0\docs\html\domwriter.html
The xml files: personal.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE personnel SYSTEM "personal.dtd">
<personnel>
<person id="Big.Boss">
<name><family>Boss</family> <given>Big</given></name>
<email>[email protected]</email>
<link subordinates="one.worker two.worker three.worker four.worker five.worker"/>
</person>
<person id="one.worker">
<name><family>Worker</family> <given>One</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"/>
</person>
<person id="two.worker">
<name><family>Worker</family> <given>Two</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"/>
</person>
<person id="three.worker">
<name><family>Worker</family> <given>Three</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"/>
</person>
<person id="four.worker">
<name><family>Worker</family> <given>Four</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"/>
</person>
<person id="five.worker">
<name><family>Worker</family> <given>Five</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"/>
</person>
</personnel>
Personal.dtd
<?xml encoding="UTF-8"?>
<!ELEMENT personnel (person)+>
<!ELEMENT person (name,email*,url*,link?)>
<!ATTLIST person id ID #REQUIRED>
<!ATTLIST person note CDATA #IMPLIED>
<!ATTLIST person contr (true|false) 'false'>
<!ATTLIST person salary CDATA #IMPLIED>
<!ELEMENT name ((family,given)|(given,family))>
<!ELEMENT family (#PCDATA)>
<!ELEMENT given (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT url EMPTY>
<!ATTLIST url href CDATA 'http://'>
<!ELEMENT link EMPTY>
<!ATTLIST link manager IDREF #IMPLIED>
<!ATTLIST link subordinates IDREFS #IMPLIED>
<!NOTATION gif PUBLIC '-//APP/Photoshop/4.0' 'photoshop.exe'>
Batchfiles:DOMCount.bat
set PATH=%PATH%;C:\Progra~1\Java\jdk15.0_0\bin
set CLASSPATH=%CLASSPATH%;c:\xerces1_2_0\xerces.jar;c:\xerces-1_2_0\xercesSamples.jar
cd c:\xerces-1_2_0
java dom.DOMCount data/personal.xml
C:\xerces-1_2_0>java dom.DOMCount
data/personal.xml
data/personal.xml: 170 ms (37 elems, 18 attrs, 26
spaces, 242 chars)
Batchfiles:SAXTest.bat
set PATH=%PATH%;C:\Progra~1\Java\jdk15.0_0\bin
set CLASSPATH=%CLASSPATH%;c:\xerces1_2_0\xerces.jar;c:\xerces-1_2_0\xercesSamples.jar
cd c:\xerces-1_2_0
java sax.SAXCount data/personal.xml
Saxcounter
C:\XERCES~1>SAXTest.bat
C:\XERCES~1>set
PATH=C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\
PROGRA~1\COMMON~1\ADAPTE~1\System;C:\jakarta\JAKART~1.28\bin;c:\progra~1\j
ava\jd
k15~1.0_0\bin;c:\jakarta\jakart~1.28\common\lib;;C:\Progra~1\Java\jdk15.0_0\bin
C:\XERCES~1>set CLASSPATH=c:\progra~1\java\jdk15~1.0_0\lib\tools.jar;c:\progra~1
\java\jdk15~1.0_0\bin;c:\progra~1\java\jdk15~1.0_0\bin\hello;c:\jakarta\jakart~1
.28\common\lib;;c:\xerces-1_2_0\xerces.jar;c:\xerces-1_2_0\xercesSamples.jar
C:\XERCES~1>cd c:\xerces-1_2_0
C:\xerces-1_2_0>java sax.SAXCount data/personal.xml
data/personal.xml: 181 ms (37 elems, 18 attrs, 26 spaces, 242 chars)
SAXWriter.bat:
writing out the xml content
set PATH=%PATH%;C:\Progra~1\Java\jdk15.0_0\bin
set CLASSPATH=%CLASSPATH%;c:\xerces1_2_0\xerces.jar;c:\xerces-1_2_0\xercesSamples.jar
cd c:\xerces-1_2_0
java sax.SAXWriter data/personal.xml
The next slide is the output…I cut some blank lines to get it
to fit.
DOMWriter
C:\xerces-1_2_0>java dom.DOMWriter data/personal.xml
data/personal.xml:
<?xml version="1.0" encoding="UTF-8"?>
<personnel>
<person contr="false" id="Big.Boss">
<name><family>Boss</family> <given>Big</given></name>
<email>[email protected]</email>
<link subordinates="one.worker two.worker three.worker four.worker five.worker"></link>
</person>
<person contr="false" id="one.worker">
<name><family>Worker</family> <given>One</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person contr="false" id="two.worker">
<name><family>Worker</family> <given>Two</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person contr="false" id="three.worker">
<name><family>Worker</family> <given>Three</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person contr="false" id="four.worker">
<name><family>Worker</family> <given>Four</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person contr="false" id="five.worker">
<name><family>Worker</family> <given>Five</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
C:\xerces-1_2_0>java sax.SAXWriter data/personal.xml
data/personal.xml:
<?xml version="1.0" encoding="UTF-8"?>
<personnel>
<person contr="false" id="Big.Boss">
<name><family>Boss</family> <given>Big</given></name>
<email>[email protected]</email>
<link subordinates="one.worker two.worker three.worker four.worker five.work
er"></link>
</person>
<person contr="false" id="one.worker">
<name><family>Worker</family> <given>One</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person contr="false" id="two.worker">
<name><family>Worker</family> <given>Two</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person contr="false" id="three.worker">
<name><family>Worker</family> <given>Three</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person contr="false" id="four.worker">
<name><family>Worker</family> <given>Four</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
<person contr="false" id="five.worker">
<name><family>Worker</family> <given>Five</given></name>
<email>[email protected]</email>
<link manager="Big.Boss"></link>
</person>
Iterator view
some classes missing
org.w3c.dom.
set PATH=%PATH%;C:\Progra~1\Java\jdk15.0_0\bin
•
• set CLASSPATH=%CLASSPATH%;c:\xerces1_2_0\xerces.jar;c:\xerces-1_2_0\xercesSamples.jar
• cd c:\xerces-1_2_0
• java dom.traversal.IteratorView data/personal.xml
• --not running
Xalan: XSL
• Xalan can be downloaded as with Xerces. It also
comes on the Dietel XML CD. Xalan contains
software to process XSL. The distribution also
contains Xerces.
• There are documents to help you get started at:
C:\Xalan\Xalan Getting Started.htm
• You’ll have to set your path and class path as we
did for Xerces.
• I’ve provided an example but your version will
depend on where you unpack the zip files.
SimpleTransform: output
C:\Xalan\xalan-j_1_2_D02\samples\SimpleTransform>java
SimpleTransform
<?xml version="1.0" encoding="UTF-8"?>
<out>Hello</out>
C:\XALAN\XALAN-~1\SAMPLES>
SimpleTransform xml and xsl files
• The xml:
<?xml version="1.0"?>
<doc>Hello</doc>
• The xsl:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="doc">
<out><xsl:value-of select="."/></out>
</xsl:template>
</xsl:stylesheet>
Batch file to run SimpleTransform
set PATH=%PATH%;C:\Progra~1\Java\jdk15.0_0\bin
set CLASSPATH=%CLASSPATH%;c:\xerces1_2_0\xerces.jar;c:\xalan\xalan-j_1_2_D02\xalan.jar;c:\xalan\xalanj_1_2_D02\samples\xalansamples.jar
cd c:\xalan\xalan-j_1_2_D02\samples\SimpleTransform
java SimpleTransform
ApplyXPath: looking for a particular
item in the xml
set PATH=%PATH%;C:\Progra~1\Java\jdk15.0_0\bin
set CLASSPATH=%CLASSPATH%;c:\xerces1_2_0\xerces.jar;c:\xalan\xalanj_1_2_D02\xalan.jar;c:\xalan\xalanj_1_2_D02\samples\xalansamples.jar
cd c:\xalan\xalan-j_1_2_D02\samples\ApplyXPath
java ApplyXPath foo.xml /doc/name/@first
ApplyXPath: output
C:\Xalan\XALAN-~1\samples>cd c:\xalan\xalanj_1_2_D02\samples\ApplyXPath
C:\Xalan\xalan-j_1_2_D02\samples\ApplyXPath>java ApplyXPath
foo.xml /doc/name/@first
<output>
DavidDavidDonaldEmilyJackMyriamPaulRobertScottShane</output>
PureSAX
• The PureSAX class uses SAX
DocumentHandlers and the Xerces SAX
parser to produce a stylesheet tree, an
XML input tree, and the transformation
result tree.
Batch file for PureSAX
set PATH=%PATH%;C:\Progra~1\Java\jdk15.0_0\bin
set CLASSPATH=%CLASSPATH%;c:\xerces1_2_0\xerces.jar;c:\xalan\xalan-j_1_2_D02\xalan.jar;c:\xalan\xalanj_1_2_D02\samples\xalansamples.jar
cd c:\xalan\xalan-j_1_2_D02\samples\PureSAX
java PureSAX
An Applet to transform XML into
HTML
•
The applet uses a stylesheet to transform an XML document into HTML. It
displays the XML document, the stylesheet, and the HTML output.
• How to Use the Xalan-Java applet wrapper
1. Include XSLTProcessorApplet class in an HTML client.
2. Specify the XML source document and XSL stylesheet.
You can use the DocumentURL and StyleURL PARAM tags or the
XSLTProcessorApplet setDocumentURL() method and
XSLTProcessorApplet setStyleURL() method. If the XML document contains
a stylesheet Processing Instruction (PI), you do not need to specify an XSL
stylesheet.
3. Call the XSLTProcessorApplet transformToHTML() method which performs
the transformation and returns the new document as a String.
The applet: remarks on classes
and running it
• This applet transforms XML into HTML. Given the
restrictions imposed by the applet sandbox, the local
copy of this applet does not load and run correctly in
some environments and with some versions of
IE/Netscape. Run the applet from an HTTP server, and
these problems disappear.
• To run the applet from one of our Domino servers, click
here.
• The local copy of client.html assumes that xalan.jar and
xerces.jar are in the Xalan root directory, two directories
above the samples/applet subdirectory. If these JAR files
are located elsewhere, you must edit the applet archive
attribute in client.html to point to xalan.jar and xerces.jar.
• To run the applet locally, click here.
The applet: before hitting button
Running applet (in IE window)
A Servlet example: delivering XML
as HTML
•
•
•
•
•
•
The client (which you must set up) specifies an XML document and a stylesheet. The servlet
performs the transformation and returns the output to the client. You can use media.properties to
specify which stylesheet is to be used depending on the client browser/device.
How to run it:
Configure your application server (Websphere or JServ, for example) so it can find the classes (in
xalansamples.jar) as well as the stylesheets and properties file in the servlet subdirectory.
Set up an HTML client to call DefaultApplyXSL with arguments as illustrated below.
Examples:
http://localhost/servlet/DefaultApplyXSL?URL=/data.xml&xslURL= /style.xsl
–
–
•
http://localhost/servlet/DefaultApplyXSL?URL=/data.xml&xslURL= /style.xsl&debug=true
...ensures that XML and XSL processor messages are returned in the event of problems applying style.xsl to data.xml
http://localhost/servlet/DefaultApplyXSL/data.xml?xslURL=/style.xsl
–
•
...applies the style.xsl stylesheet to the data.xml data. Both files are
served from the Web server's HTTP XSLTInputSource root.
...applies the style.xsl stylesheet to the data.xml data, just like the first example. This is an alternative way of specifying the XML
XSLTInputSource by utilizing the HTTP request's path information.
More servlet examples
•
•
More examples
http://localhost/servlet/DefaultApplyXSL/data.xml
– ...examines data.xml for an associated XSL stylesheet. If multiple XSLs are
associated with the data, the stylesheet whose media attribute maps to your
browser type will be chosen. If no mapping is successful, the primary associated
stylesheet is used.
•
http://localhost/servlet/data.xml
– ...provides the same function as the previous example, but this example
assumes that /servlet/data.xml has been mapped to be executed by this servlet.
The servlet engine may be configured to map all or some *.xml files to this
servlet through the use of servlet aliases or filters.
•
http://localhost/servlet/data.xml?catalog=http://www.xml.org/dtds/oag.xml
– ...supplements any servlet-configured XCatalog with a catalog of supply chain
DTDs residing at the XML.ORG DTD repository.
For more information, see the comments in DefaultApplyXSL.java.
•