Transcript XML

DT228/3 Web Development
Introduction to
XML
XMl Parsers
Uses of XML
•
To exchange data between incompatible systems
(just send an XML document, with an agreed definition
of the tags)
•
For B2B e-commerce – exchange of business documents
between businesses - XML is flexible enough to describe
any logical text structure e.g. Purchase order, invoice
•
To store data – as plain text files, or in databases
•
To create new mark-up languages (I.e. that uses tags) –
Can use XML to agree what the tags mean. Many markup languages already created that have been based on
XML – e.g. JSTL, WML, VoiceXML, XHTML
Using an XMl document
Need an XML Parser to “use” or
parse out the data held in the XMl document
XML Parsers
An XML parser does the following:
• Retrieves and read the an XML document – I.e.
“parses” the document to figure out what’s in it,
• Ensures the document adheres to specific standards
(e.g. well formed? Adheres to DTD?)
• Makes the document contents available to your
application
XML Document parsers
• If you application is going to use XML
documents, you could write your own parser
• But makes sense to use a pre-built parser
• E.g. Java provides an XML parser API that can
be used in any java application that
processes XMl document
• Saves on development work
XMl Document Parsers
• Hundreds of parsers available
• Most parsers are based on two main
interfaces:
– Tree based – Document Object Model
(DOM)
– Event based – Simple API for XMl (SAX)
XML Parsers: Tree based DOM
interface
• Uses Document Object Model (DOM)
• Tree based interface (navigates
through the document)
• Developed by W3C
• XML parsers that use DOM exist for
java, javascript, perl, C++
Tree based DOM parser - example
Object/Tree Interface (DOM)
Definition: Parser reads the XML
document, and creates an in-memory
“tree” of data – an object module of the data
For example:
Given a sample XML document on the next
slide, what kind of tree would be
produced?
Tree based DOM parser - example
Sample XML Document
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE WEATHER SYSTEM "Weather.dtd">
<WEATHER>
<CITY NAME="Hong Kong">
<HI>87</HI>
<LOW>78</LOW>
</CITY>
</WEATHER>
Tree based DOM parser - example
XML Parsers: Event based SAX
parser
• Simple API for XML
• Event based
• Developed by volunteers on the XMLdev mailing list
• http://www.megginson.com/SAX/
Event based SAX parser
Event Based Parser
Definition: Parser reads the XML
document, and generates events for
each parsing event.
They don’t create an in memory object model of the
document – it’s up to the programmer to write the
code to interpret the events
For example:
Given the same XML document, what kind
of events would be produced?
Event based SAX parser: example
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE WEATHER SYSTEM "Weather.dtd">
<WEATHER>
<CITY NAME="Hong Kong">
<HI>87</HI>
<LOW>78</LOW>
</CITY>
</WEATHER>
Event based SAX parser: example
Events generated:
•
•
•
•
•
•
•
•
•
•
1. Start of <Weather> Element
2. Start of <CITY> Element
3. Start of <HI> Element
4. Character Event: 87
5. End of </HI> Element
6. Start of <LOW> Element
7. Character Event: 78
8. End of </LOW> Element
9. End of </CITY> Element
10. End of </WEATHER> Element
Event based parsers
For each of these events, the your
application implements “event
handlers.”
Each time an event occurs, a different
event handler is called.
Your application intercepts these
events, and handles them in any way
you want.
Comparing tree based DOM parser
with event based SAX parser
Questions:
• Which parser is faster?
• Which parser is more efficient?
• Which parser is suitable for which type of XML
documents?
Comparing tree based DOM parser
with event based SAX parser
Tree based:
slower
takes up more memory
Simpler to use
More suitable for documents
that are less structured, with
less repetition of tags.
More suitable where the program
needs to move around the document
alot within the program  need to
keep easy access to full document at
all time.
Event based:
Faster
Takes up much less memory
But More complex to
implement
Good for large, machine generated,
structured documents e.g. book contents
(because repetitive nature of tags allows
for re-use of event handling code and
therefore less work for programmer
Good where only parts of the document
needed at any one time within the
document (event based parsers cannot
“skip around” from one part of the
document to the other
Comparing tree based DOM parser
with event based SAX parser
Performance and Memory
Therefore, when high performance and
low-memory are the most important
criteria, use an event-based parser.
Examples:
• Java applets
• Palm Pilot Applications
• Parsing Huge Data files
Storing XML documents
• Can use XML for data storage – e.g. to
store news headlines, business
documents
• Q: How to store XML documents in a
database?
Storing XML documents
• Choices:
• Keep as XML files.. (filename.xml)
• Put into a relational database and
convert to/from XMl format
• Use a native XML database
Storing XML documents
• Keep as XML files.. (filename.xml)
• ---- Fast for small number of users
-----Eliminates overheads of database
connections
• ---- Large number of users ->
concurrency issues
• -----Poor for high volume read/write
• -----Security/visibility
Storing XML documents
Put into a relational database and convert
to/from XMl format as needed
• ---- Provides “ACID” support to ensure
integrity of access to the data
• ---- Assumes data can become “tabular” in
format (usually data used for transport..)
• ---- Poor for data that is not easily
transformed into table-based structures e.g.
Word processor documents
Storing XML documents
• Store in a Native XML database
• -----Native XML databases are databases designed
especially to store XML documents.
• ---- A native XML database is one that treats XML
documents and elements as the fundamental structures
rather than tables, records, and fields.
• ---- Good for XMl documents that are for human
consumption..”Content”.. (e.g. books, emails)
• ---- ---- Provides “ACID” support to ensure integrity of
access to the data
Storing XML documents
• Store in a Native XML database
(continued)
• ----- Good when XMl documents needs to be returned
(but most applications need data returned in other
formats)..
• Query languages evolving (e.g. XQuery) but no
equivalent yet of SQL update/insert/delete..
• New technology
• (e.g. open source dB eXist)