school of library, archive and information studies lis1510

Download Report

Transcript school of library, archive and information studies lis1510

SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
LIS1510
Library and Archives Automation Issues
XML and extensible
systems
Andy Dawson
School of Library, Archive & Information Studies, UCL
(University of Malta 2008)
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
What we will be covering today
•
•
•
•
•
Shortcomings of HTML
Generalised markup languages
How XML works
XML document types
Other related extensible technologies
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Limitations of (X)HTML
• Fixed tag set (specifications determined by
W3C)
• Intended for display of documents on the
Web
• Doesn’t do everything everyone wants
• Not easy to use for other purposes
– searching in documents
– analysis of documents
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Principles of Generalized Markup
• Descriptive markup – encodes features
within a document
• Say what those features are - not what to do
with them
• Need to define your own tags
• Creates machine-independent data
• Data can then be used for different
purposes
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
SGML
• SGML – Standard Generalized Markup
Language
– International standard in 1986
– Metalanguage (syntactic framework) for
defining markup tags
– Parts of SGML are rather complex
– Used by large projects
– Not particularly easy to get started
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
XML
• XML (Extensible Markup Language)
– Adopted by World Wide Web Consortium
in 1998
– Cut-down version of SGML
– Based on same principles
– Designed to implement easily on the Web
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Advantages of XML
•
•
•
•
Machine-independent plain ASCII files
Potential longevity
Multi-purpose use
Ability to analyse/manipulate content
• BUT need to define tag set!
• Not a replacement for HTML unless
analysis/manipulation of data is required
• However, XHTML has become a ‘reliable’
alternative option for simple web publishing
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Defining Your Own Tags
• Need to undertake document analysis
– Identify key features in document
– Identify structure of document
– Choose names for tags
• Only then can we apply the tag
scheme
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Example of a Newspaper
Name of newspaper
Issue
Article
Headline
Author
Paragraphs
Pictures
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Basics of XML Syntax
• Documents are composed of elements
• Start and end tags for every element - unlike
HTML, end tags must be present
– also “Empty elements”
• Attributes
– modify an element
– have a name and a value
– Value must be enclosed in matching quotes (single or
double)
– An element may have several attributes
• Documents can be “Well-formed” or “Valid”
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Well-formed Documents
• Well-formed documents follow XML syntax i.e.
– start and end tags
– attributes in quotes
– nested structure
• But they have no pre-defined structure!
• Therefore:
– Can only check the syntax
– Cannot validate the structure of well-formed documents
• Prepares documents for potential use/conversion
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Valid Documents
• A Valid XML document contains (or refers
to) a Document Type Definition (DTD)
• The DTD is a specification of the document
structure identifying
– which elements are allowed
– where they are allowed
– which attributes they may take
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Related technologies
• CSS – Cascading Style Sheets
– As used with HTML
– Concentrate only on appearance
• XHTML
– Version of HTML conformant with XML syntax
• XSL - eXtensible Stylesheet Language
– XML language for style sheets
– Controls the appearance of the elements within the
document & defines templates for processing elements
• XML Schemas
– Another way of defining document information
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
That’s all folks…
• Any questions?
• Optional XML exercise is
available…anyone?
• Otherwise – carry on with your
coursework
• Next Tuesday: Website management and
last chance to finish off your website!
…and have a nice weekend 
Andy Dawson