New Developments For Managing Information on the Web

Download Report

Transcript New Developments For Managing Information on the Web

New Developments For
Managing Information
on the Web
Brian Kelly
UKOLN
University of Bath
Bath, BA2 7AY
1
URL:
http://www.ukoln.ac.uk/
email:
[email protected]
UKOLN is funded by the British Library Research and Innovation Centre,
the Joint Information Systems Committee of the Higher Education Funding
Councils, as well as by project funding from the JISC’s Electronic Libraries
Programme and the European Union.
UKOLN also receives support from the University of Bath where it is based.
Contents
Introduction
The Problems Facing
Information Providers
Some Solutions
• Protocol Developments
• Application Solutions
• Other Solutions
Conclusions
2
Seminar Aims
• To briefly review new
protocol developments
• To describe significant
developments to the
Web architecture
• To ensure information
providers are aware of
implications of current
approaches to website
management and to flag
possible new
approaches
UK Web Focus
UK Web Focus:
•
•
•
•
JISC funded post based at UKOLN (Bath Univ)
Advises UK HE community on web issues
Represents JISC on W3C
Organiser of 2 national Web Management
Workshops (July 1997, Sept 98)
W3C (World Wide Web Consortium):
• International consortium which coordinates
development of web protocols
• Four domains:
• Architecture
• User Interface
3
• Technology & Society
• Web Accessibility
What Are The Problems?
What problems do you face?
4
You Are Not Alone
"What are the main problems facing web
managers" asked on website-info-mgt list
on 17 Feb 99
•
•
•
•
5
50 replies sent in 6 days between 17-22 Feb
22 contributors to thread
Many long, well thought-out replies
See <URL: http://www.mailbase.ac.uk/
lists/website-info-mgt/1999-02/
index.html >
Problem Areas
Legal issues
Technical Issues:
Management issues
Keeping informed
Browser compatibility
Authentication
Resourcing
Other web sites:
What are they doing?
Why don't we do that?
Key Problem
Areas
Indexing
(multiple servers)
Content
Management
Document
Management Systems
Workflow
Design
6
Navigation
Management
Support
Role(s) of
web editors
Lack of a
"web strategy"
Protocol Developments
Protocol development work in progress to:
• Address web
problem areas
e.g. performance
• Provide new
functionality
e.g. e-commerce
See list of Technical
Reports at <URL:
http://www.w3.
org/TR/ >
7
Recommendations
• Resource Description Framework
(RDF) Model and Syntax
Specification
• WebCGM Profile
• Namespaces in XML
• Document Object Model (DOM)
Level 1
• Synchronized Multimedia
Integration Language (SMIL) 1.0
Specification
Working Drafts
• XHTML™ 1.0: The Extensible
HyperText Markup Language — A
Reformulation of HTML 4.0 in XML
Protocol Developments: HTTP
HTTP
• HTTP/1.1: Performance benefits e.g. better
support for caching
• HTTP/NG: Radical redesign to enable
applications (e.g. legacy applications, new
applications such as distributed searching) to
be easily integrated with the web
Solutions will emerge as server and client support
becomes available
Proxies (e.g. web caches) may provide initial support
NOTE: Newer browsers support HTTP/1.1
8
Protocol Developments:
Addressing
URLs:
• Break
• Limited (provide location, not unique name)
URNs (Uniform Resource Names):
• Unique identifiers is solvable (cf. National
insurance number, telephone no.)
• Resolution is very difficult
Solutions:
• DOIs, PURLs, ... - business use?
• "URLs don’t' break - people break them". Think
about URL persistency and naming guidelines
• Read Jakob Neilson's Alertbox column
9
Data Formats
HTML 4.0 used in conjunction with CSS 2.0 (Cascading
Style Sheets) and the DOM provides an architecturally pure,
yet functionally rich environment
HTML 4.0 - W3C-Rec
• Improved forms
• Hooks for stylesheets
• Hooks for scripting
languages
• Table enhancements
• Better printing
CSS Problems
• Changes during CSS development
• Netscape & IE incompatibilities
• Continued use of browsers with
known bugs
• Microsoft patent
10
CSS 2.0 - W3C-Rec
• Support for all HTML
formatting
• Positioning of HTML
elements
• Multiple media support
DOM - W3C-Rec
• Document Object Model
• Hooks for scripting
languages
• Permits changes to
HTML & CSS properties
and content
Why You Need Stylesheets
11
Stylesheets:
• Key part of W3C's architectural view of Web
• Separation of structure and appearance reduces
maintenance workload
• Designed to support accessibility
• Should be deployed now (today's nicely designed
website is tomorrow's maintenance nightmare)!
But:
• Browser compatibility problems
• Not insurmountable
Recommendation:
• Use stylesheets now - don't continue to create
tomorrow's maintenance problems
XML
HTML 4.0 / CSS 2.0 have limitations:
• Difficulties in introducing new elements
– Time-consuming standardisation process
(<ABBREV>)
– Dictated by browser vendor (<BLINK>, <MARQUEE>)
• Area may be inappropriate for standarisation:
– Covers specialist area (maths, music, ...)
– Application-specific (<STUD-NUM>)
• HTML is a display (output) format
• HTML's lack of arbitrary structure limits
functionality:
12
– Find all memos copied to John Smith
– How many unique tracks on Jackson Browne CDs
XML
XML:
•
•
•
•
Extensible Markup Language
A lightweight SGML designed for network use
Addresses HTML's lack of evolvability
Arbitrary elements can be defined
(<STUDENT-NUMBER>, <PART-NO>, etc)
• Agreement achieved quickly - XML 1.0
became W3C Recommendation in Feb 1998
• Support from industry (SGML vendors,
Microsoft, etc.)
• Support in Netscape 5 and IE 5
13
XML Support
XML document with no
style sheet - XML tree displayed
XML support:
• Can be provided at
backend
• (Partial) XML support
in IE 5
• Also in Netscape 5?
XML document with style sheet
14
XLink and XPointer
XLink will provide sophisticated
England
hyperlinking missing in HTML:
France
• Links that lead user to multiple destinations
• Bidirectional links
• Links with special behaviours:
– Expand-in-place / Replace / Create new window
– Link on load / Link on user action
<commentary xml:link="extended" inline="false">
• Link databases
<locator href="smith2.1" role="Essay"/>
<locator href="jones1.4" role="Rebuttal"/>
XPointer will provide
<locator href="robin3.2" role="Comparison"/>
access to arbitrary
</commentary>
portions of XML resource
XSL stylesheet language will provide extensibility and
transformation facilities (e.g. create a table of contents)
15
XML and HTML
HTML 4.0 is being expressed in XML - XHTML
Issues:
• Documents must be well-formed
• Tags in lowercase
• Quote attributes: <img src="foo" height="10"
• <li>End tags required</li>
• Empty elements: <img src="foo" / > <br / >
• Tidy utility
• See <URL: http://www.w3.org/TR/
WD-html-in-xml/>
16
Note:
Time to produce XHTML documents?
Metadata
Metadata - the missing architectural component from
the initial implementation of
the web
Addressing
URL
Metadata Needs:
17
•
•
•
•
•
•
Resource discovery
Content filtering
Authentication
Improved navigation
Multiple format support
Rights management
Transport Data format
HTTP
HTML
Resource Discovery  RDF
"AltaVista" Metadata
<META NAME="Description" CONTENT="…">
Dublin Core
• Need for common vocabulary for finding resources
• DC.Creator, DC.Date, etc.
18
• Need for structured metadata & naming schemes led
<rdf:RDF
to RDF
xmlns:rdf="http://www.w3.org/TR/.."
xmlns:dc="http://purl.org/dc/..">
RDF
• Addresses multiplicity
<rdf:RDF>
of incompatible
<rdf:Description RDF:HREF="page.html">
<dc:Creator>John Smith</dc:Creator>
metadata standards
<dc:Title>John’s Home Page</dc:Title>
• Provides metadata
</rdf:Description>
</rdf:RDF>
framework
Deployment
Issues:
• Need for DC-aware services
• Little point in developing
services until DC metadata
available
• Coordinated actions
(e.g. Queensland) are
addressing such
difficulties
• Dangers of metadata
management
19
Metadata Today
Resource Discovery
http://www.ukoln.ac.uk/
metadata/dcdot/
• Use "AltaVista" and DC
metadata in key pages
• Use metadata management
such as DC-dot
Administration
20
• Who owns a page?
• When should the page be
reviewed / deleted?
• Until agreed standards, need organisational guidelines
for machine-understandable administrative metadata
Recommendation:
Introduce local guidelines and tools based on them
(cf. ht://dig)
Exploit national indexing work?
Enough of The Protocols
Experienced web authors:
• Appreciate new web developments will help them
• Realise need for document management systems
• Know that file-based view of management causes
maintenance problems
• Have hit limitations of working directly with
operating system (permissions, naming, FTP, etc.)
But:
• Are uncertain how to progress
• Face political and resourcing problems which
technology won't solve
21
Organisational Culture
What is Your Organisational Culture?
• Unix
• Hacking
• Open source
• Toolkit
• Programming
• Extensible
• Recurrent
expenditure
• Hidden costs
• Sandals
22
• MS Windows
• Shrink-wrapped
• Proprietary
• Closed
• Capital expenditure
• Open costs
• Suits
What approach would you take to:
• Indexing tools
• Link checkers
• Logging
Site Analysis of UKC
Analysis of UKC web
site carried out using
MS SiteServer tool
23
http://www.ukoln.ac.uk/web-focus/events/seminars/
kent-mar1999/analysis/ukc-full.ac.ukbypage.html
Integration of Web Services
Web based services for
information providers
(and end users) can be
integrated with a web
browser for:
• Analysing pages (size,
HTML, links)
• Checking accessibility
• Spell-checking, and even
translation!
Note importance of APIs for web-based services
See article at
http://www.ariadne.ac.uk/issue19/web-focus/
24
Browser Issues
Browser Version
Old
New
Mature, supported, small size
Possibly buggy, large?, supports new
standards, installation costs
Manufacturer
Netscape Widely used, was v. proprietary.
Microsoft Bill Gates, DoJ, better support for standards
Support
Need for browser administration software
IEAK
See http://www.microsoft.com/
windows/ieak/en/default.asp
Netscape See http://home.netscape.com/
25
communicator/cck/v4.5/
Deploying New Technologies
More sophisticated deployment techniques can be
adopted to overcome deficiencies in simple model
HTML
resource
HTML /
XML /
database
resource
Web
server
browser
Intelligent
Web
server
Intermediaries can provide
functionality not available at client:
• DOI support
• XML support / format conversion
26 • Authentication
Web server simply sends
file to client
File contains redundant
information (for old
browsers) plus client
interrogation support
Client
proxy
browser
Server
proxy
Example of an intermediary
The National Picture
University website managers need to be
aware of national developments
Datasets
Subject gateways
Mirroring
Service
JISC
Caching
Service
Others
Universities
Local projects
User-driven initiative
27
RDNC
(Resource
Discovery
Network
Centre)
JTAP
DNER
(Distributed
National
Electronic
Resource)
Teaching & Learning
Technological Approaches
Key technological approaches:
• Think big:
– Management of web site but HTML pages
– Commercial document management systems or
– Open source approach (PHP / Apache / …)
– National initiatives
• Backend databases
• XML and CSS / XSL
• Reusable software (API and web interface)
• Management policy for upgrading browsers
• Proxies (intermediaries)
28
Other Issues
Other issues important for providing quality
institutional web services include:
To be addressed
• Need for a web strategy
elsewhere
(What is your web service for?)
• Management model
• Adequate resourcing
• Guidelines for information providers
• Web site engineering (see <URL:
http://www.webnetjrl.com/column1-2.htm >)
• Tensions between short-term fixes of today's
problems and longer-term solutions
29