Approaches To The Validation Of Dublin Core Metadata

Download Report

Transcript Approaches To The Validation Of Dublin Core Metadata

Approaches To The Validation Of
Dublin Core Metadata Embedded
In (X)HTML Documents
Background
Dublin Core is the name given to a standard set
of core metadata elements used for resource
discovery.
Metadata has an important role to play in many
digital library applications. The Dublin Core
standard has been widely adopted in many
digital library applications.
The Problem
Lack of compliance with standards is well-known
in Web applications, particularly with HTML.
Despite the availability of a range of HTML
validation tools, these do not appear to be
widely used and many Web authors appear to
check their documents simply by viewing in Web
browsers.
There is a danger that Dublin Core metadata
embedded in HTML will fail to comply with
standards – a possibly which is more likely due
to the lack of a visual display of Dublin core
metadata.
A centre of expertise in digital information management
A Simple Approach To Validation
Use of DC-dot
DC-dot is a popular Web-based tool for creating
and managing Dublin Core metadata. DC-dot can
also be used to carry out simple validation of
Dublin Core embedded in HTML resources.
Limitations of DC-dot
DC-dot has several limitations:
• It only performs
basic validation
• It was not designed
primarily as a
validation tool
• It cannot be easily
extended (e.g.
Applied with other
application profiles)
Findings
Use of DC-dot across a digital library programme
showed that the entry points contained various
errors in the representation of Dublin Core:
• Use of DC.Author rather than DC.Creator
• Incorrect format of date field
• Incorrect use of delimiters
A centre of expertise in digital information management
Using An RDF Validator
Use of An RDF Validator
An alternative tested was to make use of W3C's online Dublin
Core to RDF XLST transformation service and the RDF
validator. This approach made use of several online services
which were chained together:
• Tidy to convert project home page to XHTML format
• Dublin Core to RDF XLST transformation service to
convert embedded Dublin Core elements to RDF format
• RDF validation service to validate the RDF format
Comments
This approach helped by providing a visual display of
the Dublin Core metadata.
It was noticed, for example, that one page contained an
invalid identifier: http:/www.foo.ac.uk/... rather
than http://www.foo.ac.uk/...
However since the RDF validation service has no
understanding of the semantics of the Dublin Core
metadata, this approach has its limitations .
A centre of expertise in digital information management
dcmeta: An XSLT Approach
Use of XSLT
We have pioneered use of XSLT to provide validation of
Dublin Core metadata embedded in HTML resources.
The XSLT approach:
• Creates a report on
DC metadata
embedded in an
XHTML document
• Is designed with
knowledge of the
Dublin Core semantics
by checking against an application profile of the
DC Metadata Element Set.
The profile is a set of rules which specify:
• Permitted DC properties (e.g. only the 15 core DC
elements are allowed)
• Minimum/maximum permitted occurrences of a
specified property (e.g. only one occurrence of
DC.Title permitted)
• Permitted encoding schemes (e.g. DC.Subject
properties should have the scheme "LCSH")
• Permitted values (e.g. DC.Publisher must have the
value "UKOLN")
A centre of expertise in digital information management
Conclusions
Summary
This poster summarises a number of approaches
to validating Dublin Core metadata embedded in
HTML resources. The poster also describes
initial work in the development of an XSLT-based
tool for validation.
Future Work
The XSLT stylesheet is available as open source,
and we invite interested parties to develop this
work further.
Areas in which the tool could be developed
include:
• Development of the Web interface to the tool
• Allowing local rules to be included
• Deploying the tool as a bookmarklet
• Deploying the tool as a "Web Service"
Contact Details
For further information please contact Pete
Johnston, UKOLN by sending email to
<[email protected]>
Implementation
The service is available at
<http://www.ukoln.ac.uk/metadata/dcmeta/>
A centre of expertise in digital information management