Dublin Core Application Profile Guidelines – Draft CWA

Download Report

Transcript Dublin Core Application Profile Guidelines – Draft CWA

Dublin Core and Emerging
Conventions for a Semantic Web
Thomas Baker
Fraunhofer-Gesellschaft, Bonn
ELPUB 2003, Guimaraes, Portugal
26 June 2003
A particular set of metadata terms
• Dublin Core as a simple and semantically generic
lingua franca
– Fifteen “core” elements: Subject, Description, Title…
– A metadata "pidgin" for "digital tourists" on a culturally
diverse global Web
– Limited grammar, easy to learn and use
– Enough "as is" for many needs
– 33 "element refinements" and 17 "encoding schemes"
to qualify the elements for specialized purposes
– A small set of 12 resource types for use with dc:type
A simple data model
(resource with properties)
• 1996-1998: Collective realization that machineprocessability requires a coherent data model
• 1996: “Warwick Framework” proposed at DC-2 workshop:
DC as one specialized module (“resource discovery”)
• 1997: “Qualifiers” proposed for specifying meanings
– Some early adopters took this to unintended extremes:
“DC.Creator.telephone-number”
• 1998: DCMI involvement in emerging Resource
Description Framework, clarification of simple data model
• 2000: First set of qualifiers approved
A typology of metadata terms
("grammar")
• Elements
– (core) properties of resources
• Element Refinements
– properties that semantically refine elements
• Encoding Schemes
– give context to a metadata value
• Vocabulary Terms
– constitute controlled lists of possible values
An emergent approach to
"structured values"
• Implementers sometimes "shoehorn" complex sets
of information into a single value
– Creator: "name=Tom, affiliation=FHG, shoesize=47"
• In practice, a large variety of "structured values"
–
–
–
–
–
Labelled strings
Unlabelled strings
Marked-up strings (e.g., LaTex, HTML)
Secondary resource descriptions (as above)
Post-processing ad-hoc constructs is messy and does not scale
• Andy Powell's model:
– Elements can have string values (Simple DC)
– A further requirement to point to linked metadata?
A process for community
standardization [10]
• 1995-1999: open workshops, unruly but
stimulating meetings of minds, rough consensus
• 2000: qualifier vote: circa 25 voting members of
an ad-hoc "Usage Committee"
• 2001: smaller Usage Board
– Codification of formal process for editorial control
– Two two-day face-to-face meetings per year
– Mandate and responsibility to maintain standard,
approve extensions and clarifications
...based editorial review by
a Usage Board
• Term set must evolve as implementors coin new
terms and usage patterns emerge
– Working groups propose new terms or clarifications
– Evaluate in light of grammatical principle, usefulness,
clarity of definition, overlap with existing terms
– Review application profiles based on Dublin Core
• Tiered model of approval status: conforming,
recommended, obsolete, registered
• Meeting materials, mailing lists, and decisions
archived and accessible on the open Web
• DCMI as maintenance agency for ISO 15836
A bias towards simple and generic
• DCMI Usage Board bias
– Strength and value of DC lies in simplicity and generic
applicability
– Keep the core standard small, generic, and lightweight
– Resist temptation to "complexify"– people want and
need distinctions, but not in a "small standard"
– DCMI Type Vocabulary has just 12 terms: user
communities should invent or re-use their own more
specific sub-types
A bias towards cooperation and re-use
• Help user communities define and use their
own extensions
– Cooperate with maintainers of specialized
vocabularies on forms of mutual recognition
– Provide a model for re-use
"Good neighbor" policies
• MARC Relators (roles such as "adapter", "artist")
– DCMI: "use MARC Relators to refine dc:contributor"
– LoC's RDF schema: "MARC Relators (identified with
URIs) are sub-properties of dc:contributor"
• Encoding Schemes
– DCMI term designates Library of Congress Subject
Headings (http://purl.org/dc/terms/LCSH)
– If LoC coins own term, DCMI should promote its use
A "namespace policy" [20]
• All DCMI metadata terms are given unique
identity within three namespaces:
–
–
–
–
http: //purl.org/dc/elements/1.1/ - the core elements
http://purl.org/dc/terms/ - all other elements/qualifiers
http://purl.org/dc/dcmitype/ - a Type vocabulary
Example: http://purl.org/dc/elements/1.1/title
• Policy on long-term stability of namespace URIs
– Changes not substantially “semantic” (i.e., corrections)
will not result in change of namespace URIs
– “Semantic” changes must trigger a change of name
– Version turnover of a “document management” nature
will have no effect on namespace URIs
A typology of metadata vocabularies
• Term declarations
– Declare a unique set of elements and definitions
– Each DCMI term is identified with a URI
– Documented in HTML pages, formally declared as
RDF schemas
• Application profiles
– Declare how an application uses which terms in its
metadata
– May mix-and-match from multiple namespaces
Why application profiles?
• People want them!
–
–
–
–
Most standards have them: IEEE/LOM, MARC, DOI...
As focus of dialogue and semantic negotiation
Deep human need to resist total standardization?
To identify emerging semantics "at the edges" of a
standard
– To know how colleagues and peers are designing
metadata – and avoid "reinventing the wheel"
• To harmonize metadata usage within domains:
– User communities (DC-Libraries, DC-Government)
– Subject gateways (Renardus)
Dublin Core application profiles
• Declaration specifying which metadata terms an
information provider uses in metadata
– Identifies source of terms used
– May provide additional documentation
• Designed to promote interoperability within constraints of
Dublin Core model
• Draft guidelines sponsored by European Standardization
Committee (CEN) to be progressed through DCMI process
– http://www.cenorm.be/isss/Workshop/MMI-DC/applicationprofile-for-comment.pdf
• Caution: a documentary format cannot itself guarantee
interoperability
A set of encoding practices
• Guidelines for encoding metadata records (or
embedded metadata) in HTML, XML, RDF
– Use of rdfs:label and rdfs:value allow nesting of
secondary resource descriptions
• A model for declaring terms "machineprocessably" in RDF
– Namespace Policy mandates this, though not
specifically RDF
• Work item: a model for declaring application
profiles machine-processably
CORES Resolution
Shared conventions for
declaring namespaces? [30]
• Cross-community consensus-building
– W3C metadata standards and URIs as a basis
for interoperability among different standards?
• EU CORES Project (2002-2003)
– Identify and explore areas of possible
agreement among major standards initiatives
– Interoperability Forum meeting in Brussels,
November 2002
CORES Resolution on
Identifying Metadata Elements
• http://www.cores-eu.net/interoperability/cores-resolution/
• Whereas
– Our metadata standards have “elements” – units of
meaning comparable and mappable to elements of
other standards,
• We agree:
– To assign Uniform Resource Identifiers to our
elements;
– To articulate and publish specific policies regarding
the stability, persistence, and maintenance of the URIs
assigned to the elements.
Clarifications to the
CORES Resolution
• URIs not necessarily used in applications "as is"
– In metadata records, maybe dc:contributor instead of
http://purl.org/dc/elements/1.1/contributor
• Signatories decide what to identify with URIs
– An individual element? An entire set of elements? A
specific historical version of an element?
• No implication that URIs will "resolve" to anything
–
–
–
–
URIs may "get" something with HTTP on Web – or not!
E.g., resolve to a database query?
Resolve to an RDF schema?
Or even resolve to nothing at all ("file not found")!!
Signatories
• Eliot Christian, USGS, for GILS
• Brian Green, EDItEUR, for ONIX
• Rebecca Guenther, Library of Congress, for
MARC21
• Keith Jeffery, EuroCRIS, for CERIF
• Norman Paskin, Int’l DOI Foundation, for DOI
• Robby Robson, IEEE LTSC, for IEEE/LOM
• Stuart Weibel, DCMI, for Dublin Core
Signatories’ Action Plan
• Action plan, November 2002 – May 2003:
– Define and publish URI assignment mechanisms
– Assign URIs to elements
– Publish URI persistence policies
• Article on follow-up scheduled for D-Lib
Magazine in July 2003 issue
– Taken as a whole, corpus of good-practice policies
for others to discuss and emulate
Beyond the CORES Resolution [40]
• Benefits for signatories:
– Important first step towards future interoperability
applications (e.g., mapping, conversion)
– Improve "citability" of elements between standards
• Potential areas of further work:
– Provide persistent URIs for terms in taxonomies and
ontologies
– Shared conventions on declaring URIs in machineprocessable forms
– Shared conventions for application profiles and
mapping constructs
– Shared ontologies as targets for mapping
What exactly is being identified?
• Is a particular term the same when used in
different contexts?
• A single term in a flat namespace?
– http://ltsc.ieee.org/LOM/Identifier
• Or two terms in a flat namespace?
– http://ltsc.ieee.org/LOM/GeneralIdentifier
– http://ltsc.ieee.org/LOM/MetadataIdentifier
• Or two terms in a hierarchical namespace?
– http://ltsc.ieee.org/LOM/General/Identifier
– http://ltsc.ieee.org/LOM/Metadata/Identifier
What exactly is being identified?
• For purposes of identification, is a term "the same"
through successive versions?
• At first, DC reflected version in the URI:
– http://purl.org/dc/elements/1.1/title
• Then decided to keep URIs stable and define the
limits of change in the Namespace Policy
– http://purl.org/dc/terms/audience
• URIs for DC 1.1 kept for legacy reasons
• URIs for successive versions of a term used
"behind the scenes" for tracking changes
Publishing and documenting
a vocabulary
A method for maintaining (and
versioning) a vocabulary
• Assume that vocabularies must evolve:
– Anticipate need to understand discrete states of the
standard
– All documents, decisions, and term declarations must
evolve
– Versioning to support future automated methods for
processing legacy metadata
• Numbered decisions linked to:
– A specific historical version of a term
– Supporting documentation for the decision
– Historical record of the Usage Board meeting
Modes for publishing a vocabulary
• Multiple publication formats needed
– Web pages for human use
– RDF schemas for expressing relationships between
terms in machine-processable form
– OWL ontologies and rules languages will improve
expressivity of these constructs
– Future schemas may need to express versioning
machine-processably
• Workflow
– Web pages and schemas from a common source
– XML data + XSLT scripts – simple, effective
A searchable "registry" of terms [50]
• DCMI Registry
–
–
–
–
Searchable database of metadata terms
Terms translated into various languages
Goal: application interface for Web services
Goal: harvest schemas directly from their maintainers
• An ecology of registries?
– Harvest and merge element sets, vocabularies, profiles
• For general overviews: SCHEMAS, CORES
• Specific domains: MEG, GEM (education), FAO (agriculture)
– Publication environment for information models
– Tool for harmonization, mapping, conversion, merging
The evolving Web context
The Web as a new social context
• Something new in history
– Not just an historical set of technologies (HTTP, URLs,
HTML)
– Platform for historically unprecedented forms of social
and intellectual interaction
• Metadata as language for the Web
– A language for statements about Web resources
– Statements created and used both by humans and by
machines
– "Semantic Web" is about describing how resources
relate to each other
Scale and automation
• The Web is too big to control
– Metadata statements are expensive to make and
maintain
– Shift away from the metaphor of "library"?
– NSF workshop on "Post Digital Library Futures"
• http: //www.sis.pitt.edu/~dlwkshop/
• Automated resource discovery (e.g. Google)
– Using contextual information (e.g., URL structures) to
infer "aboutness"
– Natural-language technology, e.g. summarization
An evolving role for metadata
• Balance between human and machine
– Automated methods to generate metadata
– "Let Google do it" versus expert intervention
• Granularity of metadata
– Describe each item or entire collections?
– How much metadata is "enough" to improve
discovery?
– Semantic precision or tolerance of fuzziness?
Which aspects of Dublin Core will
prove most useful over time?
•
•
•
•
•
The elements and related sets of terms
Open processes for community standardization
Editorial review by a Usage Board
A bias toward simple and generic metadata
A bias toward cooperative re-use of vocabularies
– The etiquette of mutual recognition
•
•
•
•
•
•
A namespace policy for using URIs
A typology of vocabularies (e.g. application profiles)
A set of encoding practices (HTML, XML, RDF)
Methods for maintaining and versioning a vocabulary
Publishing a vocabulary for humans and machines
Searchable registries of metadata terms
[email protected]