Transcript Slide 1

Global Digital Format Registry
An Update
July 2006
Harvard University Library
Global Digital Format Registry
• “The Global Digital Format Registry (GDFR) will
provide sustainable services to collect, review,
store, discover, and deliver significant
representation information about digital formats.”
– Centrally-organized collection and review
– Distributed storage, discovery, and delivery via a
peer-to-peer network
Harvard University Library
The GDFR project
• Harvard University Library (HUL) funded for 2
years by the Mellon Foundation
• Staffing and technical work subcontracted by
HUL to OCLC (June 2006)
• Project oversight
– Steering Committee (SC) for policy oversight
– Technical Working Group (TWG) for technical
oversight
– Active solicitation of the international stakeholder
community for review and comment
Harvard University Library
Deliverables
•
•
•
•
•
Functional requirements
Technical specifications
Implementation plan (technology platform)
Inter-nodal protocol
Reference software implementation for nodes
– Released under LGPL
• Editorial process
• Initial population
• Succession plan
Harvard University Library
Schedule
• Month 1
Staffing, establish public web site
• Months 2-6
Consultation, design, prototyping
Public discussion planned for DLF Fall
Forum, Boston, November 2006
• Months 7-12
Protocol, node implementation
• Months 13-18 Initial population, inter-nodal
testing
• Months 19-24 Integration testing
Harvard University Library
What is a format?
• “A serialization of an abstract information model”
– A set of syntactic and semantic rules for mapping
from an information model to a byte stream (and, in
most instances, for mapping back)
• Encompasses the nominal sense of “file format”
as well as a range of conceptual models from
the micro to the macro level
– IEEE 754 floating point number … File system
Harvard University Library
GDFR network
• Peer-to-peer network communicating over a
common protocol
• Structured delegation for distribution
Vetted for propagation
– DNS analogy
Root
GDFR node
• “Root” node
• Top-level nodes
Data
– Distribution classes
• Local data
• Unvetted data
• Vetted data
GDFR node
propagation
GDFR
protocol
Editorial
process
Submissions for technical vetting
GDFR node
GDFR node
Harvard University Library
Representation Information
•
•
•
•
•
•
•
•
•
Identifiers
Responsibility
Classification
Relationships
Specifications
Signatures
Grammar
Tools
Assessment
Harvard University Library
Identifiers
• Canonical and alias identifiers in a variety of
naming systems
–
–
–
–
Common usage
MIME
PRONOM PUID
LC FDD
“TIFF”
“image/tiff”
“fmt/10”
“fdd000022”
• Canonical GDFR-defined identifier in the “info”
URI scheme
Harvard University Library
Responsibility
• Creator
• Owner
• Maintenance agency and process
• Legal conditions for use
Harvard University Library
Classification
Ontological CLASSES, abstract families, concrete formats,
and relationships
BYTESTREAM
IMAGE
STILL
RASTER
GIF
GIF87a
GIF89a
JPEG
ISO 10918-1
JFIF
TIFF
TIFF 4.0
TIFF 5.0
TIFF 6.0
TIFF/IT
TIFF/IT/CT
TIFF/IT/CT/P1
is-new-version-of
GIF87a
is-subtype-of
ISO 10918-1
is-new-version-of
is-new-version-of
is-subtype-of
is-subtype-of
is-subtype-of
TIFF 4.0
TIFF 5.0
TIFF 6.0
TIFF/IT
TIFF/IT/CT
Harvard University Library
Relationships
• Subtype
ASCII
UTF-8
• Version
TIFF 6.0 is-version-of
TIFF 5.0 has-version
• Encapsulation
WAVE
μ-law
can-contain
μ-law
is-contained-by WAVE
• Affinity
JPEG
SPIFF
is-similar-to
is-similar-to
is-subtype-of
has-subtype
UTF-8
ASCII
TIFF 5.0
TIFF 6.0
SPIFF
JPEG
Harvard University Library
Specifications
• Bibliographic citation, including descriptive (e.g.
ISBN) and actionable (e.g. (URI) identifiers
• IP considerations probably prohibit the free
distribution of specification documents
Harvard University Library
Signatures
• External
– Generally indicative
– File extension(s)
• Internal
– Generally dispositive
– Magic number
– Other well-defined internal syntactic structures
Harvard University Library
Grammar
• Formal notation of a format
• Typed to permit multiple parallel formulations,
e.g. BNF, ABNF, BSDL, DFDL, EAST
• May be feasible only for relatively simple formats
Harvard University Library
Tools
• Services, systems, and tools using formats as
inputs or outputs
• Described in terms of some functional taxonomy,
e.g. edit, transform, render
Harvard University Library
Assessment
• Format-specific risk assessment
• Typed to permit multiple parallel formulations
– LC Sustainability/Quality & Functionality (SQF)
– OCLC INFORM
– DSTC PANIC
– Cornell Virtual Remote Control (VRC)
Harvard University Library
General development goals
• First create a generalized registry framework,
then specialize it for the GDFR application
– To the extent that this does not effect other goals and
schedules
• Platform/network transport independent
• Full information content of GDFR is expressible
in XML form
• GDFR network is re-instantiatable from its XML
expression
Harvard University Library
Related Work
• PRONOM
www.nationalarchives.gov.uk/pronom/
• Representation Information Registry/Repository
dev.dcc.ac.uk/twiki/bin/view/Main/DCCRegRepV04
• LC Digital Formats Web
www.digitalpreservation.gov/formats/
• NARA GDFR governance investigation
Harvard University Library