Transcript URL - DOIs

doi>
Digital Object Identifier
Charles Ellis:
Chairman, International DOI Foundation
Norman Paskin: Director, International DOI Foundation
Steve Stone:
Director, Microsoft eBook Product Group
Eric Swanson: Chairman, CrossRef
Outline
•
•
•
•
Background: why DOI
What the DOI system consists of
DOI explained: what it does
Applications
2
Background: why now?
• Identifiers enable us to manage content
• Physical world: ISBN, ISSN, ISMN, SICI, etc
• good systems for publishers
• Digital world: ? URL?
• poor systems for publishers
• how to use existing identifier systems?
• Make WWW transactions as invisible as telephone
transactions
– machine to machine,
– not machine to people to machine
3
The intellectual property background
Digital world enables both use and misuse
• Publishers aim is to maximise value of
information objects:
- reduce copy infringement and
- increase accessibility;
- we need to identify in order to manage content
• Mass production  mass customisation
- a la carte/ on demand publishing
- components must be clearly identifiable
- and rights properties of them automated
4
Background: the organisation
• International DOI Foundation: founded 1998
– following demonstration of prototype in 1997
• Not-for-profit; paid membership support
– similar principles to World Wide Web Consortium
•
•
•
•
Open to all interested parties
Democratic: board elected from members
Full time Director
35+ organisations (growing)
– Content owners (text publishers, music)
– Technology companies
– Content intermediaries (etc)
5
DOI: requirements
• Identification of content
– intellectual property in any form
• Actionable identification
– automation; “click to do something”
• Interoperability
– existing identification systems
– future developments
• Open standard
– compatible with other standards
6
DOI: the aim
• Establish a way of identifying content in the
digital environment
– actionable identifier
• Which can be the basis of rights management
– extensible; can be developed further
7
Components of an identifier
• A number (or “name”)
– assign a number to something
– (compare: telephone number)
8
Components of an identifier
• A number (or “name”)
– assign a number to something
– (compare: telephone number)
• A description
– say what the number is assigned to
– (compare: directory entry)
9
Components of an identifier
• A number (or “name”)
– assign a number to something
– (compare: telephone number)
• A description
– say what the number is assigned to
– (compare: directory entry)
• An action
– make the number able to do something
– (compare: the telephone system)
10
Components of an identifier
• A number (or “name”)
– assign a number to something
– (compare: telephone number)
• A description
– say what the number is assigned to
– (compare: directory entry)
• An action
– make the number able to do something
– (compare: the telephone system)
• Policies
– (compare: social /business structures)
11
NUMBERING
Syntax
10.1234/5678
DESCRIPTION
Metadata
Pieces of data which describe
uniquely that which is
identified
ACTION
Resolution
System able to link the
number to something
useful
POLICIES
12
1. Numbering
• DOI syntax: how the number is made up
- NISO standard (Z39.84)
- 10.1000/12345
• 10.1000 = prefix (e.g. a publisher, a journal, etc)
• 12345 = suffix (combination is unique)
• An opaque string (“a dumb number”)
– once assigned, parts of the number do not have
separate meaning
• Permanent
– stays the same even if ownership changes
13
2. Description
• “What is numbered?”
• Not as simple as you might think:
1. Not only digital files, but physical things
and intangible things!
2. Not only things, but parts of things!
• Let’s explain these:
14
Not only digital
things...
Manuscript
mss #ABC123
paper
journal/volume/page
15
URL
“intangible abstraction”
MS
Vol/page; ISBN;
SICI, etc
“intangible
abstraction”
16
Not only things, but parts of things
• Components
• Book
– Chapter
• Section
– Figure
17
Not only things, but parts of things
• Components
• Book
– Chapter
• Section
– Figure
• “Granularity”
18
Not only things, but parts of things
• Components
• Book
– Chapter
• Section
– Figure
• “Granularity”
• Must be able to identify at whatever level is
appropriate : functional granularity
19
Description is by metadata
• Metadata is: Data
• Data about other data
- Book: ISBN 0864426437 (data)
- Price: $12.95 (metadata)
- Subject: Buenos Aires (metadata)
• One man’s metadata is another man’s data:
20
Description is by metadata
• Data about other data
- Subject: Buenos Aires (data)
- Book: ISBN 0864426437 (metadata)
- Price: $12.95 (metadata)
• Part of an infinite web:
– interconnected
– infinite in extent
• inextricable from “identification”
21
Description is by metadata
• Not sufficient to assign an identifier without
specifying precisely what the entity is
– “ a paper” or “a book” is not precise enough;
– must be precise, because:
• In an automated world, that specification must be
by metadata (able to be used by machines)
• In an interoperable world, that metadata must be
– unambiguous (“well-formed”)
– follow a data model
22
(able to be used consistently by machines)
DOI uses <indecs> framework
Interoperability of data in e-commerce systems
• Focus is generic intellectual property management
• Enabling, not replacing, other schemes
• Broad in scope
– description, transaction, rights
• Based on tested “real world” models, wide support
– CIS (music industry); IFLA (library cataloguing)
• Now in use in real applications
– Muze (audiovisuals), EPICS/ONIX (books & serials)
• Extensible, structured, open standard
23
DOI metadata is very simple
• A few (7-8) key pieces of data
– title, type of content, origin, etc
– varies according to what is needed (video, book, etc)
• about the object
– does not include rights metadata
• but interoperates with rights data
– because based on same data model
– uses the same terms to mean the same thing
• analogy: telephone bill = rights information
– the telephone number  your bank account
24
Specified
Action
3. Actions
10.1000/123
Web Browser
doi>
User
Actionable identifier
etc.
25
DOI uses Handle
•
•
•
•
•
•
®
System
Open Standard using internet
Distributed, scalable, fast and reliable
In use now in several places (e.g. Lib. of Congress)
Very simple concept, powerful applications
Fits with other standards (URL, URN, etc)
Associates a name with “values” (e.g. URL)
– input DOI
– output URL (or some other defined value)
26
Using Handle, DOIs Resolve
to Multiple Data Types
Handle (DOI)
10.1004/123456
Extensible Data Types
INPUT
Data type
DOI data
URL http://www.pub.com/.
URL http://www.pub2.com/.
DLS loc/repository
XYZ 1001110011110
OUTPUT
27
For convenience we re-draw like this:
INPUT
OUTPUT
URL2
10.1000/123
doi>
URL
RAP
XYZ
etc.
28
4. Policies
• DOI free to use
– costs paid by assigner
• DOI applies to any Intellectual Property entity
– copyright focus (Berne/WCT etc)
• Registration agencies to deal with assigning DOIs
(and metadata/resolution) for publishers etc
• Business models determined by agencies
• Policies for agencies are now evolving
29
ENUMERATION
• Allocation of
an identifier (DOI)
DESCRIPTION
<indecs> framework allows
a DOI to describe any form
of intellectual property,
at any level of granularity
doi>
RESOLUTION
Handle System allows
a DOI to resolve to
any piece of current data
POLICIES
30
What is DOI?
Digital Object Identifier
• A unique identifier….
- of a piece of intellectual property
- in any form (tangible, intangible)
- defined by some key metadata
- an opaque string e.g. DOI:10.1000/123
31
What is DOI?
•
“resolvable..”
- routing, via proven internet technology,
•
“to associated state data”….
- one or more current values of
specified types of data (e.g. URL);
- these data may be, or link to, services
32
What is DOI?
•
“in an information management substrate…”
- once the (meta)data has been obtained, it can
interoperate with other data
- e.g. about context (subscription etc)
- to construct services and transactions
- because (meta)data follows a generic interoperable
architecture
33
What is DOI?
“A unique resolvable identifier and multiple pieces
of associated state data in an information
management substrate” achieved by:
• Technical implementation + policies
• Two underlying technical tools:
1. intellectual property: <indecs> framework
2. resolution: Handle System
34
What are the advantages?
1. Identify the item of intellectual property
• not its location, because:
• if the location changes the identifier should stay the
same (persistence)
• the same “resource” can be at several locations at the
same time (“multiple copies”)
DOI does this
35
The problem illustrated on the Web
1. URL is not a persistent identifier
- it refers to Location, not content
2. Same content at two different URLs has two
different identifiers - cannot use as common reference
Web Browser
URL
?
“404 not found”
User
“...has moved to…”
URL
“One in five Web links more than one year old may be out of date”
(Alta Vista)
36
Identifiers on the Web
1. Don’t change the URL; “persistence is a social, not
a technology, problem”
Web Browser
URL
User
 People do change URLs
 There are good reasons to change URLs
 Does not deal with multiple copies
37
Making
Identifiers
identifiers
on thepersistent
Web
2. Assign a Name (= identifier) and redirect for “has moved to..”
Web Browser
name
User
URL
URL
 http Bookmarks and caches
save the end point, not
the name
(in current browsers)
 still does not deal with
multiple copies
38
Identifiers on the Web
3. Assign a Name (DOI) and use a better resolver
Web Browser
doi>
URL
User
 DOI provides name
URL
 One point of management
 Multiple resolution
39
This is the DOI: initial implementation
10.1000/123
Web Browser
URL
doi>
User
Resolution
1. DOI is a persistent identifier
2. DOI identifies the content, irrespective of the location
40
Full DOI implementation: adding multiple resolution
URL2
10.1000/123
Web Browser
doi>
URL
URL
User
Data 1
Actionable identifier
Data 2
Identifier resolves to any piece of data
etc.
41
Multiple resolution for performance: (e.g. D-Lib magazine)
URL1
10.1000/123
Web Browser
doi>
URL2
User
URL3
URL4
Identifier resolves to all URLs;
the first to respond is chosen
etc.
42
Multiple resolution for
intelligence: “services”
Specified
Action
URL2
10.1000/123
Web Browser
doi>
URL
URL
User
Data 1
Actionable identifier
Data 2
Service 1 @ 10.1000/123
etc.
43
What are the advantages?
2. Able to deal with relationships:
– “this item is a manifestation of that work”
– “this item is a part of that item”
DOI does this:
• DOIs can resolve to other DOIs
• Metadata can express relationships
– “is part of…” etc
44
DOI networks can reflect the complex relationships of publishing
URL2
doi>
doi>
URL
URL
Service A
Service B
URL
Service
45
What are the advantages?
3. Apply to any intellectual property entity
– any format (digital convergence)
– any granularity (any part of something)
4. Enable complex actions
– can express relationships between entities
– interact with data from other sources
– enables services (automated, predictable) to be
constructed
46
What are the advantages?
5. Extensible
• resolution system has capability for trusted
transactions
• metadata framework has capability for full rights
management architecture
6. Not limited to current environments
• not just the Web (other Internet applications)
• not just digital (intangibles etc)
47
DOI: development in three tracks
Metadata
Single redirection
Initial
implementation
Multiple resolution
Full
implementation
W3C, WIPO,
NISO, ISO, etc,
other initiatives
Standards
tracking
A continuing development activity
48
Applications
• Reference linking of articles
- CrossRef (full scale DOI implementation, not run
by IDF); metadata, single resolution
• E-books
– currently being worked on (with ONIX/EPICS)
• Images
– BioImage; others
• Books
• Audiovisuals
• etc.
49
DOI Deployment
• DOI Foundation to provide governance
– using a federation of registration agencies
– agencies follow agreed rules (policies)
• minimum criteria for registration agencies:
– technical; information management; $
• does not prescribe details of individual
businesses
• comparable models:
– Bar codes (EAN/UPC); Visa; ISBN etc.
50
Summary
• A general purpose identifier system
– number, description, action and policies
• Any item, at any desired level
– using a metadata framework
• Linking to any service or data
– using resolution (multiple resolution)
• Simple to use
– registration agencies
• Applications and agencies now happening
51
Further information
• DOI background papers & DOI Annual Review,
FAQs, gallery, etc
– www.doi.org
• <indecs>
– www.indecs.org
• Handle system
– www.handle.net
• [email protected]
doi>
52