Transcript Lecture 4

CS 502: Computing Methods for
Digital Libraries
Lecture 4
Identifiers and Reference Links
1
Desirable Properties of Identifiers
•
•
•
•
•
•
•
2
Location independent name
Globally unique
Persistent across time
Choice of human generated or automatic generation
Fast resolution
Decentralized administration
Supported from standard user interfaces
Syntax of Handles
Syntax
<naming_authority>/<locally_unique_string>
or
hdl:<naming_authority>/<locally_unique_string>
Examples
10.1234/1995.02.12.16.42.21;9
cornell.cs/cstr-94.45
loc/a43v-8940cgr
3
(date-time stamp)
(mnemonic name)
(random string)
Examples of DOIs
Publisher ID
assigned by
DOI Agency
Item ID
assigned by
Publisher
10.1048 / 872
10.156 / catalog-96
10.1532 / PII
10.18698 / SICI
4
Elements of the Handle System
• Handle services:
global handle service
local handle services
caching services
• Clients:
client libraries
browser extension
WWW proxy servers
• Handle administration
• System utilities
5
Hierarchy of Naming Authorities
loc
10
cornell
10.1234
loc.cords
6
cornell.cs
cornell.temp
cornell.cs.d
Handle Servers and Handle Service
• The Global Handle Service provides central coordination
for all handle services.
• Each naming authority has a home handle service
(which may be Global) where its handles are maintained.
• Each handle service may be implemented as several
handle servers.
• A hashing algorithm determines the server used to store a
given handle.
7
Handle Record for a Digital Object
cnri.dlib/arms-09
Adm
Admin Data
Adm
Admin Data
URL http://www.cnri/xyz
RAP
merlin.dlib.org
NEW orb:#cornell[]norb
8
Address Rules
The Global Handle Service stores:
a record for each naming authority
a record for each local handle service
The record for each naming authority includes:
the home handle service for that naming authority
For each handle, the home handle service stores:
the handle record
9
Resolving a Handle Without Caches
Handle cnri.dlib/wya in Global G
? cnri.dlib/wya ?
Client
handle data
G
Global
cnri.dlib/wya
10
Resolving a Handle Without Caches
Handle cnri.dlib/wya in Home Service abc
? cnri.dlib/wya ?
G
Global
Client
pointer to abc
? cnri.dlib/wya ?
handle data
11
abc
Home HS
for cnri.dlib
cnri.dlib/wya
Caching Handle Service
Client
Caching Server
Hash
Hash table
Cache
12
Handle Servers
Replication
All data is replicated at several sites
for performance and reliability
Los Angeles, CA
13
Washington, DC
Applications of Identifiers
The challenges:
Persistent, unique identifiers
Eliminate broken links
Control duplicates
Applications:
On-line publication
Registration
Citation (reference links)
Collection management
Archives
14
DOIs and URNs in Action
User
DOI
Publisher
Handle
System
15
Flexibility for Publisher
Every publisher
can have a
different system.
Database
DOI
DOI
DOI
Warehouse
16
Repository
Reorganization by Publisher
The publisher
can create a new
system.
DOI
Database
DOI
DOI
Repositories
17
Change of Publisher
User
DOI
Halfmoon
Millenium
Handle
System
18
Citation
User 1
DOI
Publisher
User 2
DOI
19
Handle
System
Catalogs and Indexes
User
DOI
Search System
Publisher
Handle
System
20
Copyright Registration
User
Copyright Registry
DOI
Halfmoon
Handle
System
21
Multiple Copies
User
DOI
Halfmoon Europe
Halfmoon USA
Handle
System
22
Archives
User
DOI
Archive
Handle
System
23
Reference Linking: The Problem
Generic
Given the information in a standard citation, how does one
get to the thing to which the citation refers?
Specific
Given the information in a citation to a journal article, how
does a user get from the citation to an appropriate copy of
the article?
24
The General Model
Publisher
Reference
database
Location
database
Client
25
Publisher places
information in databases
Content
The General Model
Publisher
Reference
database
Location
database
Identifiers
Citation
Client
26
Content
The General Model
Publisher
Reference
database
Location
database
Identifier URLs
Client
27
Content
The General Model
Publisher
Reference
database
Location
database
Content
URL
Content
Client
28
The General Model
Publisher
Reference
database
Identifiers
Citation
Location
database
URL
Identifier URLs
Client
29
Content
Content
Target of Citations
IFLA model
•
•
•
•
Work
Expression
Manifestation
Item
Citations can refer to any specific creation
but for journals usually refer to the work.
30
Identifiers
• Are identifiers necessary?
– Persistence
– Flexible targets
• Examples:
– PubMed ID, BibCode, DOI, etc.
31
How are Identifiers Obtained?
Often the client knows the citation, but not the
identifier.
• In the general model identifiers are obtained
by searching the reference database.
• In limited domains, identifiers can be
calculated from metadata.
• The identifier may be embedded in the
citation.
32
Reference Database Lookup
• Static: Reference links are established once for all time.
– Current model in journal publishing
– Not suitable for general user queries
• Dynamic: Reference links are established on demand.
– Provides link based on most recent information
– Success can not be guaranteed
Quality of metadata in reference database(s) is crucial.
33
Metadata in Reference Database
• Existing schemes
– Considerable agreement on minimal elements
– Considerable differences in details and syntax
34
Minimal Metadata Elements for
Journal Article
•
•
•
•
•
•
•
35
Title of journal article
Creator(s)
Journal title
Date of publication
Enumeration (e.g., volume and issue)
Location (e.g., page or article number)
Type (e.g., "journal article")
Resolution of Identifier
• Choice of resolver (distributed resolution)
– Simple model: identifier determines resolver
• Selection from multiple copies (selective resolution)
– Performance criteria
– Economic and related criteria
– User requirements
36
Interoperability
Several reference linking services under development:
PubMed
Astrophysics Data Center
DOI reference service
Los Alamos National Laboratory internal reference service
What levels of agreement and tools are needed for crosslinking?
37