Implementing Reference Linking

Download Report

Transcript Implementing Reference Linking

Implementing Reference
Linking in PROLA
Mark Doyle
Manager, Product Development
The American Physical Society
http://prola.aps.org/
September 26, 2002
CrossRef - Boston, MA
The American Physical
Society





40,000+ members
Founded in 1898
Mission: “diffusion and advancement of
knowledge of physics”
Publisher of Physical Review journals
and Reviews of Modern Physics
14,500 articles per year (100,000 pages
per year)
September 26, 2002
CrossRef - Boston, MA
What is PROLA?





Physical Review Online Archive
Covers all APS journals from 1893-present,
but only 1893-1998 available
Separate subscription from current content
journals
1 year “migrated” each year
APS corpus is 330,000 articles
September 26, 2002
CrossRef - Boston, MA
The Basic Problem




References in an article’s bibliography needs
to linked to the full text article
Citation metadata given: author, journal,
volume, page (or other enumeration)
Identify metadata, query linking partners,
store results, create links for end users
Keep links up to date, keep system robust
and fast, keep costs low
September 26, 2002
CrossRef - Boston, MA
Three General Approaches



Static - query for links at time of publication,
create a static HTML file with the appropriate
links, serve that.
Dynamic - Store linking information in live
database which is queried at the time the
user requests the web page
Semi-dynamic - Pre-query links, update them
periodically, generate HTML with links
dynamically
September 26, 2002
CrossRef - Boston, MA
Semi-Dynamic Approach





Lower investment in database
technology
Lower costs to mirror
Fast for the user
High availability
Scales well with usage
September 26, 2002
CrossRef - Boston, MA
APS Process Overview
Full Text
SGML/
XML
Apache
Bibliogr.
XML
mod_perl
Filesystem
XREF
CAS
AIP
September 26, 2002
Linking
Database
(Oracle)
XML
Link
Metadata
ADS
CrossRef - Boston, MA
HTML
End
User
XML File
<references> ….
<citation cid="C3"><ref><article><refauth>J. J.
Boland</refauth>, <journal>Phys. Rev. Lett.</journal>
<volume>67</volume>, <pages>1539</pages>
(<date>1991</date>);</article></ref>
<ref abbrev="prevau"><article><refauth>J. J. Boland</refauth>
, <journal>J. Vac. Sci. Technol. A</journal>
<volume>10</volume>, <pages>2458</pages>
(<date>1992</date>).</article></ref></citation>
…..
September 26, 2002
CrossRef - Boston, MA
Process Overview
Full Text
SGML/
XML
Apache
Bibliogr.
XML
mod_perl
Filesystem
XREF
CAS
AIP
September 26, 2002
Linking
Database
(Oracle)
XML
Link
Metadata
ADS
CrossRef - Boston, MA
HTML
End
User
Parse XML Bibliographic
Record





Parse XML tagged references
Article’s DOI suffix becomes the primary key
Journal, volume, page information becomes a
reference ID (J. Vac. Sci. Technol. A 10, 2458
gets mapped to JVacSciTechnolA.10.2458)
Table for DOI, reference id, citation number,
reference number
Second table with article metadata for
querying process.
September 26, 2002
CrossRef - Boston, MA
Database Schema




ARTICLES (Phys. Rev. DOI, citation number,
reference number, reference id)
ARTICLE_DATA (ref_id, first author, journal,
volume, issue, enumeration, year)
ARTICLE_LINKS (ref_id, link type, link data)
QUERY_DATES (ref_id, link type, query
date).
September 26, 2002
CrossRef - Boston, MA
Query CrossRef and others





Nightly query of CrossRef for new references
that don’t have DOI
Track batches in a Scheduler application
Table tracks link source (XREF, ADS, CAS,
SPIN, INSPEC), linking data (DOI for XREF)
for each reference ID.
Query dates table to track when we last
queried something that didn’t match
Periodically rerun queries which haven’t
matched
September 26, 2002
CrossRef - Boston, MA
Links in the Database
SQL> select link_type,link_data from article_links where
ref_id='JVacSciTechnolA.10.2458';
LINK_TYPE
--------XREF
INSPEC
SPIN
ADS
CAS
September 26, 2002
LINK_DATA
-----------------------------10.1116/1.577984
JVTAD600001000000400245800000B
JVTAD6000010000004002458000001
1992JVST...10.2458B
1:CAS:528:DyaK38XltlygtLg%3D
CrossRef - Boston, MA
Statistics






330,000 articles (1893-present)
6.4 million (journal) references
3 million Phys. Rev. references
1.4 million unique non-APS references
210,000 CrossRef links (1.8 million links total)
Folding in the APS references which are also
in CrossRef, about 30% of our references are
in CrossRef
September 26, 2002
CrossRef - Boston, MA
Process Overview
Full Text
SGML/
XML
Apache
Bibliogr.
XML
mod_perl
Filesystem
XREF
CAS
AIP
September 26, 2002
Linking
Database
(Oracle)
XML
Link
Metadata
ADS
CrossRef - Boston, MA
HTML
End
User
XML Linking File
<?xml version="1.0"?>
<apslinks>
<citlink cid="1" rid="1">
<link ref_id="PhysRevLett.62.567”
type="APS">PhysRevLett.62.567</link></citlink> …
<citlink cid="3" rid="2">
<link ref_id="JVacSciTechnolA.10.2458"
type="XREF">10.1116/1.577984</link>
<link ref_id="JVacSciTechnolA.10.2458"
type="INSPEC">JVTAD600001000000400245800000B</link>
….</apslinks>
September 26, 2002
CrossRef - Boston, MA
Process Overview
Full Text
SGML/
XML
Apache
Bibliogr.
XML
mod_perl
Filesystem
XREF
CAS
AIP
September 26, 2002
Linking
Database
(Oracle)
XML
Link
Metadata
ADS
CrossRef - Boston, MA
HTML
End
User
Rendered Links
September 26, 2002
CrossRef - Boston, MA
Conclusions




Simple and pragmatic solutions work
Marked up content makes it all fit together
(obviates the need for extensive labor)
Modest resources are needed to implement
and maintain the system
Scheme is easily expanded to include other
linking targets
September 26, 2002
CrossRef - Boston, MA
Contact information


http://prola.aps.org/
[email protected]
September 26, 2002
CrossRef - Boston, MA