From eprint archives to open archives and OAI

Download Report

Transcript From eprint archives to open archives and OAI

From eprint archives to
open archives and OAI:
the Open Citation project
By The Open Citation Project team
Presented by Steve Hitchcock, Southampton University
These slides prepared for the JISC/NSF Digital Libraries Initiative (DLI)
All Projects Meeting, Edinburgh, 24-25th June 2002
OpCit is a joint JISC-NSF
International Digital Libraries Project 1999-2002
About this presentation
The aim is to show:
• Progress since Stratford All-Projects meeting in 2000
• Demonstrate new services developed by the project
• Highlight the relationship between the project and the Open
Archives Initiative
• Outline key tasks remaining and which services will continue
beyond the Open Citation Project
Recap 1: principal partners
• Southampton University, IAM (Intelligence, Agents, Multimedia)
Research Group, PI Stevan Harnad
Citation-ranked search, EPrints.org, user surveys
• Cornell University, Digital Library Research Group, PI Carl Lagoze
Architecture for reference linking, experiments with the ACM
Digital Library and D-Lib magazine, OAI technical support center
• arXiv.org, Paul Ginsparg
Now based at Cornell University. Still the largest archive of freely
accessible author-deposited scientific papers
The Open Citation Project :
deliverables
The Open Citation Project (OpCit) is developing software and
services to support the Open Archives Initiative (OAI). OpCit can
help OAI data providers and service providers:
• Citebase: citation-ranked search
• EPrints.org software: free software to build and manage OAIcompliant eprint archives
• API for reference linking, an interface on which reference
linking applications can be built
Recap 2: last time at Stratford
Reference links on pdf copies of papers
Citebase, a new interface to the
scholarly literature
Citebase, a citation-ranked
search engine
http://citebase.eprints.org/
“Google for the refereed literature”
Citebase is based on an open citation database
• Harvests metadata using OAI-PMH
• Extracts reference lists from arXiv papers
• Provides impact (and other)-ranked search based on reference data
• Re-exports metadata + references
Evaluating Citebase
The evaluation is aimed at users of arXiv, and all others who use
bibliographic services to access the refereed journal literature.
How you can contribute. Find the evaluation form at
http://www.ecs.soton.ac.uk/~aw01r/citebase/evalForm1.htm
Aims of the evaluation:
• Discover the user’s awareness of related services
• Assess usability with a practical exercise
• Invite the user’s views on the main features
• Assess the level of user satisfaction with the service
Citebase: further developments
• OpenURL-enabled: pointing Citebase links at library and journal
services
• Google interface using DP9: getting Citebase results, and open
archives, into Google
• Metadata format and XML schema for citations: making
citation metadata harvestable via OAI-PMH. Possible formats
include:
– Academic Metadata Format: a ‘local profile’ format, some
collaborative experiments performed within OpCit
– OpenURL metadata, moving towards NISO standardisation
Recap 3: API for reference linking
getLinkedText – contents of the paper, reference-linked plus lots
of metadata for the paper
getReferenceList – this paper’s references
getCurrentCitationList – the list of works citing this paper
(best knowledge)
getMyData – metadata for this paper
Surrogates in the API
Based on an automatic analysis of the work, a surrogate for a
scholarly work (and of other works, for citations), consists of
the following three XML files:
• Bibliographic data for the scholarly work
• References contained in that work, and their contexts
within the full text
• Citations of that work
API: progress and evaluation
New features
• Citation interface added: surrogates can now collect citations
• Graphic citeref tool (demoed on ResearchIndex)
API tested on D-Lib Magazine and the ACM Digital Library. Try demo at
http://cs-tr.cs.cornell.edu/RefLinkingDemo/
Performance (in terms of accuracy of data extracted):
• Reference analysis: 86.7%
• Item analysis (bib data, contexts, and references for a given paper): 82.42 %
Implementability
• Simple interface: Surrogate s = new Surrogate (some-url)
• Portable: written in Java, has run in Solaris, Win2K, and NT4
• Installation: API source code plus public domain jar files
EPrints.org software
http://www.eprints.org/
Generates eprints archives that are compliant with the Open
Archives Protocol for Metadata Harvesting. EPrints is free (GPL)
software. It is aimed at organisations and communities.
EPrints v. 2.0 released February 2002 (now on v. 2.0.1, which fixes
bugs and typos). Features:
• Internationalised metadata stored as Unicode
• Support for multiple archives on one server
• Improved user interface
OpCit and OAI
•OIA aggregator: collecting and caching the results from OAI
data providers to improve the efficiency of data harvesting
• OAI infrastructure: proxies, caches, gateways. Improve
interoperability, scalability and reliability of OAI services.
Joint work with Old Dominion University, see paper A Scalable
Architecture for Harvest-Based Digital Libraries - The
ODU/Southampton Experiments
http://arxiv.org/abs/cs.DL/0205071
• OAI Registration and Validation work is performed at Cornell
EPrints and OAI
• EPrints feeds repository URLs straight into the OAI
registration process (if so desired by the EPrints administrator)
• A scan of the OAI database of registered sites shows many
sites use EPrints software to create repositories
www.openarchives.org/Register/BrowseSites.pl
A repository administrator’s view
of OAI
“As we have introduced our repository to our faculty and staff, we
have emphasized the point that because they would be depositing
their material in an OAI-compliant archive, it would automatically
and painlessly be discoverable from various other points around the
globe. Luckily, we were right.”
Roy Tennant, eScholarship, California Digital Library, June 2002
OpCit user surveys and data mining
Maximising impact
Maximising access
Results from Mining the Social Life of an Eprint Archive
http://opcit.eprints.org/tdb198/opcit/
When interoperability is not enough: show authors what users do when
open access services are available
Key project tasks remaining
• Evaluation and reporting of the results
• Programmer's guide to using the API
• Journal and conference papers
• Final reports to JISC and NSF
After OpCit
OpCit formally ends in September 2002, but the
following services will continue to be developed
• Citebase
• EPrints.org
• OAI
What we have achieved; what we
have learned
• OAI is gathering momentum
• Software for building repositories is available
• Institutional archives are being created, but need to be filled by
authors
• Attracting authors requires evidence of real services that will improve
the visibility and impact of their works
• Such services are now available. Citation-ranked search and
reference linking are examples of OAI services that offer this
• The infrastructure supporting OAI services continues to be enhanced
• Resource discovery and current awareness are exemplar OAI services
now. Future services may include preservation risk management, and
personalization
Credits
Other contributors to the project include
• Technical development at Southampton is directed by Les Carr
• Research at Cornell by Donna Bergmark
• EPrints.org software is being developed by Chris Gutteridge
• CiteBase is produced and managed by Tim Brody
• Project manager is Steve Hitchcock
A copy of these slides can be found on the OpCit Web site
http://opcit.eprints.org/. Look for Papers and Presentations
Contact Steve Hitchcock: [email protected]