presentation

Download Report

Transcript presentation

The OAI-PMH Harvester Plugin
for
The Omeka Content Management System
LIS 654
BUILDING DIGITAL LIBRARIES
FALL 2011
NOVEMBER 03, 2011
JAMES R. GRIFFIN III
100356891
Defining the OAI-PMH
•
"The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is
a low-barrier mechanism for repository interoperability. Data Providers are
repositories that expose structured metadata via OAI-PMH. Service Providers
then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a
set of six verbs or services that are invoked within HTTP.“1
• Thus, the OAI-PMH is a means by which to enable digital
repositories to openly and freely exchange and share
metadata detailing their collections with the world.
1Open
archives initiative protocol for metadata harvesting. (2011). Retrieved from http://www.openarchives.org/pmh/
Installing the OAI-PMH Harvester Plugin for Omeka
1.
Download the plug-in from the following source:
http://omeka.org/add-ons/plugins/oai-pmh-harvester/
(Note: This is a ZIP archive [like other plug-ins for Omeka])
2.
Upload the ZIP archive to the server wotan
(Note: This can be done using any scp client such as WinSCP)
3.
Decompress the archive into the appropriate directory for your
installation of Omeka
(Note: This is typically the path /home/[USER NAME]/omeka/plugins/)
4.
Using the web interface, install the harvester plug-in
The Purpose Behind the OAI-PMH
 Metadata shared using the OAI-PMH is structured in a uniform
manner, ensuring that metadata for all collections shared on the World
Wide Web can be harvested regardless of the specific application
 For example, one institution can archive content using the Drupal
application as a repository, while another institution can archive
content using Omeka
 Using the OAI-PMH protocol, both repositories can be configured to
exchange information detailing the contents of their archived
collections.
Repository Interoperability
 Unfortunately, not every digital repository has been
developed using the same framework (or even the
same programming language[s])
 Thus, if OAI-PMH were to attempt to institute
language-specific standards for exchanging metadata,
inevitably some repository application would be
developed in an unsupported language
 The solution to this is the software object
OAI-PMH Metadata Objects
 For the purposes of this presentation, a software object
is a means by which to structure data in a languageindependent manner
 As the OAI-PMH Initiative seeks to establish their
contribution as the definitive standard for the
exchange of repository metadata, this will increase the
likelihood that future repository applications (some of
which will be written in currently non-existent [i.e.
future] languages) will still employ this protocol
OAI-PMH Metadata Objects
 The metadata objects are transferred over the HyperText
Transfer Protocol (HTTP)
 This means that no platform-specific binaries must be
employed in order to harvest OAI-PMH-compliant
metadata
 (e.g. Anyone can access information detailing the contents
of these archived collections using a web browser – you do
not need to purchase or install any additional software)
OAI-PMH Metadata Objects
 The metadata objects are bound to/serialized using the
eXtensible Markup Language (XML)
 This is mentioned for the sake of those who are enrolled in
LIS650, those who have previously taken LIS650, or those
who are familiar with web design
 For those unfamiliar with XML or web design itself, this
simply means that this metadata can be extended and
manipulated easily by web designers as well as developers
An Instance of an OAI-PMH Metadata Object
 In order to generate OAI-PMH-compliant metadata objects for one’s
collection, one must first install and configure another plugin:
The OAI-PMH Repository
(http://omeka.org/add-ons/plugins/oai-pmh-repository/)
 Retrieving metadata from the repository:
http://wotan.liu.edu/omeka/jgriffin/oai-pmh-repository/request?verb=ListRecords&metadataPrefix=oai_dc
 The parameter “verb” specifies to wotan precisely what is being requested

(e.g. A list of my collections – “ListRecord”)
 The parameter “metadataPrefix” specifies to wotan precisely which
metadata framework to use in the formatting of the response

(e.g. “oai_dc” is the OAI’s format which is based upon the Dublin Core framework)
An Instance of an OAI-PMH Metadata Object
This was retrieved by requesting the following resource:
http://wotan.liu.edu/omeka/jgriffin/oai-pmh-repository/request?verb=ListRecords&metadataPrefix=oai_dc
<OAI-PMH xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2011-11-03T19:46:59Z</responseDate> <!-- When I requested this object -->
<request verb="ListRecords" metadataPrefix="oai_dc"> <!-- Which parameters were passed to wotan -->
http://wotan.liu.edu/omeka/jgriffin/oai-pmh-repository/request
</request>
<ListRecords> <!-- A detailed listing of the collection records -->
<record>
<header>
<identifier>oai:wotan.liu.edu/omeka/jgriffin/:5</identifier>
<datestamp>2011-10-22T00:48:49Z</datestamp> <!– Record creation time -->
<setSpec>6</setSpec>
</header>
<metadata>
<oai_dc:dc
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/http://www.ope[...]">
<!-- The Dublin Core Elements -->
<dc:title>/src/bin/psql/psql.c</dc:title>
<dc:creator>Regents of the University of California</dc:creator>
<dc:publisher>
[…]
</metadata>
</record>
</ListRecords>
</OAI-PMH>
Harvesting Metadata from Remote Repositories in Omeka
 The plugin has its utility in its ability to directly import data
detailing items archived in a remote repository into one’s
own repository
 Conceptually, the mechanisms underlying this process are
similar to those used in the practice of “copy cataloging”
Harvesting Metadata from Remote Repositories in Omeka
 As previously specified, the server must be running an
OAI-PMH repository for the archived collections
 In order to demonstrate this, I can harvest from my own
OAI-PMH repository:
http://wotan.liu.edu/omeka/jgriffin/oai-pmh-repository/request
 …as well as from L’Université Rennes 2 de la Bibliothèque
Numérique*:
http://bibnum.univ-rennes2.fr/oai-pmh-repository/request?verb=ListRecords&metadataPrefix=oai_dc
*This source was specified by Sheila Brennan of the Roy Rosenzweig Center for History and New Media.
Please see http://omeka.org/blog/2011/08/29/do-you-share-your-data/
Harvesting Metadata from Remote Repositories in Omeka
 Metadata sets can be re-harvested or deleted
 While a set of records are being harvested, one is offered the
ability to “kill” the process
 Should there be problems regarding the memory required by
the harvester, one can modify the settings of the plugin

The “Memory Limit” field should only be modified if a harvest fails due
to an error.

The path for the PHP binary should always be ‘/usr/bin/php5’ on wotan
The OAI-PMH Harvester Plug-In for the Omeka Digital
Archive
 Questions?
 Comments?