Transcript Harvesting

Digital Libraries:
Harvesting:
OpenArchivesInitiative (OAI)
I. OAI
• aims and concepts
• history
• organisation
II. Harvesting
• definitions
• data provider and service provider
• OAI-PMH
OAI: Aims and concepts
• increasing the availability of
scholarly communication by
increasing easier access to digital
material
• easier access is provided by developing
technological frameworks and standards (for example
OAI-PMH)
• access should be independent of the type of
content offered and the economic mechanisems
surrounding the content.
• OIA is committed to exploring and enabling a
new and broader range of application
• world-wide consolidation of scholarly archives
• free access to the archives (at least: metadata)
• consistent interfaces for archives and service provider
• low barrier protocol / effortless implementation (e.g.,
because based on HTTP, XML, DC)
History
• independent development of different solutions
• Santa Fe Meeting 1999
• OAI-PMH= mechanism for harvesting
metadata records from one system to another
system from a Data Provider to a Service
Provider; it‘s not a search protocol, but a base
layer on which to build other services
Organisation
OAI is supported by Digital Library Federation, the Coalition for
Network Information and from the National Science Foundation
• Steering Committee
• Executive Committee
• Set of Technical Committees
Harvesting: definition
Harvesting is the gathering of metadata from a number
of distributed repositories into a combined data store
Data Provider: deposit and publishing of resources
in a repository and expose for harvesting the
metadata of resources
Service Provider: harvest metadata from Data
Provider use metadata for purpose of providing one or
more services across all data
Multiple service provider can harvest for multiple Data Providers
Aggregator is a Data Provider and a Service Provider
Harvesting and Searching
Harvesting is defined by the OAI-Protocol for
Metadata Harvesting (OAI-PMH)
• OAI-PMH is based on
HTTP
• 6 request types =verbs
• responses are encoded
in XML
Flow control
Request Types:
• Identify
•ListMetadataFormats
•ListSets
•ListIdentifiers
•ListRecords
•GetRecords
example
archive.org/oai-script?verb=Identify
Response to
ListIdentifiers
request
Response to
GetRecord request
Example: Data Provider
Resumption token
• http://www.openarchives.org/
• http://www.oaforum.org/tutorial/