Protocol for Metadata Harvesting
Download
Report
Transcript Protocol for Metadata Harvesting
Open Archives Iniative –
Protocol for Metadata Harvesting
Iztok Kavkler, University of Ljubljana
Some slides by
Stefaan Ternier, KUL
Bram Vandenputte, KUL
Joris Klerkx, KUL
What is OAI?
Harvesting standard, documented at
http://www.openarchives.org/OAI/openarchivesprotocol.html
Seven service verbs
–
–
–
–
–
–
Identify
ListMetadataFormats
GetRecord
ListRecords
ListIdentifiers
ListSets
Allows multiple metadata formats
–
DC (Dublin core) format mandatory
2
How OAI works
OAI “VERBS”
–
–
–
–
–
–
Identify
ListMetadataFormats
GetRecord
ListIdentifiers
ListRecords
ListSets
Service Provider
Metadata Provider
H
HTTP Request
A
(OAI Verb)
R
V
E OAI
S
T
HTTP Response
E
(Valid XML)
R
R
E
P
O
OAI S
I
T
O
R
Y
3
Try it
Install Apache-Tomcat or any other Java
servlet container
Download WAR file from
http://fire.eun.org/Iztok/OAILREApp.war
Deploy WAR
Demo html
http://localhost:8080/OAILREApp/
Or type a service verb, e.g.
http://localhost:8080/OAILREApp/oaiHandler?verb=Identify
4
The raw XML
By default, the resulting XML has stylesheet
attached for pretty rendering
To remove the stylesheet comment the line
OAIHandler.styleSheet=testoai/oaicat.xsl
in file
oaicat.properties (in WAR file or the web-app dir)
5
OAI XML example
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" ...>
<responseDate>2007-06-11T06:48:58Z</responseDate>
<request metadataPrefix="oai_lom"
verb="ListRecords">http://localhost:8080/OAILREApp/oaiHandler</request>
<ListRecords>
<record>
<header>
<identifier>oai:oai.xyz-repository.com:exercises/112553</identifier>
<datestamp>2007-06-09T22:38:28Z</datestamp>
<setSpec>exercises</setSpec>
</header>
<metadata>
<lom xmlns=...> ... </lom>
</metadata>
</record>
....
<resumptionToken expirationDate="2007-06-11T07:48:58Z"
completeListSize="42" cursor="10">1181544538265</resumptionToken>
</ListRecords>
</OAI-PMH>
6
OAICat - a Java implementation
OAICat home at
http://www.oclc.org/research/software/oai/cat.htm
Takes care of
–
–
web service details
OAI XML specification
The implementer has to provide three classes
–
–
–
RepositoryOAICatalog
RepositoryRecordFactory
Repository2oai_dc (lom, ...) - usually more than
one
7
A sample implementation
(Source code and libs in
http://fire.eun.org/Iztok/OAILREApp.zip)
Create a new web module
Add servlet oaiHandler to web.xml
<servlet>
<servlet-name>LreOAIHandler</servlet-name>
<servlet-class>ORG.oclc.oai.server.OAIHandler</servlet-class>
<load-on-startup>5</load-on-startup>
</servlet>
<servlet-mapping>
<servlet-name>LreOAIHandler</servlet-name>
<url-pattern>/oaiHandler</url-pattern>
</servlet-mapping>
8
(cont)
Define properties file location
<context-param>
<param-name>properties</param-name>
<param-value>oaicat.properties</param-value>
</context-param>
Welcome file for testing
<welcome-file-list>
<welcome-file>testoai/index.html</welcome-file>
</welcome-file-list>
9
Sample record
A record with basic fields
id, url, title, descr and date
SampleOAICatalog contains an array with 3
sample records
10
SampleOAICatalog.listIdentifiers
Parameters
–
from – date to harvest from (String in iso8601
format)
–
–
to – date to harvest to
set – a set name, list only records from this set (if
null, list all records)
–
date or datetime - depends on granularity
set names classify objects in natural groups
every record may belong to multiple sets (or none)
metadaPrefix – list only records that support this
format (sample formats: oai_dc, oai_lom, ...)
11
SampleOAICatalog.listIdentifiers
Must return a map with to fields
–
–
headers – a String iterator of OAI headers
identifiers – a String iterator of OAI identifiers
Both created by the call (rec is a SampleRecord)
String[] header = getRecordFactory().createHeader(rec);
headers.add(header[0]);
identifiers.add(header[1]);
Create result
Map<String, Object> listIdMap = new HashMap<String, Object>();
listIdMap.put("headers", headers.iterator());
listIdMap.put("identifiers", identifiers.iterator());
return listIdMap;
12
getRecordFactory().createHeader(rec)
Creates header by calling the methods in
SampleRecordFactory
String getOAIIdentifier(Object rec)
–
String getDatestamp(Object rec)
–
return full oai identifier “oai:oay.rep.com:id001”
returns date in iso8601 format
Iterator<String> getSetSpecs (Object rec)
ArrayList<String> list = new ArrayList<String>();
list.add(...);
return list.iterator();
Iterator<String> getAbouts (Object rec)
String fromOAIIdentifier(String id)
–
helper method – convert id to a local id
13
SampleOAICatalog.listSets
takes no parameters, returns the list of all
sets in this repository
–
each ListIdentifiers or ListRecords query may
contain a set name, limiting the results to just one
set
14
SampleOAICatalog.getSchemaLocations
like GetRecord, but returns the Vector of all
metadata schema locations the record
supports
–
to obtain them, just call
getRecordFactory().getSchemaLocations(rec);
15
SampleOAICatalog.getRecord
String getRecord(String id, String metadataPrefix)
–
–
–
–
find record and convert it to xml string (<record> element)
id is in global format – to get local value call
getRecordFactory().fromOAIIdentifier(id)
throw IdDoesNotExistException if record not found
to generate XML use constructRecord
constructRecord(rec, metadataPrefix)
16
SampleOAICatalog.listRecords
just like ListIdentifiers, only generates a list of
XML <record> elements
return a map with one element
Map<String, Object> listRecMap = new HashMap<String, Object>();
listRecMap.put(“records", records.iterator());
return listRecMap;
17
Crosswalks
Conversions of native record type to XML like
Sample2oai_lom or Sample2oai_dc
Only two methods per implementation
–
–
boolean isAvailableFor(Object rec)
String createMetadata(Object rec)
SampleRecord record = (SampleRecord) rec;
return LOMFormat.writeStringWithSchema(record.toLOM());
throw CannotDisseminateFormatException if the
metadata not available in this format
18
SampleRecord.toLOM
uses LOM-j lib to quickly hack together LOM
http://sourceforge.net/projects/lom-j/
–
automatic serialization/deserialization of LOM and
DC XML formats
Example
lom.newGeneral().newIdentifier(0).newCatalog().setString("lre");
lom.newGeneral().newIdentifier(0).newEntry().setString("sample:" + id);
lom.newTechnical().newLocation(-1).setString(url);
lom.newGeneral().newTitle().newString(0).newLanguage().setValue("en");
lom.newGeneral().newTitle().newString(0).setString(title);
19
Resumption
A repository usually has fixed limit on the
numer of records to return in one call
–
–
–
if there are more available, it returns a resumption
token, allowing to receive next packet
Implemented by functions
listIdentifiers(String resumptionToken) ,
listRecords(String resumptionToken)
see XYZOAICatalog for details
20
References
http://www.openarchives.org/OAI/openarchivesprotocol.html
http://www.fmf.uni-lj.si/~kavkler/
http://www.oclc.org/research/software/oai/cat.htm
http://www.cs.kuleuven.ac.be/~hmdb/SqiOaiMelt
http://sourceforge.net/projects/lom-j/
SIO/Trubar OAI url
http://sio.edus.si/LreTomcat/
21