Technology and Metadata for Libraries

Download Report

Transcript Technology and Metadata for Libraries

OCLC and FRBR: directions and
research results
OCLC Online Computer Library Center
Lorcan Dempsey
with contributions from
Diane Vizine-Goetz, Ed O’Neill, Thom Hickey
and Eric Childress
!
Revolution or Evolution? The impact of FRBR (Functional
Requirements for Bibliographic Records)
Organized by the Australian Committee on Cataloging.
Melbourne Convention Centre, 2 February 2004
Click to edit Master title style
Overview

FRBR and OCLC

OCLC research work

OCLC production plans

Some issues
FRBR and OCLC

Long standing interest in work-based
approaches
– The Humphry Clinker problem

Strong practical interest
–
–
–
–
–
End-user presentation
ILL
Cataloging – help find records
Collection analysis
Data enrichment
OCLC Research and FRBR

Mining the data …
– Ed O’Neill

Algorithmically FRBRizing
– Thom Hickey

Work-based prototypes
– Diane Vizine-Goetz
– Thom Hickey
OCLC Online Computer Library Center
Mining the data
Analyzing representations of a single work in
detail.
Tested OCLC Research conversion algorithm
against 1000 works.
Click to edit Master title style
Types of Works



Elemental Works have only a single
manifestation (78 %)
Simple Works have only a single expression but
multiple manifestations (16 %)
Complex Works have multiple expression (6 %)
Principal Types of Complex Works

Translations

Augmented

Revised

Collected/Selected
Translations


All translations are expressions
Other types of complex works frequently
include translations
Typical Augmented
Work
The Expedition
of Humphry Clinker

48 Expressions

114 Manifestations

Expressions created by
augmentation with: notes,
introductions, illustrations,
bibliographies,
glossaries, etc.
Typical Revised
Work

1st and 2nd Editions are by
John Phillip Immroth
• 3rd and 4th editions are
by Lois Mai Chan and
“Immroth’s” was added to
the title
Collected Works


A collection of items each of which is a
distinct intellectual or artistic creation; a
collection of works
50% of ‘collected works’ explicitly list
component works.
And …

Expressions not clear.

Bring out the differences that matter.


Retrospective activity constrained by
available bibliographic data.
Empirical work will support ongoing
clarification of the model (Working group
on the expression entity)
OCLC Online Computer Library Center
Algorithmically ‘FRBRizing’
The OCLC Research work set algorithm
Click to edit Master title style
Our Approach

Concentrating on work-level
– Problems with expression-level clusters

Efficient, maintainable, understandable

Useful matches with correct cataloging
– Err on the side of missed matches
– Some accommodation of frequent variants (e.g.
Shakespeare’s Hamlet = Hamlet)

Compare with manually clustered
– Reliable at work level. Expression level not clear
enough.
The Algorithm

A key is generated for each record

Extract author, title
– Look up in NACO authority file
– Added entry information as needed

Form a key from bibliographic record
– Author, title, added entry information
– These can be sorted, compared
Results

Manual estimate: 1.5 manifestations/work
in WorldCat

Algorithm: ~1.27

25,000 clusters have >20 records

415,000 clusters have >4 records

30% records and 50% of holdings are in a
cluster
OCLC Online Computer Library Center
Work-based prototypes
FictionFinder
XISBN
Click to edit Master title style
FictionFinder


A prototype system of 2.6+ million
bibliographic records for fiction clustered
according to the OCLC FRBR work set
algorithm
Uses the FRBR model to organize, index,
and display bibliographic elements of
potential interest to users
Fiction Subset

2,665,662 WorldCat records (fiction
indicator)

1,758,479 work clusters

1.5 records/cluster

3,866 clusters have 20 or more records

50,540 clusters have 5 or more records
Most widely held fiction works
Holdings
M’stations
Key
29,043
692
twain, mark\1835 1910/adventures of huckleberry finn
26,088
1,267
carroll, lewis\1832 1898/alices adventures in wonderland
20,843
640
twain, mark\1835 1910/adventures of tom sawyer
19,410
1,341
defoe, daniel\1661 1731/robinson crusoe
18,566
983
cervantes saavedra, miguel de\1547 1616/don quixote
18,492
836
stevenson, robert louis\1850 1894/treasure island
18,123
526
dickens, charles\1812 1870/christmas carol
18,100
278
crane, stephen\1871 1900/red badge of courage
17,761
525
bronte, charlotte\1816 1855/ Jane Eyre
17,499
332
chekhov, anton pavlovich\1860 1904/short stories
FictionFinder & FRBR


Information that applies to all expressions
of a given work, such as summaries,
genre terms, and subjects given
precedence in work/expression-level
screen displays.
Because of the difficulty of consistently
identifying expressions, manifestations are
organized by language of expression
Work display
Work/expression display
FictionFinder & FRBR

Some characteristics of an expression, such as
expression title, e.g.,
– Harry Potter and the Philosopher's Stone v.s
– Harry Potter and the Sorcerer’s Stone
are presented at the Work/Expression level

Other less clear-cut distinctions between
expressions & manifestations, such as Braille
and electronic book versions are presented at
both the Work/Expression level and the
Manifestation level.
Work/expression/manifestation display
xISBN

An experimental web service:
– xISBN server receives a single ISBN and returns a list of all
ISBNs for the work cluster
– Designed for machine-to-machine data exchange
– Can return list in XML or XHTML

Supports automatic expansion of ISBN searches:
– Check user ILL requests against all editions/versions in OPAC
– Use xISBN bookmarklet to find local library’s editions when user
finds any edition of item on Amazon, etc.
– Quickly check OPAC for all editions/versions during
selection/acquisitions/gift book processing
xISBN
Eucalyptus / Murray Bail 1998 Melbourne : Text Pub.
ISBN: 1875847634
OCLC FRBR
Work-Set
Algorithm
http://labs.oclc.org/xisbn/1875847634
xISBN
table builder
xISBN
server
work cluster 1
ISBN 1
ISBN 2
ISBN 3
work cluster 3
ISBN 8
ISBN 9
ISBN 10
work cluster 2
ISBN 5
ISBN 6
ISBN 7
<?xml version="1.0" encoding="UTF-8" ?>
- <idlist>
<isbn>1875847634</isbn>
<isbn>1860464947</isbn>
<isbn>1860464955</isbn>
<isbn>963859313x</isbn>
<isbn>2221087615</isbn>
<isbn>9532060065</isbn>
<isbn>9657120055</isbn>
</idlist>
Eucalyptus 1998 Melbourne : Text Pub.
Eucalyptus 1998 London : Harvill Press
Eucalyptus 1999 London : Panther
Eukaliptusz 1999 Budapest : Ulpius-ház [Hungarian]
Eucalyptus 1999 Paris : R. Laffont [French]
Eukaliptus 1999 Zagreb : Meandar [Croatian]
Ekaliptus 2001 Tel Aviv : Hargol [Hebrew]
Searching
for the book
on Amazon
LibraryLookup bookmarklet
LibraryLookup
http://www.amazon.co.uk/exec/obidos/A
SIN/1860464955/qid=1075134526/sr=11/ref=sr_1_10_1/202-6426661-8213436
Single
ISBN
Is the book at my library?
xISBN bookmarklet
LibraryLookup
http://www.amazon.co.uk/exec/obidos/A
SIN/1860464955/qid=1075134526/sr=11/ref=sr_1_10_1/202-6426661-8213436
xISBN
server
Multiple
ISBNs
ADDED
ADDED
ADDED
ADDED
ADDED
xISBN
Is the book at my library?
OCLC production plans

FRBR in FirstSearch (end-user searching)
– End 2004 as part of broader searching enhancement.
– Present users with view most relevent to them (work,
manifestation, …)

FRBR and cataloging
– Interested in potential for ‘FRBRization’ services
– Use FRBR as aid to finding cataloging copy
– FRBR view of cataloging yet to be discussed.
Some issues




Data. Variations in cataloging practice and errors
or omissions in transcription and input lead to
false clusters
Systems. Support in library management and
other systems.
Agreement and shared practice. Theoretical
discussion needs to be informed by practice. The
detail!
Communications format. How to share works etc.
Different internal implementations.
Further information
www.oclc.org/research
Projects
Publications
ResearchWorks (soon)
Software (algorithm)