Transcript ppt

Making data work harder
Lorcan Dempsey
OCLC Members Council
17 May 2005
OCoLR
20041025
#53928015
OCLCR
May 2005 Members’ Council
Web hub services
OWC
A comprehensive
discovery
experience
Yes
Predictable, often
immediate,
fulfilment
In progress
Data works
hard
Being improved
Open to
intermediate
consumers
In progress
Co-created with
users
Not yet
Presentation
Examples
Yes
Curioser
FAST
Yes
WorldCat
Wiki
May 2005 Members’ Council
May 2005 Members’ Council
Making data work hard
 The user experience: from search to rich browse
 Capturing user contribution
 Data mining
May 2005 Members’ Council
Context: value
 Amazoogle: we can add significant value. We should be
looking for organizational frameworks within which we can
do this.
 ROI: libraries invest in data but do not extract as much
value as they might from it. Unless we release more value,
then the argument for this investment becomes weaker.
The user experience
 Management intelligence

May 2005 Members’ Council
May 2005 Members’ Council
May 2005 Members’ Council
May 2005 Members’ Council
May 2005 Members’ Council
Top Sets for Fiction (Records)
Record
Keys
1,296
defoe, daniel\1661 1731/robinson crusoe
1,267
carroll, lewis\1832 1898/alices adventures in wonderland
971
cervantes saavedra, miguel de\1547 1616/don quixote
828
stevenson, robert louis\1850 1894/treasure island
689
twain, mark\1835 1910/adventures of huckleberry finn
624
twain, mark\1835 1910/adventures of tom sawyer
618
swift, jonathan\1667 1745/gullivers travels
May 2005 Members’ Council
FRBR & FAST
 FRBR
 FAST
‘Interim FRBR’ in OWC
 FRBR in research
projects






FictionFinder
Curioser
xISBN
Algorithm
Top 1000
FRBR in FirstSearch –
late this year
 Curioser ….

Moving FAST headings
into OpenWorldCat
 Experiment: mapping
Yahoo! categories to
FAST headings

Recognized value …
May 2005 Members’ Council
WIKI in WorldCat
 Capture user input in structured ways
May 2005 Members’ Council
Extending
Wiki’s utility
MetaWiki:
 supported markup:
wikitext
 structured data (e.g.,
MARC, METS, DC…)

Wiki:
 supported markup:

wikitext
 page editing:

a single text
block
 searches:

full text searching
 collections
managed:

one per wiki
 page editing:
a single text block, or,
 field level

 searches:
full text searching
 fielded searching

 collections managed:

one/multiple per MetaWiki
 Built on top of standards
(OAI, OpenURL, SRU)
May 2005 Members’ Council
Management intelligence: data mining
 Data
Bibliographic data
 Transaction logs
 …

 Need to mine this data for intelligence that creates value
for libraries and users
 OCLC Research undertaking a number of data-mining
projects aimed at:



Knowing more about the characteristics of library collections
Creating interesting and useful data displays
Generating intelligence to support library decision-making
May 2005 Members’ Council
Know Your Audience!
Holdings represent selection decisions by
librarians … implies there are about 1
billion individual selection decisions in the
WorldCat holdings file
?
Selections are made to serve the interests of
a library’s target community …
• Associate target community (audience level) to
particular library profiles - e.g., ARL, non-ARL
academic, public, K-12 school …
Implies: we can infer materials’ audience level
from holdings patterns, which in turn can support:
•
•
•
•
Collection management
Readers’ advisory services
Reference services
Information retrieval
Paper forthcoming!
May 2005 Members’ Council
The Implications of Google Libraries …
Potentially covers about one
third of print books in
WorldCat
~60 percent of total G5 books
held by only one of the Google
5
Less than 5 percent held by all
of the Google 5
~20 percent of total G5 print
books out of copyright
Paper forthcoming …
May 2005 Members’ Council
“Last Copy”: Identifying At-Risk
Materials
~23 million WorldCat records have only a
single holding attached
Libraries need to know what portions of
their collections are:
Rare … Rare and valuable …
“Last copy” (artifact and/or content)
Identification of rare materials essential
intelligence in support of storage, digitization,
and preservation decision-making
Data-mining study of Vanderbilt holdings in WorldCat:
• Identified 23,000 items held uniquely by Vanderbilt
• ~60 % are print books
• ~60 % produced prior to 1950; ~25 % produced after 1970
Paper forthcoming!
May 2005 Members’ Council
Looking at Library Print Book
Collections … Systematically
OCLC/Ithaka collaboration: Use WorldCat to characterize the
“system-wide” print book collection – i.e., aggregate print
book holdings in WorldCat
32 million print books, representing
26 million distinct works
Only about 120,000 works had both
print book and e-book manifestations
Half of print books published after
1977; more than 80% still “in copyright”
Rareness is common! Only a third of print books have
more than five holdings; half have two or less
Intelligence of this kind can help establish digitization priorities
and inform preservation planning
More information: http://www.oclc.org/research/presentations/lavoie/cni2005.ppt
May 2005 Members’ Council
Thank you!
OCLC Research:
http://www.oclc.org/research/