Prospecting in the library data mines

Download Report

Transcript Prospecting in the library data mines

OCLC Programs & Research
Prospecting in the
library data mines
Brian Lavoie
Consulting Research Scientist
OCLC Programs & Research
Annual Partners Meeting
Washington, DC
June 4, 2007
Making data work harder
 Data is an asset
 Informs planning and decision-making
 Drives new forms of services
 Libraries have many data assets
 Bibliographic, holdings, usage, reference inquiries, …
 Opportunities to collect data increase in network
spaces …
 Web site traffic, click-through patterns, e-usage, …
 Make data work harder
 Use library data in innovative ways to create value
Annual Partners Meeting
OCLC Programs & Research
2
Prospecting the library data mines
Data mining & OCLC Research
 Networks of collaboration and coordination
 Decisions taken in “system-wide context”
 Focus on resources of “system”
 Mass digitization, cooperative print storage, shared
discovery environments, …
 As library networks develop and expand,
opportunities arise to create value through:
 Collective action
 Aligning local collections with system-wide environment
 Data is context
 Research area focused on data mining activities
 Aggregate collections
 “System-wide collection” (as represented in WorldCat)
Annual Partners Meeting
OCLC Programs & Research
3
Prospecting the library data mines
Managing the collective collection
 Mass digitization
 “Last copies”
 Long tail
Annual Partners Meeting
OCLC Programs & Research
4
Prospecting the library data mines
Mass digitization
Google Book Search
(aka Google Print for Libraries)
Aggregate collection of
digitized print books
(combined holdings of
Harvard, Michigan, Oxford,
NYPL, and Stanford)
Data-mining to provide
empirical context
to inform communitywide dialog
http://www.dlib.org/dlib/september05/lavoie/09lavoie.html
Annual Partners Meeting
OCLC Programs & Research
5
Prospecting the library data mines
“Rareness is common”
System-wide print book collection:
~32 million print books
5%
Held by > 100
3%
Held by 51 - 100
5%
Held by 26 - 50
37%
Held by 1
20%
Held by 6 - 25
30%
Held by 2 - 5
Identify rare &
unique materials
in system-wide
collection
(“last copies”)
Data-mining to better understand nature of the “collective collection”
Annual Partners Meeting
OCLC Programs & Research
6
Prospecting the library data mines
The Library Long Tail
Number of Holdings
(using holdings as measure of popularity)
HEAD: Top 10% of WorldCat records (ranked by holdings)
account for 80% of total WorldCat holdings
LONG TAIL: Bottom 90% of WorldCat records
(ranked by holdings) account for 20% of total
WorldCat holdings
Items ranked by system-wide popularity
HEAD: Small proportion of items account for lion’s share of collecting activity
LONG TAIL: Everything else spread out across Long Tail of diffuse collecting activity
Data-mining to inform strategies/policies aimed at optimizing
system-wide supply & demand for library materials
Annual Partners Meeting
OCLC Programs & Research
7
Prospecting the library data mines
Others …
 Registry of Copyright Evidence
 New York Art Museum study
Annual Partners Meeting
OCLC Programs & Research
8
Prospecting the library data mines
Shared print storage
 Use library data to inform decision-making:
 Data about library assets (bibliographic)
 Data about choices involving these assets (holdings, circ., ILL)
 System-wide aggregation (larger aggregation = richer context)
 Shared print storage decision-making:
 Data about assets (local inventories of print materials)
 Data about system-wide availability (holdings)
 Data about usage (local & system-wide)
 Role of Research:
 Data collection
 Data-mining analysis in support of project needs
 Inform community dialog on shared print storage issues
 Analyze “collective collection” in shared print context
 Support development of effective print storage strategies
 Standardize analysis to maximize applicability/re-use
Annual Partners Meeting
OCLC Programs & Research
9
Prospecting the library data mines