Prospecting in the library data mines
Download
Report
Transcript Prospecting in the library data mines
OCLC Programs & Research
Prospecting in the
library data mines
Brian Lavoie
Consulting Research Scientist
OCLC Programs & Research
Annual Partners Meeting
Washington, DC
June 4, 2007
Making data work harder
Data is an asset
Informs planning and decision-making
Drives new forms of services
Libraries have many data assets
Bibliographic, holdings, usage, reference inquiries, …
Opportunities to collect data increase in network
spaces …
Web site traffic, click-through patterns, e-usage, …
Make data work harder
Use library data in innovative ways to create value
Annual Partners Meeting
OCLC Programs & Research
2
Prospecting the library data mines
Data mining & OCLC Research
Networks of collaboration and coordination
Decisions taken in “system-wide context”
Focus on resources of “system”
Mass digitization, cooperative print storage, shared
discovery environments, …
As library networks develop and expand,
opportunities arise to create value through:
Collective action
Aligning local collections with system-wide environment
Data is context
Research area focused on data mining activities
Aggregate collections
“System-wide collection” (as represented in WorldCat)
Annual Partners Meeting
OCLC Programs & Research
3
Prospecting the library data mines
Managing the collective collection
Mass digitization
“Last copies”
Long tail
Annual Partners Meeting
OCLC Programs & Research
4
Prospecting the library data mines
Mass digitization
Google Book Search
(aka Google Print for Libraries)
Aggregate collection of
digitized print books
(combined holdings of
Harvard, Michigan, Oxford,
NYPL, and Stanford)
Data-mining to provide
empirical context
to inform communitywide dialog
http://www.dlib.org/dlib/september05/lavoie/09lavoie.html
Annual Partners Meeting
OCLC Programs & Research
5
Prospecting the library data mines
“Rareness is common”
System-wide print book collection:
~32 million print books
5%
Held by > 100
3%
Held by 51 - 100
5%
Held by 26 - 50
37%
Held by 1
20%
Held by 6 - 25
30%
Held by 2 - 5
Identify rare &
unique materials
in system-wide
collection
(“last copies”)
Data-mining to better understand nature of the “collective collection”
Annual Partners Meeting
OCLC Programs & Research
6
Prospecting the library data mines
The Library Long Tail
Number of Holdings
(using holdings as measure of popularity)
HEAD: Top 10% of WorldCat records (ranked by holdings)
account for 80% of total WorldCat holdings
LONG TAIL: Bottom 90% of WorldCat records
(ranked by holdings) account for 20% of total
WorldCat holdings
Items ranked by system-wide popularity
HEAD: Small proportion of items account for lion’s share of collecting activity
LONG TAIL: Everything else spread out across Long Tail of diffuse collecting activity
Data-mining to inform strategies/policies aimed at optimizing
system-wide supply & demand for library materials
Annual Partners Meeting
OCLC Programs & Research
7
Prospecting the library data mines
Others …
Registry of Copyright Evidence
New York Art Museum study
Annual Partners Meeting
OCLC Programs & Research
8
Prospecting the library data mines
Shared print storage
Use library data to inform decision-making:
Data about library assets (bibliographic)
Data about choices involving these assets (holdings, circ., ILL)
System-wide aggregation (larger aggregation = richer context)
Shared print storage decision-making:
Data about assets (local inventories of print materials)
Data about system-wide availability (holdings)
Data about usage (local & system-wide)
Role of Research:
Data collection
Data-mining analysis in support of project needs
Inform community dialog on shared print storage issues
Analyze “collective collection” in shared print context
Support development of effective print storage strategies
Standardize analysis to maximize applicability/re-use
Annual Partners Meeting
OCLC Programs & Research
9
Prospecting the library data mines