Transcript Document

What usage statistics say about
online user behaviour
Philip Davis, Librarian
Cornell University
Presented at the 5th Fiesole Collection Development Retreat, Oxford University, July 24, 2003.
What do these stats mean?
A. Monthly Statistics by Journal
Jan
Journal
2002
Subscribed Journal Usage
Feb
Mar
Apr
May
Jun
2002
2002
2002
2002
2002
Total
Use
A
100
131
136
193
80
56
696
B
0
0
0
0
10
0
10
C
8
6
39
5
88
368
514
Usage statistics do not tell us…




what is being downloaded
who did the downloading
why an article was downloaded
how many individuals are responsible for
the statistics
Why we can’t know everything



Patron confidentiality
Use IP address as a surrogate for “user”
Some IPs represent aggregate users




Library proxy server
Public computers in libraries and labs
Dial-in modem users
Some IPs are assigned dynamically
Results of two studies (ACS)

Article downloads by IP address


Previous studies have reported only aggregate
use analysis
How scientists find the articles they read

Using referral URL data. The location from
which users were referred to the ACS site.
Most users download few articles
300
200
100
0
1
3
2
5
4
7
6
9
8
11
10
13
12
15
14
Number of Downloads
17
16
19
18
20
…from few journals
800
600
400
200
0
1
3
2
5
4
7
6
9
8
11
10
13
12
15
14
16
17
19
18
21
20
Number of Journals
23
22
25
24
26
27
29
28
30
The relationship is quadratic
1200
Library Proxy Server
1000
Each point
represents a
“user”
800
600
400
200
N = 1283
0
Rsq = 0.6798
0
10
20
Number of Journals
30
In fact, its an inverse square law
40
x/y2
Library Proxy Server
30
20
10
N = 1283
0
Rsq = 0.6858
0
10
20
Number of Journals
30
Population size may be estimated
10000
5000
4000
3000
2000
Each point
represents a
journal
1000
500
400
300
200
100
50
40
30
20
10
Rsq = 0.9169
00
20
00
10
0
500
400
30
0
20
0
10
50
40
30
20
10
5
4
3
2
1
Number of Users
Analysis of individual use illustrates

Most users download few articles from few
journals


A small number of users have a very large
effect on total downloads
User population size may be estimated by
total use
Different paths to same destination
Referral Type
library catalog
bib database
e-journal list
web page
web search
email (web based)
article link
other
Total Referrals
Total
Referrals
2,482
2,372
1,813
1,108
996
592
571
15
9,949
Unique Referrals
IPs
per IP
552
324
405
190
491
79
204
9
1591
4.5
7.3
4.5
5.8
2
7.5
2.8
1.7
6.3
Web page referrals
Web Page Referral
ACS Journal Web Page
News
Department/lab
Faculty
Course web page
Commercial
Organization
Personal
Other
Total
Frequency Percent
366
272
200
75
43
31
21
19
81
1108
33
25
18
7
4
3
2
2
7
100
Most users referred infrequently
800
Frequency
600
400
200
0
1
3
2
5
4
7
6
9
8
11
10
13
12
15
14
Number of Referrals per IP
17
16
19
18
20
…from few sources
1200
1000
Frequency
800
600
400
200
0
1
2
3
4
5
6
7
8
9
Number of Unique Domain Referrals per IP
10
Total Referrals
Yielding same inverse square law
500
400
Each point
represents a
“user”
library proxy server
300
200
100
N = 1591
0
Rsq = 0.4107
0
10
20
Number of Domains per IP
30
In summary

Scientists will use many different pathways
to the same literature




But use few and consistent methods of referral
Underestimated the use of e-mail and
bookmarking as a source of referral
Underestimated bibliographic indexes
Overestimated importance of library catalog
Implications
Libraries
 Develop redundant
tools to facilitate
access to literature
Publishers
 Facilitate direct
linking to article
 Adoptions of linking
standards
“Save the time of the reader”
-- S.R. Ranganathan, from the Five Laws of Library Science
P. Davis and L. Solla. An IP-level analysis of usage
statistics for electronic journals in chemistry:
Making inferences about user-behavior. JASIST
54(11), 2003 in press.
P. Davis. Information seeking behavior of scientists: a
transaction log analysis of referral URLs. (in review,
JASIST, June 19, 2003).
http://people.cornell.edu/pages/pmd8/