Transcript Document
Measuring Scholarly
Communication on the Web
Mike Thelwall
Statistical Cybermetrics Research Group
University of Wolverhampton, UK
Bibliometric Analysis in Science and Research: Jülich 2003
By Olle Persson, Inforsk, Dep of Sociology Umeå, SE-901 87 Umeå
Contents
1.
2.
3.
4.
Data collection
Data processing
Analysis
Results
Why analyse scholarly
communication on the Web?
Ensure that the Web is efficiently used for
research communication
Identify trends in informal scholarly
communication
Suggest improvements in search tools
Exploratory research: the Web is important and a
valid object for scientific study
How can Web scholarly
communication be tracked?
Web server logs
Hyperlinks
Good source, but restricted to individual sites
Secondary source of information – few users actually
create hyperlinks
Commercial search engines can be used for raw data
about the ‘whole’ Web
Analogies with bibliometric citation studies
Hyperlink studies will be discussed in this article
Methodologies: Data collection
Web crawler
AltaVista advanced queries
link:fz-juelich.de AND host:ac.uk
AllTheWeb advanced queries
Google
Does not support same level of Boolean
querying (even with the API)
Methodologies: Data processing 1
Link counts to target universities
Colink counts
Inter-site links only
B and C are colinked
Couplings
D and E are coupled
D
A
B
C
E
F
Methodologies: Data processing 2
Alternative Document Models
E.g. count links between domains (ignoring
multiple links) instead of pages
P1
P2
P3
www.wlv.ac.uk
P4
P5
P6
www.albany.edu
Methodologies: Data analysis
Statistical techniques for evaluating results
Correlation with known research performance
measures
Factor analysis, Multi-Dimensional Scaling,
Cluster analysis for patterns
Simple graphical techniques
Techniques from Communication
Networks research / Geography
Results
For inter-university linking, interdepartmental linking and journal Web
sites
Inter-university links
Counts of links to universities within a
country can correlate significantly with
measures of research productivity
Geographic factors also play a part
Links between universities are created for a
wide range of reasons, rarely to cite
research
Links to UK universities against their
research productivity
The reason for the
strong correlation is
the quantity of Web
publication, not its
quality
This is different to
citation analysis
Expected link counts against distance between UK
universities
Link Creation Motivations
For links between UK university Web sites
Link counts mainly represent a wide range of
types of informal scholarly communication
Less than 1% equivalent to citations
About 90% are related to scholarly activity, including
research and teaching
The rest include recreational and administrative links
But not cognitive connections or online impact of the
research itself
It is difficult to interpret link count values
Journal Web Sites
Can calculate Web versions of Journal
Impact Factors?
Web impact factors correlate with Journal
Impact Factors within a discipline
Also affected by Web site age and contents
This is complicated by the existence of
publishers’ digital libraries and digital
meta-libraries
Disciplinary Research Linking
In the US, links to chemistry and psychology
departments from other departments associate
with total research impact
No evidence of a significant geographic trend
Disciplinary differences in the extent of
interlinking: history Web use is very low
{Research with Rong Tang}
Linguistic Factors in EU
Communication
English the dominant language for Web sites in
the Western EU
In a typical country, 50% of pages are in the
national language(s) and 50% in English
Non-English speaking extensively interlink in
English
{Research with Rong Tang}
Mapping patterns of international
communication
Counts of links
between AsiaPacific universities
are represented by
arrow thickness.
{Research with
Alastair Smith,
VUW, NZ}
The Future
Mapping patterns of academic Web
communication
Individuals exploiting AltaVista to investigate
online perceptions of their site
Developing data mining tools to extract
information and predict based upon link patterns
Combining links with text-based approaches
(Computer Science)
Improved understanding of informal scholarly
communication on the Web
More effective use of the Web by scholars, e.g. via
PhD training