Transcript Document

Analysing the link structures of
the Web sites of national
university systems
Mike Thelwall
Statistical Cybermetrics Research Group
University of Wolverhampton, UK
Why analyse university link
structures?



Obtain evidence of online impact of work
Identify trends in informal scholarly
communication
Basic research: the Web is important and a
valid object for scientific study
Methodology: Data collection


Web crawler
AltaVista advanced queries
host:york.ac.uk AND link:wlv.ac.uk


AllTheWeb advanced queries
Google

Does not support same level of Boolean
querying
Methodology: Data analysis 1

Link counts to target universities


Inter-site links only
Colink counts
Methodology: Data analysis 2

Alternative Document Models



Aggregate Web pages into documents based
upon site, domains or directories for link
counting
Can’t be done easily from search engine
data
Produces better results in some situations
than simple link counting
Methodology: Data analysis 3

Statistical techniques for evaluating results



Correlation with known research performance
measures
Factor analysis, Multi-Dimensional Scaling,
Cluster analysis for patterns
Techniques from Communication
Networks research
Methodology: Data analysis 4

Simple graphical techniques


Display linkages above a certain threshold
Community identification techniques from
computer science
Results 1: Links associate with
research



Counts of links to UK, Australian,
Taiwanese universities correlate
significantly with measures of research
productivity
Counts of links in China appear not to
Results are better with ADMs for the UK
but not Taiwan
Results 2: Most links are only
loosely related to research


A random sample of links between UK
university sites revealed over 90% had
some connection with scholarly activity,
including teaching and research.
Less than 1% were equivalent to citations
Results 3: Links are related to
geography

Interlinking between universities in the UK
decreases with geographic distance
Results 4: Universities cluster
by geographic region


This is clearest for Scotland but also for
other groupings, including Manchesterbased universities
Coherent clusters are difficult to extract
because of overlapping trends
Results 5: Linguistic factors in
EU communication
English the dominant language for Web sites in
the Western EU
 In a typical country, 50% of pages are in the
national language(s) and 50% in English
 Non-English speaking extensively interlink in
English
{Research with Rong Tang, SUNY Albany}

Results 6: Power laws in the Web

Academic Webs have a topology
dominated by power laws, including




Inlink counts
Outlink counts
Directed component sizes
Undirected component sizes
Results 6: Power laws in the Web
Results 6: Power laws in the Web
Results 7: Academic Web
Topology
Criticism




What do the statistics mean?
A variety of factors influence link creation,
mainly informal
About 90% of inter-site links have some
connection to research
Links an informal scholarly
communication soup, from which patterns
can be sieved out
The future

Results of research leading into:




improved Web-related policy making in the
EU
Improved Web information retrieval
algorithms
Improved understanding of informal
scholarly communication on the Web
It is easy to get some statistics, but very
hard to get meaningful statistics