Microsoft PowerPoint - NCRM EPrints Repository

Download Report

Transcript Microsoft PowerPoint - NCRM EPrints Repository

Virtual Knowledge Studio (VKS)
Information Studies
What is web link mining? ?
Mike Thelwall
Statistical Cybermetrics Research
Group
University of Wolverhampton, UK
1. Definition and scope
Link analysis is:


mapping and measuring hyperlink networks for collections of
web pages or sites
a flexible toolkit of methods and software rather than a field
or single technique
A new source of information about:


relationships between people, organisations and information
- via the web
the impact of information and ideas
Used in:

media studies, information science, politics, marketing,
sociology
Link Analysis: Motivation
Individual hyperlinks reflect concrete creation
reasons such as connections between web
page contents or creators
Counts of large numbers of hyperlinks may
reflect wider underlying social processes
Links may reflect phenomena that have
previously been difficult to study; e.g.,




informal scholarly communication
informal news discussions
friendship patterns
“amateur” politics
But link patterns vary by
context…
Commercial web sites tend not to link much
Academic and government web sites link
more
Disciplinary differences: e.g., History Web use
is very low, Chemistry is very high
Individual projects/resources can have an
enormous impact upon web sites

E.g. Arts web sites are often for specific
exhibitions or for digital media projects
Links often not frequent enough to reliably
reveal underlying patterns
A
Link Type Definitions
B
Inlink – a hyperlink to a web page from
anywhere
Site inlink – a hyperlink to a web page
from a different web site
Outlink – a hyperlink from a web page
to any other
Site outlink – a hyperlink from a web
page to a page in a different site
Indirect link types - colinks
Useful when direct links rare

Indirect connection
A
Co-inlinks

B and C co-inlinked
Co-outlinks

B
C
E
D
D and E co-outlinked
F
Lennart Björneborn’s terminology
What to count?
Links between individual pages
Links between entire web sites

Site A links to site B if any page in site A
links to any page in site B
A
B
2. Link Networks – Methods
Draw a network diagram




LexiURL Searcher, Issue Crawler, SocSciBot (web
networks)
Pajek, UCINET, NetMiner (generic networks)
About 10-50 sites/pages is recommended
Diagrams should reveal patterns in the data
Social Network Analysis statistics

E.g., density, degree centrality
Direct link networks
Start with list of web sites (or pages)
Build from many linkdomain:A site:B Yahoo
searches




Powerful and free way to scan the entire web for
links!
Returns pages in web site B that link to web site A
Can be automated with LexiURL Searcher
Or use SocSciBot to crawl web sites and get links
e.g., linkdomain:ox.ac.uk site:pku.edu.cn
Direct links
example
unconnected
universities
removed
arrows
represent
> 100 links
(with
Han Woo
Park)
Top ASEAN universities network
Co-inlink networks
Start with a list of web sites or pages
Build from many linkdomain:A linkdomain:B site:A -site:B Yahoo searches

can be automated in LexiURL Searcher
Suitable for commercial or competitive web sites that
do not interlink

normally better than direct link diagrams
A web environment (co-inlink) network for a single
web site



finds web sites that link to it
picks the top 50 web sites liked to by these web sites
draws a co-inlink diagram of these web sites
Indirect links
example
The web environment of
ZigZagMag
Another example –
no patterns
but interesting
3. Link Impact - Methods
Inlink counts often used as an
impact/visibility indicator

Impact = “The effect or impression of one
thing on another”, “to have an effect” *
Compare links to web sites to assess
which site/organisation has the most
online impact
*
http://www.thefreedictionary.com/impact, definition 3
Link Impact Reports
Standardised comparative analysis of
the link impact of web sites
Example audit:
http://cybermetrics.wlv.ac.uk/audit/101
/
Similar reports can be created for nonlink impact (citation impact)
http://cybermetrics.wlv.ac.uk/audit/boo
ks/
Total impact example
impact spread
example
4. Tools
E.g., …
5. Statistical analyses…
Links to UK
universities
against their
research
productivity
The reason for the
strong correlation is
the quantity of Web
publication, not its
quality
More statistical analyses…
Universities tend to link to neighbours
6. Content analysis
Content analysis of random sample of links
recommended to get context
Example of usefulness of content analysis
results:

90% of links between UK university sites relate to
scholarly activity
 But less than 1% are equivalent to citations

Link counts do not measure research but are a
natural by-product of scholarly activity
 Use link counts to track (an aspect of) communication
7. Summary
Link networks

To investigate relationship patterns within
collections of web sites
Link impact

Compare impact of web sites using inlinks
Methods


Toolkit of visual and statistical methods
Specialist software like LexiURL Searcher & Issue
Crawler
Use to investigate web phenomena or offline
phenomena reflected online in web sites
Books
Thelwall, M. (2009). Introduction to
webometrics: Quantitative web research for
the social sciences. New York: Morgan &
Claypool.
Rogers, R. (2005). Information politics on the
Web. Massachusetts: MIT Press.
Thelwall, M. (2004). Link analysis: An
information science approach. San Diego:
Academic Press.
http://lexiurl.wlv.ac.uk http://webometrics.wlv.ac.uk
http://www.issuecrawler.net