Finding information through Internet and the WWW

Download Report

Transcript Finding information through Internet and the WWW

1
Informatie vinden via Internet
Finding information through the Internet
in 2003
[email protected]
• Vrije Universiteit Brussel
• Information and Library Science, University of Antwerp
Belgium
Presented at the bi-annual conference
organised by VVBAD Informatie 2003
in Brussel, September 2003
2
These slides are available from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)
and from the conference organisers
3
4
PLAT du JOUR
Entrée
• Online access information
sources and services
»introduction
»types of information sources
5
PLAT du JOUR
Plat principal
• A systematic overview of information sources and services that
are accessible through the Internet, such as
»directories and text search engines,
»current awareness systems
»the “invisible WWW” to find:
books
journal articles
images/pictures
newsgroup messages
e-journals
…
6
PLAT du JOUR
Dessert
• Evaluating the quality of information sources
7
PLAT du JOUR
Pousse café
• Receptie op het einde van dit congres
;-)
8
Online access information
sources and services
Introduction
9
Growing importance of computer
network information resources
• Networked information resources are growing at a high
rate, not only in volume but also in importance. There are
many sources there which are vital to research and many
others which are useful generally.
• To keep abreast of their field, most academics and
researchers will find an increasing need to use the
network for fast and efficient communication and for
access to information. If they don’t, they are likely to be
left behind, because most of their colleagues in
institutions around the world will be doing just that.
10
Internet based information sources:
problems / difficulties (Part 1)
• Redundancy and overlap:
On the one hand, there is too much information on some
topics; in other words, the redundancy and overlap are high in
many cases.
Too few information sources:
On the other hand, there are too few information sources on
some topics.
11
Internet based information sources:
problems / difficulties (Part 2)
• No order is imposed on most sources.
Quality checks / quality controls are not performed.
Related to this: it is not required to register new information
offered.
Is the information that you find real, honest, authentic?
12
Internet based information sources:
problems / difficulties (Part 3)
• Change is the only constant:
Information sources are constantly changing, growing,
but sometimes disappearing.
13
Internet based information sources:
problems / difficulties (Part 4)
• Scattering:
There is no single simple but powerful system to find
relevant information through the Internet.
In other words:
integration / aggregation is still far from perfect.
14
Internet based information sources:
problems / difficulties (Part 5)
• Slow:
The Internet is in many places and for many applications not
yet fast enough.
15
Internet based information sources:
problems / difficulties (Part 6)
• In conclusion:
Surfing, using the
Internet, the WWW,
can be a time sink instead
of a productive activity.
16
Internet based information sources:
how many? how much information?
In 2001:
• More than 10 terabyte (= 10 000 gigabyte) of text data
In 2002:
• More than 2 000 million (= 2 billion) unique URLs in the
total Internet
17
Online access information
sources and services
Dictionaries and encyclopaedias
accessible through the WWW
18
Dictionaries and encyclopedias
through the WWW: introduction
• Dictionaries and encyclopedias are the first choice among
many types of information sources,
»when we do not need detailed information on a common
topic
»when we want to prepare a more detailed search on an
unfamiliar topic, by searching for the right spelling,
synonyms, context,…
• Some dictionaries and encyclopedias are available
through the WWW free of charge.
19
Example
Dictionaries accessible through
Internet and the WWW: example
• The American Heritage® Dictionary of the English
Language
»Over 200,000 entries,
70,000 audio word pronunciations,
900 full-page color illustrations
»Available free of charge from
http://education.yahoo.com/reference/dictionary/
Example
Dictionaries accessible through
Internet and the WWW: compilation
• A compilation/collection of dictionaries can be searched
simultaneously and free of charge:
http://www.onelook.com/
20
Example
Encyclopedias accessible through
Internet and the WWW: examples
• Encarta Concise Free Encyclopedia
»http://encarta.msn.com/
»Available in English and in some other languages
21
Example
Encyclopedias accessible through
Internet and the WWW: examples
• Encyclopædia Britannica
only a small part is available free of charge
+ links to selected WWW sites
»http://www.britannica.com/
• Encyclopædia Britannica Concise
»http://education.yahoo.com/reference/encyclopedia/
22
Example
Encyclopedias accessible through
Internet and the WWW: examples
• The Canadian Encyclopedia
(in English and in French):
»http://thecanadianencyclopedia.com/
23
Example
Encyclopedias accessible through
Internet and the WWW: overviews
• A list / overview of encyclopedia on the Internet:
http://www.internetoracle.com/encyclop.htm
• Other lists of encyclopedia on Internet
can be found as a part of more general directories of
Internet-based information sources.
24
25
Online access information
sources and services
Internet directories and indexes
26
Internet global subject directories:
introduction
• They are virtual libraries with open shelves, for browsing.
• They are manually generated, man-made by many
people.
• They can be browsed following a tree structure or a more
complicated variation.
• The most famous of these systems belong to the most
popular and most visited sites on the WWW: e.g. Yahoo!
27
Internet global subject directories:
structure
The structure corresponds to a classification that is in most
cases specific for the particular overview.
In other words: the well-known and classical universal
classification systems are not used in most Internet
directories.
28
Internet global subject directories:
pros and cons
• They cover a small number of selected WWW sites,
in comparison with the total number of sites that are
accessible.

+ The selected, included sites should be better than average.
- They are not suitable for deep, detailed, specific searches
with a high coverage.
29
Internet global subject directories:
why use one?
• They are suitable mainly for broad searches that can be
difficult to formulate in words,
but NOT for more specific searches that require
combinations of several concepts.
Example
Internet global subject directories:
Yahoo!
• A hypertext global subject directory can be found at
http://www.yahoo.com/
and at many other sites, including
http://www.yahoo.co.uk/
• Entries are NOT rated.
• Accessible free of charge.
30
Example
Internet global subject directories:
Yahoo! links in pediatrics
• Health > Medicine > Pediatrics:
• International Pediatric Chat - for professionals to share information and education
regarding children's health care.
• National Med/Peds Residents' Association - organization for residents, practioners and
medical students interested in combined internal medicine and pediatrics.
• Neonatology Network - information and communication platform for neonatologists
and pediatricians.
• Pediatria OnLine - qui si parla di bambini, fra pediatri e con le famiglie.
• Pediatric Critical Care
• Pediatric Database (PEDBASE) - containing descriptions of over 500 childhood
illnesses.
• Pediatric Endocrinology Conference - LWPES/ESPE joint meeting occuring July 6-10
2001.
• Pediatric Endoscopic Photos - illustrating intestinal problems in children.
31
Example
Internet global subject directories:
Yahoo! for pediatrics
• Health
> Medicine
> Pediatrics:
link to a digital library
(health sciences)
for young patients
32
Example
Internet global subject directories:
Yahoo! to pediatrics organisations
• Health
> Medicine
> Pediatrics
> Organizations:
link to the
American Academy
of Pediatrics
33
Example
Internet global subject directories:
Yahoo! links to pediatrics schools
• Health > Medicine > Pediatrics
>Schools, Departments, and Programs
• University of Rochester - partnership between pediatric residents and
community-based agencies that serve children and their families.
• Michigan State University@
• Royal College of Paediatrics and Child Health - responsible for training,
examinations, professional standards, and organisation of child health
services for the UK.
• Tohoku University
• University of Alabama at Biringham - programs and training opportunities
in pediatrics. Also contains faculy information and sub-speciatlty
descriptions.
• …
34
Example
Internet global subject directories:
searching with a query in Yahoo! (1)
• The directory of Yahoo! can not only be browsed, but can
also be searched with a query.
• However, in this way the hierarchical structure is not well
exploited.
• For the formulation of a search query, Yahoo! can provide
automatic assistance related to spelling and word
variations.
For instance:
After searching for “Capetown”, Yahoo! Answers:
Other Spellings: Try searching for cape town instead.
35
Example
Internet global subject directories:
searching with a query in Yahoo! (2)
• When such a query does not provide results, then Yahoo!
uses a much larger external Internet index to execute a
query based on textual search statements.
The chosen Internet index has varied over time.
• This mechanism is not made very clear and may confuse
the user.
36
Example
Internet global subject directories:
Google directory
• A hypertext global subject directory can be found at
http://directory.google.com/
• Accessible free of charge.
• Based on the Netscape DMOZ
Open Directory Project.
• Do not confuse this with the famous Google WWW search
engine.
37
Example
Internet global subject directories:
Open Directory Project
• A hypertext global subject directory can be found at
http://www.dmoz.org/
• The contents is also used by in the
Google Directory system.
• Accessible free of charge.
38
Example
Internet global subject directories:
Resource Discovery Network
• A collection of hypertext subject directories that focus on
academic information sources can be found at
http://www.rdn.ac.uk/
• Together these lead to more than 30 000 selected WWW
sites.
• Accessible free of charge.
39
40
Internet global subject directories:
evaluation criteria - desiderata (1)
• Usage free of charge?
• Wide coverage?
• Up to date? Frequent updates?
Only few dead / broken links?
• Good coverage of the sources in that part of the world in
which you are interested?
• Does the manager of the directory refuse to give priority
to sites that want to pay to get a prominent place in the
directory?
41
Internet global subject directories:
evaluation criteria - desiderata (2)
• Easy user interface?
• Short response times?
• Are mirror sites available closer to you for faster
response?
• Good presentation, description of each site?
• Is a rating, appreciation, review offered for each listed
site?
• Is translation of documents offered free of charge?
42
Internet global subject directories:
evaluation criteria - desiderata (3)
• Good documentation and online help?
• Good help desk available?
• High stability and reliability?
43
Internet global subject directories:
evaluation criteria - desiderata (4)
• Are other services offered from the same site or with the
same interface?
Is the subject directory integrated with other services?
Additional services can be
»an Internet index or a WWW index or a gateway to such an
index for searching with a query
»weather, travel guides, flight and hotel reservations,
maps,...
»WWW-based e-mail and e-mail address directories
»auctions through WWW
44
Internet subject directories:
non-global, more specific systems
a directory limited to
sources in/of a country or region
the
complete
WWW
a global
subject
directory
can lead to
a directory restricted to
a specific subject domain
(“portal”)
Examples
Internet subject directories focusing
on a specific subject domain (Part 1)
• Marine science and oceanography:
»http://oceanportal.org/ = http://ioc.unesco.org/oceanportal/
• Engineering, mathematics, computing:
»http://www.eevl.ac.uk/
»http://www.ub.lu.se/eel/
• Civil engineering:
»http://www.icivilengineer.com/
• Fishing:
»http://www.onefish.org/
45
Examples
Internet subject directories focusing
on a specific subject domain (Part 2)
•
•
•
•
•
•
•
•
•
Medicine and healthcare: general:
http://www.achoo.com/
http://www.medmatrix.org/
http://www.medscape.com/
http://www.omni.ac.uk
Medicine and healthcare: General pediatrics:
http://GeneralPediatrics.com
http://www.medscape.com/pediatricshome
http://www.pedinfo.com/
46
47
Internet indexes:
automated search tools
• Several systems allow to search for and to locate many
items (addressable resources) in the Internet in a more
systematic, direct way than by only browsing/navigating.
• These systems do NOT search the contents of computers
through the real Internet in real time and completely
when a user makes a query.
Searching in that way would be much too slow due to
limitations in the technology.
48
Internet indexes:
scheme of the mechanism
User searching for Internet based information
Internet client hardware and software
user interface to a search engine
Internet index search engine
Internet information source
Internet crawler and indexing system
database of Internet files, including an index
49
Internet indexes:
description of the mechanism
Each of these search systems is based on:
• a database of links to pages / URLs that can be retrieved
by searching with queries through a big index that is built
machine-made on the basis of the contents, the texts, of
these pages
(to build this database and to keep it up to date, pages are
continuously collected from the Internet by a “robot”
computer software system)
• a search system with a user interface in a WWW form, to
allow the user to search through that database
50
Internet indexes:
building their database
Internet documents fed into the database management system
Indexing
Records
derived from the input
and stored in the database
Retrieval
User
Inverted file,
full text index,
register
of the database
51
Internet indexes:
AltaVista
• The primary search interface can be found in the US.
The following addresses all lead to the same information:
»http://www.altavista.com/
»http://www.av.com/
»http://av.com/
• Mirror site in UK:
»http://uk.altavista.com/
»http://www.altavista.co.uk/
52
Internet indexes:
AltaVista: features
• Allows full text searching of the WWW
• Allows advanced Boolean searching
(in “Advanced” mode)
• Offers relevance ranking of search results
• Offers a link to an Internet subject directory (Looksmart)
• Offers links to systems to find
images, sounds,… (multimedia) in the Internet
53
Internet indexes:
All the Web
• The search interface can be found at:
http://www.alltheweb.com/
• You can search the WWW and ftp servers.
• The database is one of the biggest.
• Not only HTML and plain text files, but also the full text
of many Adobe PDF files is indexed.
54
Internet indexes:
Google (Part 1)
• http://www.google.com/
• Full-text searching is possible of many files that are
available through the WWW.
• Not only HTML and plain text pages are covered, but also
the first part is indexed of many files in other file formats,
such as
»Adobe PDF,
»Microsoft Word, Microsoft Excel, Microsoft PowerPoint
»Rich Text Format…
55
Internet indexes:
Google (Part 2)
• One of the most popular systems in 2001, 2002, 2003…
For retrieval an algorithm is used that takes into account
the links between WWW pages.
A retrieved page is ranked higher when
»many sites/pages point to it
»“important” sites/pages point to it
• Another famous search system Netscape Search is based
on Google (at least in 2003) http://search.netscape.com
56
Internet indexes:
Google refers to a thesaurus
• In Google, the words used in a search query are returned
to the user with hyperlinks to a dictionary and to a
thesaurus on the WWW, that can be used partly free of
charge.
• The thesaurus can of course show the user synonyms,
narrower terms, related terms for the word.
So this system can be used to expand a search query, so
that the query better covers the search concept.
57
Example
Internet indexes:
from Google into a thesaurus
58
Internet indexes:
Google limitations (Part 1)
• Google does NOT offer/allow
»an unlimited number of search terms in a search query
»manual or automatic truncation of words in a query
»manual or automatic stemming of words in a query
»full Boolean search formulations (OR, AND, brackets…)
like in
(sea OR ocean) AND (pollution OR contamination)
59
Internet indexes:
Google limitations (Part 2)
• Google does NOT offer/allow
»a proximity/nearby operator in the queries (such as NEAR)
»more or less automatic expansion of a search query to
include synonyms, narrower terms, translations…
»full-text searching of complete text in the case of very long
documents
»a relevance feedback mechanism
60
Internet indexes:
Google limitations (Part 3)
• Google does NOT offer/allow
»powerful searching to find WWW documents that link to
some document in a given WWW site (WWW site citation
searching), as truncation is not possible in a Google query;
only searching is possible to find documents that link to a
particular WWW document;
in other words, the URL of the WWW document as written
in the query must be perfect and cannot be truncated
(AltaVista is superior in this application,
because it allows truncations in the search queries)
61
Internet indexes:
Google limitations (Part 4)
• Google does NOT offer/allow
»automatic classification/clustering/categorization of
retrieved WWW pages, to cope with the problem of the
natural ambiguity of meaning of the terms that were used
in the search query
»any evaluation of documents retrieved and offered as
results
62
Internet indexes:
Google limitations (Part 5)
• Google does NOT offer/allow
»fact extraction from the information sources, in an attempt
to answer the query more directly than by offering only
links to documents
»a current awareness service (by email for instance)
63
Internet indexes:
Google additional features
• Besides a system to search for WWW pages,
Google offers also
»a subject directory
»searching for images/pictures on the WWW
»searching an archive of Usenet messages +
posting to Usenet groups
»searching for news
• Thus Google has become a great integrator / aggregator.
64
Internet indexes:
MSN Web Search
• Offered free of charge by Microsoft.
• You can search for WWW content.
• Since 1998.
• Famous system, because the search interface can be found
with the search functions that have been built into one of
the most widespread Internet browser, Microsoft Internet
Explorer, and because it is offered by
http://search.msn.com/
65
Internet indexes:
MSN Web Search
• Is based on an Internet index created by another
company.
But in 2003, Microsoft has started building its own
WWW crawler.
66
Internet indexes:
Scirus
• Allows you to search for manually selected scientific
information (only) on the WWW.
This includes
»the peer-reviewed articles in the journals that are published
in ScienceDirect by Elsevier, that can be downloaded in
full-text format only when a fee has been paid to the
publisher
»scientific open archives files, that contain scientific
research articles that can be downloaded free of charge.
• The search interface: http://www.scirus.com
67
Internet indexes:
Scirus features
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system that is
also used by Alltheweb.
• Offers access to information ordered according to some
classification system / taxonomy.
• Offers not only access to files in html format, but also to
files in PDF, PostScript and other formats.
68
Internet indexes:
Teoma
• Allows you to search for information on the WWW.
• Offers a feature that is not offered by most other search
systems:
categorization = classification = refinement
= categorization = clustering of search results,
to help the user coping with the problem of ambiguity of
meaning of the search query that was made
• The search interface: http://www.teoma.com/
69
Internet indexes:
Teoma example
Example of coping with ambiguity: searching for pascal
gives results related to the philosopher and to the
computer programming language:
70
Internet indexes:
coverage / size of each index
The indexes grow and their “size ranking” is variable.
Biggest systems in 2002:
•
Google !
•
AltaVista
•
(Fast =) All the Web (serving also Lycos)
•
Systems based on the INKTOMI database of WWW
pages
71
Internet indexes:
specialised systems
• More specialised search engines / systems can yield better
result sets:
»higher recall
»higher precision
• Specialised Internet indexes / search engines can be found
for instance in the directory
http:
//directory.google.com /Top /Computers /Internet
/Searching /Search_Engines /Specialized/
72
Internet indexes:
non-global, regional systems
the complete WWW
covered by
a global / international Internet index
covered by
an index limited to
sources in/of a country or region
73
Internet indexes:
subject-specific systems
the complete WWW
covered by
a global / international Internet index
covered by
an Internet index limited to
sources related to a specific subject
74
Internet indexes:
comparison with library catalogues
• Most Internet indexes have a larger database than most
catalogues.
• Internet index databases do not correspond as well to the
Internet as a normal, good catalogue corresponds to the
collection, because the documents on the Internet change
more often and their number is growing fast.
• Most Internet indexes contain all the words of the
documents that they index, whereas catalogues only
contain short descriptions of the documents.
75
Internet indexes:
variations among various systems
• Besides their common aims and characteristics, we can
nevertheless see differences, variations among the
searchable Internet index systems.
• To illustrate these variations and to assist Internet users
to make a decision on which search system to use, the
following list of some features and evaluation criteria can
be useful.
76
Internet indexes:
evaluation criteria - desiderata (1)
• Is usage free of charge?
• How complete is the coverage?
• Is the coverage good (or poor) for a particular geographic
region?
• Is the coverage good (or poor) for a particular type of
documents?
• Is the searchable database up to date? Is the database
updated frequently? Do the search results contain only
few dead (broken) links?
77
Internet indexes:
evaluation criteria - desiderata (2)
• Is spamming filtered out, to give other pages a better
chance of turning up in the result set?
Can the system cluster presumed duplicate documents in
the results?
Or does the system simply eliminate presumed duplicate
documents from its database?
78
Internet indexes:
evaluation criteria - desiderata (3)
• Does the database system work with full text indexing of
each document that has a place in the database, so that
full text searching is possible?
Is the complete text indexed and searchable, even for very
long documents?
79
Internet indexes:
evaluation criteria - desiderata (4)
• Are the contents of meta-fields also indexed to make them
searchable?
80
Internet indexes:
evaluation criteria - desiderata (5)
• Does the system index also the text in files on the web that
consist of non-ASCII codes to make these also searchable
and retrievable?
For instance files in the format of the various versions of
»Microsoft Word (DOC),
Microsoft PowerPoint (PPT, PPS),
Microsoft Excel
»Adobe Acrobat Portable Document Format (PDF)
81
Internet indexes:
evaluation criteria - desiderata (6)
• Field indexing, so that searching limited to the contents of
a particular field is possible?
for instance:
HTML title,
HTML keywords,
URL,
date,
link,
Java applet,
text,
image file,
sound file,
video file...
82
Internet indexes:
evaluation criteria - desiderata (7)
• Does the system offer powerful search options like
»searching for terms composed of several words, in queries
like “word1 word2” with the words enclosed in double
quote characters
»truncation of words in a query?
»Boolean search combinations?
»an unlimited number of search terms in a query?
»proximity/nearby/adjacency searching, with operators like
“word1 NEAR word2” or “word1 ADJ word2”
83
Internet indexes:
evaluation criteria - desiderata (8)
»spelling check of search terms in the query, and suggesting
spelling variations?
»automatic expansion of the search terms in the initial
user’s query, to achieve a higher recall,
for instance by
—automatic stemming of words in a query
—including synonyms
—including narrower terms
—including translations into several other languages
84
Internet indexes:
evaluation criteria - desiderata (9)
• Can the results be limited to a certain time period?
For instance based on the date
»of the file as noted by the server computer, or
»of the most recent indexing of the file
• Is the user interface easy to understand and efficient to
use?
• Is a user interface offered in your own language?
• Does the system rank the items in the result set according
to their presumed relevance?
85
Internet indexes:
evaluation criteria - desiderata (10)
• Possibility to combine Boolean retrieval with relevance
ranking of results?
• Can the results be ordered according to date
»of the file as noted by the server computer, or
»of the most recent indexing of the file
• Can the results be ordered according to size?
86
Internet indexes:
evaluation criteria - desiderata (11)
• Can the system rank the results (documents) on the basis
of the number of WWW hyperlinks to that document?
• The system does NOT place/rank some results
(documents) higher in the results list, on the basis of
payments by the producer of those documents to the
search system company.
• Are advertisements / sponsored links / sponsored results
clearly distinguished from normal (not sponsored) search
results?
• Good and detailed summary of each result available?
87
Internet indexes:
evaluation criteria - desiderata (12)
• Short response times?
• Are mirror sites available closer to you for faster
response?
• Does the system offer a good presentation format of each
result (document/page/item)?
For instance: are search terms indicated / highlighted in
the results?
• Any evaluation offered (automatic?) of the quality of each
result, besides ranking in an order related to probable
relevance and importance of the results
88
Internet indexes:
evaluation criteria - desiderata (13)
• Can all the results (documents) from the same site be
grouped together (clustered)?
• Are results (retrieved documents)
grouped / classified / categorized / clustered
by the search system, on the basis of the subjects of the
documents and are these presented as groups / clusters /
classes / categories to the user of the search system, to
assist the user in coping with the problems that can be
caused for instance by multiple meanings of words used
in a search query.
89
Internet indexes:
evaluation criteria - desiderata (14)
• Is translation of documents offered free of charge?
• Is any fact extraction from the information sources
offered, in an attempt to answer the query more directly
than by offering only links to documents?
• High stability and reliability?
• No large variations/fluctuations in the results from
identical searches at different times.
90
Internet indexes:
evaluation criteria - desiderata (15)
• Term suggestion:
Does the system analyse the search results of the first
query, to find frequently occurring terms and to suggest
these to the user as new and potentially interesting
additional query terms?
91
Internet indexes:
evaluation criteria - desiderata (16)
• Relevance feedback:
Can the user indicate among the search results of a first
query
“good, relevant” results
and
“bad, irrelevant” results,
so that the system can use this information to offer better
results in a second query?
92
Internet indexes:
evaluation criteria - desiderata (17)
• Relevance feedback 2: even better:
Can the user indicate among the search results of a first
query
+ “good, relevant” results,
- as well as “bad, irrelevant” results,
so that the system can use this information to suggest
+ additional, new interesting query terms that can be
included in a second query,
- as well as query terms that should be excluded in a
second query?
93
Internet indexes:
evaluation criteria - desiderata (18)
• Good documentation and online help?
• Good help desk available?
• Can the search system provide updated results through
electronic mail, as a current awareness tool?
94
Internet indexes:
evaluation criteria - desiderata (19)
• Is the search/query also submitted to another database to
obtain more results?
for instance:
to a book database to obtain book descriptions
besides WWW documents
95
Internet indexes:
evaluation criteria - desiderata (20)
• Other services available besides the normal WWW index:
»index to news resources, that is more frequently updated?!
»Internet subject directory?!
»anonymous ftp file index?
»gopher index?
»searchable Usenet newsgroups archive?
»White pages = people finder = addresses = ...
»WWW-based e-mail and e-mail address directories
»auctions through WWW
96
Coverage of Internet directories and
Internet indexes
Internet information sources
A global Internet directory
A global Internet index
97
Global Internet search tools:
a comparison
Global Internet
directories
Global Internet
indexes
Multi-threaded
search systems
• Only a limited
selection of Internet
sources
• About 1/3 of the
• These get information
Internet is covered by
from directories
an index
and indexes
• Browsing
information sources
is easy
• Searching requires
some skills and
knowledge
• Searching requires
some skills and
knowledge
• Good for broad
searches
• Good for specific,
narrow searches
• Good when even 1
index does not yield
information
98
Internet indexes cover only a part of
the Internet: introduction (1)
The “visible” part of Internet
The “hidden, invisible” part of Internet and the WWW,
(that is not searchable using a global index
like Alltheweb, AltaVista, Google...)
99
Internet indexes cover only a part of
the Internet: introduction (2)
Why can Internet indexes find only a part of what is in fact
available through the Internet?
1. Quantitative technical limitations:
Each Internet search system has indexed only a part of the
static WWW pages that are available for indexing.
2. Qualitative technical limitations:
Besides the static WWW pages that Internet search engines
try to cover, many other, quite different sources exist, that
are also available through the Internet, but that are not
incorporated in those search engines.
100
Internet indexes cover only a part of
the Internet: scheme
telnet
ftp
...
Internet
WWW
CGI, ASP,...
Static indexable texts in the WWW
( = on HTTP server computers)
covered partly by Internet indexes
Rapidly changing information,
such as news
Databases
and
file archives
accessible through
the Internet
Word
files
Information accessible only
when passwords are used
PDF
files
Example
Database accessible over the Internet:
a famous example: Medline/PubMed
101
102
Internet indexes cover only a part of
the Internet: conclusion for users
When you want to retrieve information about a particular
subject from the Internet, use not only WWW indexes,
but use also other sources accessible through the Internet
»databases!
(book and journal bibliographies, library catalogues,
archives of group messages, directories, atlases,…)
»rapidly changing information, such as news
»information accessible only when passwords are used
»anonymous ftp file archives
»e-mail based interest groups; Usenet newsgroups
103
Gateways to Internet databases
accessible free of charge
• Most Internet search engines search classical, static
WWW pages and not databases accessible through the
WWW.
• However, some systems offer a gateway to search
databases on the Internet. Examples:
»http://invisibleweb.com/
»http://www.completeplanet.com/
»http://www.invisible-web.net/
(See also other more general directories/overviews/lists of
Internet information sources.)
Example
Gateways to Internet databases
accessible free of charge: invisibleweb
104
105
Hybrid systems to find information on
the Internet
• Some systems require a search in words from the
searcher, but they do not rely on classical Internet
indexes.
• Example:
Ask Jeeves
Example
Hybrid systems to find information on
the Internet: Ask Jeeves
• Ask Jeeves tries to “answer questions” of searchers,
by analysing the natural language queries and
by referring to selected sources on the Internet.
• http://www.askjeeves.com/
• http://www.ask.com/
• http://www.aj.com/
106
107
Multi-threaded
Internet search systems: scheme 1
Client
computer
+
WWW
client program
WWW
server
computer
Internet
WWW
WWW
server
computers
with Internet
search
systems
User
In
Out
108
Multi-threaded
Internet search systems: vocabulary
• “multi-threaded Internet search systems”
• “multiple search systems”
• “multi-search systems”
• “meta-search systems”
• “intelligent Internet search agents”
• “Internet meta-search tools”
• ...
109
Multi-threaded
Internet search systems: relations
User
an Internet meta-search system
Internet search system 1
Internet search system
collected database 1
Internet search system 2
Internet search system
collected database 2
WWW pages
110
Examples
Multi-threaded Internet search
systems: server-based systems
•
•
•
•
•
•
•
•
•
•
http://www.all4one.com
http://www.bytesearch.com
http://www.cyber411.com
http://www.dogpile.com
http://www.go2net.com = http://www.metacrawler.com
http://www.kartoo.com
http://www.mamma.com
http://www.profusion.com
http://www.search.com
http://www.vivisimo.com
111
Examples
Multi-threaded Internet search
systems: server-based systems
• An overview of meta-search systems that are based on a server in
the Internet is avialable via
http:
//directory.google.com
/Top
/Computers
/Internet
/Searching
/Metasearch/
112
Example
Multi-threaded Internet search
systems: server-based: example
113
Example
Multi-threaded Internet search
systems: server-based: example
114
Example
Multi-threaded Internet search
systems: server-based: example
• Vivisimo adds value by analysing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.
• Vivisimo can accomplish this on the fly, that is WITHOUT
pre-processing the documents before the search.
115
Example
Multi-threaded Internet search
systems: server-based: example
• In the test search for a family name, Vivisimo succeeded in
clustering documents related to different persons with the
same family name.
For comparison:
the clustering search engine Teoma did not accomplish
this.
116
Example
Multi-threaded Internet search
systems: server-based: example
117
Example
Multi-threaded Internet search
systems: server-based: example
• Kartoo offers an advanced graphical user interface.
• Before you can exploit the system, reading the manual is
recommended.
118
Multi-threaded Internet search
systems: advantages (Part 1)
+ Saves time when otherwise more than only 1 Internet
index would have to be used one after the other;
for instance when searching for specific information that
is hard to find in any single Internet index.
+ Some meta-search systems provide a useful integration of
the results they get from the various primary search
systems, with a removal of repeated results.
J
119
Multi-threaded Internet search
systems: advantages (Part 2)
+ Some server-based and client-based meta-search systems
show links among retrieved pages.
+ Some client-based meta-search systems allow storage on
the client computer of a search query for later, repeated
usage/application;
application of such a system even allows excluding
resulting documents that were already retrieved in an
earlier search.
J
120
Multi-threaded Internet search
systems: advantages (Part 3)
+ Can add value, for instance by analysing the results / hits
so that they can be
clustered / grouped / categorized / classified,
to make further selections by the user / searcher easier
and faster.
Example: http://www.vivisimo.com
J
121
Multi-threaded Internet search
systems: disadvantages (Part 1)
- It is not always clear through which Internet indexes the
meta-search system will search.
- Not all meta-search systems can search all the major
primary search systems; for instance Google is normally
NOT included.
- The systems are often slower than a direct, primary
search system.
- Only a limited number of the results that can be obtained
from the various Internet indexes are shown.
122
Multi-threaded Internet search
systems: disadvantages (Part 2)
- Some specific or advanced features of the individual
search systems cannot be used through all the metasearch systems, such as:
»Boolean searching,
»proximity searching,
»field searching,
»categorization / clustering of search results,
»...
123
Internet:
who owns the search tools?
In 2003:
• The company Yahoo! owns
»the most famous global Internet subject directory
»3 (!) Internet full-text search engines:
All the Web, AltaVista, Inktomi
• The company Google owns
»the most famous Internet full-text search engine
»one of the best Internet image search engines
»a gateway to old and new Usenet news messages
124
Current awareness services focusing
on WWW pages: introduction
• Tracking changes in one or more public access pages on
the WWW or
finding new pages, is possible in an automated way,
»by using one of the available, suitable, programs loaded on
your client workstation!
example: the advanced version of Copernic that is not
available free of charge
»through “alert” services based on a server on the WWW
—that track updates for the user/subscriber
—and send alerts by email to the user/subscriber
125
Current awareness services focusing
on WWW pages: modified versus new
• Several systems exist that can track changes /
modifications / updates in a particular existing WWW
page for you, even free of charge.
• Some systems can find new pages on the WWW for you.
126
Current awareness services focusing
on WWW pages: ProFusion
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on a few external Internet indexes;
unfortunately, Google is NOT included.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2003.
• Available via http://www.profusion.com/
• The user interface to set up saved searches is not very
clear.
127
Current awareness services focusing
on WWW pages: Google Alert
128
Current awareness services focusing
on WWW pages: Google Alert
• Can discover relevant changed or new WWW pages for
you in the future.
• Is based on the external Internet index Google.
• Works with search queries given by you that are stored on
their server computer.
• Free of charge, at least up to 2003.
• http://www.googlealert.com/
129
Online access information
sources and services
Public access book databases
130
Public access book databases:
introduction
• Even in this age of Internet-based information sources, a
lot of information is still distributed in the form of printed
books.
• The contents of most books is (still) not available on the
Internet.
• Most general Internet search tools do NOT allow you
to find out about the existence of books that may be
interesting for you.
• So, specific search tools to find books can be useful.
131
Public access book databases:
an overview
• (Databases by publishers.)
• Fee-based databases by commercial providers
• Databases by book distributors / bookshops!
• Online public access catalogues of
»local libraries,
»national libraries (which produce and offer normally
their national bibliography)!
»big, famous libraries!!
• (Databases of computer-based versions of books.)
132
Public access book databases:
which one to use?
• For years, the market of bibliographic information
on books was limited to the services and databases of
subscription-based bibliographic providers.
• Nowadays, the WWW provides a key to unlock many
possibilities to find bibliographic information.
• Which book database should be preferred for
particular applications is not clear for most
librarians or end-users.
133
Suitable book databases?
AIM
RECOMMENDED SYSTEMS
To find book titles
about a specific subject / topic
?
To search for book titles published
before 1990
?
Book title search
in general
?
To find the price
of a book
?
To be informed regularly
about new books
?
134
Public access book databases
by commercial producers
• To find currently available books, some databases
assembled by commercial producers can be
interesting.
• Example: Global Books in Print
• These databases offer formal descriptions of books,
prices of the books, short descriptions of the contents
with subject terms…
• However, access to such a database is not free of
charge and can be expensive
(in comparison with alternatives).
135
Public access book databases
provided by bookshops
• To find currently available books, the bibliographic
databases assembled by big bookshops are interesting.
• Several offer a good coverage and
are accessible free of charge.
• The added price information can be useful for the
acquisition and accounting department of a library or if
an individual user wants to buy a book.
• Some provide a current awareness service,
also free of charge.
Examples
Book databases accessible free of
charge: examples in U.S.A.
• Amazon.com (US):
http://www.amazon.com/
http://www.amazon.co.uk/
note: amazon, NOT amazone
Subject description is poor.
• Barnes and Noble (US):
http://www.bn.com/
136
Examples
Book databases accessible free of
charge: examples in Europe
• Blackwell’s on the Internet
(International, academic books):
http://www.blackwell.co.uk/
• VLB for books in German
http://www.buchhandel.de/
• For books in French
http://www.chapitre.com
• Boeknet - De Nederlandse Internet Boekhandel (Dutch)
http://www.boeknet.nl/
137
Examples
Book databases accessible free of
charge: for old books
To find
used,
secondhand,
rare,
hard-to-find, and
out-of-print
books around the world:
abebooks
http://www.abebooks.com/
138
139
Free public access bibliographic book
database + price comparisons
• Even comparisons of the catalogues of shops of books
(as well as of music, movies and many other goods)
are available free of charge.
• See for instance
»http://www.bookfinder.com/
»http://www.dealtime.com/
Examples
Example of an international
public access dissertation database
• The dissertation database of UMI is available from:
http://wwwlib.umi.com/dissertations/
• The most current two years are available without charge.
140
141
Online Public Access Catalogues of
libraries
• Mainly to find older books, the catalogues of libraries can
be useful.
• Most are accessible online and free of charge.
142
Online Public Access Catalogues
of the big famous libraries
• For instance:
Library of Congress (USA)
• Their coverage is good.
• They offer the best subject descriptions.
• Access is free of charge.
• So they form excellent sources to find books about a
particular subject/topic.
143
Example
Online Public Access Catalogues:
The Library of Congress, U.S.A.
• >15 million books
+
>10 million other documents
• Located in Washington DC, U.S.A.
• Accessible online via WWW
• Access free of charge
• Offers good subject descriptions with the famous
Library of Congress Subject Headings (LCSH)
144
Example
Online Public Access Catalogues:
The British Library
• Accessible online via WWW:
Since 2000: http://blpc.bl.uk/
• Access free of charge
145
Example
Online Public Access Catalogues:
The British Library: screenshot
146
Online Public Access Catalogues:
catalogues of national libraries
• National libraries are first of all an outstanding source for
the local publications.
• The national libraries are the most reliable source for
bibliographic searching and verification.
147
Recommended book databases
AIM
RECOMMENDED SYSTEMS
To find book titles about a
specific subject / topic
Library of Congress, British Library,
(Amazon)
To search for book titles
published before 1990
National libraries, Barnes&Noble,
Infoball, Alapage, Abebooks
Book title search
in general
Library of Congress, British Library,
Infoball
To find the price
of a book
Global Books in Print, Infoball,
online bookshops
To be informed regularly about
new books
Amazon, Alapage, Bol
148
Public access book databases:
evaluation criteria - desiderata (1)
• Is usage free of charge?
• Wide coverage?
Specialized coverage of books
»in your preferred language?
»on particular subjects / topics?
»published in a specific country?
»published in a particular time period?
»of particular types (such as conference proceedings)?
• Up to date? Frequent updates?
149
Public access book databases:
evaluation criteria - desiderata (2)
• Does the database offer besides each formal book
descriptions also
»an abstract / summary / description of the contents?
»a table of contents?
»reviews by readers?
»the price?
»information about the publisher?
»titles of related books?
150
Public access book databases:
evaluation criteria - desiderata (3)
• Full text indexing of each item in the database,
so that full text searching is possible?
• Field indexing, so that searching for the contents of a
particular field is possible? for instance
»the title
»the date of publication
»the author
»the publisher
»the language
151
Public access book databases:
evaluation criteria - desiderata (4)
• Does the database producer improve retrieval by
»adding subject terms, or
»by classifying the books in categories
152
Public access book databases:
evaluation criteria - desiderata (5)
• Powerful search options:
»truncation?
»stemming?
»Boolean search combinations? combined field searching?
»proximity searching?
»spelling check of your search terms?
»translation of your search terms in several other
languages?
153
Public access book databases:
evaluation criteria - desiderata (6)
• Can the user browse through subject categories that are
used in the book database?
• Is a user interface offered in your own language?
• Easy user interface?
• Relevance ranking of results?
• Possibility to combine Boolean retrieval with relevance
ranking of results?
• Can results be limited to a certain time period?
• Short response times?
154
Public access book databases:
evaluation criteria - desiderata (7)
• Can the results be ordered according to
date, size, origin...?
• Good presentation of each result?
For instance: Are search terms highlighted?
• Can results be downloaded well structured with field
tags?
(For instance to allow incorporation of the data in
another database.)
155
Public access book databases:
evaluation criteria - desiderata (8)
• Does the system offer a current awareness service,
sending information on new titles that may be of interest
to you?
156
Public access book databases:
evaluation criteria - desiderata (9)
• Are other services offered from the same site or with the
same interface?
Is the system integrated with other services?
Additional services can be
»searchable databases of videos, of music CD’s, CD-ROMs,
DVDs, all for sale also
»WWW-based e-mail and e-mail address directories
»auctions through WWW
157
Online access information
sources and services
Fee-based online public access
information services
158
Types of online access information
systems: “free” versus “fee”
• A lot of the information on the Internet is available free of
charge, but another part is only accessible when a fee is
paid to the producer and / or the distributor.
• Some organisations pay these fees for some sources and
then organise access, so that the members of the
organisation can retrieve and exploit the information as if
it is free of charge.
• The first commercial computer systems that make
information available online were born around 1975.
• Most of them are now also available through the Internet.
159
Examples
Fee-based online access services:
examples (Part 1)
Name
Location of the computer(s)
America On Line
OCLC
Ovid Technologies
CompuServe
Cambridge
Data-Star
Dialog
EBSCO
U.S.A.
U.S.A.
U.S.A.
U.S.A.
U.S.A., Taiwan, UK
Switzerland
U.S.A.
U.S.A.
160
Examples
Fee-based online access services:
examples (Part 2)
Name
Location of the computer(s)
Elsevier ScienceDirect
Factiva
ISI (Web of Science, JCR,…)
LexisNexis
MSN (Microsoft)
Prodigy
Silver Platter
STN
Swets-Blackwell (e-journals)
...
U.S.A.
U.S.A.
U.S.A.
U.S.A., The Netherlands,...
Germany - U.S.A. - Japan
The Netherlands
...
161
Online information services:
various names for similar systems
• (fee-based) online (access) information service
• (fee-based) online (access) computer service
• databank
• database vendor
• host computer
• aggregator
• ...
162
Online information services:
access methods
• Using generic, common communications software
»through the telephone network (telephone + modem)
»through X-25 data communication networks
»through Internet, using client-server systems:
—telnet
—WAIS or Z39.50
—http (WWW)! (Examples: http://www.dialogweb.com;
http://www.datastarweb.com)
• (Using client software dedicated to the particular service)
163
Online information services:
total size of their databases
In 1999:
The big host systems and the public access WWW pages
offer a comparable quantity of information:
• WWW offered about 8 terabytes (= 8 000 gigabytes) of
text data
(according to Lawrence and Lee Giles, Nature, 1999, Vol. 400, pp. 107-109.)
• Dialog offered about 9 terabytes (= 9 000 gigabytes)
(in 1998)
»6 billion pages of text
»3 million images
164
Database hosts / distributors:
evaluation criteria - desiderata (1)
• Contract not required?
• A priori payment not required?
• Satisfactory stability / history / evolution / future of host?
• Low costs of data communication?
• Many databases available?
• Whole records available (or only parts)?
• Frequent updates?
• Whole database available? As one file or fragmented?
165
Database hosts / distributors:
evaluation criteria - desiderata (2)
• Low price of access? Low price of information?
• Good searching facilities?
(cfr. desiderata for Internet indexes)
• Can the indexes of more than one database be searched
simultaneously?
166
Database hosts / distributors:
evaluation criteria - desiderata (3)
• Online indication of costs?
• Practice free of charge?
• Good manuals, documentation and online help?
• Training courses available? Quality?
• Good help desk available?
• Gateway service offered?
• ...
167
Databases of
online public access databases
• Example
»Gale directory of databases !
• Their coverage:
»online access databases
»(databases accessible on CD-ROM)
»...
168
Databases of databases:
Gale
• Produced in U.S.A.
• Not free of charge
• Available in various formats:
»printed
»on CD-ROM
»online via the host systems Data-Star, Dialog,
with a payment required for each use
»online through the Internet through various hosts,
for a fixed price per year to be paid in advance
169
Online access information
sources and services
Online access databases about journal articles
170
Online access databases
about journal articles: overview
• Thousands of fee-based online access databases offer
bibliographies or full-texts of journal articles in
particular subject domains and published by many
publishers.
• Many publishers offer searchable bibliographies, but only
of their own publications. (for instance Emerald, Elsevier)
• Only few large databases offer access to bibliographies of
articles published in journals from many publishers, free
of charge.
Example
Online access databases
about journal articles: Ingenta (1)
• Ingenta Journals allows you to search a bibliographic
database of millions of journal articles,
including titles, authors, in many cases abstracts.
• Searching is free of charge.
171
Example
Online access databases
about journal articles: Ingenta (2)
• Payment is required to receive the full text of an article.
• Ingenta has acquired Uncover in 2000.
• Available from
»http://www.ingenta.co.uk/
»http://www.ingenta.com/
172
Example
Online access databases
about journal articles: Article@INIST
• Article@INIST allows you to search in a bibliographic
database, NOT full-text, (Journal articles, journal issues,
books, reports, conferences, doctoral dissertations)
at the Institut de l'Information Scientifique et Technique,
France.
• Does not offer usage of classification or thesaurus.
• Searching is free of charge.
• Available from http://form.inist.fr/public/eng/conslt.htm
• Payment is required to receive the full text of an article.
173
174
Example
Online access databases
about journal articles: Infotrieve
• Infotrieve allows you to search free of charge in a
bibliographic database of the articles of more than 20 000
journal titles and conference proceedings, NOT full-text.
• Available from http://www3.infotrieve.com/
• Payment is required to receive the full text of a document.
• Current awareness services are also offered free of
charge:
the table of contents of new issues of the journals that you
have selected are sent to you by email.
175
Example
Online access databases
about journal articles: Scirus
• This is a specialised Internet index that allows you to
search for selected scientific information (only) on the
WWW.
This includes the peer-reviewed articles in the journals
that are published in ScienceDirect by Elsevier.
• An article can be downloaded in full-text format only
when a fee has been paid to the publisher
• The search interface: http://www.scirus.com
Example
Online access databases
about journal articles: Scirus features
• Offered free of charge by Elsevier.
• Is partly based on the Fast WWW search system that is
also used by Alltheweb.
• Offers access to information ordered according to some
classification system / taxonomy.
176
177
Online access databases:
Web of Knowledge
• The Web of Science or more recently
the Web of Knowledge offers access through the WWW to
a database of bibliographic descriptions of scientific
journal articles in all subject domains.
• This database is (only) available to members of
organisations / institutes / companies / consortia
that pay a yearly fee to the producer/publisher of the
database.
• This database is not only suitable for subject searching,
but also for citation searching.
178
Online access information
sources and services
Electronic newsletters and journals
179
Electronic newsletters and journals:
introduction
Since the end of the 1990s, electronic journals have become
a new communication medium that cannot be neglected.
Author / Sender
Editor
Reader / Receiver
180
Electronic newsletters and journals:
various types and the price of access
• We can distinguish various types:
»equivalents of a version printed on paper
—published almost simultaneously
—print version published long time before electronic
version
= deliberate long delay for the electronic version
»purely electronic publications
• Price of access: from free of charge to very expensive
181
Electronic newsletters and journals:
access and distribution methods
Many different methods are
used:
»anonymous ftp
»gopher
»WAIS / Z39.50
»electronic mail, listserv,...
»Usenet News
»loaded on local systems
in universities or
institutes
»http, WWW !
»Open Archives
Harvesting Protocol +
http, WWW
182
Electronic newsletters and journals
through the WWW
• The WWW has become the most important platform for
access to electronic newsletters and journals.
Example
Electronic newsletters and journals:
example
183
184
Electronic newsletters and journals:
problems and challenges
• There is no central database
with all article titles, summaries, and full contents.
There is not even a central, complete and up to date
directory of journal titles.
• There is no standard licensing/pricing method.
• Not all electronic journals are accessible through 1 user
interface.
• Many passwords must be used.
• Archiving (By whom? Forever?)
185
Find out
how you can efficiently access electronic journals
from your institute.
186
Directory of
Open Access Journals
• The Directory of Open Access Journals
is a directory of electronic journals that can be accessed
free of charge.
• Available since May 2003.
• http://www.doaj.org/
187
Directory of Open Access Journals:
screenshot
188
Online access information
sources and services
Finding multimedia files on the Internet
189
Finding multimedia files on the
Internet: introduction
Several public access search systems are available
free of charge to search the Internet for multimedia files:
»images / pictures (either artwork, either photos, or both)
»sound / audio files (music, speeches,...);
video
190
Finding images on the Internet:
introduction
• Several public access search systems are available free of
charge to search for
images / pictures (either artwork, either photos, or both)
on the Internet.
• When searching for images, the search results from such
a system offer not only links to the image files on the
Internet, but also directly small versions of the images
(so-called “thumbnails”).
191
Examples
Finding images on the Internet:
examples of search engines (1)
• http://alltheweb.com/ !!
• http://gallery.yahoo.com/ !
• http://images.google.com/ !!!
or through http://www.google.com/
The largest database in this category (at least in 2002,
2003). For each result, not only a thumbnail is offered,
but also directly the readable URL;
this makes it easy to guess the relevance of the document.
192
Examples
Finding images on the Internet:
examples of search engines (2)
• http://multimedia.lycos.com/
• http://www.altavista.com/ !!
(also audio and video, choose not the normal text search,
but IMAGES in the user interface.)
193
Examples
Finding images on the Internet:
examples of search engines (3)
• http://www.ask.com/ or
http://www.aj.com/ or
http://aj.com/
Ask Jeeves.
Offers no indication of the number of images retrieved,
which is a disadvantage when many pictures are found,
but only a few can be seen at the time.
• http://www.ditto.com/ !
Examples
Finding images on the Internet:
screen shot of a Google image search
194
195
Examples
Finding images on the Internet:
directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top /Computers
/Internet /Searching
/Search_Engines
/Specialized
/Images/
Examples
Finding sounds and video on the
Internet: directories of search engines
A collection of links to suitable Internet search engines:
http: //directory.google.com
/Top
/Computers
/Internet /Searching
/Search_Engines
/Specialized
/Multimedia/
196
197
Online access information:
future trends
• An increasing amount of information becomes available
online.
• A growing amount of this online information becomes
available free of charge.
• The quality of server and client software is growing.
A consequence is:
• An increasing number of end-users searching for
information online.
198
Online access information:
conclusion
• In the case of simple information needs, the WWW and the
search tools can work like “magic”.
• However, in the case of more complicated information
needs, there is still is no “magic button” that brings you
immediately to all the required information.
199
Evaluating the quality
of information
Documentary information sources:
evaluating their quality
200
Documentary information sources:
evaluating their quality
• We should always be critical when using information
sources, in view of
»the widely varying degrees of quality of information
sources, and of
»the costs associated with searching, finding, using
information.
201
Documentary information sources:
evaluation criteria (1)
• Is the information valid, reliable, trustworthy, genuine,
authentic?
Is the author honest?
Is the source objective, not subjective, without cultural or
political or ideological or commercial bias?
Is the origin an individual or a company or an
organisation?
Is the publication sponsored by some company or
organisation?
202
Documentary information sources:
evaluation criteria (2)
• Is the information accurate, correct?
Who is the author or producer?
Has the source an author or a producer with a high
expertise, a good reputation, good qualifications?
Can the author be contacted for clarification or
discussion?
Was the information reviewed, edited, improved,
corrected, censored, approved, verified, before
publication?
Do experts agree on the information provided?
203
Documentary information sources:
evaluation criteria (3)
• Is the information source unique?
Does it offer a great amount of primary information,
which is not obtainable from other sources?
• Is the information complete?
Is the work available in its entirety?
• Does the source offer a wide coverage?
Is the source comprehensive, substantive?
• Is the information current enough, up to date?
Is a publication date provided?
Is an expiration date provided?
204
Documentary information sources:
evaluation criteria (4)
• Does the document provide suitable references, so that
you can verify statements and find older suitable
information sources?
• Good clear format and lay-out of the information /
User-friendly information system /
Easy for users to orientate themselves within the resource
and to find their way around it?
• Good user support / Good customer support?
• Is the type of distribution medium appropriate?
(print, e-mail, online,...)
205
Documentary information sources:
evaluation criteria (5)
• Is the information what you want?
If not, then reassess your needs and consider other types
of information as well.
206
Documentary information sources:
evaluation criteria (6)
• Is the information suitable for your level of
understanding of the subject?
Is the document popular, suitable for the general public,
for students, for professionals, for scholarly/academic
use…?
Does it report new, primary research (survey, experiment,
observation, measurement, invention) or is it a review of
sources published earlier?
• Does the information repeat or confirm what you already
know, or is it complementary, contradictory, new?
207
Evaluating the quality
of information
Computer-based information sources:
evaluating their quality
208
Computer-based information sources:
The Internet Detective
• A tutorial in English about how to assess the quality of
WWW-based information resources can be accessed
online free of charge through the WWW:
http://www.netskills.ac.uk/TonicNG/cgi/sesame?detective
• Ook vertaald in het Nederlands:
http://www.kb.nl/coop/detective/
209
•These slides are available through the WWW from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)
•References to publications about this subject
and more slides are available through the WWW from
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/courses/
210
Vragen?
Suggesties voor discussie?