The Search for quality - LIR HEAnet User Group for Libraries
Download
Report
Transcript The Search for quality - LIR HEAnet User Group for Libraries
The Search for Quality:
productive Web searching
John Cox
James Hardiman Library
NUI, Galway
The Problem
7.3 million new Web pages daily
Quality varies, mainly due to ease of
publication and lack of checks
Quality is in the eye of the beholder
Over-dependence on general search engines
Simplistic use of search tools
Some Usage Findings
NUI, Galway Library survey, March 2000:
Search engines cited by 79 out of 167 respondents
Exclusively used for, eg Nazism, defamation law, hepatitis C
Less than 50% satisfied
Other surveys show very simplistic use:
33% users enter one word only
Further 33% users enter two words only
UK survey indicates 80% searchers waste some time
US survey shows “search rage” within 12 minutes
Key Question
“How much better than users are information
staff at finding high-quality information on the
Web and what leadership do we provide?”
5 key actions needed
5 Key Actions
Get the best from the search engines
Go vertical: subject-specific sources
Take time to experiment, eg helper software
Exploit the invisible Web
Actively promote quality searching
1: Get the Best from the
Search Engines
Understand how they work
Know their limitations
Use advanced features
Search more than one
Know when not to use them
Search Engine Components
Crawler: follows links
Indexer: builds database
Query processor: lets us search
Common Limitations
Profit-oriented
Paid entries listed at top
Out of date
Partial site indexing
Technically must exclude many sites, eg
Password-protected
Registration needed
Database-driven
Hidden search facilities
Understanding Google
Strengths
Coverage
Cached pages
File types, eg PDF,.doc,.ppt
Relevance: link popularity
Beyond pages: images,
newsgroups
Weaknesses
Poor Boolean support
No truncation
Limited date searching
Invisible search facilities
Two pages per site
displayed by default
Google: coverage
Google: search modes
Basic
Advanced
Google: file types
Google: newsgroup search
Google: cached pages 1
Google: cached pages 2
Google: Boolean limitations 1
Correct syntax: medline OR embase
Google: Boolean limitations 2
Correct syntax: medline –embase (or use Advanced Search)
Google: no truncation
Use clinton (tax OR taxes OR taxation)
Google: few date limits
Google: hidden features 1
Discovered at www.searchengineshowdown.com (buried in Google help)
Google: hidden features 2
Partial URL v Specific Site Search:
Not possible on Advanced Search despite “Domains” limit
Other Search Engines
Always worth searching more than one, eg
All the Web (FAST)
AltaVista
Lycos/HotBot
Northern Light (?)
Overlap may be limited
Different ranking criteria
2. Go Vertical: specific tools
Type
Region
Example(s)
Doras, Yahoo Australia & NZ
Domain
SearchEdu.com
Genre
Newsindex
Discipline
EEVL, LawCrawler
Subject
Politicalinformation.com
Horses for Courses 1
Horses for Courses 2
Horses for Courses 3
3. Experimentation
Try out “add-on” search software, eg
BullsEye Pro
Copernic
Copernic Summariser
BullsEye Pro: searching
BullsEye Pro: Webliographies
Copernic
Copernic Summariser
4: Explore the “Invisible Web”
Material, often of high quality, that general
search engines can’t or won’t index
Unlinked pages
Non-HTML file types, eg audio, video, PDF
Authenticated sites
Databases
Much greater in size than visible Web
invisibleweb.com
invisible-web.net
WebData
Librarians’ Index to the
Internet
5. Promote Quality Searching
Old sources
Old habits
New media
Old Sources
Old Habits
Search strategy
formulation
Concept analysis
Flexibility
Patience
Critical
appraisal of
search hits
Critical
source
selection
New Media
Library
Web Site
Enewsletter
Weblog
http://www.hw.ac.uk/libWWW/irn/irn.html
Towards a Brighter Future
Automatically-generated, accurate metadata
Smarter search engines
More quality-sensitive
More penetrative
XML: structured data
References
•Sherman, Chris and Price, Gary The invisible Web:
uncovering information sources search engines
can't see. Medford, N.J.: Information Today, 2001.
ISBN 091096551X. (accompanying database at
http://invisible-web.net)
•Search Engine Watch:
http://www.searchenginewatch.com
•Search Engine Showdown:
www.searchengineshowdown.com