The Search for quality - LIR HEAnet User Group for Libraries

Download Report

Transcript The Search for quality - LIR HEAnet User Group for Libraries

The Search for Quality:
productive Web searching
John Cox
James Hardiman Library
NUI, Galway
The Problem
7.3 million new Web pages daily
Quality varies, mainly due to ease of
publication and lack of checks
Quality is in the eye of the beholder
Over-dependence on general search engines
Simplistic use of search tools
Some Usage Findings
NUI, Galway Library survey, March 2000:
Search engines cited by 79 out of 167 respondents
Exclusively used for, eg Nazism, defamation law, hepatitis C
Less than 50% satisfied
Other surveys show very simplistic use:
33% users enter one word only
Further 33% users enter two words only
UK survey indicates 80% searchers waste some time
US survey shows “search rage” within 12 minutes
Key Question
“How much better than users are information
staff at finding high-quality information on the
Web and what leadership do we provide?”
5 key actions needed
5 Key Actions
Get the best from the search engines
Go vertical: subject-specific sources
Take time to experiment, eg helper software
Exploit the invisible Web
Actively promote quality searching
1: Get the Best from the
Search Engines
Understand how they work
Know their limitations
Use advanced features
Search more than one
Know when not to use them
Search Engine Components
Crawler: follows links
Indexer: builds database
Query processor: lets us search
Common Limitations
Profit-oriented
Paid entries listed at top
Out of date
Partial site indexing
Technically must exclude many sites, eg
Password-protected
Registration needed
Database-driven
Hidden search facilities
Understanding Google
Strengths
Coverage
Cached pages
File types, eg PDF,.doc,.ppt
Relevance: link popularity
Beyond pages: images,
newsgroups
Weaknesses
Poor Boolean support
No truncation
Limited date searching
Invisible search facilities
Two pages per site
displayed by default
Google: coverage
Google: search modes
Basic
Advanced
Google: file types
Google: newsgroup search
Google: cached pages 1
Google: cached pages 2
Google: Boolean limitations 1
Correct syntax: medline OR embase
Google: Boolean limitations 2
Correct syntax: medline –embase (or use Advanced Search)
Google: no truncation
Use clinton (tax OR taxes OR taxation)
Google: few date limits
Google: hidden features 1
Discovered at www.searchengineshowdown.com (buried in Google help)
Google: hidden features 2
Partial URL v Specific Site Search:
Not possible on Advanced Search despite “Domains” limit
Other Search Engines
Always worth searching more than one, eg
All the Web (FAST)
AltaVista
Lycos/HotBot
Northern Light (?)
Overlap may be limited
Different ranking criteria
2. Go Vertical: specific tools
Type
Region
Example(s)
Doras, Yahoo Australia & NZ
Domain
SearchEdu.com
Genre
Newsindex
Discipline
EEVL, LawCrawler
Subject
Politicalinformation.com
Horses for Courses 1
Horses for Courses 2
Horses for Courses 3
3. Experimentation
Try out “add-on” search software, eg
BullsEye Pro
Copernic
Copernic Summariser
BullsEye Pro: searching
BullsEye Pro: Webliographies
Copernic
Copernic Summariser
4: Explore the “Invisible Web”
Material, often of high quality, that general
search engines can’t or won’t index
Unlinked pages
Non-HTML file types, eg audio, video, PDF
Authenticated sites
Databases
Much greater in size than visible Web
invisibleweb.com
invisible-web.net
WebData
Librarians’ Index to the
Internet
5. Promote Quality Searching
Old sources
Old habits
New media
Old Sources
Old Habits
Search strategy
formulation
Concept analysis
Flexibility
Patience
Critical
appraisal of
search hits
Critical
source
selection
New Media
Library
Web Site
Enewsletter
Weblog
http://www.hw.ac.uk/libWWW/irn/irn.html
Towards a Brighter Future
Automatically-generated, accurate metadata
Smarter search engines
More quality-sensitive
More penetrative
XML: structured data
References
•Sherman, Chris and Price, Gary The invisible Web:
uncovering information sources search engines
can't see. Medford, N.J.: Information Today, 2001.
ISBN 091096551X. (accompanying database at
http://invisible-web.net)
•Search Engine Watch:
http://www.searchenginewatch.com
•Search Engine Showdown:
www.searchengineshowdown.com