Neverending Search: Taming the Internet Search Engines

Download Report

Transcript Neverending Search: Taming the Internet Search Engines

Neverending Search:
What you really need
to know about
online searching
and search tools!
Search, don’t surf!
It’s a trillion page Web!
We already know
how to
use the Web!
Just because you live on
the Web, doesn’t mean
you can’t learn how to use
it more effectively and
more powerfully!
Effective searching
Brainstorming/
Questioning/
Planning
Choosing
the right
type of
search tool
Staying up
to date
Understanding
strategy/
syntax
Evaluating
results!
Four tips: FSRE (for sure?)
• Focus—What is your mission or question?
• Strategize—Which search tools will you use?
Which keywords and search terms will you use
and how will you express them?
• Refine—How might I improve my search results?
• Evaluate—Which results will you visit? Which
sites or documents are worthy enough to use?
Did I do good work?
Good searchers also:
• Use peripheral vision—they mine their results
for additional search terms
• Consult several search tools
• Make use of advanced search screens
• Search the free Web and subscription
databases
• Use appropriate syntax (the language specific
to the search tool they are using)
• Use search strategies
• Modify or refine their searches (Searching is
recursive!)
• Web search engines can
locate every page on the
Web
• Search engines are the
only search tools on the
Web
• Webmasters can fool a
search engine into ranking
a page more highly in its
search results
Eileen Stec, Rutgers, ALA
Preconference 1/03
One Possible Scenario
• Pre-process search terms
– Brainstorm
• Use Boolean techniques
– Advanced search
•
•
•
•
•
Phrase searching
Other search “tricks”
Use specialized search engines
Plumb the Deep Web
Use online subscription databases
Pre-process your search terms
Bernie Dodge
Step Zero: Seven Steps to Better Searching
http://edweb.sdsu.edu/webquest/searching/stepzero.html
Recognize the importance of
brainstorming and strategy
Research Question: How effective are drug abuse
prevention programs for young people?
Connect with “ANDs”
Concept 1
Concept 2
Concept 3
Concept 4
or
teen*
“drug
abuse”
prevent*
effectiv*
or
adolesc*
marijuana
program*
success
or
child*
alcohol
treat*
How important AND is!!!
When do you really need OR?
OR is generally used for synonyms or related words.
Use NOT as a refinement
technique when problem words
are likely to come up
eagles NOT Philadelphia
“Martin Luther” NOT King
Rockwell Schrock’s Boolean Machine
http://kathyschrock.net/rbs3k/boolean/
Let’s play Boolean Aerobics!
• Stand up if you have brown hair AND
brown eyes
• Remain standing if you have brown hair
AND brown eyes AND are wearing
glasses
• Remain standing if you have brown hair
AND brown eyes AND are wearing
glasses AND are wearing something
blue
“Phrase searching”
• One of your best searching tools!
• Use only for legitimate phrases, names, titles
• “vitamin A”
• “John Quincy Adams”
• Titles “An Officer and a Gentleman”
• Phrase searching is sometimes overused:
Remember: not every group of words is a
phrase
• Sometimes “ANDing” or “NEARing” are better
strategies
Advanced Search Screens
•
•
•
•
Google
All the Web
AltaVista
HotBot
Tricks for advanced searchers
seeking a needle in a haystack
• Word stemming:
– wom*n
– lesson* NEAR plan*
• Search within
– Google, AlltheWeb
– Also use “find” to search within a page full of text!
• Field Searching
– Search for keywords in titles, subject tags, file formats rather
than just words anywhere in the text
• Search Engine Features Chart
http://searchenginewatch.com/facts/ataglance.html
Field searching is usually easier in the
Advanced Search area
• title:
• Link check (Google, AltaVista) Helps in
evaluating sites!
– link:mciu.org/~spjvweb
• Media or filetype:pdf or ppt (Google) Great
for finding documents, papers, and
presentations!
• domain:
– domain:jp +edu
Just as I wouldn’t ask my
contractor friend to prepare my
will, I wouldn’t ask my lawyer
friend to build my new kitchen.
Search tools have
specialties too.
A hammer won’t do it all . . .
“People who are only good
with hammers see every
problem as a nail.”
Abraham Maslow,
psychologist
Choosing the right search
tool is an important strategy!
search engines
annotated/rated directories
subject directories
subject guides/gateways
meta-search tools
specialized directories
A field guide to the search tools
Search engines
Databases of billions of Web pages, gathered by automated “robots,”
allowing broad, often overwhelming searches. Search engines vary in the
ways they collect sites and organize results
Metasearch Engines
Search across a variety of search tools and organize the collected results.
Good for a broad sweep type search
Subject directories
Links to resources arranged in subject hierarchies, encouraging users to
both browse through, and often search for, results. Subject directories are
often annotated. They are selected, evaluated, and maintained by humans.
Specific Subject Guides or Gateways
The work of a subject specialists, subject gateways usually result in
carefully selected and annotated links
Specialized search engines
Search engines that focus their searching in a particular area of
knowledge or interest.
Subscription Databases Pay services often provided by states or libraries
offering premium content in the form of reference materials, journal and
newspaper articles, broadcast transcripts, etc.
Subject directories:
When to use them
• When you are just starting out, or have a broad
topic or one major keyword or phrase (example:
“Civil War”)
• When you want to get to the best sites on a topic
quickly
• When you value annotations and assigned subject
headings which may help retrieve more relevant
material
• When you want to avoid viewing the many noise
documents picked up by search engines
Two Essential Directories
Librarians’ Index to the Internet
http://lii.org
Well-organized, selective, and continually
updated collection, also known as “the
thinking person’s Yahoo.” Maintained by a
team of librarians at Berkeley Public Library
Kids Click
http://kidsclick.org/
Great starting point for kids. Annotations are
carefully written. Offers grade levels and
describes how illustrated a site is.
Subject directories to count on
INFOMINE: Scholarly Internet Resource Collections
http://infomine.ucr.edu/
A large collection of scholarly Internet resources
About.com http://www.about.com
Offers a surprising number of guide pages, maintained by paid
experts. Not scholarly but very handy for everyday, practical topics
Academic Info: Your Gateway to Quality Educational Resources
http://www.academicinfo.net/
Great for high school and college research
BUBL Link http://bubl.ac.uk/link/
This UK project leads to carefully selected and annotated resources
WWW Virtual Library http://www.vlib.org/
The first subject directory on the Web. Features comprehensive,
well-annotated subject collections maintained by experts around
the world
Subject directories—Popular
• Google Directory http://directory.google.com/
• Yahoo! Directory http://dir.yahoo.com
Both Yahoo! and Google offer popular directories. They are not very selective,
but they offer some wonderful subject collections.
Examples:
Yahoo! Full Coverage
http://fullcoverage.yahoo.com/fc/
Google Social Issues
http://directory.google.com/Top/Society/Issues/
Search Engines:
When to use them
• When you have a narrow topic or several keywords
• When you are looking for a specific site
• When you want to do a comprehensive search and
retrieve a large number of documents on your topic
• When you want to make use of the features in an
advanced search screen or search for particular
types of documents, file types, source locations,
languages, date last modified, etc.
• When you want to take advantage of newer retrieval
technologies, such as concept clustering, ranking by
popularity, link ranking, etc.
Search engines are powerful but they
have limitations!
• They do not crawl the web in “real time”
• If a site is not linked or submitted it may not
be accessible
• Not every page of a site is always searchable
• Few search engines truly search the full text
of Web pages
• Special tools needed for the Invisible/Deep
Web
• Paid placement/sponsored results distract
from real results
When using a search engine
Your goal is to get the best
stuff to appear on the
first two or three pages.
Relevance rocks!
Search engines determine relevance
in different ways.
Second Gen Search Tools
Approach relevance in helpful ways:
• Google ranks by link popularity
• Teoma ranks by subject-specific popularity
• Vivisimo offers concept-clustered results
• Surfwax uses human generated indexes—Focus
Words and summaries
• Ixquick Metasearch uses the ranking schemes (top
ten lists) of other search tools
U. Albany Laura Cohen
http://library.albany.edu/internet/second.html
Some search tools present
results horizontally, not in
long lists!
• Query Server (metasearch)
http://www.queryserver.com/web.htm
• Vivisimo (metasearch)
http://vivisimo.com
Your Goal as a Searcher:
“Upping” the best results
Traditional
• Text relevance
Second generation
• Link analysis
• Popularity
• Thesauri
• Visualization/Mapping
• “More like this”
• Concept clustering/
Autocategorization
Trends to look for
• SurfWax and Ask.com use indexes or thesauri. The
burden of coming up with precise or extensive
terminology shifted from searcher to the engine.
• Google and WiseNut rank results based on the
behavior of millions of Web users.
• Vivisimo and WiseNut use concept clustering/autocategorization/horizontal display
• KartOO maps results visually
• Ixquick Metasearch compiles “top ten” lists of the
major engines
Specialized Search Tools
• Scirus (science search)
• Search.edu (searches only edu domains)
• Biography Center (profile aggregator)
• SearchEric.org (education)
http://searcheric.org
• SOSIG (Social Sciences)
http://www.sosig.ac.uk/
• HUMBUL (Humanities)
http://www.humbul.ac.uk/
Invisible/Deep/Hidden Web
• The Web’s largest growing resource
• Estimated to be 40 times size of the
visible Web
• Most not subject to fees
• Includes topic-specific databases
Why is some of the Web invisible? I
• The material is on the Web but it is a
proprietary database
• The material is on the Web but is in a free
database
• Content appears past the page size reach of
the crawler
• The crawler does not search a particular file
format or non-text interface
• The page is available only after registration
• The page is available by some engines but not
others. No two engines are the same
Why is some of the Web invisible? II
• Time lag exists between posting, crawling,
and searching (Spiders do not crawl in real
time). Site may have been unavailable during
the last crawl
• Firewall prevents access
• Page must be accessed or searched in a
special way
• Page is not linked to from any other page
• Page was not submitted to the search engine
you are using
Tools for seeing the Invisible Web
• Invisible Web Directory
http://invisible-web.net/
• Complete Planet
http://completeplanet.com
• Librarian’s Index to the Internet
http://lii.org
• Pinakes
http://www.hw.ac.uk/libWWW/irn/pinakes/pinakes.html
• OAIster
http://oaister.umdl.umich.edu
Examples of Free Databases
• Find Articles
http://www.findarticles.com
• MagPortal
http://magportal.com
• ERIC
http://searcheric.org
• American Memory Collection Finder
http://lcweb2.loc.gov/ammem/collections/finder.html
• NARA
http://www.archives.gov/search/index.html
• Perry Casteñeda Map Collection
http://www.lib.utexas.edu/maps/index.html
The free Web is
not enough!
What’s not on the free Web?
• Copyrighted fiction and nonfiction:
biographies
• High quality reference: including
literary criticism, science biography
• Full, searchable archives of
journals, magazines, newspapers
• Most of our OPAC
Bessie Chin Library Subscription Databases
EBSCO databases (Magazines & Newspapers, Health, History, Literature and
Science resources, and Columbia Encyclopedia). Teacher journals, ERIC
database and Advanced Placement Search are accessed by clicking on Teacher
Resources)
Britannica Online
Columbia International Affairs Online
CountryWatch
CQ Researcher
InfoTrac (Opposing Viewpoints, Gale Virtual Reference Library)
NoveList
Oxford Reference Online
Questia
World Book Online
Public Library Databases
• Marin County Free Library
databases
• MARINet databases
Don’t forget to use online
encyclopedias and databases as
subject directories!
They select great links!
Tools to help you make search engine choices:
•
Debbie Abilock’s Choosing the Best Search
For Your Purpose
http://www.noodletools.com/debbie/literacies
/information/5locate/adviceengine.html
•
How to Choose a Search Engine or Directory
(U. Albany)
http://library.albany.edu/internet/choose.html
For more information (and for people who love
searching):
Search Engine Watch
http://searchenginewatch.com
Okay, what
did we
learn?
Let’s review