2-Search Engines

Download Report

Transcript 2-Search Engines

Internet Research
Search Engines & Subject Directories
Search engines
• Search engines are the
means by which most
people search the
Web.
• Common examples
are Google, Altavista,
Direct Hit.
Yet they don’t search the Internet
• Yet a search engine does not
actually search the Web
during your search.
• A search engine searches
itself.
• It’s a three-step process.
1) Bots index words
• Search engines continually
send out hundreds of
“robots” or “bots” (or
“spiders” or “crawlers” )
• Bots visit web sites, read
word by word, and then
index those words.
2) A database is created
• A huge database of Web
sites thus is gathered
and indexed by word.
• These databases can be
huge, with millions of
links.
3) The Interface gives you access
• Using the
keywords you
give it, a
search engine
then searches
its own current
index.
Interfaces are based on rankings
• Search engines
return results based
on a ranking system.
• Ranking is the order
that files are listed
when they are
retrieved.
The ranking system is secret
• These systems are proprietary and
often “secret.” In general:
• Altavista ranks web pages higher if
your search terms are found in the
first few words of the page
• Google ranks by document
“popularity” with other similar
searches
• Direct Hit ranks by the length of
time other users spent at the site
Not even half the Web
• With all of this software
and sophistication, even
the best search engines
cover only 40-50% of
the Web.
• And they miss much
else on the Internet.
Bots hit and miss
Bots miss:





XML pages, pdf files
Dynamically created HTML pages
Frames-based pages
New pages or recent updated text
Some say the Invisible Web is 500
times larger than Web
Subject Directories
• A subject directory is
also a database of web
sites and references.
• But a subject directory
is organized not by
keywords but by
category or subject.
Yahoo!
• Yahoo! Is the most
popular subject
directory.
• www. about.com takes
the idea a step further
with subject guides for
selected topics.
Subjects are organized by people.
• Information is
selected, organized and
cataloged by a person,
not software.
• You can usually be
more assured that the
search results will
make sense.
You get an index of sites.
• Subject directories
will not often provide
you with ranked web
sites.
• Instead, you will get a
broad index related to
your topic, divided
further by
subheadings.
Use for early searching.
• Use a subject directory
early in your search
process to learn about
your subject.
• You will get fewer
links of higher quality.
• When you get more
specific questions, you
should use a search
engine.