Searching the web - Computer and Information Science

Download Report

Transcript Searching the web - Computer and Information Science

Searching the web
MG 25th March '02
TABLE OF CONTENTS
•Page(s) 1-4 Description and explanation of search
engines on the web.
• Table 1 Engine Comparison Chart - describes
methods that are beneficial for successful retrieval of
information from the web.
•Table 2 -Methods various search engines use to gather
and format information for their databases.
Description of some of the more popular search
engines on the web today.
•Guidelines for the best search engine for your
information needs.
MG 25th March '02
•
Search Engines have evolved into a dynamic and powerful
means of gathering sorting and selecting a wide range of
information that would not otherwise be accessible to the
general public. Today there are hundred of search engines, also
called ‘web location services’, vying for a competitive edge on
the market. These search engines have made the gathering and
retrieval of information very simple for the user. Everyone who
has had contact with the Internet have encountered, used or
heard about search engine. Some claimed that ‘search engines’
have put a great amount of power in the hands of the user free
of cost but others question whether all this abundant information
actually comes with a price.
MG 25th March '02
•
The creation of the search engine can be accredited to Allen
Emtage, a grad student, who wrote a program that ‘automatically
searched for keywords appearing in archives coded in File
Transfer Protocol (FTP, the language created to send net files).
He later released this program in 1989 which he called ‘Archie”.
Other search tools such as Veronica and Wide Area Internet
Services (WAIS) were later created. These search tools basically
gathered general keywords describing the topic you inputted and
the sources where these tools should look. They then were able
to best match the criteria by looking into the index and counting
how many times each item contained the selected keywords.
Unfortunately, Emtage did not file a patent application and
presently a Anchover Mass. Company has claimed patent rights
and are presently disputing patent infringement rights of other
search tools companies. The battle continues but that does not
prevent the scores of new search engines from being created
very day.
MG 25th March '02
•
Today we are bombarded with hundred of search tools
crawling the Internet. In order to access information on any
search engine, you must type in a keyword. Then the search
engine takes the keyword(s) and search for documents that
have the keyword(s) requested. The amount of returns of
keyword(s) found in the documents will determine which
documents to chose. A search engine usually displays the
results page by page. There usually thousands of documents
on the Web that seems to relate to an inquiry. As it is
impossible to look through so many documents, it is best to
confine your enquiry to a particular topic in order to achieve
satisfactory results. In order to find what you need from a
search engine it is best to analyze your information needs,
create queries and select appropriate search strategies.
Depending on your specificity to a particular topic, it is best if
you have a distinct idea of what you are searching for.
MG 25th March '02
– There are tricks and pitfalls in search engines that a
user should be aware of. Due to the increasingly
competitive nature of the World Wide Web, site
owners have take drastic steps to ensure that their
sites are seen as many people as possible. Usually
Corporations and Businesses try to attract potential
customers by using tricks to guarantee that their sites
are seen. A key feature of this type of practice is
“keyword spamming”. Keyword spamming is the use
of multiple keywords in documents to ensure that
their sites are placed on the top of the hit list. Another
method is the use of hidden text. Black text is hidden
against a black background in web pages to ensure
that it gets at the top of the list. Also some site
owners just simply pay the search engines to place
their sites at the top. The abuse of keywords and
other aggressive means are only a few of the
techniques used to ensure popularity.
MG 25th March '02
–
These questionable practices are now being employed as it
guarantees popular hits. This does not guarantee however that
these sites are indeed the best ones but in order for a user to
effectively benefit from the information on the web, he needs to
know what to look for and what to discard. Search engines are vast
in numbers and it is best to use well thought out keywords and
prior knowledge of exactly where to search for the information. The
primary purpose of most search engines to get as many visitors to
the sites listed as possible. Many web sites are caught up in the
grand idea of getting as many hits as possible and they fail to
realize that in order to maintain the optimum quality of service
possible they must provide quality information.
MG 25th March '02
– Bruceclay.com provided excellent ideas to improve and
design your site to best suite the general needs of the user.
One suggestion was using a ‘follow the leader’ method’ when
selecting keywords and page wording. Meta tags in the
source codes of the sites control keywords in search engines
and the right keywords are the backbone for receiving
popularity for a site. Knowledge of the keyword of competitors
and improving and maintaining quality information is the next
step to maintaining a popular ranking on a search engine This
is by no means similar to ‘keyword spamming’ as proper
selection of keywords will enable a site to maintain popularity.
•
MG 25th March '02
• There are different types of web resources that can help you to find
the answers to your questions:
• Subject tree:
• A subject tree is a hierarchically organized category of topics
with lists of web sites and online documents relevant to each
topic. Also called directories.
•
• Clearinghouse:
• A collection of Websites and online documents about a specific
topic. Clearinghouses are similar to subject trees but on a larger
scale
• General search engine:
• Indexes a large collection of web pages that users retrieve by
entering keywords. General search engines rely heavily on web
spiders to do most of the sorting and gathering. These databases
are huge and sometimes the relevant information may be hidden
MG
25th in
March
deep
the'02list. Providing specific and relevant keywords to your
search is idea in order to retrieve information
• Specialized search engines
• Similar to general search engine but is limited to specific
web pages. Takes the concept of the clearinghouse but
does more than just provide links to the documents. The
specialized search engine provides the actual
documents. These are handpicked that a user has
selected as relevant to the topic.
– The best search engines offer a simply query option
where you type full sentences or question that describe
your information needs. The engine with the most pages
in the database is not necessarily the best search engine.
The chart attached indicates the best search engines to
date even though this is always subjected to change.
There are many types of search engines ranging from
general information to specialized information. The
compilation of databases of documents and indexing of
these documents provide users with on the spot result if
queries are successful.
MG 25th March '02
– Again, there are hundreds of search engines that specialize in
different levels of searches. By using web spiders, search engines
create and update their document databases automatically. A web
spider is a computer program that searches for Web pages and
collects, update, replace and renew old pages or find new web
pages. This program keeps a list of all the Uniform Resource
Locator’s (URL) and returns all the information to the search engine.
There are numerous techniques used to index these documents and
large search engines constantly run Web Spiders to index as much
information as possible of the Web. There are many things on the
Internet that are legitimate but quite a bit of the information found is
also very unreliable. Therefore, it is best to be critical of some of the
information received. The bottom line however is that in order to get
satisfaction from search engine, users must be aware of the nature
of their query.
– phttp://www.submitcorner.com
• http://webreference.com/search/background.html
• http://ariade.ac.uk/issue10/search engines
MG 25th March '02
• http://searchability.com/about.htm
Crawling
Yes
No
Deep Crawl
All but...
Excite
Frames Support
Image Maps
robots.txt
Meta Robots Tag
Link Popularity Helps Deep
Crawl
Learns Frequency
Paid Inclusion
MG 25th March '02
All but...
Excite, FAST
AltaVista, Excite, FAST, Google,
NLight
Inktomi
All
n/a
All but
Excite
n/a
All
n/a
AltaVista, Excite, FAST, Google,
Inktomi
NLight
AltaVista,
Inktomi,
FAST
(coming
9/01)
Excite, Google
Notes
Indexing
Yes
No
Notes
Some stop words
may not be
Full Body Text
All
n/a
indexed
AltaVist
a,
Excite,
Inktomi
,
Stop Words
Google
FAST, NLight
All
Google,
Meta Description
but...
NLight
All
Excite, FAST,
Meta Keywords
but...
Google, NLight
AltaVist
a,
Excite, FAST,
ALT text
Google
Inktomi, NLight
Comments
Inktomi
Stemming
Ranking
Others
-- See Search Features Chart --
Yes
No
Notes
AltaVista,
Excite, FAST,
Meta Tags
Boost Ranking
Google,
Inktomi
NLight
Link Popularity
Boosts Ranking
Very important
All
n/a
Boost Ranking
HotBot
Others
Spam
Yes
No
AltaVist
Google, Inktomi,
Meta Refresh
a
NLight
Invisible Text
Excite, FAST
MG 25th March '02
Others
AltaVist
a,
Excite, FAST,
Tiny Text
Inktomi
NLight
at Google
Direct Hit
Excite, FAST,
Notes
Web Search Engine
Comparison Chart
Search Engine
Connector Terms
(Boolean)
Phrase
Searchin
g
Search
Modifiers
Proximity
Searching
Truncation
or Wildcards
AltaVista
Yes. Can use AND, OR,
AND NOT, and (...) in
Advanced Search.
Default connector is
AND.
Yes. Put
phrase in
quotation
marks.
Can use +
and - in the
Simple
Search. Not
available in
Advanced
Search.
Yes. Use
NEAR to
specify that
terms be within
10 words of
each other.
Yes. Use the
* for
truncation or
as a wildcard.
Excite
Yes. Can use AND, OR,
AND NOT, and (...). Must
be in ALL CAPS. Default
connector is OR.
Yes. Put
phrase in
quotation
marks.
Can use +
and - to
require or
exclude
terms.
Proximity
searching not
available.
Truncation
not available.
Google
Yes. Can use AND.
Default connector is
AND.
Yes. Put
phrase in
quotation
marks.
Can use +
and - to
require or
exclude
terms.
Proximity
searching not
available.
Truncation
not available.
HotBot
Yes. Specify Boolean
Phrase in the drop-down
box. Can use AND, OR,
NOT, and (...). Default
connector is AND.
Yes. Put
phrase in
quotation
marks or
specify in
the dropdown box.
Can use +
and - to
require or
exclude
terms.
Proximity
searching not
available.
Yes. Use the
* for
truncation or
as a wildcard.
InfoSeek
Use of connector terms
not available. Default
connector is OR.
Yes. Put
phrase in
quotation
marks.
Can use +
and - to
require or
exclude
terms.
Proximity
searching not
available.
Truncation
not available.
Lycos Pro
Yes. Can use AND, OR,
NOT, and (...). Can
specify AND, OR in the
drop-down box. Default
connector is AND.
Yes. Put
phrase in
quotation
marks or
specify in
the dropdown box.
Can use +
and - to
require or
exclude
terms.
Proximity
Truncation
searching
not available.
available in the
drop-down
box.
MG 25th March '02
Choose the Best Search for
Your Information Needs
• http://www.searchenginewatch.com/
• http://nuevaschool.org/~debbie/library/r
esearh/adviceengine.html
•
» End
MG 25th March '02