Commercial Systems

Download Report

Transcript Commercial Systems

Commercial Systems
by Sylvia King
Overview

Crawler based Search engines




Directory based Search Engines





3 Main Characteristics
Google’s Pigeon Ranking Technology
Advantages and Disadvantages
Human-powered directory
Natural Language Processing – AskJeeves
Advantages and Disadvantages
MetaCrawler Search Engine
Conclusion
Crawler based Search Engines

3 Main Components:

Spider



The index




Also known as crawler, visits a web page reading as it goes along, then it
follows links to other pages within that site.
returns to the site every month or two and checks for changes. All
amendments are detected and are transferred into the index.
is sometimes known as the catalogue, has a copy of every single web page
that the spider finds.
Index is updated with all the new changes.
a web page may have been "spidered" but not yet "indexed." until it is
added to the index it will not be available to those who are searching with
crawl based search engine.
Location & Frequency

Search engines follow a set of rules called algorithms. They concentrate on
the location and frequency of keywords on a web page
Crawler based Search Engines Cont….

PigeonRank Technology







Firstly a user submitted a query to Google,
The query is then routed to what is known as data coop,
When a relevant result is located by one of the pigeons in the
cluster, it strikes a rubber-coated steel bar, this gives the page a
Pigeon Rank value of number one.
For each peck, the Pigeon Rank value is increased.
The pages that get the most pecks are prioritised and are shown at
the top of the user's results page.
The remaining results are displayed in order of this pecking system.
The pigeon rank methods makes it difficult to amend results,
aside from the Location & Frequency tricks, some try and boost
rankings by including images on their pages, Google's Pigeon
Rank technology is not fooled be such techniques.
Crawler based Search Engines Cont….

Advantages




Offers much larger databases of web sites for searches.
The full text of individual web pages is often searchable.
Great for searching very obscure terms or phrases.
Disadvantages



No humans to weed out problems, such as duplicates and
rubbish
The huge size of the database can lead to high numbers of
search results.
Search command languages can often be complex and
confusing.
Directory based Search Engines

Human-powered directory




Directories that depend on humans to collect their listings.
Directories point to sites rather than compiling databases containing
pages
You submit a short description to the directory for your site, and then
a search looks for matches only with the description submitted.
Natural Language Processing – AskJeeves



Through the use of Teoma Technologies, AskJeeves assists the user
through questions which helps narrow the search
also searches of up to six other search sites for the relevant web
pages
This technique avoids searchers to Boolean or other query languages
Teoma technology & AskJeeves

Teoma technology places strong emphasis on popularity of web
sites in their ranking algorithms, this search engine decides
results by ranking a site based on the following:



Subject-Specific Popularity: which is the number of web pages
about the subject that reference the page.
General Popularity: the number of all the other web pages that
reference the page.
Teoma technology also uses what are known as "communities" of
expert sites.

Communities are relevant knowledge hubs that are used to guide the
user through their search.
Directory based Search Engines Cont…

Advantages



If the user is uncertain of which keywords to use
Because these directories use human editors, the general standards are
higher than what’s found in search engines
Disadvantages




It could take the user a longer time in locating a suitable website.
Directories tend to be smaller than search engine databases.
Because directories are maintained by people and not spiders, and also
because they point to sites, rather than compiling databases containing
pages, the content of a site or page can change without the directory
being updated.
Dead links, -these are links that do not go to the pages they are intended
to, but instead produce an error message is viewed as a problem because
the responsible is on human editors to maintain the content of the
directory.
MetaCrawler Search Engine

A Meta search engine works as an agent between the user
and the search engines.



Meta search engines do not build or maintain their own web indexes,
they use the indexes built by others
Meta search engines generally present the first 10 - 30 results from
each of the results page
Advantages and Disadvantages


The advantages here is that Meta has the ability to single-handily
search several databases for the required topic.
The disadvantage is that it may return a limited number of hits.
MetaCrawler Search Engine Cont…
Conclusion

Why Google’s number one statues?


Google processes its search queries at a speed much greater
than the traditional search engines; it accomplishes this by
collecting pigeons in thick clusters.
Suggestions to minimize duplications/rubbish within
Crawler based search engines?