search engine - School of Library and Information Science

Download Report

Transcript search engine - School of Library and Information Science

SEARCH ENGINE
By
Ms. Preeti Patel
Lecturer
School of Library and Information Science
DAVV, Indore
E mail: [email protected]
Search Engine
Introduction
 Components
 Type
 Functions Subject directories Vs Search
engine

Introduction: Search engine


Search engine came into existence in 1994.
According to Yahoo Search engine directory –
2003 , there are over 448 major search engines.
A SE is a searchable database of Internet files
collected by a computer program
(called
wanderer, crawler, robot, worm and spider).
Indexing is created from the colleted files e.g.
title, full text, size, URL etc. There are no
selection criteria for collection of files. SE
allows the user to enter keywords and SE
retrieve Web documents from its data base that
match the key words entered by the searcher.
The SE doesn’t wait for someone to submit
information about a site. It send
spider/crawler/web crawler to visits publicly
accessible websites following all links it comes
across collecting data for search engine indexes.
A Spider discovers new sites and update
information from sites previously visited . A
spider can also be used to check links within
websites.
Components of SE
A SE might well be called a search engine service or a
search service. The components of SE are following Spider: Programs that traverses the Web from link to
link, identifying and reading pages.
 Index: Web database containing a copy of each web
page gathered by the spider.
 SE Mechanism: Software that enables users to query
the index and that usually returns results in relevancy
ranked order.
Types: SE

A SE downloads all the information that the
page contains and then examines that
information to index key words and phrases that
can be used to categories the sites. SE can be
categorized into three types on the basis of the
indexing techniques employed by them:-

Active SE: It collect all information by itself. It
uses a program calls ‘Spider’ or ‘Web robot’ to
index and categories web pages as well as
websites. The spider travel around WWW in
search of new sites and add entries to their
catalogue.

Passive search engines or Subject directories:This type of SE are possibly more accurately referred to as
directories. It doesn’t seek out information by itself but it
rely on the WWW users to submit details on their favorite
sites in order to build up a database. For example yahoo
directory has 14 main subject categories and each categories
has many sub categories and sub categories also their own
sub categories, and so on almost ad infinitum.
Due to size of the web and constant
transformation ,keeping up with important sites
in all subject areas is humanly impossible.

Meta Search engine:
An increasing number of search engines have
led to the creation of ‘meta ‘ search tool. A meta
search engine does not catalogue any web page
by itself. It simultaneously searches multiple
search engines. When query is put before this
type of search engine ,it forward that query to
other search engines.
Types of meta Search engine
1.
2.
There are two types of meta Search engine
One type of SE provide separate list of results
from each engine that was searched. With this
type of Meta SE , one can retrieve
comprehensive , and sometimes over whelming ,
results.
The other type is more common and returns a
single list of results, often with the duplicate hits
removed . This type of Meta SE always brings the
results back to its own site for viewing.
Example:
 Metacrawler (www.metacrawler.com)
 SurrfWax ( www.surwax.com )
 Zapmeta ( www.zapmeta.com )
According scope the Search engine SE
can divided in following categories.
 General Search engine : It covers a rage
of services and facilities and facilitate
Boolean search . Example: Google, Alta
Vista etc.
 Regional Search Engine: It refer to
country specific search engine for locating
varied resources region –wise . Example :
Euro Ferret( Europe) and Excite UK etc.

Subject specific search engine:
It does not attempt to index the entire
web. It focuses on searching for websites
or pages within a defined subject area ,
geographical area or type of resources.
Because this specific search engine aims
for depth of coverage across subject.
Examples:
1. WWW.123india.com
Regional
2. WWW.in.altavista.com Regional
3. WWW.nauri.com
Employment
4. WWW.zipcode.com Weather
5. www.khoj.com
India specific

Features of SE

When using a Web search engine by entering
more than one words, the space between the
words has a logical meaning that directly affects
the results of the search. This is known as
default syntax. Example: Alta Vista , Info seek
and excite, a search, a search of word ‘bird
migration’ means that the searcher will get back
documents that contain either word’ Birds’ and
the word ‘migration’ or both.


The space between the words defaults to the
Boolean OR. This is probably not what the
searcher will get back documents that contain
both the words ’ Birds’ and ‘migration’.
SE return results in schematic order. Most SE
use various criteria to contract a term relevancy
rating of each hit and present the search results
in this order.
Criteria can include: search term in the title, URL, first
heading , HTML META tag; number of times search
appear in the document; search terms appearing early in
the document; search term appearing close together;
etc.
 SE technology continuous in developing stage. To day
SE technology is organization of search results by
concept, site, domain popularity and linking rather
than by relevancy.





Following services provided by the SE
Direct Hit ranks according to sites other searchers have
chosen from their results to similar queries.
Google rank by the number of links from pages ranked
high by services.
Inference find ranks by concept and top-level domain.
Meta find sorts results by keywords, alphabetically or by
domain.


SE do not index all the documents available on
the web. Example most SE cannot index files to
password protected sites, behind firewalls or
configured by the host server to be left alone.
Other web pages may not picked up if they are
not linked to other pages.
SE rarely contain the most recent document
posted to internet; do not look yesterday news
on search engine


Contents of databases will generally not show
up in a search engine results. A growing amount
of valuable information on the Web is not
generated from the database.
Some SE allow users to viewed display of the
retrieved Web sites/ Web pages, clustered under
different topics related to the search terms.
FUNCTIONS OF SE


They search the Internet by using a specialized
software ,called crawler or robot ;these software
/agent can find out web pages by following
hyper links.
These agent/ software sent the cached version
of web pages to the repository of a search
engine and SE keeps an index of words they
find and where (URL) they find them

They allow users to look forwards or
combinations of words found in that index
Diagrammatic representation of
Search Engine
CRAWLARS
Indexing
Software
in search
engine
Different
Websites
Different
Websites
Different
Websites
Database
of search
engine
Different
Websites
Switch
Search
Subject Directories Vs Search Engine
A subject directories is a services that
offers a collection of links to Internet
resources submitted by the site creators or
evaluators and organized into subject
categories.