Transcript Searching

Searching The Web
• Search Engines are computer programs
(variously called robots, crawlers, spiders,
worms) that automatically visit Web sites
and, starting with the Home Page, they
follow all the internal links in the site, visit
every WebPage at the site and read
every word on every page and create an
index of these words.
Running Effective Searches
• Browsing and Searching are not the same.
• When you browse, you navigate from one
Web page to another by following links.
• When you search, you enter keywords in a
search engine to display a list of pages that
match the keywords.
Word Index versus Subject
Directory
• A Word Index database contains billions of pages and
from each page hundreds, or even thousands of words,
since a Word Index contains every main word (small
words such as: in, at, on, etc. are not indexed) from
every page it finds at a Website. The Google database
contains every main word for 228.000 pages at the
Baskent University websites.
• A Subject Directory is extremely small – it contains only
basic Subject headings for a few main pages at each
Website. For example, the Google Subject Directory
database contains only the Subject category and page
Titles of about 54 pages at the Baskent University
websites.
Searching for Data
Use a Search
Engine to find
data by keying
in a word or
phrase.
The word or
phrase is called a
keyword and
represents a topic
or phrase.
Results
Page
Search
Expression
Keyword
Query
Hits
Sponsored Links
Ranking
• The positioning of a Web page on the
results page is called a site’s ranking.
– The order of the ranking will vary according
to which search engine is used.
– Search engines only examine their own
databases.
Search Engines Differ
Because they:
– use different Web robots (spiders) to collect
information
– choose different Web pages to index
– interpret search expressions differently
– store a different amount of text from a Web
page in the database
Word Limiters
• The minus ( - ) sign means a word must
not be on the results page.
• if you want to be sure that the words are
found in the results then put a plus ( + )
sign before the word.
• Phrase Matching (" ") Putting quotes
around a set of words will only find results
that match the words in that exact
sequence compare
Document Section Limiters
• intitle: Finds pages that contain one specified word in
the page title, which appears in the title bar of the
browser.
• allintitle: Finds pages containing several words in title.
e.g. allintitle: ataturk education requires both words to
be in the page Title.
• inurl: Finds pages with one specific word in the URL.
• allinurl: If you start a query with allinurl; Google will
restrict the results to those pages with all of the query
words in the url. (google-search)
• allintext: Searches only the Text in the BODY of the web
page for the words.
• filetype: Finds only a specified filetype such as MSWord (.doc), MS-Excel (.xls)
Web Directory
Search engines
index words in Web
pages and then add
them to their
databases by
employing
automated
programs, such as
Web robots.
Real people
develop Web
directories and
decide which Web
sites should be
added to the
directory.
The content in Yahoo’s
Web Site Directory is
organized by topic
Drill down through
directory levels to
find Web sites
Some Web
directories also
include search
engine features
Natural Language Searches
A conceptual query is one where the search
engine returns only Web pages that are
relevant to the topic, even if the words don’t
precisely match your keywords.
Concept-based Search Engines
www.excite.com
www.askjeeves.com
Can also be queried by
natural language
Metasearch Engines
• Metasearch engines will query
several engines simultaneously
– the search will pull results from several
search engines
– www.infospace.com
– www.mamma.com
Other Electronic Research
Resources
• Web is not the only Electronic source of
information.
• Among other sources is the Başkent library
website which provides students with access to
hundreds of other quality databases that are
not found using Search Services like Google or
Yahoo, because they are for registered
subscribers only. Başkent pays a fee for these
services that are then offered at no cost to our
Students.