Unit 3
Web Search Engines
Can You Find the Answers?
Connect to Google
Search for items on Iran
Records ________
Combine Iran with nuclear weapons Rec _______
Combine Iran with the phrase
nuclear weapons
Rec _______
Use Advanced Search:
Combine Iran with the phrase nuclear weapons
so that all the words appear in title of
Records ___
Unit 3
Web Search Engines
How People Search on the Web
 What Are Search Engines?
 How Search Engines Work
 What’s in Search Engines?
 How to Find Search Engines
 Search Basics
Three Ways People Search
 No
direction, clear idea, issue
 Consult people, news, magazines, Web for ideas
 Have
some idea, but vague, flexible, unclear
 Consult reference sources, Web directories for
direction, topic, theme
Search in-depth
 Have
defined topic, narrow focus
 Consult databases, search and metasearch engines
What Are Search Engines
 Captures
web sites, pages
Indexes full-text of web page
Provides interface to search web pages
 Large,
billions of pages (unlike directories)
 Computer built (robots, spiders)
 No selectivity, no evaluation
How Search Engines
Spider comb, “capture” web pages
 Software builds database
 Words from web pages “indexed”
 Search interface finds words on pages
 Engine ranks, describes results
 How engines and directories differ
Spiders Comb,
Capture Web Pages
Software decides which web pages to
 Spiders check for updated pages
 Spiders remove dead sites
Spider Software Builds Database
Current web size: over 15 billion pages
 No engine’s database covers it all
 Google
covers 22% (3.3 billion+)
 AlltheWeb covers 21% (3.2 million+)
 HotBot (Inktomi) covers 20% (3 billion+)
 Teoma covers 10% (1.5 billion)
Words from Web Pages
“Index” means creating lists of words for
database and linking words to web pages
 Some index full text in document
 Some index part of text
 First
100 words in document
 Words in abstract, or title of document
How indexing works affects search
Search Interface Finds Web Pages
Provides keyword search box
Offers simple or advanced searching
Offers search options to affect results:
 Most
assume AND between words: Russian mafia
 Most accept “quotes” to search a PHRASE:
“Russian mafia”
 Most allow FIELD searches : ti:Russian mafia
Engine Ranks, Describes Results
Software lists most “relevant” items first
 Word
popularity: word repetitions, location
Site popularity – visitations of web site
Link popularity – how often link cited
 Results described
 Few words to a paragraph
 Sometimes stars, other indicators of
How Engines and Directories
Computers vs people
 Engine
spiders not editors select documents
Quantity vs quality
 Engines
big: want all, accept anything
 Directories small: want “best” “important”
Technology vs human judgment
 Engine
software ranks, no human evaluation
Top Search Engines
 Google
 AlltheWeb
 HotBot
 Teoma
3.3 billion+
3.2 billion+
3 billion+
1.5 billion+
Directories, Search Engines and
If directories find little, they default to engines
 Yahoo
defaults to Google
 Open Directory defaults to Google
 Looksmart defaults to Inktomi (Hotbot)
Some search engines borrow directories
 Google
uses “Open Directory”
Learn the source of information when using a
directory search box or search engine’s
Metasearch Engines
Technologies that search several search
engines at the same time
Increase results when search engine
produce little
 Save time by searching several engines at
 Show results of several engines on one
Retrieve too many hits
 Retrieve less relevant results
 Do
not individualize search syntax for each
 Do
not know whether to use and, AND, +, OR, or,
cannot interpret phrases, etc.
Exclude certain large engines like Google
Top Metasearch Engines
 Vivisimo
results, narrows topics
 Ez2find
most major engines
 Dogpile
results, covers major engines
What’s In Search Engines?
Business, commercial information
 Organizational publications
 Government resources
 Magazine, newspaper excerpts
 Some scholarly information
 Teaching
materials, unpublished articles
Books, articles whose copyright expired
What’s Not in Search Engines
Books under copyright
 Most
Journal, magazine, newspaper articles
 Most
current and past research
Reference books
 Most
Fiction, non-fiction in existence
recent, quality publications
In short
 Bulk
of human knowledge and research
How to Find Search Engines
Word of mouth, hearsay
 Newspaper, magazine articles
 Library web pages
 Guides
to search engines
 HyperResearch
Search Basics
Identify, select keywords
 Effects
of internet use on children
 Internet,
children, effects
Combine keywords to focus results
 Use
 Use phrase searching
 Limit search to field like title or URL
 Retrieves
an article if it contains
either keyword
 Use to connect similar words
 Use to increase results
Expands Results
 Internet
 Internet
OR Web
 Internet or Web or digital
 Use
to connect two different ideas
 AND between keywords means both
terms must be in record
 Use to decrease results
Reduces Results
2,956, 000
 Children AND Internet
 Children AND Internet AND Homework
 Children
internet homework
AND, OR and
Search Engine Syntax
Use help or tips
 AND,
and, OR, or, “+” “-” ?
 Does engine default to AND or OR?
 Do AND or OR have to be upper case?
 Use ADVANCED SEARCH to learn options
Is there a pull-down menu box?
can mean “All the words”
 OR can mean “Any of the words”
Phrase Searching
Two words in consecutive order
 Juvenile
delinquency Russian mafia
How does computer recognize “phrase”?
 Pull-down
 Quotation marks: “drug abuse”
Phrase searches reduce hits, improve
Russian mafia
 “Russian mafia”
Field Searching
Common document “fields”
 Author,
Title, Subject, Abstract, Text, URL
Limits search to words in particular fields
 Learn
syntax: title: ti: url:
Ti: Russian mafia
url: russianmafia
 Use pull-down menu (in the title, in the URL)
Reduces hits, improves relevancy
 Russian
mafia (all the words) = 23,234
 Russian mafia (in title) = 254
Use major search engines
 Alltheweb,
Google, Teoma
Use a metasearch engine- Vivisimo
 Practice using AND, OR, phrase, field