Selling an Idea or a Product - Plattsburgh State Faculty and

Download Report

Transcript Selling an Idea or a Product - Plattsburgh State Faculty and

Unit 3
Web Search Engines
Can You Find the Answers?






Connect to Google
Search for items on Iran
Records ________
Combine Iran with nuclear weapons Rec _______
Combine Iran with the phrase
nuclear weapons
Rec _______
Use Advanced Search:
Combine Iran with the phrase nuclear weapons
so that all the words appear in title of
documents
Records ___
Unit 3
Web Search Engines
How People Search on the Web
 What Are Search Engines?
 How Search Engines Work
 What’s in Search Engines?
 How to Find Search Engines
 Search Basics

Three Ways People Search

Surf
 No
direction, clear idea, issue
 Consult people, news, magazines, Web for ideas

Browse
 Have
some idea, but vague, flexible, unclear
 Consult reference sources, Web directories for
direction, topic, theme

Search in-depth
 Have
defined topic, narrow focus
 Consult databases, search and metasearch engines
What Are Search Engines

Software
 Captures



web sites, pages
Indexes full-text of web page
Provides interface to search web pages
Database
 Large,
billions of pages (unlike directories)
 Computer built (robots, spiders)
 No selectivity, no evaluation
How Search Engines
Work
Spider comb, “capture” web pages
 Software builds database
 Words from web pages “indexed”
 Search interface finds words on pages
 Engine ranks, describes results
 How engines and directories differ

Spiders Comb,
Capture Web Pages
Software decides which web pages to
collect
 Spiders check for updated pages
 Spiders remove dead sites

Spider Software Builds Database
Current web size: over 15 billion pages
 No engine’s database covers it all

 Google
covers 22% (3.3 billion+)
 AlltheWeb covers 21% (3.2 million+)
 HotBot (Inktomi) covers 20% (3 billion+)
 Teoma covers 10% (1.5 billion)
Words from Web Pages
“Indexed”
“Index” means creating lists of words for
database and linking words to web pages
 Some index full text in document
 Some index part of text

 First
100 words in document
 Words in abstract, or title of document

How indexing works affects search
results
Search Interface Finds Web Pages

Provides keyword search box

Offers simple or advanced searching
Offers search options to affect results:

 Most
assume AND between words: Russian mafia
 Most accept “quotes” to search a PHRASE:
“Russian mafia”
 Most allow FIELD searches : ti:Russian mafia

AlltheWeb
Engine Ranks, Describes Results

Software lists most “relevant” items first
 Word


popularity: word repetitions, location
Site popularity – visitations of web site
Link popularity – how often link cited
 Results described
 Few words to a paragraph
 Sometimes stars, other indicators of
relevancy
How Engines and Directories
Differ

Computers vs people
 Engine

spiders not editors select documents
Quantity vs quality
 Engines
big: want all, accept anything
 Directories small: want “best” “important”

Technology vs human judgment
 Engine
software ranks, no human evaluation
Top Search Engines
 Google
 AlltheWeb
 HotBot
 Teoma
(Inktomi)
3.3 billion+
3.2 billion+
3 billion+
1.5 billion+
Directories, Search Engines and
Defaults

If directories find little, they default to engines
 Yahoo
defaults to Google
 Open Directory defaults to Google
 Looksmart defaults to Inktomi (Hotbot)

Some search engines borrow directories
 Google

uses “Open Directory”
Learn the source of information when using a
directory search box or search engine’s
directory
Metasearch Engines
Technologies that search several search
engines at the same time
Pros
Increase results when search engine
produce little
 Save time by searching several engines at
once
 Show results of several engines on one
page

Cons
Retrieve too many hits
 Retrieve less relevant results

 Do
not individualize search syntax for each
engine
 Do
not know whether to use and, AND, +, OR, or,
cannot interpret phrases, etc.

Exclude certain large engines like Google
Top Metasearch Engines
 Vivisimo
Categorizes
results, narrows topics
 Ez2find
Includes
most major engines
 Dogpile
Refines
results, covers major engines
What’s In Search Engines?
Business, commercial information
 Organizational publications
 Government resources
 Magazine, newspaper excerpts
 Some scholarly information

 Teaching

materials, unpublished articles
Books, articles whose copyright expired
What’s Not in Search Engines

Books under copyright
 Most

Journal, magazine, newspaper articles
 Most

current and past research
Reference books
 Most

Fiction, non-fiction in existence
recent, quality publications
In short
 Bulk
of human knowledge and research
How to Find Search Engines
Word of mouth, hearsay
 Newspaper, magazine articles
 Library web pages

 Guides
to search engines
 HyperResearch
Search Basics

Identify, select keywords
 Effects
of internet use on children
 Internet,

children, effects
Combine keywords to focus results
 Use
OR, AND
 Use phrase searching
 Limit search to field like title or URL
Or
Broadens
 Retrieves
an article if it contains
either keyword
 Use to connect similar words
 Use to increase results
OR
Expands Results
 Internet
 Internet
OR Web
 Internet or Web or digital
15
50
90
AND
Narrows
 Use
to connect two different ideas
 AND between keywords means both
terms must be in record
 Use to decrease results
AND
Reduces Results
Children
2,956, 000
 Children AND Internet
1,756
 Children AND Internet AND Homework

 Children
internet homework
26
AND, OR and
Search Engine Syntax

Use help or tips
 AND,
and, OR, or, “+” “-” ?
 Does engine default to AND or OR?
 Do AND or OR have to be upper case?
 Use ADVANCED SEARCH to learn options

Is there a pull-down menu box?
 AND
can mean “All the words”
 OR can mean “Any of the words”
Phrase Searching

Two words in consecutive order
 Juvenile

delinquency Russian mafia
How does computer recognize “phrase”?
 Pull-down
menu: EXACT PHRASE
 Quotation marks: “drug abuse”

Phrase searches reduce hits, improve
relevancy
Russian mafia
 “Russian mafia”

23,234
789
Field Searching

Common document “fields”
 Author,

Title, Subject, Abstract, Text, URL
Limits search to words in particular fields
 Learn

syntax: title: ti: url:
Ti: Russian mafia
url: russianmafia
 Use ADVANCED
SEARCH
 Use pull-down menu (in the title, in the URL)

Reduces hits, improves relevancy
 Russian
mafia (all the words) = 23,234
 Russian mafia (in title) = 254
Can You Find the Answers?






Connect to Google
Search for items on Iran
Records 11,00,000
Combine Iran with nuclear weapons Rec 790,000
Combine Iran with the phrase
nuclear weapons
Rec 428,000
Use Advanced Search:
Combine Iran with the phrase nuclear weapons
so that all the words appear in title of
documents
Records 274
Homework

Use major search engines
 Alltheweb,
Google, Teoma
Use a metasearch engine- Vivisimo
 Practice using AND, OR, phrase, field
searching
