Stop Searching and Start FINDING: Strategies for

Download Report

Transcript Stop Searching and Start FINDING: Strategies for

Stop Searching and Start
FINDING: Strategies for
Effective Web Research
a presentation by
Ken Wiseman
& IMSA
[email protected]
Our goals today ...

Discover the biggest mistakes made by most
Internet users:
 Typing search terms in the wrong box.
 Using the wrong tool at the wrong time.

Talk about the differences between directories
and search engines (and when to use each.)
 Learn some advanced Google searching
techniques.
 DO ALL OF THIS IN ENGLISH!
To Start
Put the right request in the right
place
Put Web addresses in the address box
(this is the URL stuff that begins with
http://)
Put search terms (the stuff you
are looking for) in the search
box
The Biggest Mistake
Thinking that search tools are
card catalogs of the web
9 Billion pages reside on the Web
(4/02)
• No search tool indexes all of the
web.
• The largest -Google-indexes less
than 30% of the total (3 billion).
• Each engine indexes a different set
of web pages.
Search Engine Size 3500
3000
2500
2000
1500
1000
500
0
Goo Allth Alta Wise Hotb
Teo
MSN
gle eWe Vista nut ot
ma
NL
Giga
blast
SE 3033 2106 1689 1453 1147 1018 1015 733
275
Search Engine Showdown - 12/02
The Second Biggest
Mistake
Using the wrong tool at the wrong
time
Three questions

Where would you find the telephone number or
address of the Woodfield theatre?
 A telephone book

Where you would find the definition of the word
“pestilence?”
 A dictionary (or in your school yearbook)

Where would you find the name of the war that
the Treaty of Westphalia ended?
 An encyclopedia
What would happen if
you tried to look up the
definition of the word
“pestilence” in the
telephone book?
YAHOO ISN’T A
SEARCH
ENGINE!
... it is a directory.
(but this maybe changing)
Directories

Usually humancompiled guides to the
web, where sites are
organized by category
 Major directories:
 MSN
 Yahoo
 Netscape ODP
Directories
How Internet Directories Work
Vis it Web sites
Internet/Web
Pages
Directory
Employee
Evaluates
Adds &
Catalogs
Directory’s
Directory’s
searchable
Searchable
index
index
Directory’s
Browsable
Categories
receives
info
Directory Server
Searches
Us er Brows es
Us er seeking
information
What directories are good for ...



“What is the Web page address for some
company, organization, or entity?” (or “who
makes product X?”)
“Where can I find a list of Web pages that
focus on a particular, ‘universal’ topic?”
In other words, directories are GREAT for
“telephone book” searches.
What directories AREN’T good for ...


Directories are horrible for
“encyclopedia” or “dictionary” searches.
The only exception is if the topic is so
universal that the directories have no
choice but to link to a page or two that
discuss that topic (and even then the
selection will be slim.)
Search Engines have three parts:
1.
A spider (also called a
"crawler" or a "bot") that
goes to every page or
representative pages on
every Web site that wants
to be searchable and
reads it, using hypertext
links on each page to
discover and read a site's
other pages.
Search Engines have three parts:
2.
A program that creates a
huge index (sometimes
called a "catalog") from the
pages that have been read.
A program that receives your search
request, compares it to the entries
in the index, and returns results to
you.
Directories vs Search Engines


Directories are human-compiled and have a
small number of pages in their databases
(usually in the low millions)
Search engines are machine-compiled and
have a HUGE number of pages in their
databases (usually in the hundreds of
millions or even the billions)
The Second Biggest
Mistake -- Restated
Using a directory as if it was a
search engine ... and then not
understanding why you can’t find
anything!
Top search sites – January 2002
1.
2.
3.
4.
5.
MSN
Yahoo
Google
AOL
Ask Jeeves
LookSmart
7. Infospace
8. Overture
9. Netscape
10. AltaVista
6.
-- Courtesy Jupiter Media Metrix
Which ones are directories?
1.
2.
3.
4.
5.
MSN
Yahoo
Google
AOL
Ask Jeeves
LookSmart
7. Infospace
8. Overture
9. Netscape ODP
10. AltaVista
6.
6 Billion+
Why Use a Search Engine?
We b Pag es Indexed by Directori es
3
M
i 2
l
l
i
o
1
n
s
0
Open Directory
Lo okSm art
MSN
Ya hoo
Directory Nam e
NBCi/Sna p
Britanni ca
go ogle
Secondary results

Most directories use a
search engine as a
backup (Yahoo and
Netscape use Google,
almost everyone else
uses Inktomi)
 Why add the extra
step?
How the sites stack up


Most directories (like
MSN and AOL) link to
2 or 3 million pages.
Most search engines
(like AlltheWeb and
Google) link to
Billions of pages.
-- Courtesy searchenginewatch.com
Why do people
predominantly use
directories when search
engines have more
stuff? Because no one
ever takes the time to
teach us how to use a
search engine!
The Third Biggest
Mistake
Not knowing how to use
directories or search engines to
actually FIND stuff
Search engine rule #1
Be specific ... because if
you aren’t specific, you’ll
end up with a bunch of
garbage!
Preparing to Search






Formulation of the research question
Identification of important concepts within
the question
Identification of search terms to describe
those concepts
Consideration of synonyms and variations of
those terms
Take a look at Vivisimo clustered results for
help
Preparation of the search logic
Search engine rule #2
Use quotes to search for
phrases.
“ken wiseman”
Use quotes for phrases

To search for phrases, just put your phrase in
quotes.
 For example, disney fantasyland
“pirates of the caribbean”
 This would show you all the pages in Google’s
index that contain the word disney AND the word
fantasyland AND the phrase pirates of
the caribbean
 By the way, while this search is technically OK, my
choice of keywords contains a (deliberate) factual
mistake.
Can you spot it?
Arr, She Blows!
Pirates of the Caribbean isn’t
in Fantasyland, it’s in
Adventureland in Orlando
and New Orleans Square in
Anaheim.
 So searching for disney
AND fantasyland AND
“pirates of the
caribbean” probably isn’t
a good idea.

Search engine rule #3
Use the + sign to
require.
Apple+computer
Search engine math:
+ & And
Apple & Computer
Only returns pages with both of
these terms on them
Limits your search
Search engine rule #4
Use the - sign to
exclude.
apple -computer
Search Engine Math:
- & not
Limits your search
Women not History
Only returns pages that contain
one but not the other term on them
Boolean OR
Sometimes the default AND gets in the way.
That’s where OR comes in.
 The Boolean operator OR is always in caps and
goes between keywords.
 For example, an improvement over our earlier
search would be disney fantasyland OR
“pirates of the caribbean”

 This would show you all the pages in Google’s index
that contain the word disney AND the word
fantasyland OR the phrase pirates of the
caribbean (without the quotes)
Three Ways to OR at Google

Just type OR between keywords
disney fantasyland OR “pirates of the
caribbean”

Put your OR statement in parentheses
disney (fantasyland OR “pirates of the
caribbean”)

Use the | (“pipe”) character in place of the word
OR
disney (fantasyland | “pirates of the
caribbean”)

All three methods yield the exact same results.
Search engine math:
OR
Broadens your search
Women or History
Returns every page with either of
these terms on them
OR, She Blows!

Just remember, Google’s
Boolean default is AND

Sometimes the default
AND gets in the way.
That’s where OR comes
in.
How Insensitive!


Google is not case sensitive.
So, the following searches all yield exactly
the same results:
disney
Disney
DISNEY
DiSnEy
fantasyland
Fantasyland
FANTASYLAND
FaNtAsYlAnD
pirates
Pirates
PIRATES
pIrAtEs
Search engine rule #5
Combine symbols as often as
possible
(see rule #1).
+”Martha Washington” –george +revolution
The five rules
1.
2.
3.
4.
5.
6.
Be specific ... because if you aren’t
specific, you’ll end up with a bunch of
garbage!
Use quotes to search for phrases.
Use the + sign to require.
Use the - sign to exclude.
Combine symbols as often as possible
(see rule #1).
Don’t forget OR
Did You Know…



that large chunks of the Web are invisible to
most search engines.
That no one has a good handle on the
magnitude of the invisible web*
That much of the invisible web is of great
value to educators & students
So What?




Would you intentionally exclude large
chunks of the Library of Congress’ 12 million
documents from your searches?
How about the US Census Bureau?
How about health and medical databases?
Many newspapers?
What Today’s Search Tools
Can and Cannot find




Not search tool specific
Search tools were created to handle flat
HTML pages.
When confronted with a search box the
search tool is stopped unless it has specific
instructions on how to handle that input box.
Dynamically created web pages have
unusual URLs
What Today’s Search Tools
Can and Cannot find

The LII page for Automobile
(http://lii.org/search/file/automobiles) is in
Google;

The LII page for Motorcycles
(http://lii.org/search?title=Motorcycles;
query=Motorcycles; searchtype=subject) is
not.
Do you see why NOT??
Simple Examples
Ken Wiseman - 602 Hits
None contain my contact info
But…
Typical Search Pages
Finding Specialized databases in Google…
Adding “searchable
database” reduced the
hits from 14,000 to 140
and added additional
resources not found in
the standard search!
Index of Specialized Databases

Beaucoups
 Listing of specialized search tools
 Most of the information is invisible to general
search engines
 Over 2500 search tools listed.
Other Invisible Web Recovery
sites
Don’t Forget
Portals…
http://www.homeworkspot.com/
http://www.skewlsites.com/
http://www.awesomelibrary.org/
http://www.about.com/