Search Tips - Freedom Forum

Download Report

Transcript Search Tips - Freedom Forum

Search Tips
or with competition with search robots
Inspired by
Mary Ellen Bates’ workshop
“Tips From a Super Searcher: Getting the Most From
the Web and Online Sources”, Prague , 2003.
Toshka Borisova
AUBG Freedom Forum Journalism Library Coordinator
Search Tips
 The World Wide Web contains more
information than any other single resource in
existence today. Finding the information you
are looking for among the billions of web
pages on the web can be tough. This guide of
search tips will have you on the road to
finding information quickly and effectively.
 Web search tips
 The invisible web
19 and 26 June 2003
Toshka Borisova
2
Online Search Strategies
What are you looking for:
 Full text or abstracts?
 Current material or 10 years back?
 Basic or advanced material?
 Short or in-depth articles?
 Any "validating" sources?
 Exact match or something close?
 Leads to identify experts to call?
 White papers ( White Papers contain an official set of proposals in
specific policy areas), statistics and other info more likely to
be on web sites?
19 and 26 June 2003
Toshka Borisova
3
Online Search Tips
 Use "advanced search" option
 http://www.aubg.bg/library/text.php?i=680
 Google Well known as the "king of search," this engine
has one of the largest databases of web pages in the
world. Fast, accurate results are common here and
chances are good that if you can't find it in Google, it's
not meant to be found.
19 and 26 June 2003
Toshka Borisova
4
Online Search Tips
 Plan on two separate search sessions
 Be sure to value your time
White Paper on the true cost of searching
the open web vs. the professional online
Services www.factiva.com/infopro/BusIntellletter.pdf
 Assume you will find something
 We have higher relevance expectations
than our patrons
 Watch for what's not online
19 and 26 June 2003
Toshka Borisova
5
Online Search Tips
 Watch for references to "grey literature“
"That which is produced on all levels of government, academics,
business and industry in print and electronic formats, but which is
not controlled by commercial publishers."
 Include www or http in your search strategies
to find mentions of web sites
 Always use several tools for the same search
 Watch for alternate spellings and
phrasings
 Use same words in different order
19 and 26 June 2003
Toshka Borisova
6
Web Search Tips
 Use tools, not search engines. There is
absolutely no pattern
 Wayback Machine
http://www.archive.org/
 Purge your "assumptions cache" regularly
 Keep a trail of where you have been
 Be sure to value your time
19 and 26 June 2003
Toshka Borisova
7
Web Search Tips
 When exploring a site, use the Site Map or Site Index
 Use the [Search This Site] feature to find hidden





pages
Know the "power tools" of each search engine
Field searches
File-type searches
Limits by date, language, site
Truncation
Boolean
19 and 26 June 2003
Toshka Borisova
8
Search Tips
 Keyword Search
Many search engines by default offer a keyword search
 Phrase Search.
 Boolean Operators
Named after mathematician George Boole, Boolean logic
involves the operators AND, OR, NOT, and occasionally NEAR
19 and 26 June 2003
Toshka Borisova
9
Online Search Tips
Keyword Search
 Use KWIC (Key Word In Context)
Try to find synonyms, acronyms
http://www.keyworddensity.com/
http://www.wordtracker.com/
 Search for key words in title
 Use the "at least X times" feature
DJI/Factiva, LexisNexis, Dialog:
19 and 26 June 2003
Toshka Borisova
10
Web Search Tips
Phrase Searching
Requires the terms to appear in the exact order that
they are typed. Most systems that allow phrase
searching have the user enter the phrase in quotes.
"national endowment for the arts"
 Phrase Searching”- Supported by all
 Google - Phrases may not be on page
 Teoma- “Not always exact matches” (FIXED)
 Openfind Debuting in beta form in July 5, 2002
Openfind is a new, large independently-built search engine, initially
claiming 3.5 billion pages. It is based on research in Taiwan and has a
Chinese version as well. None available now
19 and 26 June 2003
Toshka Borisova
11
Web Search Tips
Boolean operators
 Just use it wisely
– Simple ANDs, ORs
– Narrows results
 Boolean NOT ( - )
– Exclude meaning
– Exclude domains
19 and 26 June 2003
 Boolean OR
Toshka Borisova
– Crucial synonyms
– Need more pages
12
Web Search Tips
To OR or not to OR
 Google: OR in CAPS, advanced
– Does not always work right
– yellowstone bison OR buffalo
 AlltheWeb: use ( ) or Advanced Boolean Box
– yellowstone (bison buffalo)
 AltaVista: normal
– yellowstone AND (bison OR buffalo)
 Gigablast: Use + (but not the same)
– +yellowstone bison buffalo
 Teoma
– yellowstone bison OR buffalo
– Becomes(yellowstone AND bison) OR buffalo
19 and 26 June 2003
Toshka Borisova
13
Web Search Tips
 Proximity
– Text matching
– citation hunt
– plagiarism check
– Q&A
 NEAR and Other Proximity
– AltaVista only
19 and 26 June 2003
Toshka Borisova
14
Web Search Tips
 Truncation
Searches for variants of a word by using a symbol to
represent one or more characters. The most common
symbols are * (asterisks), ? (question marks), and !
(exclamation marks). If truncation is not supported by
the search engine use the Boolean operator OR to
combine like terms.
– AltaVistaTruncation
 HotBot & MSN Truncation
 Another term ”Stemming”: MSN (e.g., find "movies" if your
search word is "movie")
19 and 26 June 2003
Toshka Borisova
15
Web Search Tips
 Case Sensitive (
alaskan pipeline- with the incorrect lowercase "a")
– AltaVista Advanced or Quoted Simple
– MIT vs. mit or IT vs. it
19 and 26 June 2003
Toshka Borisova
16
Web Search Tips
 Wild Card Word in Phrase
Wild Card characters represent undefined letters or numerals in a
search term. Wild Card characters allow for retrieval of:
- Singular and plural word forms
- Spelling variations (e.g., British/American spellings)
- Word stems with prefixes and suffixes
* - Represents zero to any number of characters at the
beginning or end of a term. *GROW* - Possible Retrievals
GROW , GROWS, OUTGROWTH
? - Represents exactly one character within a term...
T??TH TEETH, TOOTH, TRUTH
...or one character at the end of a term AMIN? AMINE , AMINO
19 and 26 June 2003
Toshka Borisova
17
Web Search Tips
 Field Searching
Fields searching allows the searcher to designate
where a specific search term will appear. Rather than
searching for words anywhere on a Web page, fields
define specific structural units of a document. The title,
the URL, an image tags, or a hypertext link are
common fields on a Web page.
 How search engines work
Spidering program - Collect links
Indexing program - Include metatags
Search/retrieval program - Sort results
19 and 26 June 2003
Toshka Borisova
18
Web Search Tips
 Link Searching
Pages include a link to the specified URL.
Link Updates, Impact Analysis
- Best at AltaVista, AlltheWeb
– Can have different results for
http://www.name.org/
Example: http://www.freedomforum.org/ - finds pages with links to
this site
 Title:searching will look for the word 'searching' in the
title of a Web page. Hits have the term(s) in the HTML
title element.
title: "search engines”
19 and 26 June 2003
Toshka Borisova
19
Web Search Tips
Field Searching
 IP: Page is the specified IP range. Incomplete numbers
are truncated.
ip:216.32.120 finds any computer in 216.32.120.*
 Site: Results are only from the specified site.
site:nasa.gov - finds pages at NASA's Web site
 Suburl: Pages have the term(s) somewhere in the URL
(host name, path, or filename).
suburl:searchenginewatch
 URL: Result must be exactly this URL and nothing
else.
url: www.slashdot.com/index.html
19 and 26 June 2003
Toshka Borisova
20
Web Search Tips
– Field Searching
title:
intitle:
url:
inurl:
site:
link:
anchor:
image:
19 and 26 June 2003
AltaVista, AlltheWeb, HotBot, Lycos,
Gigablast
Google Google, Teoma
AltaVista, AlltheWeb, Lycos, Gigablast
Google, Teoma
AlltheWeb, Gigablast, Google, Teoma
AltaVista, Google, AlltheWeb, HotBot,
Gigablast
AltaVista
AltaVista
Toshka Borisova
21
Web Search Tips
 Selected Limits
Usually on advanced search form
Language:
At most, languages vary
Date: AlltheWeb, AltaVista, Google, Inktomi
– Cut out old material,
focus search
standard
– Or to find old information
File Type: AlltheWeb, AltaVista, Google, Inktomi.
PDFs at all, Flash at AlltheWeb,
Media Type: HotBot, MSN, AlltheWeb
Page Size: AlltheWeb
IP Range:AlltheWeb
19 and 26 June 2003
Toshka Borisova
22
Web Search Tips
 Diacritics: é
Does e find é? - Sometimes
 Not at Google
– Exact match on diacritics only
 At other search engines
– e usually finds e OR é
é usually finds only é
Use English equivalents for special letters and
omit diacritics
19 and 26 June 2003
Toshka Borisova
23
Web Search Tips
Counting Complexities
 Search Engines Can’t Count
Only the big search engines count, top10 search engines
 Numbers constantly change
– From one page of results to the next
– From one minute to the next
 Try reloading for more
19 and 26 June 2003
Toshka Borisova
24
Web Search Tips
Feature Inconsistencies
 Databases Changes
– Constant
– If they don’t . . .
• They get old, out-of-date, dead links
– Size Changes Often Sudden
– Database Reversions
– Searching Failures And Other Unexpected Results
On the Fly Analysis
 Always Question Results
 Evaluate and Compare
 Find one unique, low-posted term
– Use for search engine comparisons
– Evaluate change over time
 “On-the-Fly Search Engine Analysis.” ONLINE 23(5):63-66, Sept.
1999. onlinemag.net/OL1999/net9.html
19 and 26 June 2003
Toshka Borisova
25
Web Search Tips
CEO - Search Engine Optimization
 SearchEngineShowdown.com
More on Advanced Features
Feature Chart
Detailed Reviews
 Search Engine Watch
http://www.searchenginewatch.com/facts/ataglance.h
tml
19 and 26 June 2003
Toshka Borisova
26
Inconsistencies
Low Recall or "I am not finding any sites on my
topic!!"
 Have I chosen the correct database?
 Have I been too specific in formulating the search?
 Have I included all possible terms and word forms? Should I
use truncation?
 Was Boolean logic used correctly?
 Did I make a technical error, e.g., spelling, or command
syntax?
Low Precision or "I found hundreds of citations and
many are not on my topic!!"




Delete less specific synonyms and ambiguous terms
Search fewer fields e.g., just the title field or URL
Add additional facets with AND or NOT
Add restrictions, e.g., date of publication
19 and 26 June 2003
Toshka Borisova
27
The Invisible Web
What is it?
It consists of searchable information resources whose
contents cannot be indexed by traditional search
engines.
 Content in databases
 Professional online services
 Non-ASCII files
 Sites that require log-in or registration
 Real-time information
 Dynamically-created web pages
 Discussion forums and BBSs
19 and 26 June 2003
Toshka Borisova
28
Searching the Invisible Web
 Much "invisible" content has a
"visible web" front
 Some databases are opening up
Google searches PDF, XLS, RTF, DOC files
19 and 26 June 2003
Toshka Borisova
29
Searching the Invisible Web
 Use directories and portals
-Open Directory Project http://www.dmoz.org is the
largest, most comprehensive human-edited directory of
the Web. It is constructed and maintained by a vast,
global community of volunteer editors.
-Librarian’s Index to the Internet http://www.lii.org
-Subject-specific directories http://www.econ.bg
 Experts and info pros watch for this material
Experts.com www.experts.com
A reliable and diverse source of experts, many of whom
are outside the academic arena.
 Yahoo - http://groups.yahoo.com/
 Search for database or forum along with subject terms
19 and 26 June 2003
Toshka Borisova
30
Searching the Invisible Web
 Use meta-search engines
 DogPile.com
 MetaCrawler.com
 Use Teoma.com's "Experts' Links“
 Scan the libraries of relevant
discussion groups
 Lurk on lists
19 and 26 June 2003
Toshka Borisova
31
Searching the Invisible Web
 Use reverse link look-up to find "more
like this"
 Google and Alta Vista:
link:www.BatesInfo.com
 HotBot: http://www.hotbot.com/
link:www.aubg.bg/fforum - use [Links to this URL]
19 and 26 June 2003
Toshka Borisova
32
The Invisible Web
Invisible Web Directories
 http://www.invisibleweb.com/
The InvisibleWeb Catalog™ contains over 10,000 databases and
searchable sources that have been frequently overlooked by traditional
searching.
 CompletePlanet.com Contains 103 searchable databases
 DirectSearch Difficult to use but extensive
 http://www.internets.com/They have assembled the
largest filtered collection of useful search engines and newswires
anywhere on the World Wide Web. There are 1-2 billion documents, on
the "surface web". The deep web is estimated to be approximately 500
billion documents.
 Good hierarchy of databases
19 and 26 June 2003
Toshka Borisova
33
Web Search Tips
Set aside one afternoon every two
weeks for your web reading !!!
More info
http://www.BatesInfo.com
19 and 26 June 2003
Toshka Borisova
34