Searching the Internet

Download Report

Transcript Searching the Internet

Searching the Internet
Without a clear search
strategy, using a search engine
is like wandering aimlessly in the
stacks of a library trying to find
a particular book.
-Debbie Flanagan
Two Key Steps for Successful Searching
• You must Prepare your search.
– identify the main concepts in your topic
• You need to know how to use the
search tools available on the
Internet.
– What are these tools?
– How are they different?
– How to best make use of these tools?
What are the search tools?
An
INDEX
• It is not possible to search the WWW directly
– You search a search tool's database (collection of sites). The
search tool provides you with links to the pages you wish to
retrieve.
• It is not possible to search the entire Web
– Every search tool’s database contains a relatively small subset of
the entire World Wide
• 2 categories of useful search tools
– Search Engines
– Directories
To see the accompanying teacher’s notes: select
“view” from the tool bar, followed by “notes page”
Subject Directories
• Called subject "trees“
For example, if you are looking
for the Atlanta Braves
Recreation & Sports
– They start with a few main
categories and then branch out
into subcategories, topics, and
subtopics.
– Many large directories include a
keyword search option which
usually eliminates the need to work
through numerous levels of topics
and subtopics.
Sports
Baseball
Major League
Baseball
Teams
Atlanta Braves
Subject Directories
• Subject directories are created and
maintained by Humans
• Because the web sites are carefully evaluated and handselected, the have quality content
• Trees are most effective for finding
general information
• Directories are useful for finding information on a topic
when you don't have a precise idea of what you need.
• Directories cover only a small fraction
of the pages available on the Web
Subject Directories
• There are 2 types of directories:
– directories for popular web sites
– Looksmart, Yahoo, About.com,
– directories for scholarly subjects
• Infomine, AcademicInfo, Internet Public Library,
Librarian’s Index to the Internet, The WWW Library,
UBC Internat Resources by Academic Discipline
Search engines
• Search engines rely on spiders (robots)
– Spiders are computer programs that crawl the Web and log
the words on each page that they encounter.
– They find new pages following the links in the pages they
already have in their catalogue (i.e., already "know about").
They cannot "decide" to go look up topics.
• Search engines scan their catalogue of web
sites to match the keywords provided. They
return a list of links to web sites containing
the word/s specified.
To see the accompanying teacher’s notes: select
“view” from the tool bar, followed by “notes page”
Search Engines
• Use a search engine to look for something
specific
• Search engine catalogues are usually very
large (They often return thousands of results)
• Different search engines can vary (size,
accuracy, features, and flexibility) so…….
USE MORE THAN ONE (Teoma, AltaVista, Google)
• Avoid using Metacrawlers
To see the accompanying teacher’s
notes: select “view” from the tool bar,
followed by “notes page”
• They search several individual search engines simultaneously,
but idea of meta-searching is much better than the reality.
Comparing
Subject Directories to Search Engines
• Use to find general
information
• Use to look for something
specific
• built by human selection
• built by computer robot programs
• often carefully evaluated and
annotated (but not always!!)
• UNevaluated -- contain the good,
the bad, and the ugly -- YOU must
evaluate everything
• organised into subject
categories, classification of
pages by subjects
–
subjects not standardised and
vary according to the scope of
each directory
• small and specialised to large,
but smaller than most search
engines
• all pages are ranked by a computer
• huge and often retrieve a lot of
information
– For complex searches use ones that
allow you to search within results (subsearching)
– Without search strategies or
techniques, finding what you need can
be like finding a needle in a haystack.
Two Key Steps for Successful Searching
• You must Prepare your search.
– identify the main concepts in your topic
• You need to know how to use the
search tools available on the
Internet.
– What are these tools?
To see the accompanying teacher’s
notes: select “view” from the tool bar,
followed by “notes page”
– How are they different?
– How to best make use of these tools?
Do you REALLY know how to Search the Internet
You have surfed the internet before, but do you
really know how to search?
– Most people simply type a few words into the query
box and then scroll through whatever comes up.
• Sometimes their choice of words ends up narrowing the
search unduly and causing them not to find what they're
looking for.
• More often the end result of the search is a haystack of
off-target web pages that must be combed through.
You can do better than that!
How to Search the Internet
The perfect page is out there
somewhere. It's the page that
has exactly the information
you're looking for and to you
it's beautiful and unattainable
like a faraway star.
If only you had a super-sized net for capturing it!
-Bernie Dodge
4 NETS for better searching
Narrow, Exact, Trim, Search 4 Similar
4 NETS for better searching
Narrow, Exact, Trim, Search 4 Similar
Net 1: Start Narrow
• The WWW contains ~3 billion documents
– Do you want broad/specific (unique term or phrase)
information?
– Are you looking for a narrow aspect of a topic with a
huge web presence?
– Are you searching by title, subject, keyword, author
• If you know what you're after, why not
start by asking for it as precisely as you
can? Start Narrow!
4 NETS for better searching
Narrow, Exact, Trim, Search 4 Similar
Net 1: Start Narrow
KEYWORDS:
Break down the topic of your search into key
concepts. Think of all the words that would
always appear on the perfect page
•The WWW is not indexed in any standard
vocabulary. You must guess:
•what words will be in the pages you want to find
•what subject terms were chosen to organise the
directory covering the topic
•any synonyms, alternate spellings, or variant
word forms for the concepts
4 NETS for better searching
Narrow, Exact, Trim, Search 4 Similar
Net 1: Start Narrow
DISTRACTING WORDS:
• Think of all the distracting pages that
might also turn up because one or more of
your search terms has multiple meanings.
• What words might help you eliminate
those pages?
4 NETS for better searching
Narrow, Exact, Trim, Search 4 Similar
Net 1: Start Narrow
COMBINE KEYWORDS AND EXCLUDE DISTRACTING
using
Boolean operators
= COMBINE
+
WITH
WITH ALL
WITH ANY
AND/OR
alternate spelling/
truncation
= EXCLUDE
WITHOUT
AND NOT
WORDS
NOTE:
•Search engines
express this in several
various ways, which are
all equivalent.
•Boolean operators must
be typed in CAPITALS
•AND is often implied
(default)
4 NETS for better searching
Narrow, Exact, Trim, Search 4 Similar
Net 1: Start Narrow
Examples of Boolean combining operators
• WITH ANY, OR
(OR expands your search results)
• +
(equivalent of AND),WITH
ALL, AND
(retrieves documents containing ALL the words ONLY, restricts your search results)
For example,
• sustainable (agriculture or farming) =
sustainable and agriculture as well as sustainable and farming
• these terms are expressed in the amount of results per search:
genetics < genetic AND databases < genetics OR databases
4 NETS for better searching
Narrow, Exact, Trim, Search 4 Similar
Net 1: Start Narrow
• Alternate spelling/truncation
• Truncate to expand word endings and
results by applying * to the end of the
search word
– you retrieve that word plus all words with
that wordstem which have different endings
• for example:
chem* retrieves chemistry, chemists, chemicals
rivers* retrieves rivers, Riverside, Riverside's
industr* retrieves industries, industry, industrial,
industrialization
4 NETS for better searching
Narrow, Exact, Trim, Search 4 Similar
Net 1: Start Narrow
Examples of Boolean excluding operators
-, WITHOUT, AND NOT are all equivalent terms
Use excluding operators to filter out the undesired web
sites when you have a keyword that has multiple meanings.
For example:
Saturn + planet -car = Saturn the planet rather than Saturn the car
Activity #2
4 NETS for better searching
Narrow, Exact, Trim, Search for Similar
Net 2:Find Exact Phrases
• Typing EXACT PHRASE or the word in between
brackets ( “word”) or the word in between “pipe”
symbols (word), locates only pages in which the:
1-exact phrase, 2-complete phrase contains 3those words together 4-in that order
For example:
• "new mexico”= new must appear next to mexico (#3) to get New
Mexico in that order (#4). It does not retrieve Mexico and New
Zealand or new fashions in Mexico. It retrieves New Mexico but
does not retrieve University of New Mexico (#2).
• crime and |states| finds web sites about crime in individual states
rather than the many more sites about crime in the United States
nation-wide.
4 NETS for better searching
Narrow, Exact, Trim, Search 4 Similar
Net 2:Find Exact Phrases
• Searching for exact phrases is obviously useful for
finding things that have a proper name consisting of
several words (e.g., places, book titles, people).
• It's also useful if wish to locate something you can
not remember
• If you suspect something was plagiarised, or at least
heavily borrowed without attribution? Type in a
phrase or two from the documented see if it turns up
elsewhere!
• Searching for exact matches is also very useful when
trying to evaluate the authenticity and integrity of an
electronic source.
4 NETS for better searching
Narrow, Exact, Trim, Search 4 Similar
Net 3: Trim Back the URL
• Often you'll find a terrific page nestled deep down
inside a folder inside a folder inside a folder. You
suspect that there are other pages you'd find
interesting nearby. How to you find them? Trim the URL
step by step.
You found this Romeo & Juliet WebQuest that you really like. There are more
like that where this one came from. Start here to find them:
http://www.richmond.edu/academics/a&s/education/projects/webquests/shakespeare
http://www.richmond.edu/academics/a&s/education/projects/webquests/
http://www.richmond.edu/academics/a&s/education/projects/
http://www.richmond.edu/academics/a&s/education/
http://www.richmond.edu/academics/a&s/
http://www.richmond.edu/academics/
http://www.richmond.edu/
4 NETS for better searching
Narrow, Exact, Trim, Search 4 Similar
Net 4: Search and look for similar pages
• Searching further
– Use a search engine to find an appropriate
subject specific directory
– Search for the words web +directory and your topic.
• You can also try directory but you may wind up with membership
directory, phone directory, etc.
– Search by title, subject, keyword, author
e.g.: +title:"George Washington" +President +Martha
The above TITLE SEARCH example instructs the
search engine to return web pages where the phrase
George Washington appears in the title and the words
President and Martha appear somewhere on the page.
4 NETS for better searching
Narrow, Exact, Trim, Search 4 Similar
Net 4: Search and look for similar pages
• Searching further
– Searching by domain search, allows you to
limit results to that domain:
e.g.:+domain: uk +title:"Queen Elizabeth"
+domain:edu +"lung cancer" +smok*
– Searching by host URL (address) can narrow very
broad results to web pages devoted to the keyword topic.
e.g: +url:halloween +title:stories url
e.g: http://home.sprintmail.com/~debflanagan/main.html
4 NETS for better searching
Narrow, Exact, Trim, Search 4 Similar
Net 4: Search and look for similar pages
• Use the LINK SEARCH when you want to know
what Web site are linked to a particular site of
interest, or you want to find similar pages.
– Use this tool to find more of a good thing. Use it to find pages
that are linked to a page that you find useful. Chances are,
those pages might be useful to you, too. Researchers use link
searches for conducting backward citations.
• There is also ego surfing
– if you've uploaded a page of your own to a public server and
it's been there for awhile, find out who else is linking to it.
link:www.pepsi.com
link:www.ipl.org/ref/
Searching the Internet
• Prepare your search
• Choose whether you want to use a
search engine or a subject directory
• Start Narrow
– use Boolean operators
– Truncate
• Find Exact Phrases
• Trim Back the URL
• Search and look for similar pages
Examples
• complex carbohydrates
suggestions: (complex -shopping -sports -movie) +
legume* AND (bio* OR nutrition OR metabol*) AND
"complex carbohydrates"complex but not shopping,
sports or movie complex (as center) and legume/s and
something related to the field of
biochemistry/biology or nutrition or
metabolism/metabolites
• The cause of heart disease
suggestions: medic* OR health "web directory”
retieves ( Heart failure, Heart attack, Heart murmur,
Heartburn, Heat stroke and exhaustion. You could
also try with a search engine: (medic*OR health) AND
(web + directory) AND (patholog* OR cause) AND
(heart OR Coronary) AND "heart disease".
Special Thanks to:
Bernie Dodge. “4 NETS for better searching Google.”
http://webquest.sdsu.edu/searching/fournets.htm Last
updated March 12, 2002 (April 1, 2002)
Debbie Flanagan. “ Web Search Strategies”
http://home.sprintmail.com/~debflanagan/main.html. Fort
Lauderdale, FL (April 2002)
Joe Barker. “FINDING INFORMATION ON THE
INTERNET” Library, University of California, Berkeley.
http://www.lib.berkeley.edu/ by Last update 10/3/00.
Server manager: [email protected] (April 1,
2002)
Evaluating Web pages
Assessing the Authenticity and
Integrity of Sources
Tricks for Evaluating Web pages
• What can the URL (address) tell you?
• Who wrote the page? Is the author a
qualified authority?
• Is the page dated? Current, timely?
• Is the information cited authentic?
• Is the page a reliable source?
• What's the bias of the web site?
• Could the page or site be ironic, a spoof?
What can the URL (address) tell you?
• What type of domain does it come from?
• Is it appropriate for the content?
– .com = a commercial business
.edu = an educational institution
.gov or gouv = a governmental institution
.org = a non-profit organization
.mil = a military site
.net = a network site
• Who "published" the page?
• the agency or person operating the “server”
computer from which the document is issued
• Is it somebody's personal page?
• Look for a personal name following a tilde ( ~ ) or or
the word "users" or "people.
E.g: http://home.sprintmail.com/~debflanagan/main.html
Who wrote the page? Is the
author a qualified authority?
•
Look for a name and e-mail.
 Often in the "About us" or "Contact us” section
 Except in personal pages, the author is usually not the same
person as the page designer (person hired to put the content
on the web)
 Are the author's credentials provided? Is he or her a
reliable authority on the subject?
 If there's no credential section, learn what you can by
truncating elements of the address: delete the end
characters of the URL stopping just before each slash (/)
(leave the slash). See if you can find out about the
origins/nature of the site providing the page. Repeat, one
slash at a time...
Is the page dated? Current, timely?
• When was the page updated last? Can you tell
how much was updated? Is this important for
the timeliness of what you want to know?
 Look at Netscape's "Page Info" (right click in the page,
or look under View | Page Info). If this is out of date,
be suspicious of a stale page. If current, be suspicious
that Page Info was changed but nothing else.
• Is the date appropriate for the content? Is it
"stale” information on a time-sensitive or
evolving topic?
 CAUTION: Undated factual or statistical information
is no better than anonymous information. Don't use it.
Is the information cited authentic?
• If the page claims to be from an established
newspaper, journal, organisation, institution,
agency, is it the real one?
 Check if the domain name corresponds to the source.
(gov, org, mil)
 Is it unmodified if it claims to be a
reproduction, of a published piece?
 It is really easy to copy and tinker with the content
of a page and put it back on the web with copies of
the original logo, banners, credits, and other
information. Locate the original to be sure (either
elsewhere on the Web -exact phrase search- or try
the library (ask at a reference desk for help).
Is the information cited authentic?
(continued)
• Is the source of factual or attributed
information well documented?
 Unless a known, reputable publisher or institution
vouches for the content, require for your
research that sources and claims be
substantiated by links to reliable sources or
references (notes or footnotes as in a published
work).
 CAUTION: Standards for footnoting and citing
where people obtained information (and when) are
very lax on the Web. If you don't have the source
and date for attributed information, find it or
don't use the information.
Is the page a reliable source?
 What's the purpose of the page? Why was it
created?
 To Inform? Explain? Facts or data? Persuade, promote?
Sell? Share, disclose? Rant? Entice?
 Who else links to the page? Where is it "cited"?
 Look for awards (or links to an "Awards" page). Do not
take the award at face value without checking it out.
 Use a search engine to see who links to the page.* Then
visit some of those sites to see what they say about the
page in question.
 *precede the URL by the term “link:” e.g
link:www.whitehouse.net
What's the bias of the web site?
• Who sponsors the page? Might the sponsors
have a vested interest in the viewpoint
presented?
 Look for links to "sponsors," "About us,”. Advertisers
can also be sponsors. Could the points of view be
constrained or bent to keep or attract advertisers?
Are there links to other viewpoints? Balanced?
Annotated? Anything not said that could be said? Try
to think of alternative viewpoints, and ask if they are
represented or linked to.
• Look for your own bias:
 Are you being completely fair? Too harsh? Totally
objective? Requiring the same degree of "proof" you
would from a print publication? Is the site good for
some things and not for others? Are your hopes biasing
your interpretation?
Could the page or site be ironic, a spoof?
 Think about the "tone" of the page.
 Humorous? Parody? Exaggerated? Overblown
arguments? Outrageous photographs or
juxtaposition of unlikely images? Arguing a
viewpoint with examples that suggest that
what is argued is ultimately not possible.
 Perform all the other tests above on the page,
and, if you do not find other information to
explain the tone, question whether the page is
an irony that you might feel foolish to cite as
if it were factual or straightforward.
ALWAYS, ALWAYS,
ALWAYS
BE CRITICAL!
Copyright (C) 1995-2001 by the Library, University
of California, Berkeley. All rights reserved.
Document created & maintained on server:
http://www.lib.berkeley.edu/ by Joe Barker
Last update 26Nov2001. Server manager:
[email protected].
Types of On-line References
• The World Wide Web (WWW)
– (web sites)
• Electronic Publications and Online Databases
• Publications available through library, medline
• Online Reference Sources
– Online dictionnaries, online encyclopedias
The World Wide Web (WWW)
Scientific Style
• Give the author's last name and initials (if known) and the date
of publication in parentheses.
• Next, list the full title of the work, capitalizing only the first
word and any proper nouns; the title of the complete work or
site (if applicable) in italics, again capitalizing only the first
word and any proper nouns; any version or file numbers, enclosed
in parentheses
• the protocol and address, including the path or directories
necessary to access the document
• the date accessed, enclosed in parentheses.
Burka, L. P. (1993). A hypertext history of multi-user dimensions. MUD
history.
http://www.utopia.com/talent/ lpb/muddex/essay (2 Aug. 1996).
Electronic Publications and Online Databases
Scientific Style
•
List the author's last name and initials
•
the date of publication, in parentheses
•
the title of the article or file and, enclosed in parentheses, any
identifying file or version numbers or other identifying information (if
applicable)
•
the title of the electronic database, in italics
•
the name of the online service, in italics
•
access information or the protocol and address and any directory
paths
•
the date accessed in parentheses,
Warren, C. (1996). Working to ensure a secure and comprehensive peace in
the Middle East (U.S. Dept. of State Dispatch 7:14). FastDoc. OCLC
(File #9606273898). (12 Aug. 1996).
Online Reference Sources
Scientific Style
•
Give the author's last name and initials
•
the publication date (if known and if different from the date
accessed); and the title of the article
•
cite the word "In," followed by the name(s) of the author(s) or
editor(s) (if applicable) and, in italics, the title of the complete work;
any previous print publication information (if applicable); identification
of the online edition (if applicable)
•
the name of the online service, in italics, or the protocol and address
and the path followed to access the material
•
the date accessed in parentheses
Fine arts. (1993). In E. D. Hirsch, Jr., J. F. Kett, & J. Trefil (Eds.),
Dictionary of cultural literacy. Boston: Houghton Mifflin. INSO Corp.
America Online. Reference Desk/Dictionaries/Dictionary of Cultural
Literacy (20 May 1996).