ITIS 1210 Introduction to Web
Download
Report
Transcript ITIS 1210 Introduction to Web
ITIS 1210
Introduction to Web-Based
Information Systems
Internet Research One
Introduction
Internet
Global network of interconnected networks
Infrastructure supporting W W W
World Wide Web
Repository of information
Typically in an HTML format
Accessible via browsers
Introduction
Background:
The Internet literally spans the planet
The W W W contains information stored on
millions of computers
Problem:
How do you find the right information in such
a vast storehouse?
Introduction
What do you need to know?
What search tools are available
How to create an Internet search strategy
How to select the proper search terms
Simple
Complex
How to perform a basic search
How to analyze the results
How to cite online resources properly
Search Tools
Four types
Search engines
Metasearch engines
Subject guides
Specialized search tools
No tool searches the entire Internet!
Some are better than others for certain
uses
Search Tools
Search
Engines
Metasearch
Engines
Specialty
Search
Tools
WWW
Subject
Guides
Internet
Search Tools
Surface
Web
Deep
Web
Search Engines
Subject Guides
Specialty
Search
Tools
Unsearched “deep” Web
500 times larger than the
“surface” Web
Search Tools
Search
Engines
Metasearch
Engines
Specialty
Search
Tools
WWW
Subject
Guides
Internet
Search Engines
Locate Web pages based on keywords
Keyword
Nouns, verbs
Describe page in terms of major concepts
Spider programs
“Crawl” the Web
Return results
Results indexed
Search Engines
Index consists of
Keyword
Link to Web page containing that keyword
Precise process but
Narrow view of Web
Can become outdated
Only works for specific content
Slow
Remember: no engine searches the entire Web
Metasearch Engines
Enter keyword(s)
Results are links to Web pages
Source is not Web itself but other search
engines
Useful for finding highest ranked results
from multiple search engines at once
Duplicates can be eliminated
Ranked by relevancy
Metasearch Engines
Not an optimum source
Subject to timeouts
Retrieve only top 10-50 hits from each site
May not have advanced search features
May exclude major databases
Then there’s the “haystack problem”
“Just what are you looking for, anyway?"
Metasearch Engines
A known needle in a known haystack
A known needle in an unknown haystack
An unknown needle in an unknown haystack
Any needle in a haystack
The sharpest needle in a haystack
Most of the sharpest needles in a haystack
All the needles in a haystack
Affirmation of no needles in the haystack
Things like needles in any haystack
Let me know whenever a new needle shows up
Where are the haystacks?
Needles, haystacks -- whatever
Search Tools
Search
Engines
Metasearch
Engines
Specialty
Search
Tools
WWW
Subject
Guides
Internet
Subject Guides
Hierarchically organized directories
User navigates through hierarchy to find
relevant links
Good for a broad view of a topic
Can follow hierarchy to narrow search results
Typically prepared by hand
May be academic, professional, commercial
May provide keyword searches of their
database
Specialized Search Tools
Useful in finding information “invisible” to
traditional search engines/subject guides
Data in “deep” Web usually stored in
Proprietary databases
Specialty directories
Newsgroups
Reference sites
Specialized Search Tools
Access to these resources may be
restricted to
Authorized users
Subscribers
Intelligent search agents
Specialized software
Queries these sites
Googlewhacks
Two words that yield exactly one hit
Example:
Antimatter easterness
Internet Search Strategy
Effective searching is planned searching
Most users do what?
Enter a single keyword
Get back thousands (millions!) of hits
Too many hits is just as bad as too few
What’s needed?
Effective
Efficient
Internet Search Strategy
Seven steps to
Finding the information you need
In a timely manner
Define topic and initial
keywords
Locate background
information and identify
additional keywords
Choose proper search tool
Translate questions into
effective search query
Perform search
Evaluate results
Note information for citation
YES
Satisfied?
NO
Start over
Internet Search Strategy
Define your topic, note keywords
What do you want to end up with?
Write down your topic
What are the keywords?
Might be phrases too
Try to identify all the necessary concepts
Internet Search Strategy
Background information & additional
keywords
Look up the topic somewhere else
Encyclopedia, periodical, reference source
Add important keywords to your list
Ask someone who might know
Check spelling!
Internet Search Strategy
Choose search tool
Web is a big place
Lots of tools available
Not all tools are good for every query
Some rules:
For specific content use search engines or
metasearch engines
For broad concepts use subject guides
If all else fails use specialized search tools
Internet Search Strategy
TOOL
BEST FOR SEARCHES
WHERE?
HOW TO SEARCH
EXAMPLE
TOOLS
Search
Engines
General or
specific
Own indexes
compiled from data
gathered from Web
Enter keywords, phrases, GOOGLE
complex searches
ALTAVISTA
Metasearch General or
Engines
specific
Indexes of multiple
search engines
simultaneously
Enter keywords, phrases, LXQUICK
complex searches
VIVISIMO
Subject
Guides
More
general
Own files or database
Click through subject
categories (may allow
keyword searches)
Specialized
Search
Tools
More
specific
“Invisible” databases,
directories, reference
sites, newsgroups
Enter keywords, phrases,
complex searches
LII.ORG
VLIB.ORG
Internet Search Strategy
Create effective search query
Query
Word(s)/phrases/symbols that a search engine can
interpret
Effective query
Keywords that best describe the topic
Internet Search Strategy
Perform search
Engines use search forms
Fields where you enter information about your
search
(Subject guides usually require actively
clicking on links to navigate the hierarchy)
Internet Search Strategy
Evaluate search results
Results are typically a list of links that match
your query
Quantity, quality, format vary from engine to
engine
Evaluation based on some criteria you select
Source of Web page
Currency of information
Past experience
Internet Search Strategy
Refine search
Results may be too broad
Quality/quantity may not be optimum
Possible steps
Refine query
Different keywords/phrases
Use a different tool
If still not satisfied re-evaluate keywords
Too specific?
Too obscure?
Identifying Keywords
A topic is usually too broad to be useful
Must identify the major elements of your
topic
What best describes the topic?
What makes this topic different from similar
ones?
These are the keywords
How should you create a list of keywords?
Identifying Keywords
Write a sentence or two that summarizes
your topic
I want to find Web sites
about alternative energy
Identifying Keywords
Look for uncommon words that are unique
to this topic
I want to find Web sites
about alternative energy
You’re looking for words that a search tool
can use effectively
Identifying Keywords
Identify words that
You expect will appear on a Web page
That will be useful to your research
Not all words will be useful
Some words will appear in all sites
And, or, the, etc.
Look for unique words closely related to your
topic
That won’t be found on sites you aren’t interested in
Identifying Keywords
Parts of speech
Articles
Conjunctions &
Prepositions
Adjectives &
Adverbs
Pronouns &
Verbs
Examples
a, an, the
and, or, but, in, of, for, on,
into, from, than, at, to
quick, fine, happy, as, also,
probably, however, very
this, that, these, those, is, be,
see, do
Identifying Keywords
If necessary:
Define keywords
Find background information on your topic
Useful if this topic is new to you
Use:
Encyclopedias
Dictionaries
Identifying Keywords
Example:
Dictionary says alternative energy is from
non-fossil fuels
Solar and wind are examples
Encyclopedia says water and geothermal
energy can be used as power sources
So can something called biomass
Identifying Keywords
Updated list of keywords is now:
Keywords
alternative
energy
solar
wind
water
biomass
geothermal
Identifying Keywords
Identify synonyms
Words that have same or nearly same
meaning
Why?
Web pages are created by individuals
They won’t all use the same words to
describe every topic
An expanded list should be broad enough to
include Web pages indexed under a variety of
similar terms
Identifying Keywords
Keywords
Synonyms & Related Terms
alternative
renewable, sustainable
energy
power
solar
panels, photovoltaic
wind
turbines, windmills
water
hydropower, hydroelectric
biomass
waste-to-energy, bioenergy
geothermal
heat, pumps
Basic Search
Search engines go about searches
differently
Spiders don’t always crawl the same parts of
the Web
Therefore they return different results
Using one engine restricts you to only those
parts of the Web indexed by that engine
Also, different ranking algorithms are used
So relevancy may be different for different engines
Basic Search
Check out the engine’s Help section
before starting
An optimum search with one engine might
not be optimum for another
Comparing results from different engines
can be useful
Basic Search
Some engines accept sponsored
payments to rank results
Sometimes they indicate this, sometimes
not
Check out
http://searchenginewatch.com/
Basic Search
Try it: search for “solar energy”
http://search.aol.com/
www.google.com
www.ask.com
Using Phrases
Sometimes a single word is too broad
Sometimes word order is important
Phrase searching
Usually accomplished by placing keywords
within quotation marks:
“solar energy”
Using Phrases
Search for: solar energy
Search for: energy solar
Web pages
without both
words
Two-word searches
have identical results
Web pages with
both words
Using Phrases
Search for: “solar energy”
Web pages
without both
words
Phrase search has
different results
Web pages with
both words
Web pages with
the phrase
Using Phrases
How do you specify phrase searching?
Usually with quotation marks
Sometimes not necessary
Drop-down menu
Check box option
Look at Advanced link if there is one
For example, Yahoo
Analyzing Results
A word about domains
Academic sites - .edu
Commercial sites sell or advocate - .com
Professional or organization - .org
Government - .gov
Other countries - .uk or .au
As you scan results pay attention to
domain names in the Web pages returned
Analyzing Results
Look for your search terms in the results
Number of times a keyword shows up may
indicate its relevance
Google displays keywords in bold
Proximity may indicate relevance
Keyword in URL
Decipher the URL
Mnemonic URLs contain the keyword
Easy to remember the URL
Analyzing Results
Note result ranking
Many engines rank order results
Mathematical formula or algorithm
Different engines use different algorithms
Most relevant sites displayed first
Generally, the best results are within three or
four pages of the top
Google’s I’m Feeling Lucky button
Analyzing Results
Did results return a directory or subject
guide?
If so, it’s probably relevant
Does the engine use cached pages?
Links break, pages go down
Previous versions of pages usually still
maintained in the index
Useful in finding newer version
Analyzing Results
Navigate within results
How many hits did you get?
Search within to narrow results
Citing Online Resources
Two reasons to cite resources:
So you can find them again if you need to
To avoid plagiarism
Everything on the Web is copyrighted
Must have authors permission to use
material for profit
Citing Online Resources
Fair Use exemption to copyright law
Students & researchers
Use small amounts
Educational purposes
No permission required
Must always give credit for work not your
own
Citing Online Resources
Some schools/professors require specific
citation formats
Find out which is needed for your situation
Citations have a format
Maintains consistency
Recognizable and easy to understand parts
Two basic kinds
MLA – Modern Language Association
APA – American Psychological Association
Citing Online Resources
MLA citation elements
Author
Web page title
Web site title
Date Web page was created or last revised
Internet address
Date you visited the Web page
Citing Online Resources
MLA citation format
Author last name, author first name.
“Web PAGE title.”
Web SITE title.
Date created or revised.
<Full Internet address>
(Date you viewed the Web page)
Citing Online Resources
Omit items that cannot be located on the
Web page
Author might be a corporation or
organization
Skip the Web page title if you’re citing the
entire site
Don’t underline the URL, remove the
underline if the word processor
automatically includes one
Citing Online Resources
Creation/revision date may be anywhere
on page
Omit this if you can’t locate one
Most browser print functions include the
URL and current date on the hardcopy
Make a note of it in case you don’t print the
page yourself