Class4Search

Download Report

Transcript Class4Search

Exploring the Internet
91.113-021
Instructor: Michael Krolak
Tonight
•
•
•
•
Roll Call
Class Announcements
Searching the Internet
Making your web page
searchable.
• Assignments
“Intelligence is not the ability to store
information, but to know where to find it.“
- Albert Einstein
Follow up from the last class
How you can protect yourself from Identity Theft
• Never enter personal information (Acquired
Characteristics) into a web site that uses only http (as
opposed to https)
• Never send acquired characteristics (except your name)
through the email.
• Unless you encrypt your email, expect that anyone can
read it.
• Always pay close attention to the spelling of the URL
(web address) when paying for anything on line.
• Do not respond to unsolicited emails.
•
Follow Up (cont.)
• Shread all snail mail that contains personal
information (especially credit card offers!!)
• Expect that once you throw something away, you
are giving it to the public.
Feedback on Blogs
• Keep up with your blogs! Minimum 2 blogs
per week.
• Keep the entries of the blog relevant to the
class.
How do we find information?
• Memory
• Media
–
–
–
–
Books
Movies
Music
Art
• Observe
• Ask other people
The Problem with the Internet
• The “Surface Web” contains 2.5 Billion pages.
• Each day 7.5 million web pages are added to the World Wide Web
• Information is submitted to the web without any context or test of
validity
What is a Search Engine?
search engine
n.
1. A software program that searches a database and
gathers and reports information that contains or is
related to specified terms.
2. A website whose primary function is providing a
search engine for gathering and reporting information
available on the Internet or a portion of the Internet.
Source: The American Heritage® Dictionary
Copyright © 2002, 2001, 1995 by Houghton Mifflin Company. Published by Houghton
Mifflin Company.
Examples of Popular Search Engines
•
•
•
•
•
•
www.google.com (Google)
www.av.com (Alta Vista)
www.lycos.com (Lycos)
Search.msn.com (Microsoft)
www.Excite.com (Excite)
www.northernlight.com (Northern Light)
What is a Subject Directory?
subject directory
n.
1. An Internet research tool on the World Wide
Web that organizes Internet resources by
subject headings and subheadings. Subject
directories are usually compiled by human
beings who apply some selection criteria to
resources included in the database.
Examples of Subject Directories
•
•
•
•
•
•
www.yahoo.com Yahoo!
http://bubl.ac.uk/ BUBL
http://www.ipl.org/ Internet Public Library
www.about.com About.com
www.jumpcity.com Jump City
http://www.joeant.com/ Joe Ant
What is a Meta Search Engine?
search engine
n.
1. Meta search engines are search engines that
use their own database as well as sending the
query to many other search engines
simultaneously (called spawning) and report the
unique responses from other search engines.
2. Meta search engines that are limited to only the
web, newsgroups, newspapers, and scientific
journals.
Examples of Meta Search Engines
• Ask Jeeves -- frequently get the answer in the first
pass. Jeeves allows queries in natural language.
• Dogpile -- for its variety of sources (web,
newsgroups, newspapers)
• Ixquick
• Metacrawler
• ProFusion
What is the Deep Web?
• Estimated to be 500 times (1.25 trillion web sites) the size of the
surface web.
What is a spider?
n.
1. An automated program
which crawls over the World
Wide Web, gathering web
pages for search engines.
Spiders will ignore sites that
explicitly state not be indexed
by the search engines.
Also referred to as a
webcrawler, crawler, or bot
What are Meta Tags
meta tags
n.
1. Attributes that describe information about
the content of the document. Some spiders
use these tags to determine the relevance of a
site to future queries.
Example
<META NAME="keywords" CONTENT=“red sox world champions schilling manny damon">
How do search engines work?
What is Boolean Logic?
We use Boolean Logic to evaluate the
truth of one or more propositions. There
are three important operators: AND, OR,
NOT
•AND – only true if A and B are both
true.
•OR - only true if either A or B is true.
•NOT - only true when A is false.
When searching for information, we use
Boolean logic to find results that are
relevant to our search terms. If a web
page is relevant to a search term, the
search engine evaluates the page as true.
Examples of Searching
with Boolean Logic
• Yankees and Choke
– All web pages that contain the
terms Yankees and Choke.
• Yankees or Choke
– All web pages that contain the
word Yankees.
– All web pages that contain the
word Choke
– All web pages that contain the
terms Yankees and Choke
• Choke and not Yankees
– All web pages that contain the
word Choke, but don’t contain the
word Yankees
More Advanced Uses
of Boolean Logic
•
•
•
•
If you are looking for a proper name, a
phrase, or an other collection of words
that normally are found together, then
enclose them in double quotes, i.e.
"President Gerald Ford".
If the web page should have one or more
words that must be on the page, then use
the logical And, i.e. President And Ford And
"United States".
If the web page may have different forms
of the name, or titles, etc. then use the
logical Or, i.e. President Or "Vice President"
Or Representative And "Gerald Ford".
If document should exclude a word or
phrase, then use the logical Not, i.e.
"Gerald Ford" Not "Ford automotive" and Not
"Ford car" and Not "Ford truck".
Other Helpful Hints
• While not Boolean logic, some search engines
allow concepts like -- NEAR and FOLLOWED BY
are also allowed, to indicate the relationship of the
words or phrases other words and phrases.
Normally these relations can be which comes first
or whether the word is within a certain number of
words to the first word. This concept is called
proximity logic.
• Not all search engines use the AND, OR, NOT
notation some like Alta Vista use " +" for AND
and "-" for NOT.
The Way Back Machine
• Frustrated by dead links – there is an answer.
The WayBack Machine at
http://www.archive.org/
• Just fill in the URL of the dead link and the links
history will give the history of the link and allow
you to view the dead link.
Tips for Using Search Engines
• When searching for a large scale
database, it is important to be extremely
precise.
• Avoid using vague or common words that
will only produce millions of pages.
• Read the instructions for each new search
engine you use. There are many different
methods of searching between the search
engines and subject directories.
Finding Audio and Video
• Images.google.com – Good source of images
• www.dogpile.com – One of the few search
engines that provides searches for video.
• www.fazzle.com – Provides limited video and
image searching capabilities