General & Specialized Search Engines

Download Report

Transcript General & Specialized Search Engines

How Search Engines Work
General Search Strategies
Dr. Dania Bilal
IS 587
SIS Fall 2007
Fun Quiz
Take the search engine quiz located at
http://websearch.about.com/library/quizzes
/search_engine_quiz/blsearchenginequiz.h
tm
Record the no. of incorrect answers
Share the results of the quiz with a
classmate.
How Search Engines Work?
They collect information from selected web sites
The employ special software robots, called spiders,
to crawl web pages
Spiders build lists of the words found in Web sites.

When a spider is building its lists, the spider is Web
crawling.
Spiders store the lists in the engine’s database
The engine’s indexing software builds an index of
words
Information is matched against query input and
retrieved (processing algorithm)
How Spiders and Crawlers Work?
They begin with popular and heavily used
web servers.
They begin with a popular site, collect the
words on its pages and follow every link
found within the site.

Spiders travel across pages and the most
widely used portions of the Web
How Spiders and Crawlers Work?
A dedicated server of URLs is built by a
search engine company (e.g., Google) so
that spiders collect information quickly
More than one spider is used to craw web
pages at a time

Google uses 3-4 spiders and collect over 100
pages per second
How Spiders and Crawlers Work?
When no dedicated URL server is used,
search engine company relies on ISP for
the domain names (translated into
addresses) to use for crawling the web



Delay in gathering information
Delay in updating information
Lack of control over URL addresses
Google Spider and How it Works
A spider looks at the html or xml or other
coding used to build a web page and collects
information from the meta-tags
It indexes words within the actual text of a
page
It indicates where the words were found
(URL, title, headings, etc.)
It disregards initial articles
It disregards pages that should not be
crawled or indexed
Google Spider and How it Works
It uses Robot-Exclusion Protocol in disregarding
pages


Implemented in the meta-tag section at the beginning
of a Web page
Tells a spider to leave the page alone, neither index
the words on the page nor try to follow its links
Franklin, C. How Internet Search Engines Work.
http://computer.howstuffworks.com/searchengine.htm
How Search Engines Store Words
Indexed?
The process varies among engines
Words are stored with no. of times they
appear on a pages (posting)
Weight is assigned to each word.
Words appearing near top of a page may
have more weight than those appearing in
subheadings, in links, in meta tags, in title,
etc.
How Search Engines Store Words
Indexed?
Information is encoded to save space
Information is indexed



An index of words is built by the automatic
indexer (indexing software)
A hash table is created with an assigned
weight or value for each word indexed
Hashing allows for even the distribution of
popular entries (e.g., letter M) with those that
are less popular (e.g., letter X) for quick
retrieval
Using General Directories
Yahoo and its family
Browsing directory


Directory database
Small and human-selected and indexed
Searching using keywords



Search database
Larger and non-selective database
Spider and machine indexing
Yahoo
Yahoo.com



Works like a search engine rather than a
directory
Searches the web
Exercise: search under my name and see
how Yahoo processes query while you’re
inputting information
Directory found under more or at

http://search.yahoo.com/dir
Yahoo Search Engine
Search






Web
Images
Videos
Local information
Shopping
More…
Yahoo Advanced Search
Advanced Search feature



Shown on screen after you perform a search,
or by going directly to
http://search.yahoo.com/web/advanced?ei=U
TF-8&p=dr+dania+bilal&fr=yfp-t-471
Lots of search features to explore
Yahoo Advanced Search Features
Boolean
Phrase
Currency
Domain
File format
Country
Language
Other
Yahoo Advanced Search Features
Exercise


Perform a search on a topic of your choice
Use Boolean equivalents
All the words=AND
The exact phrase=phrase; proximity search
Any of these words=OR
None of these words=Not



Choose part of page to search
Choose language other than English
Report results in class
Yahoo Search Services
For searching specific content area such as













Search Services
Web Search
Find anything from across the Web
Answers
Ask questions and get answers from real people
Audio Search
Find over 50mm audio files from across the Web
Creative Commons Search
Find Creative Commons content that you can share or re-use in your own works
Directory Search
Search or browse Yahoo!'s categorized guide to the Web
Image Search
Find over 1.6 Billion photos and illustrations from all over the Web
Job Search
Search for jobs, post your resume and more on Yahoo! HotJobs
Local
Find everything in your area from dry cleaners to day spas
Maps
Find maps and driving directions for anywhere you want to go
Mobile Search
Find whatever, wherever you are
My Web (Beta)
The newest way to save, share and organize any page you want on the Web
News Search
Search for news stories and related photos, videos and audio clips
Yahoo Next
http://next.yahoo.com/


Cutting edge technology at Yahoo
Blogs, Web 2.0, use of alltheweb, Yahoo
Maps, Podcasts, audio and all other features
that are in Beta testing
Yahoo Preferences
Customize Yahoo to fit your needs
Go to Preferences from the Web search
page
Edit preferences based on your needs
Edited preferences are saved in browser
on desktop
General Search Strategies
in Search Engines
Strategies
Boolean
Boolean equivalents
Proximity and phrase searching
Searching within a field
Search limits
Yahoo Search Strategies
Explore Yahoo’s help page
Read the Search Tips
Read the search limit parameters such as



Intitle:
url:
inurl:
Read how to use Boolean equivalents and
other search parameters
General Search Engines
Besides Yahoo Search
Engines and Information Need
Several general search engines on the
Web
Select engine(s) that best fit your need
Visit the Web Search Guide for latest
information:

http://websearch.about.com/od/generalsearch
engines/General_AllPurpose_Search_Engine
s.htm
Hands-on Activity
Browe the list of general search engines in Web
Search Guide
Explore 4 of the engines listed





Wisenut, Snap.com, Lycos, Exalead
Search under my name in each engine
Compare the results by viewing the first two pages
retrieved
How many overlaps were found among the three
engines
How many unique results were found in each engine
Specialized Search Engines
Web Search Guide has a listing of
specialized search engines
Web companion to the textbook, chapter 3
describes a variety of specialized engines
Explore chapter 3 familiarize yourself with
the engines described
Hands-on Activity
Find the answer or relevant information for
these two queries using an appropriate,
specialized search engine:


Do squirrels hybernate?
Find me a list of foreign-owned companies
based in the U.S., organized by state.