Mastering the Internet, XHTML, and JavaScript

Transcript Mastering the Internet, XHTML, and JavaScript

Mastering the Internet,
XHTML, and JavaScript
Chapter 7
Searching the Internet
Outline











Goals and Objectives
Chapter Headlines
Introduction
Directories
Open Directory Project
Search Engines
Metasearch Engines
Search techniques
Intelligent Agents
Invisible Web
Summary
Chapter 7 - Searching the Internet
2
Goals and Objectives


Goals
Understand and master searching the internet to find
relevant information fast, know what information to
searach for, and how to search for it
Objectives








Subject Directories
Open Directory project
Search and metasearch engines
Search techniques
Intelligent agents
The visible web
The invisible web
Search techniques for the invisible web
Chapter 7 - Searching the Internet
3

7.1
Chapter
Headlines
Introduction
Use directory or search engines to search internet
Directories
 Search a subject tree manually or use its search engine
Open Directory Project
 User rank search results according to their expertise
Search Engines
 How do search engines do their amazing job
Metasearch Engines
 Use multiple search engines at once
Search Techniques
 Search internet more effectively
Intelligent Agents
 Search internet more effectively
Invisible Web
 Search internet more effectively


7.2

7.3

7.4

7.5

7.6

7.7

7.8
Chapter 7 - Searching the Internet
4
Introduction





Search Engines are a primary searching tool to find the
information that we want from the internet
Internet search can be time-consuming
To formulate a strategic search the user must know what
to search for and how to search for it
Search results are generally web pages
Two main tools for searching the internet are:



Directory – it is a subject guide organized by major topics and
subtopics
Search Engines – it is a software that searches the internet
Each of the tool uses database


Directory database is compiled by humans
Engine’s database is generated automatically by software
Chapter 7 - Searching the Internet
5
Directories




Directories are human-powered search engines
A directory organizes information in a hierarchical tree
by subjects
Subjects at the top of the tree are very general and
subjects at the bottom of the tree are specialized
There are two ways to search a directory



Manual – browsing the directory subjects hierarchically
Searching engine – typing the search words in a search field
Example – In Yahoo Directory at http://www.yahoo.com
the user can browse the subjects by clicking on them or
he can use the search field at the top of the directory to
type in the search words, phrases, etc.
Chapter 7 - Searching the Internet
6
Directories
Chapter 7 - Searching the Internet
7
Open Directory Project







Search results from the internet are ranked
Directories use human editors to rank web pages
As the number of pages for a topic increase it becomes
more time-consuming and cost-bearing to rank them
Open Directory Project hands over the ranking system to
the users
Users become editors and evaluate web sites in their area
of expertise
As a result of this project the services have a lot more
content
Visit http://dmoz.org for more information about Open
Directory Project
Chapter 7 - Searching the Internet
8
Search Engines

Examples of some of the important search engines are:
1.
2.
3.
4.
5.
6.



Google
Yahoo
MSN Search
AOL Search
Ask Jeeves
AltaVista
: http://www.google.com
: http://www.yahoo.com
: http://search.msn.com
: http://search.aol.com
: http://www.askjeeves.com
: http://www.altavista.com
The search engines provide comprehensive coverage and
great relevancy
To use the search engine type search string in a search
field and then click search button
Tips on using a search engine are available at
http://www.searchenginewatch.com
Chapter 7 - Searching the Internet
9
Search Engines


Search Engines are crawler based i.e. their listings are
created automatically
A search engine has three important parts
Spider : it is robot computer program to find web pages by
following the links already in their databases
2. Indexer : it identifies the web page content and stores them in
database files of the search engine
3. Searcher : it sifts through the engine’s index to find matches to a
search string and it also ranks the matches
1.




Every result of a search is known as a hit
Title/Frequency method is used to rank the results
Search engines results may not always be relevant
Different search engines produce different results
Chapter 7 - Searching the Internet
10
Metasearch Engines





Metasearch engines perform a multi-engine search i.e.
they search other search engines
Metasearch engine expands the internet search by using
multiple search engines
Metasearch engine skips searching an engine which is
down
Metasearch engines do not have their own database but
use databases maintained by other engines
Examples are:
 http://www.dogpile.com
 http://www.metacrawler.com
 http://www.profusion.com
Chapter 7 - Searching the Internet
11


Search Techniques
Basic search involves typing a search string in a search
field, and it usually gives satisfactory results
Searching guidelines:




Search engines lists best results first
Change search string to get better results
Search string is treated as key words and not an exact phrase
Common Advanced Searching techniques are:







Words and exact phrase
Boolean search – uses boolean operators like AND, OR, NOT
Title search – using web page title
Site search – to limit search to a particular host name
URL search, Link search
Wildcard (fuzzy) search – uses the * symbol
Features search – use special features of search engines
Chapter 7 - Searching the Internet
12
Search Techniques
Chapter 7 - Searching the Internet
13
Intelligent Agents


New searching paradigms are needed to cope with the
explosive growth of online information and databases
Three retrieval paradigms exist:






Statistical – correlations of word counts in documents
Semantic – natural language processing and artificial
intelligence
Contextual – uses thesaurus and encoded relationships
An intelligent agent is a program that gather information
or performs services based on human input
It uses the above three paradigms with other algorithms
Example of intelligent agent is the Spider part of search
engine
Chapter 7 - Searching the Internet
14
Intelligent Agents

Advantages of intelligent agents are:







More intelligent search
Create and update their own knowledge database
Perform tasks quicker
Communicate and co-operate with other agents
Available all the time
Agents are customizable
Agents continuously scan internet for information
Chapter 7 - Searching the Internet
15
Invisible Web





A search engine cannot find every content due to format
problems
Visible web is the result of a search that we can see
Invisible web is the hidden web content
Invisible web is estimated to be bigger than visible web
A user can search the invisible web by using the following
techniques:



Directories (Invisible Web Catalogue)
Databases
Search engines – Google, and AllTheWeb
Chapter 7 - Searching the Internet
16
Summary
•
•
•
•
•
•
•
•
•
Internet is an endless repository of information
Search engines are used to search the internet
Directories maintain hierarchical trees by subjects
Open Directory Project is based on volunteer work by
people and leads to building a superior method of
organizing web content
Google is the most popular search engine among others
Metasearch engines is convenient for searching multiple
search engines
Basic search is enough most of the time
Intelligent agents allow intelligent searching
Invisible web may contain useful information
Chapter 7 - Searching the Internet
17