MAKLUMAT DI INTERNET

Download Report

Transcript MAKLUMAT DI INTERNET

WHAT HAVE WE DONE SO FAR?
 Weeks 1 – 8 : various components of an
information retrieval system
 Now – look at various examples of
information retrieval systems




Internet
Digital library
OPAC
Bibliographical systems
WMES3103 :
INFORMATION
RETRIEVAL
WEEK 12
SEARCHING THE WEB
INTERNET
 Different
types
on
information
Internet
 Journal
magazines,
newspapers
 Databases
 Software
 Multimedia
 Organisational
information
of
the
 Different types of Web
sites
• Entertainment
• Business & marketing
• Reference/information
• News
• Personal web sites
Information Source
(http://www.clearinghouse.net)
 Dictionaries & list of
acronyms
 Telephone & email
directory
 Encyclopedia
 Thesauri
 Dictionaries
of
other languages
 Articles
 E-jounal
 Contents pages of
journals
 Directories
 TV & radio
 Newspaper
 etc
WEB
 The Web is a portion of the Internet
 Use of hypertext
 3 methods of searching for information on
the Web
 Use a search engine
 Use a Web directory that classes the sites
by subject
 Use hyperlink
PROBLEMS WITH THE WEB
 Data
a. Distributed data
b. High % of volatile data
c. Large volume
d. Unstructured redundant
data
e. Quality of data
f. Heterogeneous data
PROBLEMS WITH THE WEB
 User’s interaction with the
IRS
a. How to specify a query?
b. How to interpret the
answer provided by the
system?
SEARCH ENGINES
 Single – use crawlers
to find and retrieve
information,
descriptors from own
index database, use
own database,
 ranking,
 (Altavista,Infoseek,
Excite,Goggle,
DirectHit, HotBot)
• Specialised
–
search for specific
information
only
(Thomas, SOSIG,
ERIC)
• Meta – use other
search
engines
concurrently
(Metacrawler,
SavvySearch)
USER INTERFACE
 Query interface
 Complex – uses
command language –
 Basic : a box where
user can type in one
or more words
Boolean
operators,
phrase
searching,
proximity searching,
wild card - example :
HotBot dan Northern
Light
ANSWER INTERFACE
 Lists the 10 most relevant sites by ranking
 Ranking on the index and not on the text
 Information – URL, size, date page was
indexed, page indexed, title and a few lines
from the document or descriptors or a
sentence
 Example : AltaVista, HotBot, Northern Light,
Excite
 Arranged by ranking and relevance
 Too many hits, resubmit query
WEB DIRECTORY
Numerous
search
engines
provide
categorization of subjects
 Also known as catalogs, yellow pages or
subject directories
 Send web sites to the Web directory for
checking and if accepted, it will be
classified and added to the directory
 Example : Yahoo
USER - PROBLEMS
 Unable to search for words
 Unable to find suitable words because do
not understand how system look for the
selected words
 Do not understand proper use of Boolean
operators
EVALUATION OF WEB SITES
CRITERIA
 Accuracy
 Authority
 Objectivity
 Currency
 Coverage
 www.gvsu.edu/library/
TEN C’s FOR EVALUATING
INTERNET SOURCES





Content
Credibility
Critical thinking
Copyright
Citation





Continuity
Censorship
Connectivity
Comparability
Context
 www.uwec.edu/library
/guides/tencs.html
METASEARCHERS
 Web servers which sends
query to a few search
engines, Web directories
and other databases,
collect and collate the
answers
 Example : Metacrawler,
Savvysearch, Copernic
INTERNET
 An information retrieval system
 Has input, process and output
 Has relevance feedback cycle
 Components





Retrieval evaluation
Query language/operation
Text operations
Indexing & searching
User interface