Transcript ird9-1_aia
Internet Resources Discovery (IRD)
Intelligent IRD
1
T.Sharon-A.Frank
Motivation for Intelligence
“We are drowning in information but starved of knowledge
“
John Naisbit
2
T.Sharon-A.Frank
Content
• Classical IRD characteristics and the
Information food chain
• Agents - Softbots family
• Meta SE - Metacrawler
• Homepage finder - Ahoy!
• ILA – Internet Learning Agent
• Shopbot – Jango et al.
See Oren Ezioni’s Web site at:
3
http://www.cs.washington.edu/research/projects/WebWare1/www/softbots/so
T.Sharon-A.Frank
ftbots.html
Classical IRD Characteristics
• Massive memory and network resources
required.
• Amortized over millions of queries per day.
• Minimal cycles devoted to each individual.
• No memory of previous requests.
• Least common denominator service.
No Time for Intelligence!
4
T.Sharon-A.Frank
Classical Information Food Chain
5
T.Sharon-A.Frank
Intelligent Information Food Chain
6
T.Sharon-A.Frank
Definition: Softbots
• Softbots are intelligent agents that use
software tools and services on a
person’s behalf.
• Make intensive use of artificial
intelligence (AI) techniques:
planning, scheduling, learning, etc.
7
T.Sharon-A.Frank
Softbot Family Tree
BargainFinder
Rodney
Sims
Simon
MetaCrawler
InfoManifold
Occam
Ahoy!
8
T.Sharon-A.Frank
ShopBot
ILA
General problems to be solved
• Discovery
– How to find new information sources (IS) ?
• Extraction
– What to send and how to parse the response ?
• Translation
– How to interpret the response in terms of internal
concepts ?
• Evaluation
– How to evaluate the quality of IS ?
9
T.Sharon-A.Frank
Main Focus of the Robots
Discovery, Evaluation:
Extraction:
Ahoy!
Translation:
10
Metacrawler
T.Sharon-A.Frank
ILA
Meta Search Engine
MetaCrawler
Yahoo
11
Web Crawler Open Text Lycos
InfoSeek
T.Sharon-A.Frank
Inktomi
Galaxy
Excite
Search Service - Motivation
1. The number and variety of Search services.
2. Each service provides an incomplete snapshot of Web.
3. Users are forced to try and retry their queries across
different indices.
4. Each service has its own interface.
5. Irrelevant, outdated or unavailable responses.
6. There is no time for intelligence.
7. Each query is independent.
8. No individual customization.
9. The result is not homogenized.
12
T.Sharon-A.Frank
The Web Community Demands
• Robustness
– A working system, accessible 24 hours a day.
• Speed
– Transmitting useful information within seconds.
• Added Value
– Any increase in sophistication had better yield a
tangible benefit to users.
13
T.Sharon-A.Frank
Premises of MetaCrawler
• No single search is sufficient.
• Problem in expressing the query.
• Low quality references can be detected.
14
T.Sharon-A.Frank
MetaCrawler
15
T.Sharon-A.Frank
MetaCrawler is a Meta-Service
• It doesn’t use a database of its own.
• It uses other external search services that
provide the information necessary to fulfill user
queries.
16
T.Sharon-A.Frank
MetaCrawler Advantages
• It access multiple databases and provides large
number of higher quality references.
• It does not depend upon the implementation or
existence of any specific search service.
• It access the search services simultaneously.
• Users need not remember the address,
interfaces, … of each search service.
17
T.Sharon-A.Frank
How It Works?
18
• It currently accesses a few services: InfoSeek,
Lycos, WebCrawler, Yahoo, etc.
• It submits a query to every search service it
knows in parallel.
• It collates the results by merging all hits
returned.
• It has a sorting and verify option.
• It presents a results page consisting of a list of
references.
T.Sharon-A.Frank
Meta-Search
• http://www.metacrawler.com
19
T.Sharon-A.Frank
Meta Search Results
20
T.Sharon-A.Frank