FindUR: Knowledge Assisted Search

Download Report

Transcript FindUR: Knowledge Assisted Search

Ontological Issues in
Knowledge-Enhanced Search
Deborah McGuinness
AT&T Labs Research
Florham Park, NJ 07932 USA
contact email: [email protected]
FindUR homepage:
http://www.research.att.com/~dlm/findur
FOIS ‘98
Contributors
Tom Beattie
Beth Cataldo
Ihung (Kyle) Chang
Curtis Chen
Lisa Croel
Martha Desmond
Paul Fuoss
Karrie Hanson
Pam Kirkbride
Dave Kormann
Harley Manning
Russ Maulitz
Mark Plotnick
Lori Alperin Resnick
Beth Robinson
Steve Solomon
Outline
•
•
•
•
•
Motivation
Basic Solution
Ontological Issues
Future Work
Conclusion
Motivation
Queries miss relevant documents because:
• queries are naive
• documents do not contain “perfect” content
Solutions
• Augment Documents
- tag all pages with (controlled vocabulary) meta tags
- labor is distributed and must be trained
- approach is unscalable especially if content changes
• Augment Index
- centralized labor cost
- must re-index every time meta tag language changes
• Augment Query (manually)
- requires user training
• Augment Query (automatically)
- no user training or content provider training
- centralized labor cost, no rework needed
FindUR
• Address issues of recall, rank ordering, and
browsing
• utilizing available knowledge
• in a standard search platform
• deploy, test, and maintain on websites
Background Knowledge
Supports:
• Retrievals of previously missed relevant
documents
• More relevant retrievals scored higher than
less relevant documents
• Simple user generation and refinement of
queries
• User expectation setting
FindUR Architecture
Content to Search:
Research Site
Technical Memorandum
Calendars (Summit 2005, Research)
Yellow Pages (Directory Westfield)
Newspapers (Leader)
Internal Sites (Rapid Prototyping)
AT&T Solutions
Worldnet Customer Care
Medical Information
Search Technology:
Content (Web
Pages or Databases
Content
Classification
CLASSIC Knowledge
Representation System
Search
Engine
Domain
Domain
Knowledge
Knowledge
GUI supporting
browsing
and selection
User Interface:
Query Input
Results
(standard format)
Results
(domain specific)
Verity (and
topic sets)
Collaborative
Topic Set Tool
Verity SearchScript,
Javascript, HTML,
CGI, CLASSIC
FindUR improves search by:
• Retrieving previously missed relevant
documents
• More appropriately ordering search results
• Facilitating simple user generation and
refinement of queries
• Setting user expectations about the content
domain
Selected FindUR implementations:
Electronic Yellow Pages: www.quintillion.com/westfield
Event Calendars: www.quintillion.com/calendar/[summit |westfield]
Medical Information (P-CHIP, POS)
Computer Science Research Information
Competitive Intelligence Sites
Staff Augmentation and Vendor Procurement Info
Network Service Realization
Rapid Prototyping Info and Services
Technical Memorandum Access
Online Newspapers
Hometown Cites
Intellectual Capital
Common Site Conditions
Short Document Length
Few related content words per document
Unfamiliar vocabularies
Variability in specificity of documents
Inconsistent or irregular meta tagging
Higher (relevance) value for general documents
over specific documents
Evidence types
Synonyms
Subclasses
Subparts
Products
Companies
Associated Standards
Key People
Issues
Topic Set Inclusion
Topic Exposure
Description Logic Usage
FindUR/Smart Search Benefits
• Retrieves documents otherwise missed
• More appropriately organizes documents according to
relevance (useful for large number of retrievals)
• Browsing support (navigation, highlighting)
• Simple User Query building and refinement
• Full Query Logging and Trace
• Facilitate use of advanced search functions without
requiring knowledge of a search language
• Automatically search the right knowledge sources
according to information about the context of the query
FindUR Future Work
• Topic Set Generation
• Distributed Collaborative Topic Set Building Environment
• Use tagged content to generate candidate topic sets
• Information Retrieval (use clustering to analyze documents and
suggest topic definitions)
• Machine Learning (use query logs as training data)
• Reuse topic sets for different purposes using views of knowledge
• Knowledge Representation Integration
• Use knowledge base to check definitions and determine overlaps
• Expand beyond subclass, instance, and synonym relationships and
incorporate more structured information
• Maintain information about how and when to use topic information
• Maintain descriptions of content sources
• Evaluation and Interface Evolution
• Evaluate on effectiveness of retrievals, relevance ranking, ease of
query refinement, east of content input into category scheme
• Java-based interface for scalability, rapid changing, understandability
AT&T Labs Research Site
• FindUR has a taxonomy of background
information which includes “knowledge
representation” as a sub-category of
“artificial intelligence.”
• The category/sub-category relationships are
displayed in the user interface. Users can
construct queries by simply clicking
categories and sub-categories, invoking
background knowledge in the process.
AT&T Labs Research Site
• With background knowledge the search
returns 696 relevant listings.
• Documents of a more general nature such as
bibliographies and departmental overviews
float higher in the list. Without background
knowledge, a reference manual was the first
retrieval.
General Nature of Descriptions
a WINE
a LIQUID
a POTABLE
general categories
grape: chardonnay, ... [>= 1]
sugar-content: dry, sweet, off-dry
color: red, white, rose
price: a PRICE
winery: a WINERY
structured
components
grape dictates color (modulo skin)
harvest time and sugar are related
interconnections
between parts
General Nature of Descriptions
concept
superconcepts
number
restrictions
roles
value
restrictions
a WINE
a LIQUID
a POTABLE
general categories
grape: chardonnay, ... [>= 1]
sugar-content: dry, sweet, off-dry
color: red, white, rose
price: a PRICE
winery: a WINERY
structured
components
grape dictates color (modulo skin)
harvest time and sugar are related
interconnections
between parts
URLs
FindUR Home Page:
http://www.research.att.com/~dlm/findur
Description Logic Home Page:
http://dl.kr.org/dl
Implemented Description Logic-based systems:
http:/www.ida.liu.se/labs/iislab/people/patla/DL/systems.html
The CLASSIC Knowledge Representation System:
http://www.research.att.com/sw/tools/classic
Deborah McGuinness:
http://www.research.att.com/info/dlm
Observations and Needs
• Users demand “smarter search”, browsing support,
and personalization
• Development demand standard platforms
• Web sites evolve to include information in many
formats
• Content may be dynamically generated
• Content may be of varying form and quality (may
be limited, inconsistent, tagged, formatted, etc.)
• Sites may cater to limited domains
• Background knowledge may be available
FindUR -Knowledge-Enhanced
Online Search
• Address issues of recall, rank ordering, and
browsing
• utilizing available knowledge
• in a standard search platform
• deploy, test, and maintain on websites
Background Knowledge
Supports:
• Retrievals of previously missed relevant
documents
• More relevant retrievals scored higher than
less relevant documents
• Simple user generation and refinement of
queries
• User expectation setting
FindUR Architecture
Content to Search:
Research Site
Technical Memorandum
Calendars (Summit 2005, Research)
Yellow Pages (Directory Westfield)
Newspapers (Leader)
Internal Sites (Rapid Prototyping)
AT&T Solutions
Worldnet Customer Care
Medical Information
Search Technology:
User Interface:
Content (Web
Pages or Databases
Content
Classification
CLASSIC Knowledge
Representation System
Search
Engine
Domain
Domain
Knowledge
Knowledge
GUI supporting
browsing
and selection
Results
(standard format)
Results
(domain specific)
Verity (and
topic sets)
Collaborative
Topic Set Tool
Verity SearchScript,
Javascript, HTML,
CGI, CLASSIC
Current and Future Work
• Collaborative topic set generation
– interactive environment for updating knowledge
– could incorporate other techniques using IR, ML, dictionaries, etc
– about to be deployed on research and AT&T Solutions sites
• Knowledge representation integration for topic maintenance
– CLASSIC maintenance on definitions, overlaps, inconsistencies,
– CLASSIC-VERITY translator
– beyond subclass, instance, synonym relationships
– content source description
• Evaluation and interface evolution
– effectiveness of retrievals, relevance ranking, ease of query
refinement, ease of content generation and maintenance
• Datalogs evaluation and incorporation of findings
FindUR improves search by:
• Retrieving previously missed relevant documents
• More appropriately ordering search results
• Facilitating simple user generation and refinement of
queries
• Setting user expectations about the content domain
• Providing an environment to maintain evolving
background knowledge sets
• Integrating multiple forms of documents
• Browsing support (navigation, highlighting)
• Automatically search the correct knowledge sources
(according to information about the context of the query
Status
• Platform is robust and integrated into Business Unit endorsed
environment.
• FindUR is deployed on five sites including the Summit 2005 Calendar site,
The Westfield Leader (newspaper) site, and the Technical Memorandum
Research database as well as Directory Westfield and the Research Web
Site.
• AT&T Solutions is funding a FindUR based solution to contractor and
vendor management.
• FindUR contains some topic sets that could be leveraged in other projects
• The collaborative topic set building environment could be used
independent of the rest of the FindUR to generate and maintain
background knowledge sets
Extras
Editing Evidence
Extras
AT&T Labs Research Site
with Gilmore (main site) & Malkhi (tm database)
• A simple text search for artificial
intelligence only retrieves listings for pages
that contain the phrase “artificial
intelligence.” Information about branches
of AI such as description logics typically do
not contain this more general phrase.
• This results list for the phrase “artificial
intelligence” misses 582 listings relevant to
the concept of artificial intelligence.
AT&T Labs Research Site
• FindUR has a taxonomy of background
information which includes “knowledge
representation” as a sub-category of
“artificial intelligence.”
• The category/sub-category relationships are
displayed in the user interface. Users can
construct queries by simply clicking
categories and sub-categories, invoking
background knowledge in the process.
AT&T Labs Research Site
• With background knowledge the search
returns 696 relevant listings.
• Documents of a more general nature such as
bibliographies and departmental overviews
float higher in the list. Without background
knowledge, a reference manual was the first
retrieval.