View presentation
Download
Report
Transcript View presentation
ICDL 2004, New Dehli, India
Ontology-Driven
Information Retrieval
Nicola Guarino
Laboratory for Applied Ontology
Institute for Cognitive Sciences and Technology (ISTC-CNR)
Trento-Roma, Italy
Summary
•
•
•
•
•
•
The role of ontologies in (generalized) IR
Semantic Matching
Increasing precision
Increasing recall
Problems with WordNet
The role of Foundational Ontologies
The key problem
Semantic matching
Simple queries: need more knowledge about what
the user wants
• Search for “Washington” (the person)
– Google: 26,000,000 hits
– 45th entry is the first relevant
– Noise: places
• Search for “George Washington”
– Google: 2,200,00 hits
– 3rd entry is relevant
– Noise: institutions, other people, places
Solution: ontology+semantic markup
• Ontology
– Person
• George Washington
• George Washington Carver
– Place
• Washington, D.C.
– Artifact
• George Washington Bridge
– Organization
• George Washington University
• Semantic disambiguation/markup of questions and corpora
– What Washington are you talking about?
The role of taxonomy and lexical knowledge
• Search for “Artificial Intelligence Research”
– Misses subfields of the general field
– Misses references to “AI” and “Machine Intelligence”
(synonyms)
– Noise: non-research pages, other fields…
Solution:
• Extra knowledge
– Ontology: Sub-fields (of AI)
• Knowledge Representation
• Machine Vision etc.
• Neural networks
– Lexicon: Synonyms (for AI)
• Artificial Intelligence
• Machine Intelligence
• Techniques
– Query Expansion
• Add disjuncted sub-fields to search
• Add disjuncted synonyms to search
– Semantic Markup of question and corpora
• Add “general terms” (categories)
• Add “synonyms”
Ontology-driven search engines
• Idealized view
– Ontology-driven search engines act as virtual librarians
• Determine what you “really mean”
• Discover relevant sources
• Find what you “really want”
• Requires common knowledge on all ends
– Semantic linkage between questioning agent, answering agent
and knowledge sources
• Hence the “Semantic Web”?
Two main roles of ontologies in IT
• Semantic Interoperability
–
–
–
–
Generalized database integration
Virtual Enterprises, Concurrent Engineering
e-commerce
Web services
• Information Retrieval
–
–
–
–
Documents
Facts (query answering)
Products
Services
[Not mentioning IS analysis and design…]
The role of linguistic ontologies
coupled with structured representation
formalisms
•
Why not just simple thesauri based on fixed keyword hierarchies?
•
Linguistic ontologies with structured representation formalisms
– Data’s intrinsic dynamics needs to keep track of new terms
– Need of understanding a rigid set of terms
– Heterogeneous descriptions need a broad-coverage vocabulary
–
–
–
–
Decouple user vocabulary from data vocabulary
Increase recall (synonyms, generalizations)
Increase precision (disambiguation, ontology navigation)
Further increase precision (by capturing the structure of queries and
data)
Using WordNet as an ontology
• Unclear semantic interpretation of hyperonimy
• Instantiation vs. subsumption
• Object-level vs. meta-level
• Hyperonymy used to account for polysemy
•(law both a document and a rule)
• Unclear taxonomic structure
• Glosses not consistent with taxonomic structure
• Heterogeneous leves of generality
• Formal constraints violations (especially concerning roles)
• Polysemous use of antonymy (child/parent vs. daughter/son)
• Poor ontology of adjectives and qualities
• Shallow taxonomy of verbs
When subtle distinctions are
important
• “Trying to engage with too many partners too
fast is one of the main reasons that so many
online market makers have foundered. The
transactions they had viewed as simple and
routine actually involved many subtle
distinctions in terminology and meaning”
Harvard Business Review, October 2001
Ontologies and intended meaning
Commitment:
K = < C,I >
Conceptualization
Language L
Interpretation
Intended
models IK(L)
Models MD(L)
Ontology
Levels of Ontological Precision
tennis
football
game
field game
court game
athletic game
outdoor game
game(x) activity(x)
athletic game(x) game(x)
court game(x) athletic game(x) y. played_in(x,y) court(y)
tennis(x) court game(x)
double fault(x) fault(x) y. part_of(x,y) tennis(y)
game
athletic game
court game
tennis
outdoor game
field game
football
Taxonomy
Glossary
Catalog
game
NT athletic game
NT court game
RT court
NT tennis
RT double fault
Thesaurus
Ontological precision
Axiomatized
theory
DB/OO
scheme
Ontology Quality:
Precision and Coverage
Good
High precision, max coverage
BAD
Max precision, low coverage
Less good
Low precision, max coverage
WORSE
Low precision and coverage
Why precision is important
MD(L)
IB(L)
IA(L)
False agreement!
DOLCE
a Descriptive Ontology for Linguistic and
Cognitive Engineering
• Strong cognitive bias: descriptive (as
opposite to prescriptive) attitude
• Emphasis on cognitive invariants
• Categories as conceptual containers: no
“deep” metaphysical implications wrt
“true” reality
• Clear branching points to allow easy
comparison with different ontological
options
• Rich axiomatization
DOLCE’s basic taxonomy
Endurant
Physical
Amount of matter
Physical object
Feature
Non-Physical
Mental object
Social object
…
Perdurant
Static
State
Process
Dynamic
Achievement
Accomplishment
Quality
Physical
Spatial location
…
Temporal
Temporal location
…
Abstract
Abstract
Quality region
Time region
Space region
Color region
…
…
Research priorities at LOA-CNR
• Foundational ontologies and ontological
analysis
• Domain ontologies
– Physical objects
– Information and information processing
– Social interaction
– Ontology of legal and financial entities
• Ontology, language, cognition
• Ontology-driven information systems
– Ontology-driven conceptual modeling
– Ontology-driven information access
– Ontology-driven information integration
www.loa-cnr.it