Ontology-Based Computing

Download Report

Transcript Ontology-Based Computing

Ontology-Based Computing
Kenneth Baclawski
Northeastern University and Jarg
The Onslaught
 Increasingly large amounts of information is
becoming accessible electronically.
 The information sources are increasingly
complicated.
 The diversity of types of information source
is also increasing.
 Technologies are emerging to cope with this
onslaught: ontology-based computing.
Ontologies
 Shared understanding within a community
of people
 Declarative specification of entities and
their relationships with each other
 Constraints and rules that permit reasoning
within the ontology
 Behavior associated with stated or inferred
facts
Relational Database Schemas
 Well established technique for specifying
the structure of shared data, not for
communication between people or agents
 Declarative specification but of tables, not
of entities and relationships
 Some constraints are expressible but no
significant rules (such as inheritance)
 No explicit behavior
 Standard language is SQL.
Object-Oriented Schemas
 Emerging technology for communication
between software components
 Declarative specifications
 Constraints and some rules
 Several ways to specify behavior
 The Unified Modeling Language (UML) is
the standard OO modeling language.
Pathway
name : string
1..1
input
consists of
0..*
2..*
Reaction
0..*
description : string
1..1
1..*
Chemical
name : string
f ormula : string
weight : number
1..*
output
catalyzed by
0..1
Enzy me
sequence : string
Logic
 Very expressive but very difficult to use.
Not designed for communication.
 Most logical languages are not based on
entities and relationships.
 Very powerful inferencing capabilities.
 Do not usually have any associated
behavior.
 Many examples: Prolog, KIF, Slang, ...
XML DTDs and XML Schema
 Defines a hierarchical document type.
XML Schema defines data types. Designed
for communication over the Web.
 Good support for entities and hierarchical
relationships; awkward for others.
 Constraints can be imposed on the
hierarchical structure and on data types.
 Behavior can be specified procedurally.
Knowledge Representations
 Very well developed branch of AI. Many
tools, but mostly academic. Not yet used
for communication over the Web.
 Powerful language for specifying entities
and their relationships.
 Most are linked with inference engines.
 Behavior is typically handled in an ad hoc
manner.
RDF and DAML
 Resource Description Framework (RDF) is
a knowledge representation language
represented in XML. It is a WWW
Consortium Recommendation.
 The DARPA Agent Markup Language
(DAML) is an extension of RDF to serve as
the basis for ontology-based computing
over the Web: the Semantic Web.
Ontological Reasoning in RDF
Property
Class
type
Wendy
type
type
Person
Fish
type
owns
type
range
type
domain
owns
Wanda
Type constraint violation: The range of owns is Fish.
OR There is no inconsistency: Wanda is a fish!
Mermaid?
DAML
type
type
Student
College
type
type
domain
range majors
subClassOf
onProperty
type
Engineering
equivalentTo
Property
Class
maxCardinality
majors type
Arts & Sciences
majors
George
1
type
Restriction
Cardinality constraint violation: George can’t have two majors
OR There is no inconsistency: Engineering = Arts & Sciences
Representing information





Relational database: records
OO database: objects and links
Logic: facts
XML: documents
Knowledge Representations: annotations
 All of these are graph structures: entities
related to other entities by relationships.
Where is the meaning?




Databases: select-project-join queries
Logic: rules determined by unification
XML: XSLT patterns
Knowledge Representations: templates
 All of these are forms of graph matching.
The units of meaning are small connected
subgraphs that I call motifs.
Ontology Infrastructure
Simply introducing a language is not enough.
There must be an infrastructure to support
ontology-based computing, including:





Ontology development tools
Content creation systems
Storage and retrieval systems
Ontology reasoning, mediation, ...
Integration with applications
Ontology Development
 Ontologies can be developed using
graphical tools specifically for ontologies or
by adapting existing tools such as CASE
tools.
 Testing ontologies is not easy because they
include constraints and inference rules.
 Ontology testing is analogous to type
checking in programming languages.
Content Creation




Databases: Data warehousing technology
Text: Natural Language Processing (NLP)
Image processing
Direct creation of content
 No matter how the content is created it must
be tested using consistency checking.
Storage and Retrieval
 Scaling up will require high-performance,
distributed storage and indexing technology.
 The natural units for indexing are the motifs
(precomputed joins), but the number of
motifs is large.
 Jarg Corporation has developed a scalable,
high-performance indexing technology for
ontology-based knowledge representations.
Jarg Architecture
Document NLP Knowledge Representation
fragmentation
Knowledge Fragments
Distributed Index Engine
Query
NLP
Knowledge Motifs
fragmentation
Knowledge Representation
Matching
Documents
Conclusion
 Ontology-based computing is emerging as a
natural evolution of existing technologies to
cope with the information onslaught.
 Ontology-based technology must be
scalable if it is to contribute to the solution
rather than add to the problem.
 Consistency checking is important for the
development of ontologies and content.
Bibliography











Semantic Web: www.w3.org/2001/sw
Ontologies: www.ontology.org
Unified Modeling Language: www.omg.org/uml
Knowledge Interchange Format: logic.stanford.edu/kif
Specware and Slang: www.kestrel.edu
XML and XML Schema: www.w3.org/xml
RDF and RDFS: www.w3.org/rdf
DAML: www.daml.org
Notation 3: www.w3.org/DesignIssues/Notation3.html
Consistency checking: vis.home.mindspring.com
Jarg Knowledge Engine: www.jarg.com