Transcript Slide 1

Ontology Learning
Μπαλάφα Κάσσυ
Πλασταρά Κατερίνα
Contents
 Introduction – Ontologies, Ontology learning
 Technical description
 Ontology learning in the Semantic Information
description
 Ontology Learning – Process
 Ontology Learning - Architecture
 Ontology Learning data sources
 Methods used in ontology learning
 Tools of ontology learning
 Uses of ontology learning
Ontologies
 Provide a formal, explicit specification of a shared
conceptualization of a domain that can be communicated
between people and heterogeneous and widely spreads
application systems.
 They have been developed in Artificial Intelligent and
Machine Learning to facilitate knowledge sharing and
reuse.
 Unlike knowledge bases ontologies have “all in one”:
 formal or machine readable representation
 full and explicitly described vocabulary
 full model of some domain
 consensus knowledge: common understanding of a domain
 easy to share and reuse
Ontology learning - General
Machine learning of ontologies
Main task: to automatically learn
complicated domain ontologies
Explores techniques for applying
knowledge discovery techniques to
different data sources ( html documents,
dictionaries, free text, legacy ontologies
etc.) in order to support the task of
engineering and maintaining ontologies
 Introduction – Ontologies, Ontology learning
 Technical description
 Ontology learning in the Semantic Information
descritpion
 Ontology Learning – Process
 Ontology Learning - Architecture
 Ontology Learning data sources
 Methods used in ontology learning
 Tools of ontology learning
 Uses of ontology learning
Ontology learning –
Technical description
 The manual building of ontologies is a tedious
task, which can easily result in a knowledge
acquisition bottleneck. In addition, human expert
modeling by hand is biased, error prone and
expensive
 Fully automatic machine knowledge acquisition
remains in the distant future
 Most systems are semi-automatic and require
human (expert) intervention and balanced
cooperative modeling for constructing ontologies
 Introduction – Ontologies, Ontology learning
 Technical description
 Ontology learning in the Semantic Information
descritpion
 Ontology Learning – Process
 Ontology Learning - Architecture
 Ontology Learning data sources
 Methods used in ontology learning
 Tools of ontology learning
 Uses of ontology learning
Semantic Information Integration
Ontology Engineering
 Introduction – Ontologies, Ontology learning
 Technical description
 Ontology learning in the Semantic Information
descritpion
 Ontology Learning – Process
 Ontology Learning - Architecture
 Ontology Learning data sources
 Methods used in ontology learning
 Tools of ontology learning
 Uses of ontology learning
Ontology learning –
Process (1/2)
Ontology learning –
Process (2/2)
 Stages analysis:
 Merging existing structures or defining mapping rules between
these structures allows importing and reusing existing ontologies
 Ontology extraction models major parts of the target ontology,
with learning support fed from various input sources
 The target ontology’s rough outline, which results from import,
reuse and extraction is pruned to better fit the ontology to its
primary purpose
 Ontology refinement profits from the pruned ontology but
completes the ontology at a fine granularity (in contrast to
extraction)
 The target application serves as a measure for validating
the resulting ontology
 The ontology engineer can begin this cycle again- for
example, to include new domains in the constructing
ontology or to maintain and update its scope
 Introduction – Ontologies, Ontology learning
 Technical description
 Ontology learning in the Semantic Information
descritpion
 Ontology Learning – Process
 Ontology Learning - Architecture
 Ontology Learning data sources
 Methods used in ontology learning
 Tools of ontology learning
 Uses of ontology learning
Ontology learning –
Architecture (1/5)
Ontology learning –
Architecture (2/5)
Ontology Engineering Workbench: A
sophisticated means for manual modeling
and refining of the final ontology. The
ontology engineer can browse the
resulting ontology from the ontology
learning process and decide to follow,
delete or modify the proposals as the task
requires.
Ontology learning –
Architecture (3/5)
 Management component: The ontology engineer
uses the management component to select input
data – that is relevant resources such as HTML
and XML documents, DTDs, databases or
existing ontologies that the discovery process
can further exploit. Then, using the management
component the engineer chooses of a set of
resource-processing methods available in the
resource-processing component and from a set
of algorithms available in the algorithm library.
Ontology learning –
Architecture (4/5)
 Resource processing Component: Depending on the
available data the engineer can choose various
strategies for resource processing:
 Index and reduce HTML documents to free text
 Transform semi-structured documents such as dictionaries into
predefined relational structure
 Handle semi-structured and structured schema data by
following different strategies for import
 Process free natural text
After first preprocessing data according to one of
these or similar strategies the resource processing
module transforms the data into an algorithm specific
relational representation.
Ontology learning –
Architecture (5/5)
Algorithm library: A collection of various
algorithms that work on the ontology
definition and the preprocess input data.
Although specific algorithms can vary
greatly from one type of input to the next,
a considerable overlap exists for
underlying learning approaches such as
associations rules, formal concept analysis
or clustering.
Contents
 Introduction – Ontologies, Ontology learning
 Technical description
 Ontology learning in the Semantic Information
descritpion
 Ontology Learning – Process
 Ontology Learning - Architecture
 Ontology Learning data sources
 Methods used in ontology learning
 Tools of ontology learning
 Uses of ontology learning
Ontology Learning from
Natural Language
 Natural language texts exhibit morphological, syntactic,
semantic, pragmatic and conceptual constraints that
interact in order to convey a particular meaning to the
reader. Thus, the text transports information to the
reader and the reader embeds this information into his
background knowledge
 Through the understanding of the text, data is associated
with conceptual structures and new conceptual
structures are learned from the interacting constraints
given through language
 Tools that learn ontologies from natural language exploit
the interacting constraints on the various language levels
(from morphology to pragmatics and background
knowledge) in order to discover new concepts and
stipulate relationships between concepts
Ontology Learning from
Semi-structured Data
 HTML data, XML data, XML DTDs, XMLSchemata and their likes add - more or less
expressive - semantic information to documents
 A number of approaches understand ontologies
as a common generalizing level that may
communicate between the various data types
and data descriptions. Ontologies play a major
role for allowing semantic access to these vast
resources of semi-structured data
 Learning of ontologies from these data and data
descriptions may considerably enforce the
application of ontologies and, thus, facilitate the
access to these data
Ontology Learning from
Structured Data
The learning of ontologies from metadata,
such as database schemata, in order to
derive a common high-level abstraction of
underlying data descriptions can be an
important precondition for data
warehousing or intelligent information
agents
 Introduction – Ontologies, Ontology learning
 Technical description
 Ontology learning in the Semantic Information
descritpion
 Ontology Learning – Process
 Ontology Learning - Architecture
 Ontology Learning data sources
 Methods used in ontology learning
 Tools of ontology learning
 Uses of ontology learning
Methods for learning
ontologies (1/8)
Clustering
The elaboration of any clustering method
involves the definition of two main elementsa distance metrics and a classification
algorithm
A workbench that supports the development
of conceptual clustering methods for the
(semi-) automatic construction of ontologies
of a conceptual hierarchy type from parsed
corpora is the Mo’K workbench
Methods for learning
ontologies (2/8)
Clustering
Ontologies are organized as multiple
hierarchies that form an acyclic graph where
nodes are term categories described by
intention and links represent inclusion.
Learning though hierarchical classification of
a set of objects can be performed in two
main ways: top down, by incremental
specialization of classes and bottom-up by
incremental generalization
Methods for learning
ontologies (3/8)
 Information Extraction Rules
Methods for learning
ontologies (4/8)
Information Extraction Rules
We start with:
 An initial hand crafted seed ontology of
reasonable quality which contains already the
relevant types of relationships between ontology
concepts in the given domain
 An initial set of documents which exemplarily
represent (informally) substantial parts of the
knowledge represented in the seed ontology
Methods for learning
ontologies (5/8)
Information Extraction Rules
Compared to other ontology learning
approaches this technique is not restricted to
learning taxonomy relationships, but arbitary
relationships in an application domain.
A project that uses this technique is the
FRODO project.
Methods for learning
ontologies (6/8)
 Association Rules
Association-rule-learning algorithms are used for
prototypical applications of data mining and for finding
associations that occur between items in order to
construct ontologies (extraction stage)
‘Classes’ are expressed by the expert as a free text
conclusion to a rule. Relations between these ‘classes’
may be discovered from existing knowledge bases and
a model of the classes is constructed (ontology) based
on user-selected patterns in the class relations
This approach is useful for solving classification
problems by creating classification taxonomies
(ontologies) from rules
Methods for learning
ontologies (7/8)
Association Rules – Example
A classification knowledge based system with
experimental results based on medical data (Suryanto
& Compton – Australia)
Ripple Down Rules (RDR) were used to describe
classes and their attributes:
Satisfactory lipid profile previous raised LDL noted 
(LDL <= 3.4)AND(Triglyceride is NORMAL)AND(Max(LDL)>3.4)OR
((LDL is NORMAL)AND(Triglyceride is NORMAL)AND(Max(LDL) is
HIGH)
Experts were allowed to modify or add conclusions in
order to correct errors
The conclusions of the rules formed the classes of the
classification ontology
Methods for learning
ontologies (8/8)
Association Rules – Example
Ontology learning methodology used:
 Firstly, class relations between rules were discovered. There
were three basic relations: subsumption/ intersection, mutual
exclusivity and similarity
 Secondly, more compound relations which appeared
interesting using the three basic relations were specified
 Finally, instances of these compound relations or patterns
were extracted and the class model was assembled
Problems that occurred:
 Very similar conclusions were sometimes identified as
mutually exclusive in cases where there different values for
the same attribute
 The method did not consider any other information about the
classes themselves
 Introduction – Ontologies, Ontology learning
 Technical description
 Ontology learning in the Semantic Information
descritpion
 Ontology Learning – Process
 Ontology Learning - Architecture
 Ontology Learning data sources
 Methods used in ontology learning
 Tools of ontology learning
 Uses of ontology learning
Ontology learning tools –
ASIUM (1/8)
 Acronym for "Acquisition of Semantic knowledge Using
Machine learning method"
 The main aim of Asium is to help the expert in the
acquisition of semantic knowledge from texts and to
generalize the knowledge of the corpus
 Asium provides the expert with an interface which will
first help him or her to explore the texts and then to learn
knowledge which are not in the texts
 During the learning step, Asium helps the expert to
acquire semantic knowledge from the texts, like
subcategorization frames and an ontology. The ontology
represents an acyclic graph of the concepts of the
studied domain. The subcategorization frames represent
the use of the verbs in these texts
Ontology learning tools –
ASIUM (2/8)
Methodology:
The input for Asium are
syntactically parsed texts from a
specific domain. It then extracts
these triplets: verb,
preposition/function (if there is no
preposition), lemmatized head
noun of the complement. Next,
using factorization, Asium will
group together all the head nouns
occurring with the same couple
verb, preposition/function. These
lists of nouns are called basic
clusters. They are linked with the
couples verb,preposition/
function they are coming from.
Ontology learning tools –
ASIUM (3/8)
Methodology:
Asium then computes the
similarity among all the basic
clusters together. The nearest
ones will be aggregated and this
aggregation is suggested to the
expert for creating a new
concept. The expert defines a
minimum threshold for gathering
clusters into concepts. Any
learned concepts can contain
noise (e.g. mistakes in the
parsing), any sub-concepts the
expert wants to identify or overgeneralization due to aggregations may occur,so the expert’s
contribution is necessary.
Ontology learning tools –
ASIUM (4/8)
Methodology:
After this, Asium will have learned
the first level of the ontology. Asium
computes similarity again but
among all the clusters; the old and
the new ones in order to learn the
next level of the ontology. The
cooperative process runs until there
are no more possible aggregations.
The output of the learning process is
an ontology and subcategorization
frames. The ontology represents an
acyclic graph of the concepts of the
studied domain. The
subcategorization frames represent
the use of the verbs in these texts.
Ontology learning tools –
ASIUM (5/8)
Methodology
The advantages of this method are twofold:
First, the similarity measure identifies all concepts of
the domain and the expert can validate or split them.
Next the learning process is, for one part, based on
these new concepts and suggests more relevant and
more general concepts.
Second, the similarity measure will offer the expert
aggregations between already validated concepts
and new basic clusters in order to get more
knowledge from the corpus.
Ontology learning tools –
ASIUM (6/8)
The interface
This window allows the
expert to validate the
concepts learned by
Asium.
Ontology learning tools –
ASIUM (7/8)
The interface
This window displays the
list of all the examples
covered for the learned
concept.
This display allows the
expert to visualize all the
sentences which will be
allowed if this class is
validated.
Ontology learning tools –
ASIUM (8/8)
The interface
This window displays the ontology like it actually is in memory i.e.
learned concepts and concepts to be proposed for a level (each blue
circle represents a class).
Ontology learning tools –
TEXT-TO-ONTO (1/8)
It develops a semi-automatic ontology
learning from text
It tries to overcome the knowledge
acquisition bottleneck
It is based on a general architecture for
discovering conceptual structures and
engineering ontologies from text
Ontology learning tools –
TEXT-TO-ONTO (2/8)
Ontology learning tools –
TEXT-TO-ONTO (3/8)
 Architecture
Ontology learning tools –
TEXT-TO-ONTO (4/8)
Architecture - Main components
Text & Processing Management Component
The ontology engineer uses that component to
select domain texts exploited in the further
discovery process.Can choose among a set of
text (pre-) processing methods available on the
Text Processing Server and among a set of
algorithms available at the Learning &
Discovering component.The former module
returns text that is annotated by XML and XMLtagged is fed to the Learning & Discovering
component
Ontology learning tools –
TEXT-TO-ONTO (5/8)
Architecture - Main components
Text Processing Server
It contains a shallow text processor based on the
core system SMES. SMES is a system that
performs syntactic analysis on natural language
documents
It organized in modules, such as tokenizer,
morphological and lexical processing and chunk
parsing that use lexical resources to produce a
mixed syntactic/semantic information
The results are stored in annotations using XMLtagged text
Ontology learning tools –
TEXT-TO-ONTO (6/8)
Architecture - Main components
Lexical DB & Domain Lexicon
SMES accesses a lexical database with more
than 120.000 stem entries and more than 12.000
subcategorization frames that are used for lexical
analysis and chunk parsing
The domain-specific part of the lexicon
associates word stems with concepts available in
the concept taxonomy and links syntactic
information with semantic knowledge that may be
further refined in the ontology
Ontology learning tools –
TEXT-TO-ONTO (7/8)
Architecture - Main components
Learning & Discovering component
Uses various discovering methods on the annotated
texts e.g. term extraction methods for concept
acquisition.
Ontology learning tools –
TEXT-TO-ONTO (8/8)
Architecture - Main components
Ontology Engineering Enviroment-ONTOEDIT
Supports the ontology engineer in semi-automatically
adding newly discovered conceptual structures to the
ontology
Internally stores modeled ontologies using an XML
serialization
 Introduction – Ontologies, Ontology learning
 Technical description
 Ontology learning in the Semantic Information
descritpion
 Ontology Learning – Process
 Ontology Learning - Architecture
 Ontology Learning data sources
 Methods used in ontology learning
 Tools of ontology learning
 Uses of ontology learning
Uses of ontology learning –
Knowledge sharing (1/2)
 Identifying candidate relations between
expressive, diverse ontologies using concept
cluster integration in multi-agent systems
 Agents with diverse ontologies should be able to
share knowledge by automated learning
methods and agent communication strategies
 Agents that do not know the relationships of their
concepts to each other need to be able to teach
each other these relationships (ontology
learning)
Uses of ontology learning –
Knowledge sharing (2/2)
Concept
representation and
learning on each
agent:
 Process: an agent sends a query to another agent
and receives a response with new concepts. A
new category is created from these concepts. The
agent re-learns the ontology rules and if the new
concept relation rules are verified, they are stored
in the agent.
Uses of ontology learning –
Interest matching (1/2)
 Designing a general algorithm for interest
matching is a major challenge in building online
community and agent-based communication
networks.
 These algorithms can be applied in user
categorization for an online community . Users’
behavior can be analyzed and matched against
other users to provide collaborative
categorization and recommendation services to
tailor and enhance the online experience.
 The process of finding similar users based on
data from logged behavior in called interest
matching.
Uses of ontology learning –
Interest matching (2/2)
User interests can be
described by ontologies
as weighed treehierarchies of concepts
 Each node has a weight attribute to represent the
importance of the concept
 These weights can be explored to calculate similarities
between users
 Learning process: a standard ontology is used and the
websites the user visits can be classified and entered
into the standard ontology to personalize it – if a user
frequents websites of a category (instance of a class) it
is likely he is interested in other instances of the class
Uses of ontology learning –
Web Directory Classification
 Ontologies and ontology learning can be used to
create information extraction tools for collecting
general information from the free text of web
pages and classifying them in categories
 The goal is to collect indicator terms from the
web pages that may assist the classification
process. This terms can be derived from
directory headings of a web page as well as its
content.
 The indicator terms along with a collection of
interpretation rules can result in a hierarchy
(ontology) of web pages.
Uses of ontology learning –
E-mail classification (1/2)
KMi Planet
A web-based news server for communication
of stories between member in Knowledge
Media Institute
Main goal: To classify an incoming story,
obtain the relevant objects within the story,
deduce the relationships between them and
to populate the ontology
Integrate a template-driven information
extraction engine with an ontology engine to
supply the necessary semantic content
Uses of ontology learning –
E-mail classification (2/2)
 KMi Planet
 There are three tools:
 PlanetOnto
 MyPlanet
 an IE tool
 PlanetOnto supports some activities.One of them is
Ontology editing.In that point ontology learning is concerned.
 A tool called WebOnto provides Web-based
visualisation, browsing and editing support for the
ontology. The “Operational Conceptual Modelling
Language”, OCML, is a language designed for
knowledge modeling. WebOnto uses OCML and
allows the creation of classes and instances in the
ontology, along with easier development and
maintenance of the knowledge models
Bibliography
 M.Sintek, M. Junker, Ludger van Est, A. Abecker, Using Information
Extraction Rules for Extending Domain Ontologies, German Research
Center for Artificial Intelligence (DFKI)
 M.Vargas-Vera, J.Domingue, Y.Kalfoglou, E.Motta, S.Buckingham Shum,
Template-Driven Information Extraction for Populating Ontologies,
Knowledge Media Institute (UK)
 G.Bisson, C.Nedellec, Designing clustering methods for ontology building,
University of Paris
 A.Maedche, S.Staab, The TEXT-TO-ONTO Ontology Learning
Environment, University of Karlsruhe
 A.Maedche, S.Staab, Ontology Learning for the Semantic Web, University
of Karlsruhe
 H.Suryanto,P.Compton, Learning classification taxonomies from a
classification knowledge based system, University of New South Wales
(Australia)
 Proceedings of the First Workshop on Ontology Learning OL'2000
Berlin, Germany, August 25, 2000
 Proceedings of the Second Workshop on Ontology Learning OL'2001
Seattle, USA, August 4, 2001
 ASIUM web page
http://www.lri.fr/~faure/Demonstration.UK/Presentation_Demo.html