Transcript Phase 1

• Context
• Problem
• Research
Question
A Framework to Experiment with Different NLP
Techniques
• Background
• Framework
Ricardo Gacitua1, Pete Sawyer1, Paul Rayson1, Scott Piao2
1 Computing
2
Department, Lancaster University, Lancaster, UK
School of Computer Science, Manchester University, U
• Results
• Demo
• Conclusions
• Further Work
Workshop - Issues in Ontology Development and Use
Nottingham, UK.
2007
Index
• Context
• Problem
Context
• Research
Question
Problems
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Research Question
Objectives
Framework
Brief Demo – Ontolancs –Workbench
Further Work
Context
Focus:
• Context
Most initiatives for Ontology
Learning combine techniques
to find concepts and
relationships between them.
• Problem
• Research
Question
Extracting the relevant domain
terminology and synonyms from a text
• Background
collection
Discovering concepts which can
• Framework
• Results
• Demo
• Conclusions
• Further Work
Deriving a concept hierarchy
be regarded as abstractions of
human thought
Methods for term extraction can organizing
be as these concepts
simple as :
Unsupervised
clustering
techniques
Extending
an existing
concept
•counting raw frequency,
hierarchy
with
new
concepts
Learning taxonomic relations
known from Machine Learning.
•applying information retrieval between concepts
[Cimmiano et al. 2005, faure &
methods such as TFIDF (Baeza-Yates &
Learning
non-taxonomic
Nedellec 1999,
Caraballo,
1999] relations
between
concepts
Ribeiro-neto, 1999) or
Populating the ontology with
instances of relations and concepts
•applying sophisticated methods such
as the C-value / NC-value method
Discovering other axiomatic
[Frantzi & Ananiadou 1999]
relationships or rules involving
concepts and relations.
Context
• Context
• Problem
• Research
Question
Focus:
Most initiatives for Ontology
Learning combine techniques
to find concepts and
relationships between them.
• Background
• Framework
• Results
• Demo
However, researchers have realised that the
output for the ontology learning process is far
from being perfect [Cimmiano, 2005]
• Conclusions
• Further Work
Philipp Cimiano, Johanna Völker, Rudi Studer Ontologies on Demand? - A Description of the State-of-the-Art, Applications,
Challenges and Trends for Ontology Learning from Text Information, Wissenschaft und Praxis 57 (6-7): 315-320. October 2006.
see the special issue for more contributions related to the Semantic Web
Problem
• Context
A key issue not addressed yet:
In most cases, it is not obvious
• Problem
to how to use, configure and
combine techniques from
• Research
different fields for a specific
Question
domain.
A challenging issue is to quantitatively
evaluate the usefulness, accuracy of the
techniques
and
combinations
of
techniques when applied to ontology
learning [1].
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Reinberg and Spyns (2005) point out the importance of the evaluation of the
effectiveness of the techniques for ontology learning “To our knowledge no
comparative study has been published yet on the efficiency and effectiveness
of the various techniques applied to ontology learning”. (page 2)
(1) Reinberger, M. L. and P. Spyns (2005). Unsupervised text Mining for the learning of DOGMA-inspired Ontologies. Ontologies Learning
from Text: methods, Evaluation and Applications, Advances in Artificial Intelligence. P. Buitelaar, Cimiano P., Magnini B. (eds.). Amsterdam,
IOS Press. vol. 24,: pages 305-339.
Research Question
• Context
• Problem
• Research
Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
Can shallow semantic
analysis of the kind enabled by
semantic tagging, together with
a range of other statistical
NLP techniques; identify key
domain concepts?
Can it do it with sufficient
confidence in the correctness
and completeness of the result?”
Background..
A number of frameworks that support ontology learning
process have been reported:
• Context
• Problem
• Research
Question
• Background
•Background
• Framework
They implement several techniques from different
fields such a knowledge acquisition, machine
learning, information retrieval, natural language
Most frameworks use a pre- processing, artificial intelligence reasoning and
defined combination of database management.
techniques. Thus, they do not
include any mechanism for
carrying out experiments with
combinations or the ability to
include new ones.
• Results
• Demo
• Conclusions
• Further Work
ASIUM
Tex2Onto
Text2Onto is based on the GATE
framework. GATE framework it is
flexible with respect to the set of
algorithms.
OntoLearn
OntoLT
DODDLE
A Flexible Framework
Phase 1: Part-of-Speech (POS) and
Semantic annotation of corpus:
•Domain
Context texts are tagged morphosyntactically and semantically.
• Problem
Phase 4: Domain Ontology
Edition: the bootstrap ontology is
turned into OWL. Then it is
processed using an ontology editor
(Protégé)
to
manage
the
versioning of the domain ontology
and modify or improve it.
• Research
Question
• Background
A existing DAML ontology can be
used as a reference and to calculate
precision and recall.
• Framework
Phase
2:
Extraction
of
The
domain
• concepts:
Results
terminology is extracted from the
• tagged
Demo domain corpus by
identifying a list of domain
candidate terms. The system
• provides
Conclusions
a set of statistical and
linguistic techniques which an
• ontology
Further Work
engineer can combine
•Phase 3: Domain Ontology
Construction: Concepts extracted
during the previous phase are then
added to a concept hierarchy.
Preliminary Results
• Context
• Problem
• Research
Question
Some researchers use different text processing techniques
such as stopword filtering, lemmatization or stemming.
StopWord Filtering: [ Bloehdorn et al., 2006 ]
Lemmatization: [ Buitelaar and Ramaka, 2005 ]
Stemming: [ Kietz et al, 2000 ]
•S. Bloehdorn and P. Cimiano and A. Hotho: Learning Ontologies to Improve Text Clustering and Classification. Proc of
GFKL, 2005.
•Paul Buitelaar, Srikanth Ramaka Unsupervised Ontology-based Semantic Tagging for Knowledge Markup In: Proc. of the Workshop on
Learning in Web Search at the International Conference on Machine Learning, Bonn, Germany, August 2005.
•J.Kietz, et al., A Method for semi-automatic ontology acquisition from a corporate intranet, in: Proc EKAW-2000 , France.
2000.
• Background
• Framework
•• Results
Results
From the preliminary experiments, we can
conclude that the lemmatization technique
(Group 3) produces better results than the
stemming technique (Group 2) for the
domain concept acquisition process.
• Demo
• Conclusions
• Further Work
Our results are consistent with other studies. For
instance, Alkula[3] suggests that the lemmatization
may be a better approach than stemming.
[3]Alkula, R. 2001. From Plain Character Strings to Meaningful Words: Producing Better Full Text Databases for Inflectional and Compounding
Languages with Morphological Analysis Software. Inf. Retr. 4, 3-4 (Sep. 2001), 195-208.
Brief Demo
• Context
• Problem
• Research
Question
• Background
• Framework
• Results
•• Demo
Demo
• Conclusions
• Further Work
Ontology Framework
Conclusions
Main challenge:
• Context
Our research project addresses an important
challenge of ontology research, i.e. how quantitatively
to evaluate the usefulness and accuracy of both
techniques and combinations of techniques, when
are applied to ontology learning.
• Problem
• Research
Question
• Background
• Framework
1
This framework is designed as a cyclical
process to experiment with different
techniques. Techniques are included as a
plug-in.
2
It provides support to determine what
techniques or their combination provide
optimal performances for ontology learning
• Results
• Demo
• Conclusions
• Further Work
Our ontology learning
environment in unique in
not only providing a
framework for integrating
linguistic techniques, but
also
possibility
an
experimental platform for
identifying
the
most
effective technique or
combinations.
Further Work
Our Project:
• Context
• Problem
OntoLancs – A Flexible Framework For Ontology
Learning
• Research
Question
• Background
• Framework
Including new techniques (plugin)
from different tools.
Experimenting with techniques in
a Supervised and Unsupervised
Mode
• Results
• Demo
• Conclusions
•• Further
Further Work
Work
Future Work
A graphical workflow engine will
provide support for the
composition of complex ensemble
techniques
Integration with Protégé
(Editor)
The End
• Context
• Problem
• Research
Question
• Background
• Framework
• Results
OntoLancs
• Demo
Computing Department
• Conclusions
• Further Work
Lancaster University
2006, UK
Text2Onto vs. OntoLancs
• Context
• Problem
• Research
Question
• Background
Text2Onto defines the user interaction as a core aspect whereas
our framework provides support to process algorithms in a
unsupervised mode.
Our framework provides a graphical workflow engine to provide
support for the composition of complex ensemble techniques.
• Framework
• Results
• Demo
• Conclusions
• Further Work
Our framework uses a plug-in-based structure as Text2Onto.
However, in contrast, it can include techniques from existing
linguistic and ontology tools by using java API’s.
Techniques included into OntoLancs
• Context
• Problem
• Research
Question
• Background
• Framework
• Results
• Demo
• Conclusions
• Further Work
1.
2.
3.
4.
5.
6.
7.
8.
Grouping by POS
Raw Frequency Filtering
POS Filtering
Lemmatization
Stemming
StopWord Filtering
Frequency Profiling
Syntactic Pattern Coocurrences
9. Window-based Collocations
10. Semantic Filter (soon)