Validation Tasks Performed by the Czech Team

Download Report

Transcript Validation Tasks Performed by the Czech Team

Parallel Corpora for Multilingual
Ontology Learning
Pavel Smrz
Brno University of Technology
Czech Republic
Motivation



Ontologies (in computer science) are formal
and machine readable representations of
concepts and relations among them
Ontologies can be used for efficient
semantic querying, relevance measurement,
sharing knowledge representation and many
other purposes
Ontologies also play the major role in the
Semantic Web vision (data on the web
understandable by machines – Berners-Lee,
OWL - Ontology Web Language)
(subclass Weapon Device)
(documentation Weapon "The &%Class of &%Devices that are designed
primarily to damage or destroy &%Humans/&%Animals,
&%StationaryArtifacts or
the places inhabited by &%Humans/&%Animals.")
(=>
(instance ?WEAPON Weapon)
(capability Damaging instrument ?WEAPON))
(=>
(instance ?WEAPON Weapon)
(hasPurpose ?WEAPON (exists (?DEST ?PATIENT)
(and
(instance ?DEST Damaging)
(patient ?DEST ?PATIENT)
(or
(instance ?PATIENT StationaryArtifact)
(instance ?PATIENT Animal)
(exists (?ANIMAL)
(and
(instance ?ANIMAL Animal)
(inhabits ?ANIMAL ?PATIENT))))))))
Motivation




Three levels of generality of ontologies:
 foundational (top, upper) ontology
 core ontology
 specific domain ontology
Domain ontologies provide common understanding
of particular application domains
Creating ontologies is extremely demanding,
labor-intensive and time-consuming task
Automatic acquisition (learning) of ontologies
(semantic relations) from text
Motivation





Pattern-based extraction of semantic
relations (M. A. Hearst, 1992)
Methods for hyponymy (is-a) later adopted
for other kinds of relations
Token co-occurrence techniques to gather
sets of concepts belonging to the same class
Various combinations and modifications
(D. Lin, A. Kilgarriff – WordSketches)
The same methods applied for subjective
language identification and opinion mining
OLE – Ontology
Learning
Platform



Based on the patternextraction technique
Used for ontology
extraction from
biomedical texts
Explicit representation
of uncertainty BayesOWL
OLE – Ontology Learning Platform





OLITE processes plain text and creates the
miniontologies from the extracted data
PALEA is responsible for learning of new
semantic patterns
OLEMAN merges the miniontologies
resulting from the OLITE module and
updates the base domain ontology
NP1 [“,”] “such as” NPList2
NP1 (“and”|“or”) “other” NP2
The ontology of HATs
hat
Red hat
small hat
flying hat
brown hat
leather hat
The ontology of HATs
hat
Red hat
small hat
flying hat
brown hat
leather hat
Extracted Relations
type of
relation
tool_for
subject
object
relevance
SCFG
RNA secondary
structure prediction
0.66
described
_in
function_
of
CKY
Cocke-Kasami-Younger
algorithm
ribosomal
RNA
frameshifting
0.81
abbr_
means
is_a
HMM
Hidden Markov Models
0.69
RNA
molecule
0.45
is_a
protein
molecule
0.45
0.73
Multilingual Pattern Learning



Extraction patterns usually defined manually
The task must be repeated for each new
language in a multilingual environment.
Parallel corpora for pattern acquisition:
1.
2.
3.
4.
Patterns defined for one language (English)
Semantically related expressions extracted from
the given part of the corpus
Translation equivalents in all other languages
Patterns for the respective languages
automatically derived from the corpus data
Subjectivity Clue Extraction




CLUMSY - clue mining system for subjective
language
Subjectivity clues and opinion extraction
patterns defined for English
Data from parallel English-Czech corpus used
for automatic acquisition of Czech patterns
Integrated in a prototype of EPOS – a new
multilingual opinion mining system
EPOS –
Electronic
Poll
System
Future Directions



Automatically extracted domain ontologies
for PortaGe – e-learning portal generator
Opinionated parallel texts for the
development of EPOS
Ontology learning from Acquis Communautaire
parallel corpus – automatic acquisition of legal
ontologies?