Use of WordNet and on-line dictionaries to build EN

Download Report

Transcript Use of WordNet and on-line dictionaries to build EN

Use of WordNet and on-line
dictionaries to build EN-SK
synsets
(experimental tool)
Ján GENČI
Technical University of Košice, Slovakia
[email protected]
Plan
•
•
•
•
•
WordNet, EuroWordNet + Slovak language
Motivation
Solution
Results
Future plans
2
WordNet, EuroWordNet
• Well known projects
• WordNet defines meaning of English words
and their relationships (it defines synsets)
• EuroWordNet (EWN) is very similar
multilingual project
• EWN doesn’t contain Slovak language
(Slovak WN)
3
Motivation
• Text classification tasks require reduction of
dimensionality and Intelligent search 
– Morphological database
– Something like WordNet
4
Our approach
• We decided to try to use on-line dictionaries
to map Slovak meanings to Wordnet synset
entries
• Two approaches:
– Intersection of translation of each member of
EN synset
– Intersection of translation of related words
5
Architecture
Input word
Synset
Builder
Inet
online dict.
WordNet DB
local DB
6
Synset “members” translation
• According WN word computer has 2
meanings specified by 2 synsets
– {computer, computing machine,computing device, data
processor,electronic computer, information, processing
system}
– {calculator, reckoner, figurer, estimator, computer}
• Result is formed as intersection of
translation of synset members
7
Translation of related words
• Based on hyponym/hyperonym relationship
between words:
– Related words are translated
– Result is formed as intersection of partial
translations
8
Results
• We provide 4 Slovak and 2 Czech on-line
dictionaries (Slovak dictionaries seem to be
from one source)
• Result depends on:
– Number of members in the synset (1 is
problem)
– Related words
– Quality(?) of dictionary
9
Results (cont.)
• Parts of speech are sometimes mixed (nouns
and adjectives)
• We implemented “multilingual view”
• Time consuming approach (quite slow) –
results are stored to the database
10
Examples
word computer
11
12
13
14
Example
word table
15
16
Future works (plans)
• To deal with “dictionary problem”
• To eliminate mixed parts of speech in the
results (at least for Slovak language, using
morphological database)
• To connect other languages
17
• Local copy of new webpage
• Addresses
– http://ruzin.fei.tuke.sk/~laposp
– http://ruzin.fei.tuke.sk/~sudynova (new one)
18
Thank you!
19