Creation of English and Hindi Verb Hierarchies and their

Download Report

Transcript Creation of English and Hindi Verb Hierarchies and their

Creation of English and Hindi Verb
Hierarchies and their Application to
Hindi WordNet Building and
English-Hindi MT
Debasri Chakrabarti, Gajanan Krishna Rane, Pushpak Bhattacharyya.
Computer Science and Engineering Department,
Indian Institute of Technology, Bombay,
Mumbai, 40076, India.
debasri,gkrane,[email protected]
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
1
Introduction

Verb hierarchy

creation of the verb hierarchy for English and Hindi
verbs.
organized according to semantics and syntax
semantic hierarchy - through the super-ordinate terms and the

syntactic information- through UNL case relations



inbuilt ontology of the UNL KB.
System is based on



English verb classes and their alternation (Levin)
UNL System: UW Manual, Knowledge base (KB) & specification
Semantic relations of English WordNet
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
2
Levin’s Class of English verbs

Classification of the English verbs
Adopted from English Verb Classes and Alternation of Beth Levin.

Details of Levin’s work
Levin’s classification of the English verb is the most significant and
celebrated work.

Assumption underlying Levin’s work
Syntactic behavior of a verb is semantically determined

Levin investigated and exploited this hypothesis
for about 3200 English verbs.
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
3
Details of Levin’s work
Verb Classes

Preliminary Investigation
considerable correlation between some facets of the
semantics of verbs and their syntactic behavior

200 semantic classes defined in Levin’s system
each class share a number of alternations

Example of verb classes
verbs of putting , verbs of communication, correspond
verbs etc.
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
4
The Universal Networking
Language (UNL)

Universal Networking Language (UNL)
electronic language for computers to express and
exchange information.

UNL system consists
Universal words (UW) : Vocabulary of UNL
Relations, attributes : Syntax of UNL
UNL knowledge base (KB): Semantics of UNL
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
5
The Universal Networking
Language

UNL represents information



sentence-by-sentence as a hyper-graph
concepts as nodes and relations as arcs
Sentence is a hyper-graph


7/17/2015
a node in the structure can itself be a graph
the node is called a compound word (CW)
C.F.I.L.T., I.I.T. BOMBAY
6
Graphical representation in UNL
eat (icl>do)
agt
obj
@ entry @ present
ins
John (iof>person)
rice (icl>food)
spoon (icl>artifact)
John eats rice with a spoon
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
7
Verbal Concepts in UNL

Verbal concepts in the UNL system are organized
into three categories

(icl>do) for defining the concept of an event which is
caused by something or someone
change (icl>do) : as in She changed the dress

(icl>occur) for defining the concept of an event that
happens of its own accord
change (icl>occur) : as in The weather will change

(icl>be) for defining the concept of a state verb
remember (icl>be) : as in Do you remember me?
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
8
Verbal Concepts in UNL
do(agt>thing{,^gol>thing,icl>do,^obj>thing,^ptn>thing,^src>thing})
do(agt>volitional thing{,icl>do(agt>thing)})
do(agt>living thing{,icl>do(agt>volitional thing)})
do(agt>human{>living thing,icl>do(agt>living thing)})
do(agt>thing,gol>thing{,icl>do, ^obj>thing,^ptn>thing,^src>thing})
Partial hierarchical structure for do
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
9
do in UNL KB

Semantic hierarchy in terms of the inbuilt ontology in KB
do(agt>thing,gol>thing{,icl>do},obj>thing{,^ptn>thing,^src>thing})
do({icl>do(}agt>thing{,gol>thing,obj>thing)},gol>abstract thing,obj>abstract thing)
do({icl>do(}agt>thing{,gol>abstract thing,obj>abstract
thing)},gol>custom{>abstract thing},ob j>custom{>abstract thing})
do(gol>thing)
7/17/2015
do(gol>abstract thing)
C.F.I.L.T., I.I.T. BOMBAY
do(gol>custom)
10
Creation of the verb hierarchy



First, a particular verb class is selected from
Levin.
Next the class is categorized according to the
UNL format
Parent node of a class is obtained through
English wordnet and various dictionaries
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
11
Creation of the verb hierarchy
“put”
‘Put your clothes in the cupboard’.
(to put something into a certain place)
(icl>move(agt>person,obj>concrete thing,gol>place)
(loc_prep{in/on/into/under/over})
[VTRANS, VOA-ACT]
“hang”
‘He hanged the wallpaper on the wall’.
(to suspend or fasten something so that it is held up from above and
not supported from below)
(icl>put{>move}(agt>person,obj>concrete thing,gol>place)
(loc_prep{from/on})
[VTRANS, VOA-ACT]
Partial hierarchy of the put class
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
12
Verb Hierarchy in Hindi
रखना ; rakhanaa; r
‘put’ ‘Put your things here.’ (to put something into a certain place)
(icl>act(agt>person,obj>concrete thing,gol>place)
अपना सामान यहााँ पर रखो।; (r
r) ; apanaa saamaana
yahaa par rakho)
{(adv_plc (यहााँ/वहााँ / ‘/ v’ loc_postp (पर ‘r’)}
रखना, सजाना ; r, ; rakhanaa , sajaanaa;
‘arrange’ ‘he arranged the books here’.(to put into a proper or systematic
manner)
(icl>put{>act}(agt>person,obj>thing)
उसने किताबों िो यहााँ पर सजािर रखा। ueitb
rrr.)
(usne kitabo ko yahaa par sajaakar rakhaa.)
{(adv_man (सजािर, r;क्रम से, re))+ (adv_plc
(यहााँ/वहााँ / ‘/
v’ ))+ loc_postp( पर ‘r’)}
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
13
Verb Hierarchy in Hindi

Syntax frames specified
for the put class in
English



(adv_plc{here/there})
(loc_prep)
Sentence frames for put
in Hindi



7/17/2015
English
Hindi
adv_plc (here / adv_man (सजािर,
r; क्रम से,
there)
re etc )
loc_prep (in,
inside, on etc)
adv_man
adv_plc + adv_man
loc_postp + adv_man
C.F.I.L.T., I.I.T. BOMBAY
adv_plc(यहााँ/वहााँ /
‘/
v’)
+loc_postp(पर
‘r)+adv_man
(सजािर, r
;क्रम से, re
etc)
loc_postp(िे उपर, e
ur
etc)+adv_man (सजािर,
14
Verb hierarchy and the Hindi
WordNet

Application of the hierarchy in the Hindi wordnet
will help in determining



semantic relations like hypernymy and troponymy
syntactic frames
Application of the hierarchy in the Hindi wordnet
revealed facts like


difference in the representations for troponyms in Hindi and English
reclassifications of the verbs in Hindi
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
15
Representations of Troponyms
English
put
sentence
Hindi
put your things here.
अपना सामान यहााँ पर
रखना
r रखो।; (

pile
pile your books up on the
shelves.
-----
cram
she cram the books into the
suitcase.
-----
7/17/2015
sentence
C.F.I.L.T., I.I.T. BOMBAY

rr)
उसने खाने में एि िे ऊपर
एि सामान
रखा।;((((((u
eee
eer
e
r)
उसने बक्से िे अन्दर सारी
किताब ठाँ सिर
रखी।;(ue
16
Classification of Hindi Verbs
Verbs
simple
noun + verb
7/17/2015
conjunct
adjective + verb
C.F.I.L.T., I.I.T. BOMBAY
compound
adverb + verb
17
Classification of the Hindi Verbs



Simple verbs
खाना()
‘to eat’
Compound verbs
गिर पड़ना(ir)
‘to fall down’
Conjunct verbs



7/17/2015
noun + verb
आरं भ िरना (rbr) ‘to start’
adjective +verb शांत िरना (tr) ‘to calm down’
adverb + verb उठािर रखना (utrr) ‘to lift’
C.F.I.L.T., I.I.T. BOMBAY
18
Reclassification of the Hindi verbs

Sentence frames of the verbs reveals

only noun+ verb conjunct is a true conjunct
Hence, a re-classification of the verbs is needed
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
19
Application in NLP

The application of the verb hierarchy in NLP

gives semantic hierarchy of a verbal concept

enumerates syntactic details of a verb

UNL based MT will be immensely benefited

7/17/2015
possible UNL relations that appear with a concept is specified
C.F.I.L.T., I.I.T. BOMBAY
20
Application in MT
Verb
Sentence
Frame
UNL Relations
fight
Sam and Sue fought.
conj_and
agt>person
fight
Sam was fighting with Sue.
prep_accompaniment{with}
agt>person,
ptn>person
fight
The tribesmen fought each
other.
-prep_with
agt>person,
obj>person
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
21
Conclusion

System statistics



Common English verbs are dealt with



approximately 3000 English verbs
approximately 5500 UWs
tested against British National Corpus
Coverage of both English and Hindi verbs is
increasing everyday
Visualizer and an application programming
interface for the verb knowledge bases in both
the languages are under construction
7/17/2015
C.F.I.L.T., I.I.T. BOMBAY
22