PowerPoint Presentation - LTRC

Transcript PowerPoint Presentation - LTRC

Constraint Based
Hindi Parser
LTRC, IIIT Hyderabad
Introduction

Broad coverage parser


Very crucial
IL-IL MT systems, IE, co-reference resolution, etc.
Why Dependency ?

Phrase Structures




Intrinsically presumes order
Context Free Grammar (CFG) not well-suited for
free-word order languages (Shieber, 1985)
Particularly ill suited to Indian Languages
Dependency Structures



Gives flexibility
Common structures
With appropriate labels, closer to Semantics
Computational Paninian
Grammar (CPG)



Based on Panini’s Grammar (500 BC)
Inspired by Inflectionally rich language
(Sanskrit)
A dependency based analysis
Computational Paninian Grammar
(The Basic Framework)

Treats a sentence as a set of modifiermodified relations


Sentence has a primary modified or the root
(which is generally a verb)
Gives us the framework to identify these
relations



Relations between noun constituent and verb
called ‘karaka’
karakas are syntactico-semantic in nature
Syntactic cues help us in identifying the karakas
karta – karma karaka

The boy opened the lock



karta, karma usually
correspond to agent, theme


k1 – karta
k2 – karma
But not always
karakas are direct participants
in the activity denoted by the
verb
open
k1
boy
k2
lock
Basic karaka relations







karta – agent/doer/force
 Relation label – k1
karma – object/patient
 Relation label – k2
karana – instrument
 Relation label – k3
sampradaan – beneficiary
 Relation label – k4
apaadaan – source
 Relation label – k5
adhikarana – location in place/time/other
 Relation label – k7p/k7t/k7
For complete list of dependency relations: (Begum et al., 2008)
Basic karaka relations
raama phala khaataa hai
‘Ram eats fruit’
Basic karaka relations
raama chaaku se saiv kaatataa hai
‘Ram cuts the apple with knife’
Basic karaka relations
raama ne mohana ko pustaka dii
‘Ram gave a book to Mohan’
Why Paninian Labels

Other choices for labels could be

Grammatical relations



Thematic roles




Subject, Object, etc.
Behavioral tests (Mohanan, 1994)
Agent, patient, etc.
No concrete cues
Difficult to extract them automatically
Karakas can be computationally exploited


Syntactically grounded, Semantically loaded
Gives a level of interface
Levels of Language Analysis





Morphological analysis (Morph Info.)
Analysis in local context (POS tagging)
Sentence analysis (Chunking, Parsing)
Semantic analysis (Word sense
disambiguation, etc.)
Discourse processing (Anaphora resolution,
Informational Structure, etc.)
Example

rAma ne mohana ko puswaka xI |
Example – Parsed Output
xI ‘give’
k1
rAma
k4
k2
mohana
puswaka
‘book’
Parser

Two stage strategy


Stage I (Intra-clausal relations)



Appropriate constraints formed
Dependency relations marked
Relations such as k1, k2, k3, etc. for each verb
Stage II (Inter-clausal relations & conjunct
relations)

Conjuncts, relative clauses, kriya mula, etc
Demand Frame for Verb



A demand frame or karaka frame for a verb
indicates the demands the verb makes
It depends on the verb and its tense, aspect
and modality (TAM) label.
A mapping is specified between karaka
relations and vibhaktis (post-positions, suffix).
Karaka Frame


It specifies what karakas are mandatory or
optional for the verb and what vibhaktis (postpositions) they take respectively
Each verb belongs to a specific verb class


Each class has a basic karaka frame
Each TAM specifies a transformation rule
Example

rAma mohana ko puswaka xewA hE |
xewA hE ‘give is’
k1
rAma
k4
k2
mohana
puswaka
‘book’
Parsed Dependency Tree
Transformations

Based on the TAM of the verb



rAma ne mohana ko KilOnA xiyA |
rAma ko mohana ko KilOnA xenA padZA |
Appropriate transformation applied
Example

rAma ne mohana ko puswaka xI |
Karaka Frame – xe (give)
Transformation Rule – yA
(TAM)
Karaka Frame
yA TAM
rAma ne mohana ko KilOnA xiyA |
Transformed frame for xe after applying the yA trasformation
---------------------------------------------------------------------------------------arc-label
necessity
vibhakti
lextype
src-pos
arc-dir
---------------------------------------------------------------------------------------k1
m
ne
n
l
c
k2
m
0|ko
n
l
c
k3
d
se
n
l
c
k4
d
ko
n
l
c
---------------------------------------------------------------------------------------0  ne
Parsed Output
xI ‘give’
k1
rAma
k4
k2
mohana
puswaka
‘book’
Other frames

Adjectives
Steps in Parsing
SENTENCE
Morph, POS tagging,
Chunking
Identify
Demand
Groups
Load Frames
&
Transform
Find Candidates
Apply
Constraints
& Solve
Final Parse
Example:

rAma ne mohana ko KilOnA xiyA |
Identify the demand group,
Load and Transform DF

xiyA


Only verb
Transformed frame

Use ‘yA’ TAM info.
---------------------------------------------------------------------------------------arc-label
necessity
vibhakti
lextype
src-pos
arc-dir
---------------------------------------------------------------------------------------k1
m
ne
n
l
c
k2
m
0|ko
n
l
c
k3
d
se
n
l
c
k4
d
ko
n
l
c
----------------------------------------------------------------------------------------
Candidates
k1
main

rAma ne mohana ko KilOnA
k2
k4
k2
xiyA _ROOT_ |
Constraints

C1: For each of the mandatory demands in a
demand frame for each demand group, there should
be exactly one outgoing edge labeled by the
demand from the demand group.

C2: For each of the optional demands in a demand
frame for each demand group, there should be at
most one outgoing edge labeled by the demand
from the demand group.

C3: There should be exactly one incoming arc into
each source group.
Constraints



A parse of a sentence is obtained by
satisfying all the above constraints
Ambiguous sentences have multiple parses
Ill formed sentences have no parse.
Parse - I
k1
main

rAma ne mohana ko KilOnA
k2
k4
xiyA _ROOT_ |
Parse - I
_ROOT_
main
xiyA
k1
rAma
k4
k2
mohana
KilOnA
Integer Programming Constraints



Xijk represents a possible arc from word group
i to j with karaka label k
It takes a value 1 if the solution has that arc
and 0 otherwise. It cannot take any other
values.
The constraint rules are formulated into
constraint equations.
Constraint Equations
C1: For each demand group i, for each of its mandatory
demands k, the following equalities must hold:
Mik :
Sj xikj = 1
C2: For each demand group i, for each of its optional or
desirable demands k, the following inequalities must hold:
Oik :
Sj xikj <= 1
C3: For each of the source groups j, the following equalities
must hold:
Sj
:
Sik xikj = 1
Multiple Frames

If more than one karaka frame for a verb


Call Integer Programming package for each
frame
If more than one demand groups (e.g.,
multiple verbs) in the sentence with multiple
demand frames

Call Integer Programming package for each
combination of such frames
Other frames

Common karaka frame



Attached to each karaka frame
Preference given to main frame if there are
clashes
Fallback karaka frame


required karaka frame is missing
Graceful degradation
Stage I: Types being handled


Simple Verbs
Non-finite verbs







wA_huA
wA_hI
nA
kara
0_rahe, etc.
Copula
Genitive
Example (Complex Sentence)

rAma ne phala khaakara
mohana ko
Ram ‘ERG’ fruit ‘having eaten’ Mohan ‘DAT’
KilOnA xiyA
toy
gave
‘Having eaten the fruit Ram gave the toy to Mohan’
Candidates
X1: k1
X8: main
X4: k2
X7: vmod

rAma ne phala khaakara mohana ko KilOnA
X6: k2
X3: k2
X5: k4
X2: k2
xiyA _ROOT_ |
Constraint Equations

Verb ‘xe’
 Mandatory Demands (C1)



Optional Demands (C2)


k4  x5 <= 1
Verb ‘khaa’
 Mandatory Demands (C1)



k1  x1 = 1
k2  x2 + x3 + x4 = 1
k2  x6 = 1
vmod  x7 = 1
_ROOT_
 C1

Main  x8 = 1
Constraint Equations (contd.)

Incoming Arcs into Source (C3)






rAma
 x1 = 1
phala
 x4 + x6 = 1
khaa
 x7 = 1
mohana
 x3 + x5 = 1
KilOnA
 x2 = 1
xe
 x8 = 1
Solution Graph
_ROOT_
main
xiyA
k1
rAma
vmod
k4
khaakara
k2
phala
k2
mohana
KilOnA
References

Akshar Bharati and Rajeev Sangal. 1993. Parsing free word order languages in
Paninian Framework. ACL:93, Proc.of Annual Meeting of Association of
Computational Linguistics, Association of Computational Linguistics, New Jersey.
USA.

Akshar Bharati, Rajeev Sangal, T Papi Reddy. 2002. A Constraint Based Parser
Using Integer Programming In Proc. of ICON-2002: International Conference on
Natural Language Processing.

Rafiya Begum, Samar Husain, Arun Dhwaj, Dipti Misra Sharma, Lakshmi Bai and
Rajeev Sangal. 2008. Dependency Annotation Scheme for Indian Languages. In
Proceedings of The Third International Joint Conference on Natural Language
Processing (IJCNLP). Hyderabad, India.

S. M. Shieber. 1985. Evidence against the context-freeness of natural language. In
Linguistics and Philosophy, p. 8, 334–343.

Tara Mohanan, 1994. Arguments in Hindi. CSLI Publications.
THANKS!!

PowerPoint Presentation - LTRC

Transcript PowerPoint Presentation - LTRC

Directory