Partial Dependency Parsing for Irish
Download
Report
Transcript Partial Dependency Parsing for Irish
Partial Dependency Parsing
for Irish
Elaine Uí Dhonnchadha &
Josef Van Genabith
1
Aims of the Research
To be able to parse and/or chunk unrestricted
Irish text
To account for as much of the syntactic
phenomena of Irish as possible in an efficient
and principled way
To use open-source software a far as
possible
2
Outline of the Talk
Background
Stages of Development for Dependency
Parser
Chunker
Future Work
3
Irish Language – some facts
Celtic Language
Verb – Subject – Object sentence word order
Goidelic (Irish, Manx, Scottish Gaelic)
Brittonic (Breton, Cornish, Welsh)
Chaith Seán an
Threw Seán the
V
S
‘Seán threw the
liathróid.
ball.
O
ball’
Fixed word order
4
Irish Language
Inflectional language
gender: fem/masc
case: common/genitive/vocative
verbs inflected for number and person
chuala mé, I heard (analytic)
chualamar, we heard (synthetic)
Initial mutation of words
cailín ‘girl’, an chailín ‘the girl’
arán ‘bread’, an t-arán ‘the bread’
seachtain ‘week’, an tseachtain ‘the week’
bord ‘table’, ar an mbord ‘on the table’
5
Irish Language
Prepositions inflected for person and number.
Labhair sé liom
faoi
Spoke
he with-me about-it
‘He spoke to me about it’
Tabhair dom
é
Give
to-me it
‘Give it to me’
Full paradigm for every preposition
liom
leat
leis
‘with-me’
‘with-you’
‘with-him/it ETC. ETC.
6
Irish Language
Verbal noun - used in progressives, perfects, infinitives,
etc.
De-verbal nouns: bris(v) ‘break’, briseadh(vn) ‘breaking
De-agentive nouns: feirmeoir(n) ‘farmer’, feirmeoireacht(vn)
‘farming’
Progressive
Tá mé ag oscailt
an dorais
Is he at opening(vn) the door(gen)
‘He is opening the door’
After Perfect
Tá mé tar_éis an
doras a
oscailt
Is me after
the door PRT opening(vn)
‘I am after opening the door’
7
Parsing Methodology
Dependency Analysis & Constituency Analysis
Dependency Analysis
Relationships between pairs of words
Grammatical Functions and Head-Modifier dependencies
Root and terminal nodes
Constituency Analysis
Phrase Structure Rules, e.g. S = NP VP
Hierarchical structure; root, phrase categories,
leaf/terminal nodes
8
Dependency Analysis
Issues in the theoretical syntax of Irish on which
there is no clear concensus …
The non-adjacency of verb and object in a VSO language,
i.e. difficulties with VP
Some periphrastic aspectual constructions in Irish, e.g.
progressive aspect has more nominal than verbal
characteristics …
Dependency Analysis includes semantic as well
as synactic information
9
Dependency Parsing
A dependency analysis looks at dependencies between
pairs of words (which do not have to be adjacent) in a
sentence
The tokens present in the input string are annotated
without introducing any abstract categories (e.g. phrasal
nodes)
i.e. dependency analysis consists of a root, and leaf nodes,
without intermediate levels
Grammatical functions such as subject, object, predicate,
as well as various types of prepositional phrase, e.g.
adverbial, aspectual, predicative, etc. are annotated
Clauses and head-modifier dependencies are identified
10
Dependency Parsing
Surface-oriented, bottom-up parsing
Dependency relations between pairs of tokens
Grammatical functions
Head-modifier relations
Tokens not necessarily adjacent.
DO
S
V
Det
Bhris an
N
Det
N
fear a rúitín
Broke the man his ankle
‘The man broke his ankle’
11
Previous NLP Work
Tokenization & Morphological Analysis
POS Tagging and Parsing
Finite-State Morphology: (Karttunen, Beesley, 1999; 2003)
Finite-State Morphological Analyser & Generator for Irish: (Uí
Dhonnchadha, 2002)
Constraint Grammar (CG): Karlsson et al (1995),
Constraint Grammar Parser CG-2 (Tapanainen, 1996),
VISL CG3 (Bick et al, 2003 ...) http://visl.sdu.dk
Chunking:
Partial Parsing via Finite-State Cascades (Abney, 1996)
12
Stages of Development
Define the Syntactic Phenomena to include
Gather Test Data
Decide on Parsing Methodology
Decide a Tag-Set for dependency and
grammatical relations
Develop Linguistic Rules for dependency
analysis
Test the rules
Evaluate the results
13
Syntactic Phenomena
Sources of Information
Grammar books
Previous research on aspects of Irish Syntax
Simple declarative sentences (incl. neg. and
interrogative)
Relative clauses
Copular constructions
Non-finite complements
Adjuncts
14
Test Data (Gold Standard)
Sample Sentences
Short invented grammatical sentences (225) based on
grammar books etc.
Automatically POS tagged and manually checked and
corrected
Dependency tagged and manually checked and corrected
Chunked and manually checked and corrected
Corpus Data
Corpus data – 250 real sentences
randomly selected from the 3000 sentence Gold
Standard POS Tagged Corpus
Dependency tagged, chunked and manually checked
and corrected
15
Tag Set
Grammatical Functions
@SUBJ, @OBJ, @FMV, @FAUX @CLB, etc.
Unlabelled depedencies
@>N, @N<, @P<, etc.
Start with the @ symbol, by convention, to distinguish them from
morphosyntatic tags
“Fuair” faigh +Verb+VTI+PastInd+Len+@FMV
This tagset follows the style of tags described for English
(Karlsson, 1995), and for Danish (Bick, 2003),[1]
However, there is not a prescribed list of tags for CG, which
allows us to tailor the tagset to the language.
[1] Other languages are also detailed on the VISL website:
http://visl.sdu.dk/corpus_linguistics.html
16
Dependency Tags: Verbs and Copulas
@FMV
finite main verb
rith 'run'
@FMV_SUBJ
finite main verb including subject
ritheamar 'we ran'
@FMV_REL
relative finite main verb
a chuala mé, 'that I heard'
@FMV_REL_SUBJ
relative finite main verb incl. subject
a chualamar, 'that we heard'
@FAUX
finite auxiliary verb
Tá sé ag cócaireacht 'He is cooking'
@FAUX_SUBJ
finite auxiliary verb including subject
táimid 'we are'
@FAUX_REL
relative finite auxiliary verb
atá siad 'that/which they are'
@FAUX_REL_SUBJ
relative finite auxiliary verb including
subject
atáimid 'that/which we are'
@COP
copula
Is
@COP_SUBJ
copula including subject
Seo an fear...'This is the man...'
@COP_WH
interrogative copula
cé leis an leabhar 'whose is the book'
@INF
bare infinitive
Ba mhaith liom fanacht 'I would like to stay'
17
Dependency Tags: Grammatical Relations
@SUBJ
subject
Chonaic Seán Máire, 'Seán saw Máire'
@SUBJ_ASP
subject of aspectual phrase
bhí sé ag obair 'he was working'
@SUBJ_INF
subject of infinitive (intrans)
an obair a bheith déanta 'the work to be done'
@SUBJ_OR_OBJ
subject or obj. of relative clause
a chonaic an bhean, 'that the woman saw' OR
'that saw the woman'
@SUBJ_REL
subject of relative clause
a rinne sé 'that he made'
@OBJ
object
Chonaic Seán Máire, 'Seán saw Máire'
@OBJ_ASP
object of aspectual
ag déanamh oibre, 'doing work'
@OBJ_INF
object of infinitive
bainne a ól, 'to drink milk'
@PRED
predicate
Tá sé mór 'It is big'
@NP
unlabelled noun head, e.g. list item,
apposition, or fragment
1) dathuithe, 2) leasaithigh, '1) colours, 2)
additives'
@CC
co-ordinating conjunction
agus 'and'
@CLB
clause boundary
e.g. agus ‘and’ when followed by a verb, and
subordinating conjs.etc.
18
Dependency Tags: Head Modifiers
(Unlabelled Dependencies)
@>ADJ
adverbial particle dependent on the adjective to the
right
go ciúin 'quietly'
@>N
pre-modifier dependent on the first noun to the right
an 'the'
@>V
pre-verbal particle dependent on a verb to the right
ní 'not'
@ADVL<
adverbial post modifier
@N<
noun post-modifier
teach mór 'big house'
@P<
noun dependent on the preceding prep.
ag an doras 'at the door'
@PC<
noun dependent on compound preposition is in
genitive case
tar éis na Nollag, after Christmas
@PN<
pronoun post-mod.
é féin 'himself'
@PRED<
dependent on predicate
Is deas an lá é 'It is a nice day' i.e. Is nice the
day it
@ADVL
adverbial
anocht 'tonight'
@AUG>SUBJ
augment pronoun dependent on subj. to the right
Is é Seán …, It/He, Seán is…
19
Dependency Tags: Prepositional Phrases
@PP_ADVL
head adverbial adjunct
ag an doras 'at the door'
@PP_ASP
head of an aspectual
ag rith '(at) running'
@PP_HAS
‘at X’ meaning ‘X has’
ag Seán, 'Seán has' i.e. at Seán (possession)
@PP_NEG
negative
gan dul 'without going'
@PP_OBL
oblique PP head
do Mháire ‘to Máire’
@PP_SUBJ
prep + subj pronoun
D'éirigh liom, 'I succeeded' i.e. success was with me'
@PP_PRED
Predicative
Is liom é 'It is mine' i.e. Is with me it (ownership)
@PP_STAT
stative
ina rí 'is a king' i.e. 'in his king(hood)'
20
Parsing Methodology: Constraint Grammar
Aims (Karlsson et al., 1995)
assign the appropriate morphological and syntactic
information according to the context of each token or
larger structure in the text;
assign an analysis to every string in the input, bearing in
mind that unrestricted text will contain typographical errors,
non-sentential fragments, dialectal and colloquial material;
if an ambiguity cannot be resolved, the alternative
analyses are retained rather than forcing a (possibly
incorrect) choice
21
Constraint Grammar Principles
Differences between CG and other parsing
methodologies (Karlsson, 1995, p37).
Unlike a context-free grammar, a Constraint Grammar
does not attempt to define the set of grammatical
sentences in a language.
‘... everything is licensed which is not explicitly ruled
out’
makes it more robust in handling unrestricted text
Does not aim to produce a minimal set of general
rules – a CG grammar can contain many specific
lexically-specific rules to handle special cases.
Doesn’t attempt to determine constituency structure.
22
CG Dependency Rules
MAP (@TAG) TARGET (POS) IF
(CONDITIONS);
e.g.
MAP (@FMV) TARGET (Verb) IF
(NOT 0 VSYNTH OR AUX)
(NOT -1 RELPART)
(NOT -2 RELPART);
SETS
LIST VSYNTH = (Verb 1P) (Verb 2P) (Verb 3P)
(Verb Auto) ;
LIST AUX = ("bí") ("téigh") ("tosaigh")
("tosnaigh") ("féad") ("caith") ("féach");
LIST RELPART = (Vb Rel) (Prep Rel) ;
23
Order of Implementation of Rules
Dependency Analysis is carried out in the following
order:
Clause Boundaries
Verbs and/or Copulas
Preposition Heads
All Dependent Modifiers
Subject
Predicates of Copular Constructions
Object(s)
Adverbials
Other
24
Example (1)
Fuair
sé
leabhar
ins
an
siopa
Fuair sé
Got
he
V
Pro
@FMV
faigh+Verb+VT+PastInd
sé+Pron+Pers+3P+Sg+Masc+Sbj
leabhar+Noun+Masc+Com+Sg
i+Prep+Art+Sg
an+Art+Sg+Def
siopa+Noun+Masc+Com+Sg+DefArt
leabhar ins
book
in
N
Prep
@SUBJ @OBJ
an siopa
the shop
Det N
@PP_ADVL @>N @<P
’He got a book in the shop’
25
Example (1)
Fuair sé leabhar
Got
he book
V
Pro N
@FMV @SUBJ @OBJ
’He got a book in
ins
an
in
the
Prep
Det
@PP_ADVL @>N
the shop’
siopa
shop
N
@P<
26
Example (1)
root
Fuair sé leabhar
Got
he book
V
Pro N
@FMV @SUBJ @OBJ
’He got a book in
ins
an
in
the
Prep
Det
@PP_ADVL @>N
the shop’
siopa
shop
N
@<P
27
Example (2)
Chonaic Máire an fear
Saw
Máire the man
V
N
Det N
@FMV
@SUBJ
a
bhí
that was
Rel V
@>N @SUBJ_REL @>V
ag
at
Prep
ithe
eating
VN
@FAUX @PP_ASP @<P
‘Máire saw the man that was eating’
ag
ithe
Prep
VN
@PP_ASP @<P
‘eating’
FORM
FUNCTION
28
Development/Test Cycle
POS Tagged Text
CG Mapping Rules
Dependency Analysis
Test against Gold Std.
29
Evaluation of Dependency Analysis
Sample Sentences: 225 short grammatical sentences
Precision (Test Suite): CorrectAut oTags 100 1,212 100 97.66%
TotalAutoT ags
1
1,241
1
Gold Standard Dependency Analysis Corpus
250 sentences randomly selected from the 3,000 sentence Gold
Standard POS Tagged Corpus
Gold Standard Development Set (150 Sentences)
Tot Tokens
Punct. Tokens
Tokens
Correct
Incorrect
F-Score
4403
444
3959
3706
253
93.60
Gold Standard Test Set (150 Sentences)
Tot Tokens
Punct. Tokens
Tokens
Correct
Incorrect
F-Score
2555
282
2273
2143
130
94.28
30
Chunking
Using the Dependency Annotations and a
Regular Expression Grammar (implemented
using Xerox Finite-State Tools[1]) we can
identify phrase-like structures, described by
Abney (1991) as 'chunks'.
[1] For details see
http://www.cis.upenn.edu/~cis639/docs/xfst.html
31
Implementation
Regular expressions and Xerox FST
Chunks
[NP .. ] , [V .. ] etc.
PP with embedded NP
Conjunction with embedded conjoint
[CJ2 .. [?] ]
[PP .. [NP .. ] ]
[NP úlla ] [CJ2 agus [NP oráistí NP] ]
‘apples and oranges’
Aspectual phrases
[ASP [PP-ASP .. [NP ..] ] ([OA ..]) ]
[ASP [PP-ASP ag [NP dúnadh ] ] [OA an dorais] ]]
‘closing the door’
32
Example (3)
"<Tá>"
"bí" Verb VI PresInd @FAUX
Is
"<sé>"
"sé" Pron Pers 3P Sg Masc Sbj @SUBJ he
"<ag>"
"ag" Prep Simp @PP_ASP
at
"<rith>" "rith" Verbal Noun VTI @P<
running
‘He is running’
[S
[V Tá bí+Verb+VI+PresInd+@FAUX ]
[NP sé sé+Pron+Pers+3P+Sg+Masc+Sbj+@SUBJ NP]
[ASP
[PP-ASP ag ag+Prep+Simp+@PP_ASP
[NP rith rith+Verbal+Noun+VTI+@P< NP]
PP-ASP]
ASP]
S]
33
Regular Expession Chunker
###########################################################
# Verb Chunk Dependency Tags
###########################################################
define VTag
[%@FAUX|%@FAUX%_REL|%@FMV|%@FMV%_REL];
define VSTag
[%@FAUX%_SUBJ|%@FAUX%_REL%_SUBJ|
%@FMV%_SUBJ|%@FMV%_REL%_SUBJ];
define PreVTag
[%@%>V];
# Verb Pre & Post Modifiers
define PreVStr
[TokLemMTag PreVTag SP];
# Verb Chunk
define VStr
[TokLemMTag VTag SP];
define VChunk
[PreVStr* VStr];
define VChunkBr
[VChunk @-> "[V " ... " ] "];
# Verb_Subject Chunk
define VSStr
[TokLemMTag VSTag SP];
define VSChunk
[PreVStr* VSStr];
define VSChunkBr [VSChunk @-> "[VS " ... " ] "];
34
Example (4)
Tá
bí+Verb+VI+PresInd+@FAUX
mé
mé+Pron+Pers+1P+Sg+@SUBJ_ASP
ag
ag+Prep+Simp+@PP_ASP
déanamh déanamh+Verbal+Noun+VTI+@P<
cáca
cáca+Noun+Masc+Gen+Sg+@OBJ_ASP
.
.+Punct+Fin+
‘I am making a cake’
Is
I
at
making
cake
.
[S
[V Tá bí+Verb+VI+PresInd+@FAUX V]
[NP mé mé+Pron+Pers+1P+Sg+@SBJ_ASP NP]
[ASP
[PP-ASP ag ag+Prep+Simp+@PP_ASP
[NP déanamh déanamh+Verbal+Noun+@P< NP] PP-ASP]
[OA cáca cáca+Nn+Msc+Gen+Sg+@OBJ_ASP OA]
ASP]
. .+Punct+Fin
S]
35
Corpus Data
Ach sin an toradh is measa a fhéadfadh tarlú don
pháirtí agus déarfaidís leat nár cóir an iomad airde
a thabhairt do na pobalbhreitheanna nach raibh
riamh fabhrach do na páirtithe beaga.
'But that is the worst possible result for the party
and they would say to you that it is not right to pay
too much attention to the opinion polls that were
never favourable to small parties.‘
36
Dependency Analysis
[S
[CONJ Ach ach+Conj+Subord+@CLB ]
[COP Sin sin+Cop+Pro+Dem+@COP_SUBJ ]
[NP
an an+Art+Sg+Def+@>N toradh
toradh+Noun+Msc+Com+Sg+DefArt+@PRED
is is+Part+Sup+@>ADJ
measa olc+Adj+Comp+@N< NP]
[VP
a a+Part+Vb+Rel+Direct+@CLB
fhéadfadh
féad+Verb+VTI+Cond+Len+@FAUX_REL ]
[INF
tarlú tarlú+Verbal+Noun+VTI+@INF INF]
[PP
don do+Prep+Art+Sg+@PP_ADVL
[NP
pháirtí
páirtí+Noun+Masc+Com+Sg+Len+@P< NP]
PP]
[CB
agus agus+Conj+Coord+@CLB ]
[VS
déarfaidís
abair+Verb+VTI+Cond+3P+Pl+@FMV+SUBJ]
[PP
leat le+Pron+Prep+2P+Sg+@PP_ADVL PP]
[COP nár is+Cop+Past+Rel+Neg+@CLB ]
[PRED cóir cóir+Adj+Base+@PRED ]
[INF
[I
an an+Art+Sg+Def+@>N
iomad iomad+Subst+Noun+Sg@OBJ_INF
airde aird+Noun+Fem+Gen+Sg+@N<
a a+Prep+Simp+@PP_INF thabhairt
tabhairt+Verbal+Noun+VTI+Len+@P<
I] INF]
[PP
do do+Prep+Simp+@PP_ADVL
[NP
na na+Art+Pl+Def+@>N
pobalbhreitheanna
pobalbhreith+Noun+Fem+Com+Pl+@P<
NP]PP]
[V
nach nach+Part+Vb+Neg+Rel+@CLB raibh
bí+Verb+PastInd+Neg+Len+@FMV_REL ]
[PRED riamh riamh+Adv+Its+@>ADJ ]
fabhrach fabhrach+Adj+Base+@PRED ]
[PP
do do+Prep+Simp+@PP_ADVL
[NP
na na+Art+Pl+Def+@>N páirtithe
páirtí+Noun+Masc+Com+Pl+DefArt+@P<
beaga beag+Adj+Com+NotSlen+Pl+@N<
NP] PP]
. +Punct+Fin
S]
37
Evaluation of Chunker
Evalb program used to evaluate bracketing of 250 sens.
150 Development Set Sentences
ALL SENTENCES
Number of sentence
SENTENCES Len<40
150 Number of sentence
120
Bracketing Recall
96.26 Bracketing Recall
97.31
Bracketing Precision
98.15 Bracketing Precision
98.57
Bracketing F-Measure
97.20 Bracketing F-Measure
97.94
100 Test Set Sentences
ALL SENTENCES
Number of sentence
SENTENCES Len<40
100 Number of sentence
85
Bracketing Recall
92.89 Bracketing Recall
94.09
Bracketing Precision
94.12 Bracketing Precision
94.09
Bracketing FMeasure
93.50 Bracketing FMeasure
94.09
38
Future Work
Partial Parsing to date as we have not addressed
Co-ordination
PP-attachment
[He] [stabbed] [the man with the knife]
[He] [stabbed] [the man] [with the knife]
PP-function
He packed his [clothes] and [shoes]
[He packed his clothes] and [left]
locative vs. stative
adjunct v.s. indirect object
adding additional info in the FS Lexicons, e.g. noun subclasses, subcategorisation frames for verbs
Irish Text Processing Tools:
http://www.scss.tcd.ie/Elaine.UiDhonnchadha/irish.utf8.htm
39