előadást - Szegedi Tudományegyetem

Download Report

Transcript előadást - Szegedi Tudományegyetem

The Syntax-Morphology Interface
and Natural Language Processing
Veronika Vincze
University of Szeged
Hungary
[email protected]
Thematic Training Course on Processing
Morphologically Rich Languages
11-15 April 2011
Outline
• Introduction
• Syntax vs. morphology from a linguistic
viewpoint
• Morphological coding systems in Hungarian
• Morphosyntactic information in Hungarian
corpora
• Language-specific morphosyntactic problems
• Effects on IE, NER and MT
Thematic Training Course on Processing Morphologically Rich Languages
Syntax vs. morphology
• Typological differences among languages
• Agglutinative lg: role of morphology is
stronger (lot of information in morphemes)
• Isolating lg: role of syntax is stronger (less
morphemes, more constructions)
• Focus on Hungarian (agglutinative) and
English (fusional/isolating)
Thematic Training Course on Processing Morphologically Rich Languages
Basic Hungarian syntax
• Lot of information encoded in morphemes
• No fixed word order
• Information structure is reflected in word order (themerheme, old-new)
Péter szereti Marit. Peter love-3SgObj Mary-ACC ‘Peter
loves Mary.’
Péter Marit szereti. ‘It is Mary who Peter loves.’
Marit szereti Péter. ‘It is Mary who Peter loves.’
Marit Péter szereti. ‘It is Peter who loves Mary.’
Szereti Péter Marit. ‘Peter LOVES Mary (and not hates).’
Szereti Marit Péter. ‘Peter LOVES Mary (and not hates).’
Thematic Training Course on Processing Morphologically Rich Languages
Morphosyntactic features of
Hungarian
• Nominal declination (nouns, adjectives,
numerals)
• Verbal conjugation
• Several hundreds of word forms for each
lemma
• Grammatical relations encoded primarily
by morphemes -> morpho + syntactic
Thematic Training Course on Processing Morphologically Rich Languages
Nominal suffixes
A stem can be extended by:
• Derivational suffixes
• Plural
• Possessive
• Case suffixes
hat-ás-a-i-nak ‘to its effects’
stem-DERIV.SUFF-POSS-POSS.PL-DAT
egész-ség-ed-re ‘cheers’
stem-DERIV.SUFF-POSS.Sg2-SUB
Thematic Training Course on Processing Morphologically Rich Languages
Case suffixes in Hungarian
• ~20 cases („rare” cases are not always
counted: distributive-temporal (-nte),
associative (-stul/-stül…))
• always at the right end of the word form
• grammatical relations are encoded:
– Arguments of the verb
– Adjuncts (temporal and locative adverbials)
Thematic Training Course on Processing Morphologically Rich Languages
…and in English
Pisti szerdánként edzésre jár.
Steve Wednesday-DIST-TEMP training-SUB
go-3Sg
Each Wednesday Steve goes to training.
Szerdánként – each Wednesday
Edzésre – to training
Thematic Training Course on Processing Morphologically Rich Languages
Pisti bort iszik.
Steve wine-ACC drink-3Sg
Steve is drinking wine.
Pisti-NOM – Steve – subject
Bort – wine - object
Thematic Training Course on Processing Morphologically Rich Languages
Possessive in Hungarian
•
•
•
•
•
•
A fiú kutyája
The boy dog-POSS
The boy’s dog
A(z ő) kutyája
The (he) dog-POSS
His dog
• A fiúnak a kutyája
• The boy-DAT the dogPOSS
• Possessor in dative
• Possessed with a
possessive marker
• Possessor in nominative
• Possessed with a
possessive marker
Thematic Training Course on Processing Morphologically Rich Languages
…and in English
• The boy’s dog
• His dog
• Possessor with a
possessive marker
(pronoun)
• Possessed with no
marker
• The dog of the boy
• Possessive relation is
marked by a
preposition
Thematic Training Course on Processing Morphologically Rich Languages
Hungarian vs. English - nouns
• Number of word forms: several hundreds
(HU) vs. 2-3 (EN)
• Means to express grammatical relations:
– Suffixes (HU)
– Preposition, fixed position (word order), suffix,
determiner (EN)
• Methods for morphological parsing are
very different for Hungarian and English
Thematic Training Course on Processing Morphologically Rich Languages
Verbal suffixes
A stem can be extended by:
• Derivational suffixes
• Mood markers
• Tense markers
• Person/number suffixes
• Objective markers
Vág-at-ná-k
Cut-CAUS-COND-3PlObj
‘they would have it cut’
Thematic Training Course on Processing Morphologically Rich Languages
Mood and tense in Hungarian
• Mood:
– Indicative: default (not marked)
– Conditional: suffixes (present) – analytic form
(past)
– Imperative: suffixes
• Tense:
– Present: default (not marked)
– Past: suffixes
– Future: analytic (auxiliary fog)
Thematic Training Course on Processing Morphologically Rich Languages
…and in English
• Mood:
– Indicative: default (not marked)
– Conditional: past tense forms + analytic forms
(auxiliary would)
– Imperative: auxiliaries + grammatical structure
• Tense:
– Present: default (not marked)
– Past: suffix / irregular forms (suppletives or ablaut
(vowel change))
– Future: analytic (auxiliary will)
Thematic Training Course on Processing Morphologically Rich Languages
Person & Number
•
•
•
•
•
•
•
Hungarian: suffixes
Fut-ok
Fut-sz
Fut
Fut-unk
Fut-tok
Fut-nak
• 3Sg is the default (not
marked!)
• English: 3Sg + pronouns /
obligatory subject
• I run
• You run
• He runs
• We run
• You run
• They run
• 3Sg marked!
Thematic Training Course on Processing Morphologically Rich Languages
Derivational suffixes in
Hungarian
• Possibility/permission:
fut-hat-ok
run-MOD-1Sg
‘I may run’
• Reflexive:
mos-akod-unk
wash-REFL-1Pl
‘we wash ourselves’
• Frequentative:
üt-öget-sz
hit-FREQ-2Sg
‘you hit sg repeatedly’
• Causative:
csinál-tat-nak
do-CAUS-3Pl
‘they have sg done’
Thematic Training Course on Processing Morphologically Rich Languages
… and in English
•
•
•
•
Possibility/permission: auxiliaries
Reflexive: pronominal objects
Frequentative: adverb
Causative: construction
Thematic Training Course on Processing Morphologically Rich Languages
Hungarian vs. English - verbs
• Number of word forms: several hundreds
(HU) vs. 4-5 (EN)
• Means to express grammatical relations:
– Suffixes + auxiliaries (HU)
– Auxiliaries + reflexive pronouns +
constructions (EN)
• A lot of syntactic information is encoded in
Hungarian morphemes
Thematic Training Course on Processing Morphologically Rich Languages
Morphology
Syntax
English
Nominal suffix
verb-argument
relation
possessive
word order,
preposition
suffix, preposition
Verbal suffix
tense
agreement
modality
causation
suffix
pronoun, suffix
auxiliary
construction
aspect
reflexivity
construction
pronoun
Thematic Training Course on Processing Morphologically Rich Languages
Morphosyntactic coding systems
• Language independent (?)
• Language dependent
• (dis)advantages:
– comparability
– considering language-specific features
– complexity
• Different information is necessary for each
language
Thematic Training Course on Processing Morphologically Rich Languages
Hungarian coding systems
• HUMOR
– recall Thursday Session 1 
– in the Hungarian National Corpus
• MSD
– In Szeged Treebank
– Parser and POS-tagger available at: http://www.inf.uszeged.hu/rgai/magyarlanc
• KR
– No database
– Parser and POS-tagger available at:
http://mokk.bme.hu/resources/hunmorph/index_html
http://code.google.com/p/hunpos/
Thematic Training Course on Processing Morphologically Rich Languages
MSD
• Morphosyntactic Description
• International coding system:
– English
– Romanian
– Slovenian
– Czech
– Bulgarian
– Estonian
– Hungarian
Thematic Training Course on Processing Morphologically Rich Languages
MSD - 2
• Positional codes
• A given position encodes a given type of
information
• Position 0: part-of-speech
• Position 1: (sub)type within POS
• Further positions: other grammatical information
(person, number, case, etc.)
• Irrelevant positions are marked with a hyphen (-)
Thematic Training Course on Processing Morphologically Rich Languages
KR
•
•
•
•
•
Created for Hungarian
Hierarchical attribute-value matrices
Default values (3Sg, singular…)
Derivational information is encoded
Compounds are also segmented
Thematic Training Course on Processing Morphologically Rich Languages
MSD vs. KR
• Differences between the two systems:
– derivation
– compounds
• Harmonization efforts in order to build a
morphological parser the output of which
is in total harmony with the Szeged
Treebank (magyarlanc) (Farkas et al.
2010)
Thematic Training Course on Processing Morphologically Rich Languages
Nouns in MSD
kutya
kutya
Nc-sn
‘dog’
kutyámat
kutya
Nc-sa---s1
‘my dog-ACC’
kutyaházaikról
kutyaház
Nc-ph---p3
‘about their
doghouse’
Obamához
Obama
Np-st
‘to Obama’
Thematic Training Course on Processing Morphologically Rich Languages
Verbs in MSD
futok
fut
Vmip1s---n
‘I run’
futhatsz
fut
Voip2s---n
‘you can run’
ütögették
üt
Vfis3p---y
‘they were hitting
it’
csináltattunk
csinál
Vsis1p---n
‘we had sg made’
Thematic Training Course on Processing Morphologically Rich Languages
Morphosyntactically annotated
Hungarian corpora
• Hungarian National Corpus
– 100-million-word balanced reference corpus of
present-day Hungarian
– Word forms automatically annotated for stem, part of
speech and inflectional information
– http://corpus.nytud.hu/mnsz/index_eng.html
• Szeged Treebank
–
–
–
–
1-million words, 82K sentences
Manually annotated for lemma, POS-tags
Constituency and dependency trees
http://www.inf.u-szeged.hu/rgai/nlp
Thematic Training Course on Processing Morphologically Rich Languages
Szeged Treebank
• Manually annotated treebank for Hungarian
– Covers various linguistics styles
• literature, newspapers, laws, student essays,
computer books, etc.
• multilingual connection:
Orwell’s 1984; Win2000 manual in Hungarian
– Available free of charge for research
• Developed by
– University of Szeged, HLT group
– MorphoLogic Ltd.
– Academy of Sciences, Research Institute for
Linguistics
Thematic Training Course on Processing Morphologically Rich Languages
Szeged Treebank 2.
• TEI XML format
• Manually annotated
– sentence split & word segmentation
– morphological analysis
– PTB-style syntactic structure
– Verb argument structure
– converted / extended to Dependency
Grammar format manually
Thematic Training Course on Processing Morphologically Rich Languages
Szeged Treebank 3.
•
•
•
•
•
•
Several versions
Constituency and dependency versions
Old MSD codes
New (harmonized) MSD codes
(dependency) parser under development
Being extended with folklore texts
Thematic Training Course on Processing Morphologically Rich Languages
Dependency vs. constituency
• Each node corresponds to a word -> no virtual
nodes (CP, I’…) in dependency trees
• Constituency grammars said to be good for
languages with fixed word order
• Syntactic relations are determined
– by the position in the tree (constituency grammar)
– by dependency relations (labeled edges)
(dependency)
Thematic Training Course on Processing Morphologically Rich Languages
Constituency trees in SzT2.0
• Based on generative syntax (É. Kiss et al. 1999)
• Syntactic features of Hungarian also considered
(i.e. not hardcore Chomskyan trees)
• Verb-argument relations are encoded by labels
• Very detailed information: different grammatical
role for each case suffix
• Semantic information also can be found
(temporal and locative adverbials)
Thematic Training Course on Processing Morphologically Rich Languages
Aggie all relative-POSS-ACC the day before yesterday see-PAST-3Sg-Obj guest-ESS
‘Aggie received all of her relatives the day before yesterday.’
Thematic Training Course on Processing Morphologically Rich Languages
Thematic Training Course on Processing Morphologically Rich Languages
Dependency trees in Szeged
Dependency Treebank
• Based on SzT2.0
• Automatic conversion and manual
correction
• Word forms are the nodes of the tree
• Simplified relations for nominal arguments:
SUBJ, OBJ, DAT,OBL, ATT
• Semantic information kept
• Sentences without 3Sg copula are
distinctively marked
Thematic Training Course on Processing Morphologically Rich Languages
Winston Smith, his chin nuzzled into his breast in an effort to escape the
vile wind, slipped quickly through the glass doors of Victory Mansions.
Thematic Training Course on Processing Morphologically Rich Languages
Virtual nodes
• No overt copula in present tense 3Sg
• Only subject and predicative
noun/adjective manifest
• No syntactic structure in SzT (grammatical
roles are not marked)
• Virtual nodes in SzDT
Thematic Training Course on Processing Morphologically Rich Languages
I like to go to school because it is good to be at school though not always.
Thematic Training Course on Processing Morphologically Rich Languages
Szeged Treebank vs. Szeged
Dependency Treebank
• Labeled relations in both cases -> not so sharp
contrast
• Virtual nodes in SzDT -> grammatical structure
marked for every sentence (IE, MT)
• No word order constraints in SzDT
• Word forms are marked
• Other possibilities: morpheme-based syntax
(Prószéky et al. (1989), Koutny, Wacha (1991))
Thematic Training Course on Processing Morphologically Rich Languages
Language-specific
morphosyntactic problems
• Morphology vs. syntax:
– Pseudo-subjects
– Pseudo-objects
– Pseudo-datives
• Morphological analysis of unknown words
• Lemmatization of named entities
Thematic Training Course on Processing Morphologically Rich Languages
Pseudo-subjects
• a noun in nominative is not the subject of the sentence -> special
attention required when parsing
• Possessor: a kisfiú labdája
the boy ball-3SgPOSS
the boy’s ball
• Predicative noun: István juhász maradt.
Stephen shepherd remain-PAST
Stephen remained a shepherd.
• Object: A kutyám kergeti a macska.
The dog-POSS chase-3SgObj the cat
‘The cat is chasing my dog.’ (garden path sentence)
A fiam szereti a lányod.
The son-1SgPOSS love-3SgObj the daughter-2SgPOSS
‘My son loves your daughter’ or ‘Your daughter loves my son’
Thematic Training Course on Processing Morphologically Rich Languages
Solutions
• Possessor:
– SzT: one NP includes the possessor and the
possessed ((a kisfiú) labdája)
– SzDT: ATT relation
• Predicative noun: PRED relation
– Virtual node in SzDT
• Object: OBJ relation
– Sometimes contextual information is needed
even for humans…
Thematic Training Course on Processing Morphologically Rich Languages
Pseudo-objects
Adverbials with an apparently accusative ending:
Futottam egy jót.
Run-PAST-1Sg a good-ACC
I have had a good run.
Nagyot aludtam.
Big-ACC sleep-PAST-1Sg
I have slept a lot.
Intransitive verbs -> cannot be an object -> MODE
relation
Thematic Training Course on Processing Morphologically Rich Languages
Pseudo-datives
Not all (semantic) subjects are in nominative:
• Dative subject:
Sándornak kell elrendeznie az ügyeket.
Alexander-DAT must arrange-INF-3Sg the issue-PL
Alexander has to arrange the issues.
• DAT in both corpora
• Certain auxiliaries with dative subjects
(exceptions)
• Dative-nominative parallelism in possessive as
well
Thematic Training Course on Processing Morphologically Rich Languages
Unknown words
• Unknown words can be: • Methods for analysis
(Zsibrita et al. 2010):
– Compounds
– Named entities
– Derivations
•
•
•
•
fémkapunk
félmillió
csokinyúl
NATO-hoz
– Segmentation into two or
more analyzable parts
– Expert rules to filter
impossible combinations
(*V+N)
– Analysis of the last part
goes to the whole word
– Substitution for hyphenated
words (pre-defined patterns
for each morphological
class)
Thematic Training Course on Processing Morphologically Rich Languages
félmillió
fél
millió
N
half
ADJ
half
NUM
half
V
be afraid
NUM
million
fél+millió
Mc-snl
Expert rules:
NUM + NUM
* non-NUM + NUM
Thematic Training Course on Processing Morphologically Rich Languages
fémkapunk
fém
N
metal
kap
V
get
kapu
N
gate
unk
S
1Pl (verb)
nk
S
1PlPoss
(noun)
fém+kap+unk
Vmip1p---n
fém+kapu+nk
Nc-sn---p1
Expert rules:
N+N
N-nonNOM + V
* N-NOM + V
Thematic Training Course on Processing Morphologically Rich Languages
csokinyúl
csoki
N
chocolate
nyúl
N
rabbit
V
stretch
V
stretch out
kinyúl
Expert rules:
N+N
N-nonNOM + V
csoki+nyúl
Vmip3s---n
Nc-sn
cso+kinyúl (?)
Vmip3s---n
* N-NOM + V
Thematic Training Course on Processing Morphologically Rich Languages
NATO-hoz
NATO
?
NATO
hoz
V
bring
S
to
Expert rules:
N+-+S
N-nonNOM + - + V
NATO-hoz
NATO: V
Vmip3s---n
NATO-hoz (kalaphoz)
NATO: N
Np-st
* N-NOM + - + V
V+-+V
Substitution:
NATO- -> kalap ‘hat’
Ordering of rules:
1. substitution
2. segmentation
Thematic Training Course on Processing Morphologically Rich Languages
Lemmatization
• Lemmatization (i.e. dividing the word form
into its root and affixes) is not a trivial task
in morphologically rich languages such as
Hungarian
• common nouns: relying on a good
dictionary
• NEs: cannot be listed
• Problem: the NE ends in an apparent
suffix
Thematic Training Course on Processing Morphologically Rich Languages
Lemmatization of NEs
each ending that seems to be a possible suffix
is cut off the NE in step-by-step fashion
Citroenben
Citroenben (lemma)
Citroen + ben ‘in (a) Citroen’
Citroenb + en ‘on (a) Citroenb’
Citroenbe + n ‘on (a) Citroenbe’
• Each possible lemma undergoes a Google and
a Yahoo search – the most frequent one is
chosen (Farkas et al. 2008)
Thematic Training Course on Processing Morphologically Rich Languages
NLP applications
• NER
– NEs with suffixes
• Information extraction
– Modality, uncertainty
– Causation
• Machine translation
– Morphemes vs. structures
Thematic Training Course on Processing Morphologically Rich Languages
Named Entities
• NEs should be recognized
• They should be morphosyntactically
tagged -> proper syntactic/semantic
analysis
A Citroenben a Peugeot meghatározó
tulajdonhányadot szerez.
• Mini dictionary + suffix list + semantic
frame
Thematic Training Course on Processing Morphologically Rich Languages
a
ben
Citroenben
en
meghatározó
n
ot
Peugeot
szerez
t
tulajdonrész
DET
S
?
S
ADJ
S
S
?
V
S
N
the
in
on
dominant
on
ACC
acquire
ACC
interest
Thematic Training Course on Processing Morphologically Rich Languages
Possible analyses
• Citroenben
Citroenben
Citroen + ben ‘CitroenINE’
Citroenb + en ‘CitroenbSUP’
Citroenbe + n
‘Citroenbe-SUP’
• Peugeot
Peugeot
Peugeo + t ‘PeugeoACC’
Peuge + ot ‘PeugeACC’
Thematic Training Course on Processing Morphologically Rich Languages
A semantic frame
<event
frame=transaction.ownerchange>[1=V("szerez"|"vásárol
"|"vesz"|"megvesz"|"megvásárol"|"felvásárol")+subject=
2+direct_object=3]
<rv role=buyer>[2=N]</rv>
[3=N("részesedés"|"tulajdon"|"tulajdonrész"|"rész„|
”tulajdonhányad”)+compl1=4+modified_by_adj=5]
<rv role=product>[4=N+case=ine+ceg]</rv>
<rv
role=newshare>[5=A+measure+modified_by_number=6]
[6=NB]</rv>
</event>
Thematic Training Course on Processing Morphologically Rich Languages
Analysis
A Citroenben a Peugeot meghatározó
tulajdonhányadot szerez.
Tulajdonhányadot -> ACC/OBJ (3)
Citroenben -> INE (4)
Peugeot -> NOM/SUBJ (2)
‘Peugeot acquires a dominant interest in
Citroen.’
Thematic Training Course on Processing Morphologically Rich Languages
Uncertainty
• Text Mining:
– derive facts from free text
– uncertainty and negation have an impact on the
quality/nature of the information extracted
• applications have to treat sentences /
clauses containing uncertain or negated
information differently from factual
information
• Uncertainty: possible existence of a thing
(neither its existence nor its non-existence
is claimed)
Thematic Training Course on Processing Morphologically Rich Languages
Uncertainty detection
• Uncertainty detection in English: cues
(words with uncertain content)
• One typical means to express uncertainty
in Hungarian: -hat/het
High school grades may influence health.
A középiskolai jegyek kihathatnak az
egészségre.
• Morphological analysis should reflect
modality (Voip3s---n)
Thematic Training Course on Processing Morphologically Rich Languages
Causation
• Semantic/thematic relations to be determined properly
• AGENT != SUBJECT
Varrattam egy ruhát.
sew-CAUS-PAST-1Sg a dress-ACC
‘I had a dress sewn.’
Varrattam Marival egy ruhát.
sew-CAUS-PAST-1Sg Mari-INS a dress-ACC
‘I had Mary sew a dress.’
Varrtam Marival egy ruhát.
sew-PAST-1Sg Mari-INS a dress-ACC
‘I sewed a dress with Mary.’
• Causative information should be encoded (Vsip3s---n)
Thematic Training Course on Processing Morphologically Rich Languages
Argument structure of causative
verbs
Agent
?
Varrattam
egy ruhát.
Varrattam Mari (INS)
Marival egy
ruhát.
Varrtam
I (NOM) +
Marival egy Mari (INS)
ruhát.
Beneficiary Patient
I (NOM)
ruha (ACC)
I (NOM)
ruha (ACC)
?
ruha (ACC)
Thematic Training Course on Processing Morphologically Rich Languages
Machine translation
• Morpheme-based translation would be
ideal
• Easier alignment of translational units
• Good morphological parser needed
• Easier to execute in dependency grammar
• Morpheme-based dependency structures
Thematic Training Course on Processing Morphologically Rich Languages
Alignments
at
|
varr
|
t
|
ruha
have
|
sewn
|
dress
ban
in
|
|
ház
house
|
|
am
my
Thematic Training Course on Processing Morphologically Rich Languages
Problems
•
•
•
•
Not practical: no corpus available at the moment
Portmanteau morphs – alignment problems
Zero morphs – how many of them?
3 zero morphs in Hungarian nouns:
könyv-Ø-Ø-Ø vs. könyveit
book-Ø-Ø-Ø book-POSS-POSS.PL-ACC
• (Mel’cuk 2006)
Thematic Training Course on Processing Morphologically Rich Languages
• Morphosyntactic
codes might help
• Csinálhattátok
Vois2p---y
• Reordering rules
V
o
i
s
2p
y
csinál
hat
t
tok
á
csinálh
attátok
Thematic Training Course on Processing Morphologically Rich Languages
do
can
PAST
you
it
you
could
do it
An example
hat
|
csinál
/|\
t á tok
can
|
do
/|\
d Ø you
could
/
\
you do
Thematic Training Course on Processing Morphologically Rich Languages
Syntax vs. case suffix
Pseudo-subject
Extra rules; PRED, OBJ
difficult for humans
Pseudo-object
List of adverbs with
accusative ending
Pseudo-dative
List of verbs with dative
subject
Unknown words
(lemmas+suffixes)
Guessing (rules)
Information extraction
Thematic/semantic
relations
Proper morphosyntactic
codes + rules
Uncertainty detection
Proper morphosyntactic
codes
Machine translation
(morpheme-based)
Proper morphosyntactic
codes
Thematic Training Course on Processing Morphologically Rich Languages
Summary
•
•
•
•
Syntax-morphology interface in Hungarian
Morphological coding systems
Syntactic annotation in Hungarian corpora
Morphosyntactic problems:
– NER
– IE
– MT
Thematic Training Course on Processing Morphologically Rich Languages
References
É. Kiss K., Kiefer F., Siptár P.: Új magyar nyelvtan, Osiris Kiadó, Bp., 1999.
Farkas Richárd, Szeredi Dániel, Varga Dániel, Vincze Veronika 2010: MSD-KR
harmonizáció a Szeged Treebank 2.5-ben. In: Tanács Attila, Vincze Veronika (szerk.):
VII. Magyar Számítógépes Nyelvészeti Konferencia. Szeged, Szegedi
Tudományegyetem, pp. 349-353.
Farkas, Richárd; Vincze, Veronika; Nagy, István; Ormándi, Róbert; Szarvas, György;
Almási, Attila 2008: Web-based lemmatisation of Named Entities. In: Horák, Ales;
Kopeček, Ivan; Pala, Karel; Sojka, Petr (eds.): Proceedings of the 11th International
Conference on Text, Speech and Dialogue (TSD2008), Berlin, Heidelberg, Springer
Verlag, LNCS 5246, pp. 53-60.
Koutny I., Wacha B.: Magyar nyelvtan függőségi alapon. Magyar Nyelv Vol. 87 No. 4.
(1991) 393–404.
Mel’cuk, Igor 2006: Aspects of the Theory of Morphology. Mouton de Gruyter.
Prószéky, G., Koutny, I., Wacha, B.: Dependency Syntax of Hungarian. In: Maxwell, Dan;
Klaus Schubert (eds.) Metataxis in Practice (Dependency Syntax for Multilingual
Machine Translation), Foris, Dordrecht, The Netherlands (1989) 151–181
Zsibrita János, Vincze Veronika, Farkas Richárd 2010: Ismeretlen kifejezések és a szófaji
egyértelműsítés. In: Tanács Attila, Vincze Veronika (szerk.): VII. Magyar
Számítógépes Nyelvészeti Konferencia. Szeged, Szegedi Tudományegyetem, pp.
275-283.
Thematic Training Course on Processing Morphologically Rich Languages