Prague Arabic Dependency Treebank
Download
Report
Transcript Prague Arabic Dependency Treebank
Prague Arabic Dependency
Treebank
Development in Data
and Tools
Faculty of Mathematics and Physics
Faculty of Philosophy and Arts
Charles University in Prague
Jan Hajič
Otakar Smrž
Petr Zemánek
Jan Šnaidauf
Emanuel Beška
Project Release – PADT 1.0
December 2004, Linguistic Data
Consortium
148 000 Morpho, 113 500 Syntax
AFP
13 000
N/A
France Presse
Penn ATB 1
UMH
38 500
N/A
Ummah Press
Penn ATB 2
XIN
13 500
N/A
Xinhua News
A Gigaword
ALH
10 000 73 500 Al-Hayat News
ANN
12 500 25 500 An-Nahar News A Gigaword
XIA
26 500 49 500 Xinhua News
September 23, 2004
Prague Arabic Dependency Treebank:
Development in Data and Tools
A Gigaword
A Gigaword
2
Open-Source Tools
TrEd Tree Editor
Multi-purpose annotation environment
Suite of programming utilities
Netgraph Search Engine
Server/Client system architecture
Easy-to-learn query language
Encode::Arabic Perl Module
Extension for processing of Arabic script
ArabTeX, Buckwalter, Unicode, …
September 23, 2004
Prague Arabic Dependency Treebank:
Development in Data and Tools
3
PADT Functional Views
Functional Generative Description
Theory of linguistic meaning and its expression
Prague Dependency Treebank for Czech
Independence of representation levels
Tectogrammatical – linguistic meaning
Analytical – surface dependency syntax
Morphological – categories and lexical units
Abstraction of the relations across levels
Strict distinction between form and function
Different units of description on each level
September 23, 2004
Prague Arabic Dependency Treebank:
Development in Data and Tools
4
Functional Morphology
Provides syntax levels with their abstract
language, not just giving letters in tokens
Revives multiple senses of categories
Completeness of generation
Strict modeling of grammatical control
MorphoTrees – ‘human tagging’
Successful prototype feature-based tagger
September 23, 2004
Prague Arabic Dependency Treebank:
Development in Data and Tools
5
Syntactic Levels of Description
Analytical level
Pragmatically motivated, close to surface syntax
Every single token resulting from
morphological level forms one node
Tree-like dependency structure for every sentence
Tectogrammatical level
Linguistic (literal) meaning, deep relations, TFA
Initial structures transformed from AL
Nodes for autosemantic words only
Decisive role of valency frames
September 23, 2004
Prague Arabic Dependency Treebank:
Development in Data and Tools
6
Logic of Analytical Trees
Concepts of dependency and valency
Reduction: sentence must retain
grammatical correctness if leaves
(terminal nodes) are chopped off
Trees: clause components clauses
sentences paragraphs etc.
Subtrees of clauses exchangeable for non-clauses
Nodes: words, tokenized parts of words,
punctuation marks – marked by functions
Edges: syntactic relations –
governing node dependent node/subtree
September 23, 2004
Prague Arabic Dependency Treebank:
Development in Data and Tools
7
Some Syntax Issues of Arabic
Non-verbal predication of several types
Subordinate non-verbal clauses / modification
Verb-like behavior of many nominal forms
Mostly VSO in verbal sentences, but…
vice-versa in non-verbal clauses
different, depending on context boundness
Compound verbs, fixed composite prepositions
Grammatical co-reference, accusative of
inner object, complex referencing, etc.
September 23, 2004
Prague Arabic Dependency Treebank:
Development in Data and Tools
8
Problem I: Predication
Head node of tree: PREDICATE
Why? Steady role in sentence, cannot be omitted
Verbal predicate: I-go to school
Non-verbal predicate
Nominal: The-house a-big (=the house is big)
Existential: There a-city (=there is a city)
Prepositional
Possessive: For him a-house (=he has a house)
Adverbial: The-mosque in the-city (=…is…)
Conjunctional: The-problem that (=…is that)
September 23, 2004
Prague Arabic Dependency Treebank:
Development in Data and Tools
9
Predication Types in Trees
dAma [Pred]
Nominal
lasted
kabIrun [Pnom]
a-big [nom.]
iqtirAHu [Sb]
proposal
al-baytu [Sb]
Prepositional
the-house [nom.]
(possessive)
vam~ata [PredE]
there-is
la- [PredP]
for
-hu [Obj]
him
Existential
-hu
[Atr] al-EamalIyata [Obj]
his
the-operation [acc.]
Prepositional
madInatun [Sb]
(adverbial,
a-city [nom.]
fI [PredP]
locative)
Verb-like
behavior
in
(object of noun?)
baytun [Sb]
a-house [nom.]
September 23, 2004
Verbal
al-jAmiEu [Sb]
the-mosque [nom.]
Prague Arabic Dependency Treebank:
Development in Data and Tools
sAEatayni [Adv]
two-hours [acc.]
EalA [AuxP]
on
zumalA’i [Obj]
colleagues
-hi [Atr]
his
al-madInati [Adv]
the-city [gen.]
10
Problem II: Clauses & Co-reference
Recursiveness: subordinate clause is contained as subtree in place of simple element
Head-node of clause gets the same function
Problem: non-verbal structures – clauses or not?
Compound verbs (mA zAla etc.) treated equally
Grammatical co-reference: Personal pronoun formally required by another element
Pronoun must be marked to be treated as such
Target of reference is unambiguously identifiable
Often in subordinate clauses, mostly attributive
Ex.: He-wrote a-book number its-pages hundred
September 23, 2004
Prague Arabic Dependency Treebank:
Development in Data and Tools
11
Compound verb,
formed as main verb
and its complement
Attributive
clause,
Clauses & Co-reference
in Trees
zAlat [Pred]
she-stopped
prepositional predicate
(adverbial)
kataba [Pred]
he-wrote
kitAban [Obj]
a-book
tuHis~u [Atv]
al-rajulu
[Sb]
Objective
clause,
she-feels
fI [Atr_PredP]
the-man
[nom.]
zaybabu [Sb]
verbal predicate
in
Zaynab
Attributive clause, mi’atu [Sb]
anna [AuxC]
hundred [nom.]
Referencing
nominal
predicate
that
-hi [Adv_Ref]
pronoun, as
tuEjibu [Obj_Pred]
it
SafHatin [Atr]
attribute in clause
they-impress
jumalan [Sb]
pages [gen.]
sentences [acc.]
mA [AuxM]
not
Referencing
naHwu [Sb]
grammar [nom.]
wADiHun [Atr_Pnom]
pronoun, as
clear [nom.]
adverbial in clause
-hA [Atr_Ref] their
September 23, 2004
Prague Arabic Dependency Treebank:
Development in Data and Tools
-hA [Obj]
her
12
Future Prospects
Implementation of Functional
Morphology
Tectogrammatical annotation
Lexicons of valency frames
Re-training the feature-based tagger
on MorphoTrees
Machine-learning on the treebank
data for various purposes
September 23, 2004
Prague Arabic Dependency Treebank:
Development in Data and Tools
13
Thank you
Questions welcome!
http://ckl.mff.cuni.cz/padt/
September 23, 2004
Prague Arabic Dependency Treebank:
Development in Data and Tools
14