Surface Syntax Example - Institute of Formal and Applied Linguistics
Download
Report
Transcript Surface Syntax Example - Institute of Formal and Applied Linguistics
The PDT
Morphology
and Surface Syntax
Jan Hajič
Institute of Formal and Applied Linguistics
School of Computer Science
Faculty of Mathematics and Physics
Charles University, Prague
Czech Republic
March 5, 2008
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface Syntax
1
Morphology (m-layer)
Prerequisites for the manual annotation process:
Tokenized data
Annotation guidelines
Annotation tool
Manual decision making support
Offline (or online) morphological analyzer
Quality checking tool
Process description
Results (manually annotated data) to be used for...
tagger training, linguistic research, basis for further
annotation, ...
March 5, 2008
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface
Syntax
2
Morphological Attributes
Tag: 13 categories
Example: AAFP3----3N----
Adjective
Regular
Feminine
Plural
Dative
Ex.: nejnezajímavějším
“(to) the most uninteresting”
no poss. Gender
no poss. Number
no person
no tense
superlative
negated
no voice
reserve1
reserve2
base var.
Lemma: POS-unique identifier
Books/verb -> book-1, went -> go, to/prep. -> to-1
March 5, 2008
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface
Syntax
3
Morphological Tagset
13 categories, 4452 plausible tags (combinations):
Category
POS
SUBPOS
GENDER
NUMBER
CASE
POSSGENDER
POSSNUMBER
PERSON
TENSE
GRADE
NEGATION
VOICE
VAR
March 5, 2008
# of values
10
75
8
4
9
4
3
5
4
5
3
3
11
Example(s)
N (noun), Z (punctuation)
P (personal pron.), U (possessive adj.)
I (masc. inanimate), X (any), - (N.A)
P (plural), D (dual)
1 (nominative), 6 (locative)
M (masc. animate), F (feminine)
S (singular), P (plural)
1 (first), ...
P (present), M (past)
3 (superlative)
A (affirmative), N (negative)
A (active), P (passive)
1 (1st variant), 6 (colloq. style), 8 (abbrev.)
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface
Syntax
4
Morphological Analysis
Formally: MA: A+ → Pow(L x T)
MA(f) = { [ l,t ] };
f A+ (the token),
l L (lemma),
t T (tag)
tokens taken in isolation
no attempt to solve e.g. auxiliaries vs. full verbs
Ex.: MA(“má“) = { [mít,VB-S---3P-AA---], lit. “to have”
lit. “has”,”my” [můj,PSFS1-S1------1], lit. “my”
[můj,PSFS5-S1------1],
[můj,PSNP1-S1------1],
[můj,PSNP4-S1------1],
[můj,PSNP5-S1------1] }
March 5, 2008
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface
Syntax
5
Morphological Analysis:
Implementation
Dictionary-based
covers 800kW (lemmas), ~ 20 mil. forms (w/tag)
C code implementation
standard (regular) derivations on-the-fly; ex.:
joinedly
spojit
spojený
spojený
join
joined
joinedliness
spojenost
joinably
spojitelný
spojitelný
joinable
joinability
spojitelnost
irregular forms listed in dictionary (w/tags)
no phonological processing (concatenation only)
grammatical prefixes only: negation, superlative
March 5, 2008
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface
Syntax
6
The Morphological Annotation
Tool (LAW)
March 5, 2008
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface
Syntax
7
The Process of
Morphological Annotation
From tokenized to annotated text:
tokenized
text (auto,
w-layer)
text w/morph.
interpretations
text w/select.
interpretation
March 5, 2008
(Auto) morphological
analysis
Manual morphological
disambiguation (DA)
Manual adjudication
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface
Syntax
morphological
dictionary
annotation
guidelines
annotated
text (m-layer)
8
PDT – Syntactic Annotation
Surface syntax annotation
Dependency surface syntax
Comparable to Penn Treebank annotation
Convertible: dependency ↔ parse trees
Deep syntactic/semantic annotation
Dependency trees
Different topology
High level of generalization and formalization
Many node attributes
9
Analytical Syntax (a-layer)
Dependency + Analytical Function
governor
dependent
The influence of the Mexican
crisis on Central and Eastern
Europe has apparently
been underestimated.
10
Analytical Syntax: Functions
Main (for [main] semantic lexemes):
Pred, Sb, Obj, Adv, Atr, Atv(V), AuxV, Pnom
“Double” dependency: AtrAdv, AtrObj, AtrAtr
Special (function words, punctuation,...):
Reflefives, particles: AuxT, AuxR, AuxO, AuxZ, AuxY
Prepositions/Conjunctions: AuxP, AuxC
Punctuation, Graphics: AuxX, AuxS, AuxG, AuxK
Structural
Elipsis: ExD, Coordination etc.: Coord, Apos
11
Example
All came from Cray Research.
12
Surface Syntax Example
Complete sentence: Sb, Pred, Obj
Resistance needs courage.
13
Surface Syntax Example
Analytical verb form:
he would be allowed to be enrolled
14
Surface Syntax Example
Predicate with copula (state)
you were fired
15
Surface Syntax Example
Passive construction (action)
(The) book has been translated [by Mr. X]
16
Surface Syntax Example
Complement
she left crying
17
Surface Syntax Example
Object
he gave Mary a book
18
Surface Syntax Example
Object used for infinitive of analytical verb
forms
he wants to learn
19
Surface Syntax Example
Relative clause (embedded)
the woman, who had a French accent, was very pretty
20
Surface Syntax Example
Coordination
... (to) magic, mysticism(,) etc.
21
Surface Syntax Example
Apposition
cheap, i.e. under five dollars
22
Surface Syntax Example
Incomplete phrases
Peter works well, but Paul badly
23
Surface Syntax Example
Variants (equality)
he bought shoes for his son
24
XML Annotation
Layers (English)
Strictly top-down links
w+m+a can be easily
“knitted”
API for cross-layer
access
(programming)
PML Schema / Relax
NG
[With slight
modification, can be
used for spoken data
(audio as layer “-1”)]
25