Surface Syntax Example - Institute of Formal and Applied Linguistics

Download Report

Transcript Surface Syntax Example - Institute of Formal and Applied Linguistics

The PDT
Morphology
and Surface Syntax
Jan Hajič
Institute of Formal and Applied Linguistics
School of Computer Science
Faculty of Mathematics and Physics
Charles University, Prague
Czech Republic
March 5, 2008
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface Syntax
1
Morphology (m-layer)

Prerequisites for the manual annotation process:






Tokenized data
Annotation guidelines
Annotation tool
 Manual decision making support
 Offline (or online) morphological analyzer
Quality checking tool
Process description
Results (manually annotated data) to be used for...

tagger training, linguistic research, basis for further
annotation, ...
March 5, 2008
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface
Syntax
2
Morphological Attributes

Tag: 13 categories

Example: AAFP3----3N----
Adjective
Regular
Feminine
Plural
Dative

Ex.: nejnezajímavějším
“(to) the most uninteresting”
no poss. Gender
no poss. Number
no person
no tense
superlative
negated
no voice
reserve1
reserve2
base var.
Lemma: POS-unique identifier
Books/verb -> book-1, went -> go, to/prep. -> to-1
March 5, 2008
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface
Syntax
3
Morphological Tagset

13 categories, 4452 plausible tags (combinations):
Category
POS
SUBPOS
GENDER
NUMBER
CASE
POSSGENDER
POSSNUMBER
PERSON
TENSE
GRADE
NEGATION
VOICE
VAR
March 5, 2008
# of values
10
75
8
4
9
4
3
5
4
5
3
3
11
Example(s)
N (noun), Z (punctuation)
P (personal pron.), U (possessive adj.)
I (masc. inanimate), X (any), - (N.A)
P (plural), D (dual)
1 (nominative), 6 (locative)
M (masc. animate), F (feminine)
S (singular), P (plural)
1 (first), ...
P (present), M (past)
3 (superlative)
A (affirmative), N (negative)
A (active), P (passive)
1 (1st variant), 6 (colloq. style), 8 (abbrev.)
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface
Syntax
4
Morphological Analysis

Formally: MA: A+ → Pow(L x T)
 MA(f) = { [ l,t ] };






f  A+ (the token),
l  L (lemma),
t  T (tag)
tokens taken in isolation
no attempt to solve e.g. auxiliaries vs. full verbs
Ex.: MA(“má“) = { [mít,VB-S---3P-AA---], lit. “to have”
lit. “has”,”my” [můj,PSFS1-S1------1], lit. “my”
[můj,PSFS5-S1------1],
[můj,PSNP1-S1------1],
[můj,PSNP4-S1------1],
[můj,PSNP5-S1------1] }
March 5, 2008
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface
Syntax
5
Morphological Analysis:
Implementation

Dictionary-based


covers 800kW (lemmas), ~ 20 mil. forms (w/tag)
C code implementation




standard (regular) derivations on-the-fly; ex.:
joinedly
 spojit
spojený
spojený
join
joined
joinedliness
spojenost
joinably
spojitelný
spojitelný
joinable
joinability
spojitelnost
irregular forms listed in dictionary (w/tags)
no phonological processing (concatenation only)
grammatical prefixes only: negation, superlative
March 5, 2008
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface
Syntax
6
The Morphological Annotation
Tool (LAW)
March 5, 2008
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface
Syntax
7
The Process of
Morphological Annotation

From tokenized to annotated text:
tokenized
text (auto,
w-layer)
text w/morph.
interpretations
text w/select.
interpretation
March 5, 2008
(Auto) morphological
analysis
Manual morphological
disambiguation (DA)
Manual adjudication
Companions Semantic Representation and Dialog
Interfacing Workshop - Morphology and Surface
Syntax
morphological
dictionary
annotation
guidelines
annotated
text (m-layer)
8
PDT – Syntactic Annotation

Surface syntax annotation


Dependency surface syntax
Comparable to Penn Treebank annotation


Convertible: dependency ↔ parse trees
Deep syntactic/semantic annotation




Dependency trees
Different topology
High level of generalization and formalization
Many node attributes
9
Analytical Syntax (a-layer)

Dependency + Analytical Function
governor
dependent
The influence of the Mexican
crisis on Central and Eastern
Europe has apparently
been underestimated.
10
Analytical Syntax: Functions

Main (for [main] semantic lexemes):
Pred, Sb, Obj, Adv, Atr, Atv(V), AuxV, Pnom
 “Double” dependency: AtrAdv, AtrObj, AtrAtr


Special (function words, punctuation,...):
Reflefives, particles: AuxT, AuxR, AuxO, AuxZ, AuxY
 Prepositions/Conjunctions: AuxP, AuxC
 Punctuation, Graphics: AuxX, AuxS, AuxG, AuxK


Structural

Elipsis: ExD, Coordination etc.: Coord, Apos
11
Example

All came from Cray Research.
12
Surface Syntax Example

Complete sentence: Sb, Pred, Obj

Resistance needs courage.
13
Surface Syntax Example

Analytical verb form:

he would be allowed to be enrolled
14
Surface Syntax Example

Predicate with copula (state)

you were fired
15
Surface Syntax Example

Passive construction (action)

(The) book has been translated [by Mr. X]
16
Surface Syntax Example

Complement

she left crying
17
Surface Syntax Example

Object

he gave Mary a book
18
Surface Syntax Example

Object used for infinitive of analytical verb
forms

he wants to learn
19
Surface Syntax Example

Relative clause (embedded)

the woman, who had a French accent, was very pretty
20
Surface Syntax Example

Coordination

... (to) magic, mysticism(,) etc.
21
Surface Syntax Example

Apposition

cheap, i.e. under five dollars
22
Surface Syntax Example

Incomplete phrases

Peter works well, but Paul badly
23
Surface Syntax Example

Variants (equality)

he bought shoes for his son
24
XML Annotation
Layers (English)





Strictly top-down links
w+m+a can be easily
“knitted”
API for cross-layer
access
(programming)
PML Schema / Relax
NG
[With slight
modification, can be
used for spoken data
(audio as layer “-1”)]
25