Elicitation Corpus

Download Report

Transcript Elicitation Corpus

Elicitation Corpus
April 12, 2003
Agenda
• Tagging with feature vectors or feature
structures
• Combinatorics
• Extensions
Annotating the corpus
• Feature Vectors:
– Maria saw the girls.
– Snum-s, stype-prop, sanim-an, scount-na, sdef-def,
vtype-perc, vtime-past, onum-pl, odef-def, etc.
• Feature Structures:
– ((SUBJ ((num sg) (type prop) (anim an) (count na) (def
def))) (vtype perc) (vtime past) (OBJ ((etc.
– These are easy: they come right out of the parser.
Adapting parser output
• Do we need to filter out irrelevant features?
– E.g., features about “have” and “be” to make the
English auxiliary system work.
• E.g., (AUX- TYPE) = have
Not covered by the parser
• Derived features:
– does the subject outrank the object in animacy?
• Constructional features:
– Counterfactual conditional: If I had gone, I would have seen him.
– Do we want to extend the parsing grammar to label these
automatically?
• Discourse/semantic/context features:
– Context: Who saw John?
• Elicitation sentence: Bill saw John.
• Feature: subject is new information.
– Elicitation sentence: He must see it.
• Feature: evidential or deontic (obligation)
• Features that aren’t used in English
– Context: we=you and me (inclusive ‘we’)
– Elicitation sentence: We are tall.
Example of Combinatorics: subject verb agreement
• five numbers (singular, plural, dual, trial, paucal)
• three genders (masculine, feminine, and neuter, and more
for Bantu languages)
• four persons (first, second, third, and fourth),
• several levels of animacy (animate, inanimate, first and
second person, third person)
• two levels of definiteness (definite and indefinite)
• huge number of tenses and aspects (present, past, future,
non-past, non-future, near past, remote past, near future,
remote future, continuous, perfective, etc.).
Two steps? (1) Which features are involved? (2) Which values
are involved?
Example of combinatorics:
determiners and possessive pronouns
• See handout.
Current Coverage of the Elicitation Corpus
•
•
•
•
•
Basic word order: intransitive verb and subject; transitive verb with subject
and object; noun phrase with determiners, adjectives, an possessors.
Definiteness and animacy: special treatment of indefinite subjects, inanimate
subjects, definite direct objects, animate direct objects, and sentences where
the object outranks the subject in definiteness or animacy.
Agreeement (in number, gender, person, etc.): subject and verb; object and
verb; determiner and noun; adjective and noun; possessor and noun; relative
pronoun and noun.
Possessive NPs: with inalienable possession (body parts); kinship terms;
alienable possession; pronominal possessors; full NP possessors.
Inflectional Features: gender, number, person, case, tense.
Not covered by the elicitation corpus
• Subcategorization frames for major verb classes: stative,
change of state, change of location, change of possession,
creation, filling and covering, experience, cognition,
perception, saying and telling, causatives, etc.
• Voice: active, passive, and oblique voices.
• Negation: sentences and noun phrases
• Relative clauses: inflectional features of the relative
pronoun; possible locations of the gap; headed or
unheaded, etc.
• Embedded clauses: argument clauses; adjunct clauses;
nominalized clauses.
Not Covered
• Coordination: sentences (switch reference and same subject), noun
phrases, and other constituents.
• Questions: Yes-no questions (positive answer expected and negative
answer expected);
• Open questions (possible locations of gaps).
• Other constructions: comparatives, conditionals, causatives,
desideratives, imperatives, possessor ascension, quantifier float, noun
incorporation (polysynthesis).
– Each of these has a few parameters to check: e.g., does the causee come
out in dative or accusative case; can the incorporated noun take an
unincorporated modifier; which NPs can possessors ascend
from/quantifiers float from, etc.
• Further coverage of tense, aspect, and modality: present, past, and
future time; ongoing and completed actions; punctual and non-punctual
activities; habituality; iteration; realized and non-realized.
– Cross product of these with lexical aspect: state, activity, accomplishment,
punctual.
Not Covered
• Information structure: treatment of topic (given
information) and focus (new information),
including clefted and topicalized sentences.
• Other meanings that are typically
grammaticalized: yet, still, only, distributive
(each), etc.
• Other noun phrase phenomena: quantification,
deictic determiners, classifiers, etc.