Toward the Articulation of Lexicon and Constructicon

Download Report

Transcript Toward the Articulation of Lexicon and Constructicon

Toward the Linking of Text
Annotations, the FrameNet
Lexicon, and an Intended
Future Constructicon
CJFillmore
Berkeley
Change of Emphasis
• Departing slightly from promises made in the
abstract, I’ll be adding some discussion on
– what it would take to discover and record the
constructions found in a large English Text that is
also lexically annotated, in “Frame Semantics”
terms, and
– how one could construct an Open Source online
directory of partial descriptions of grammatical
constructions for English,
• without ignoring the promised concern for
– indicating in Lexical Entries information about the
constructions in which the words participate, and
– indicating in Construction Entries, information
about the lexical items that participate in them.
• This obviously requires constructing a single
articulated database that includes text
annotations, a frame-based Lexicon, and a
register of constructions - a Constructicon.
“FrameNet”
• Our goal in FrameNet is to document the use and
meaning of lexical units in English - especially
“frame-bearing” words - by careful examination of
attested examples taken from a very large text
Corpus.
• This means we need to find good examples of each
of the words we describe, and that requires some
attention.
Criteria for choosing
examples
• FrameNet lexicographers are told that when
they choose examples for illustrating the
meaning and use of lexical units,
– The example sentences should be structurally
simple.
– Their lexical content should illustrate the
semantic frames they realize.
– Enough examples should be collected to illustrate
all of each word’s valences - its basic
combinatorial affordances.
Why use simple examples?
Suppose we’re working on the verb accuse.
• Simple example:
– Their publisher accused me of plagiarism.
• Complex example:
– Plagiarism is something I would hate to be
accused of.
Point: The second is a perfectly good example of an
English sentence, but its complexity has nothing to
do with relevant facts about the verb accuse. It would
not be a good dictionary example of the verb.
Finding “Frame Elements”
• For words in the frame containing accuse, the
annotations we produce recognize three main roles,
and our job is to show how these are expressed in
sentences headed by the verb. We can refer to these
three roles as
– accuser [the person who does the accusing]
– accused [the person accused of wrongdoing]
– charge [the offense]
Their publisher accused me of plagiarism.
Plagiarism is something I would hate to be accused
of. (unexpressed)
Why “frame relevant”
contexts?
They accused me of it.
Pronouns don’t tell us much about what is going
on in this sentence. Our examples are always
single sentences, and even if we could find the
antecedents of they and it in the surrounding text
that would not tell us much about the verb itself.
Their publisher accused me of plagiarism.
This has more information about the context of an
accusation and provides information about the
charge.
Why representative?
• We want examples of each valence possibility we
discover. The verb accuse has VPs of two types:
V + NP + PP[of NP]
– They accused me of theft.
• burglary, arson, perjury, murder
V + NP + PP[of VP-ing]
– They accused me of stealing their car.
• lying to the judge,
• killing their dog,
• insulting their mother
V + NP
Problems with the criteria
• Most “simple” sentences illustrating the use of a verb
are not frame-revealing, since the arguments are
mainly pronouns.
• Sentences in which all of the frame-relevant elements
are expressed in a single clause are unnaturalsounding -- the kinds of sentences linguists and
psycholinguists make up.
(“The publisher accused the author of plagiarism.”)
• Many words do not occur often enough (even in a
very large corpus) to provide simple and clear
examples of all of their affordances.
Full Text Annotation
• In general, for our lexicographic work, we tried to
steer clear of syntactically complex structures, while
knowing that we were missing the possibilities of
giving good explanations of certain lexical units.
• For reasons related to the interests of our later
funders, FrameNet activities have moved from “mere”
lexicon building, with the use of a vast Research
Corpus, to the annotation of continuous texts, letting
the examples found there provide material for lexical
analysis.
• This means that we now have to deal with
–
–
–
–
–
mistakes
ambiguities
sentence fragments
repetitions
and - especially “non-core” grammatical constructions
Constructions?
• If we’re going to start dealing with
constructions in our work, we need strategies
and principles for
– recognizing a construction when we see one,
– discovering and recording its properties, and
– convincing ourselves and our colleagues that what
we’ve found really does need the kind of
description and explanation that requires the
positing of a special construction.
• As grammarians, we feel the need to
incorporate each new construction within a
consistent and coherent generative
construction grammar; but as text analysts,
we can be (temporarily) satisfied with partial
descriptions.
• This is normal linguistics: we’ve always been
able to recognize (clear cases of), say, the
“tough construction,” but it’s taking forever to
come up with a satisfying account of it.
The Strategy
• If you find something that looks as if it can’t be
described within the framework provided by the
current state of your theory, keep trying to make it fit.
• If you have to give up, then try to see it, not as a
lonely idiom, but as an instance of some general
grammatical phenomena, and explore such
phenomena as thoroughly as you can.
• If nothing works, then call it an idiom and add it to the
lexicon - at least for now.
Valence and Grammar
• Familiar FrameNet valences presuppose a portion of
the basic grammar.
• That is, information they provide about grammatical
functions (subject, object, complement, head,
modifier, determiner, etc.) are taken as meaning that
we know how these words behave in sentences built
up with such construction types as predication,
complementation, modification, determination, and
the like.
• [ILLUSTRATE WITH “accuse”]
• Comment on that word “core”.
THE PLAN OF THIS TALK
1.
To examine a few construction types.
a)
b)
c)
2.
3.
4.
one that has fixed slots and fixed words
one that’s pure syntactic form
one whose properties are mostly hidden
To suggest ways of connecting lexical and
constructional information.
To suggest ways of annotating texts for their
constructions.
To propose cooperatively building a public online
construction registry for English.
1a
Case: next week
• My account will be a little fussy, since I want
to illustrate the reasons for deciding that
something is a construction, and the need to
look for its “boundaries.” So, suppose you
come upon the phrase next week in a
sentence like
Let’s finish this job next week.
1a
Case: next week
• First impression: What’s the problem?
– This is a case of simple modification:
adjective next + noun week
• But wait!
– why doesn’t next week have an article?
– why doesn’t it come with a preposition?
– why does it mean what it means?
1a
What does it mean?
• The phrase next week, by itself, refers to the
calendar week which comes immediately
after the calendar week which includes ‘now’,
i.e., the moment of speaking.
• It is a deictically anchored time expression.
• Compare it to the next week. This phrasing is
anaphorically anchored and is much more
regular.
1a
Is it a simple idiom?
• If it’s an idiom, just add it to the lexicon and look for a
more interesting problem.
• But wait! We find completely analogous
interpretations with
– next month
– next year
– next semester
• So maybe it’s a construction that uses the word next
followed by a noun naming a temporal period.
1a
Restrictions
• It works fine with week, month, year, and a
few special words like semester, but
– it doesn’t work with day:
*next day
– and it doesn’t seem to work with calendric units
that are too big to figure in the life experiences of
a single individual:
*next millennium
• So we have to formulate all these restrictions
too. (Maybe.)
1a
Wait! We’re not finished.
• There are semantically and formally analogous
patterns that use, instead of next, the words this and
last,
-and they too are deictically anchored expressions,
-and they too exclude day.
– this X:
the X which contains ‘now’
this week, this month, this year, *this day
– last X:
the X which precedes the X containing ‘now’
last week, last month, last year, *last day
1a
What have we got so far?
• Special use of this, next and last.
– notice: this is a demonstrative, next and last are
adjectives
• combining, without prepositions or articles,
with specific words that name calendric time
periods
• forming meanings that relate these time
periods as identical to, following, or
preceding, the named period containing
‘now’.
1a
Descriptive Choices
• We could state the conditions for the
construction as generally as possible,
– regarding the exclusion of the day unit as
explained by a pre-emption:
in order to express these meanings, the words
today, yesterday and tomorrow are required,
1a
– regarding the exclusion of century and millennium
by describing the function of the construction in
terms of the practical limits of human planning,
and
– regarding the inclusion of non-calendric terms like
semester or hour (meaning ‘class hour’ in a school
setting), as an exploitation of the system,
something that might not need to be described in
the grammar.
1a
Are we there yet?
• No. Here are some more facts about these words:
• If we want to talk about the X that follows next X, or
the X that precedes last X, we say:
– the X after next
– the X before last
• Notice that here the words next and last, by
themselves, mean next X and last X
– the week after next, the week before last
– the month after next, the month before last
– COMPARE:
the day before yesterday, the day after tomorrow
1a
And there’s still more.
• The words this, last and next also occur with
the names of members of temporal cycles,
like
–
–
–
–
weekday names (Monday, etc.),
month names (January, etc.),
season names (summer, etc.), and
day part names (morning, etc.)
1a
• And these have regular but complicated
interpretations:
–
–
–
–
last Friday is ‘the Friday of last week’;
next summer is ‘the summer of next year’;
this March is ‘the March of this year’,
last night is ‘the night of yesterday [last day]’.
and there are various extensions, pre-emptions,
exceptions.
1a
Conclusions so far
• We have here a family of constructions that
make clear use of particular lexical items, in
particular combinations, having semantic
interpretations that do not follow from
anything else that we know about the
grammar of English, which combine with
words of particular semantic types.
– A lexicon of English has to show that these words
can have these functions in these constructions.
– A constructicon of English has to show what
words, or classes of words, can participate in each
of its constructions for which lexical membership is
specified.
– Text annotations for English should link each
word to the relevant lexical entry, and each
construction instance should be linked to the
relevant entry in a constructicon.
Intermediate Cases
• There are lots of constructions that people
(some of them in this room) have described
that have both lexical and grammaticalpattern requirements.
Traditional and Special
• Questions, imperatives, relative clauses,
comparatives - each of these with many types.
• Serial verbs (Goldberg), WXDY (Kay +), Let_alone
(Fillmore +), MadMagazine (Lambrecht),
Presentatives (Lakoff), Nominal Extraposition
(Michaelis +), Way Construction (Goldberg +), Away
Construction (Jackendoff), Correlative Conditional
(Lots of people), Tautologies (Wierzbicka), Just
Because (Hirose, Bender & Kathol), and dozens
more.
Adjective Negation with “no”
• It seems that the only adjectives that can be
“negated” with no are fair, good, and different.
• And these seem to be different from the
structure that has no modifying a comparative
adjective:
–
–
–
–
no bigger than a bug
no taller than my baby sister
*no older than Methuselah
*no younger than Chuck
Presentatives
• Here comes Harry, wearing my shirt.
• Here he comes, wearing my shirt.
•
•
•
•
First part: here or there
Second part: V+NP or Pron+V
Verb: go, come, be, sit, stand, lie, hang
Third part (optional): secondary predicate
1b
By contrast,
• There are some constructions that have no specified
lexical components.
• One of these is “Right Node Raising”, so-called. We
might want to call it the Shared Completion
Construction.
• Description
– a final phrase “completes” each of two truncated phrases,
– these connected by some kind of conjoining or adjoining
device
– associated with paired foci
1b
y or x…y is a conjoining (adjoining, subjoining) device
form is 0+(x)+1+y+2+3
1 and 2 offer paired foci
Interpretation: 0+1+3 {and} 0+2+3 where ‘{and}’ is
the meaning of the conjoining device
0
Preceding
Context
I wouldn’t
1
(x)
touch
Part-1
y
let alone
2
3
Part-2
Completion
eat
anything that ugly.
1b
preceding context
pre-conjunction
first trunc phrase
conjunction
second trunc phrase
completion
I wouldn’t
touch
let alone
eat
anything that ugly
1b
preceding context
-
pre-conjunction
-
first trunc phrase
conjunction
second trunc phrase
completion
I cooked
and
she ate
the soup
1b
preceding context
pre-conjunction
first trunc phrase
conjunction
second trunc phrase
completion
Libya has
shown interest in
and
taken steps to acquire
weapons of mass
destruction
1b
preceding context
pre-conjunction
first trunc phrase
conjunction
second trunc phrase
completion
thus
reducing
if not
eliminating
the chances of
detection
1b
preceding context
pre-conjunction
first trunc phrase
conjunction
second trunc phrase
completion
to pursue
not only
a more cost-effective
but also
possibly the only real
line of defense
against these threats
1b
Conjunctions
• “Conjunctions” observed participating in
this construction include:
– and, or, but, both…and, either…or,
not_only…but_also, if_not, but_not,
if_even, rather_than, instead_of, let_alone,
1b
Conclusions so far
• This last construction seems to operate very
generally, interacting with almost any
grammatical device that permits the
expression of contrasting foci.
• There appears to be no reason to associate
this construction with anything in the lexicon.
• And it doesn’t seem possible to blame the
construction itself for the meaning of any of
the expressions it inhabits. It’s pure syntax.
1c
A “hidden” construction?
•
•
Now here’s something I think is a special
construction, but it’ll be hard to convince
most of my grammarian friends.
Components:
A.
B.
C.
D.
a predicate with meaning related to ‘having’
the word “the”
a noun construable as the name of a resource
an infinitive complement controlled by whoever is
interpreted as the subject of the ‘having’ relation,
or alternatively a Purpose phrase with “for”
1c
Examples
• I don’t have the money to take a vacation.
• We lack the staff to take on such a project.
• Where can I find the cash to buy something
that expensive?
• Do we have the resources to manage that?
• We don’t have the fuel to make it to the next
town.
• Who’ll give us the funds to do that?
1c
(verb with ‘having’ semantics)
• I don’t have the money to take a vacation.
• We lack the staff to take on such a project.
• Where can I find the cash to buy something
that expensive?
• Do we have the resources to manage that?
• We don’t have the fuel to make it to the next
town.
• Who’ll give us the funds to do that?
1c
(noun construable as resource)
• I don’t have the money to take a vacation.
• We lack the staff to take on such a project.
• Where can I find the cash to buy something
that expensive?
• Do we have the resources to manage that?
• We don’t have the fuel to make it to the next
town.
• Who’ll give us the funds to do that?
1c
(complement controlled by ‘haver’)
• I don’t have the money to take a vacation.
• We lack the staff to take on such a project.
• Where can I find the cash to buy something
that expensive?
• Do we have the resources to manage that?
• We don’t have the fuel to make it to the next
town.
• Who’ll give us the funds to do that?
1c
Mystery
• The construction allows us to explain the fact that the
sequence [the N to VP] is not a self-standing
constituent, having a bounded meaning independent
of its context.
• Evidence
*I lost [the money to take a vacation].
*We spilled [the fuel to get us to the next town].
*She just fired [the staff to complete the
project].
1c
Observation
• The infinitive complement can be omitted under
conditions of definite zero anaphora (“definite null
instantiation” in FrameNet terms).
• Usually DNI is possible only when it corresponds to
an argument of some lexical unit.
–
–
–
–
–
We lost.
I’ve got an explanation.
Let me explain.
Who’s the father?
When did they arrive?
1c
DNI without a lexical host?
– Are you going to take on the new project?
--No, can’t. We lack the staff.
– Can you drive me to Tokyo?
--Sorry, I don’t have the fuel.
– Can you join us in the trip to Hawaii?
--Where am I going to find the cash?
– Do you think he’s ready to face down the boss?
--Nah, he doesn’t have the guts.
1c
Mystery
• This construction allows us to explain
the definite NP in DNI-omitted cases.
Consider the ambiguity of
I don’t have the cash.
– Situation 1: there is some contextually
understood amount of cash
– Situation 2: complement omitted in
reference to some contextually understood
use to which the cash could be put
1c
Claims
• Many of the properties of this construction are
shared by enough in place of the, though
enough has possibilities not shared by the.
• Evidence
– We don’t have {enough/the} money to do that.
We don’t have {enough/the} money for such a
project.
We don’t have {enough/the} money.
1c
Is there a lexical solution?
• Well, we could say that the, like enough,
is a lexical item that participates in a
discontinuous modifier of a noun:
– {enough/the}…to pay for a vacation,
– {enough/the} … for that project.
How to study constructions
in texts
• A kind of annotation currently used in
FrameNet can be adapted for identifying
the phrasal and lexical members of
grammatical constructions, and new
software can be created for linking such
identifications to named entities in a
growing constructicon.
There he was, standing in the
snow with no clothes on.
LU stand: There [he] was, standing [in
the snow] [with no clothes on].
Cx#373112: [There] [he] [was], [standing
in the snow with no clothes on].
Cx#764632: There [he] was, standing in
the snow [with] [no clothes] [on].
Toward an Open Source
Construction Inventory
• We’re going to seek funding for adding
construction analyses to the texts we
annotate, hoping to end up with an inventory
of the kinds of constructions that can be
found in the texts we’re working with.
• We want, of course, to be able to give names
to the constructions, and to index examples
from the annotations to the constructicon
itself.
• The description of the constructions in the
constructicon will indicate what words, or
what semantic or morphological classes of
words, are privileged participants in the
construction, and we’ll want the lexicon to
indicate for each such word which
constructions it is available for, and what
semantic or morphological class it belongs to.
• Since we’ll want to be doing this anyway, I’m
hoping to include in the proposal the means
of having people from outside the project
contribute construction descriptions of their
own, because otherwise there’s no chance
that we’ll end up with a rich enough collection
to be of use to the research community.
• The criteria for submission would include
providing one or more clear examples, from
attested and documented sources, together
with the contributor’s observations about the
details of the construction.
• This would require building an online tutorial
to explain what we’re doing, providing a web
form that would cover the things that are
needed in a construction entry, including a
place for prototypical examples.
• As a “wiki”-like resource, it will be possible for
other (authorized) contributors to critique and
supplement existing descriptions at any time,
and there will be a forum for discussions of
the individual constructions and their
properties.
• This is all a dream, of course, but wouldn’t it
be nice if we could pull it off?
thank you