Convention, Metaphors, and Similes
Download
Report
Transcript Convention, Metaphors, and Similes
How People use words to make
meanings
__
How to compute the meaning of
natural language utterances
Patrick Hanks
Professor in Lexicography
University of Wolverhampton
***
1
Goals of the tutorial
• To explore the relationship between meaning and
phraseology.
• To explore the relationship between conventional uses of
words and creative uses such as freshly coined metaphors.
• To discover factors that contribute to the dynamic power of
natural language, including anomalous arguments, ellipsis,
and other “explotations” of normal usage.
2
Procedure
• We shall focus on verbs.
• We shall not invent examples.
– Instead, we shall analyse data.
– When tempted to invent an example, we try to remember to check
for a comparable one in a corpus.
• We shall look at large numbers of actual uses of a verb,
using concordances to a very large corpus.
• We shall ask questions such as:
–
–
–
–
What patterns of normal use of this verb can we detect?
What is the nature of a “pattern”?
Does each pattern have a different meaning?
What is the nature of lexical ambiguity, and why has it been so
troublesome for NLP?
3
Patterns in Corpora
• When you first open a concordance, very often some
patterns of use leap out at you.
– Collocations make patterns: one word goes with another
– To see how words make meanings, we need to analyse collocations
• The more you look, the more patterns you see.
• BUT THEN
• When you try to formalize the patterns, you start to see
more and more exceptions.
• The boundaries of ‘correct usage’ are fuzzy and there are
many outlying cases.
4
Analysis of Meaning in Language
• Analysis based on predicate logic is doomed to failure:
– Words are NOT building blocks in a ‘Lego set’
– A word does NOT denote ‘all and only’ members of a set
– Word meaning is NOT determined by necessary and sufficient
conditions for set membership
– But precise meanings for natural language terms can be stipulated
(using the loose, vague terminology of a natural language
• For analysing the lexicon of a natural language, a
prototype-based approach is necessary:
– mapping prototypical interpretations (‘implicatures’) onto
prototypical phraseology
– computing similarity to a prototype
5
The linguistic ‘double-helix’
hypothesis
• A language is a system of rule-governed behaviour.
• Not one, but TWO (interlinked) sets of rules:
1. Rules governing the normal uses of words to make
meanings
2. Rules governing the exploitation of norms
6
Lexicon and prototypes
• Each word is typically used in one or more
patterns of usage (valency + collocations)
• Each pattern is associated with a meaning:
– a meaning is a set of prototypical beliefs
– In CPA, meanings are expressed as ‘anchored implicatures’.
– few patterns are associated with more than one meaning.
• Corpus data enables us to discover the patterns that are
associated with each word.
7
What is a pattern? (1)
• The verb is the pivot of the clause.
• A pattern is a statement of the clause structure
(valency) associated with a meaning of a verb
– together with the typical semantic values of each
argument.
– arguments of verbs are populated by lexical sets of
collocates
• Different semantic values of arguments activate
different meanings of the verb.
8
What is a pattern? (2)
•
•
•
•
•
•
[[Human]] fire [[Firearm]]
[[Human]] fire [[Projectile]]
[[Human 1]] fire [[Human 2]]
[[Anything]] fire [[Human]] {with enthusiasm}
[[Human]] fire [NO OBJ]
Etc.
9
Semantic Types and Ontology
• Items in double square brackets are semantic
types.
• Semantic types are being gathered together into a
shallow ontology.
– (This is work in progress in the currect CPA project)
• Each type in the ontology will (eventually) be
populated with a set of lexical items on the basis
of what’s in the corpus under each relevant
pattern.
10
Shimmering lexical sets
• Lexical sets are not stable – not „all and
only”.
• Example:
– [[Human]] attend [[Event]]
– [[Event]] = meeting, wedding, funeral, etc.
– But not thunderstorm, suicide.
11
Meanings and boundaries
• Boundaries of all linguistic and lexical categories
are fuzzy.
– There are many borderline cases.
• Instead of fussing about boundaries, we should
focus instead on identifying prototypes.
• Then we can decide what goes with what.
– Many decision will be obvious.
– Some decisions – especially about boundary cases –
will be arbitrary.
12
Computing meaning
• Each user of a language has a “corpus” of uses stored inside
his or her head
– These are traces of utterances that the person has seen, heard, or uttered
• Each person’s mental corpus of English (etc .) is different
• What all these “mental corpora” have in common is patterns
• By analysing a huge corpus of texts computationally, we can
create a pattern dictionary that represents patterns shared by
many (most?) users of the language.
• for use by computers as well as by people.
• In a pattern dictionary, each pattern is associated with a
meaning (or a translation, or some other interpretation or fact)
13
Now comes the long, slow,
hard bit
• We haven’t built the inventory (the pattern
dictionary) yet. It is work in progress.
• We have to compile an inventory of normal
patterns.
– a Pattern Dictionary of English Verbs (pdev)
14
What are the components of a
normal context? – verbs
The apparatus for corpus pattern analysis of verbs:
• Valencies (NOT “NP VP” BUT “SPOCA”).
• Semantic values for the lexical sets in each valency slot:
[[Event]], [[Phys Obj]], [[Human]], [[Institution]],
[[Location]], etc.
– Lexical sets can be populated by cluster analysis of
corpora.
• Subvalency items (quantifiers, determiners, etc.):
– ‘Something took place’ vs.
– ‘Something took its place’.
15
Clause Roles
•
•
•
•
S – Subject
P – Predicator (the verb + auxiliaries, etc.)
O – Object (one, two, or none)
C – Complement (co-referential with S or O)
• I am happy | a lexicographer: C(S)
• Being here makes me happy | a tourist: C(O)
• A – Adverbial
• I came to Leiden | on Wednesday
• They treated me well | respectfully |with respect
16
Implicatures:
taking stereotypes seriously
When a pilot files {a flight plan}, he or she informs ground
control of the intended route and obtains permission to
begin flying.
…If someone files {a lawsuit}, they activate a procedure
asking a court for justice.
When a group of people file {into a room or other place},
they walk in one behind the other.
(There are 14 such patterns for file, verb.)
17
The CPA method
• Create a sample concordance for each word
– 300-500 examples
– from a ‘balanced’ corpus (i.e. general language)
[We use the British National Corpus, 100 million words, and the
Associated Press Newswire for 1991-3, 150 million words]
– Classify every line in the sample, on the basis of its
context.
• Take further samples if necessary to establish that
a particular phraseology is conventional
• Check results against corpus-based dictionaries.
• Use introspection to interpret data, but not to
create data.
18
In CPA, every line in the sample
must be classified
The classes are:
• Norms (normal uses in normal contexts)
• Exploitations (e.g. ad-hoc metaphors)
• Alternations
– e.g. [[Doctor]] treat [[Patient]] <> [[Medicine]] treat [[Patient]]
•
•
•
•
Names (Midnight Storm: name of a horse, not a storm)
Mentions (to mention a word or phrase is not to use it)
Errors
Unassignables
19
Sample from a concordance
(unsorted)
incessant noise and bustle had abated. It seemed everyone was up
after dawn the storm suddenly abated. Ruth was there waiting when
Thankfully, the storm had abated, at least for the moment, and
storm outside was beginning to abate, but the sky was still ominous
Fortunately, much of the fuss has abated, but not before hundreds of
, after the shock had begun to abate, the vision of Benedict's
been arrested and street violence abated, the ruling party stopped
he declared the recession to be abating, only hours before the
‘soft landing’ in which inflation abates but growth continues moderate
the threshold. The fearful noise abated in its intensity, trailed
ability. However, when the threat abated in 1989 with a ceasefire in
bag to the ocean. The storm was abating rapidly, the evening sky
ferocity of sectarian politics abated somewhat between 1931 and
storm. By dawn the weather had abated though the sea was still angry
the dispute showed no sign of abating yesterday. Crews in
20
Sorted (1): [[Event = Storm]] abate [NO OBJ]
dry kit and go again.The storm abates a bit, and there is no problem in
ling.Thankfully, the storm had abated, at least for the moment, and the
sting his time until the storm abated but also endangering his life, Ge
storm outside was beginning to abate, but the sky was still ominously o
bag to the ocean.The storm was abating rapidly, the evening sky clearin
after dawn the storm suddenly abated.Ruth was there waiting when the h
t he wait until the rain storm abated.She had her way and Corbett went
storm.By dawn the weather had abated though the sea was still angry, i
lcolm White, and the gales had abated: Yachting World had performed the
he rain, which gave no sign of abating, knowing her options were limite
n became a downpour that never abated all day.My only protection was
ned away, the roar of the wind abating as he drew the hatch closed behi
21
Sorted (2): [[Event = Problem]] abate [NO OBJ]
‘soft landing’ in which inflation abates but growth continues modera
Fortunately, much of the fuss has abated, but not before hundreds of
the threshold. The fearful noise abated in its intensity, trailed
incessant noise and bustle had abated. It seemed everyone was up
ability. However, when the threat abated in 1989 with a ceasefire in
the Intifada shows little sign of abating. It is a cliche to say that
h he declared the recession to be abating, only hours before the pub
he ferocity of sectarian politics abated somewhat between 1931 and 1
been arrested and street violence abated, the ruling party stopped b
the dispute showed no sign of abating yesterday. Crews in
22
Part of the lexical set [[Event =
Problem]] as subject of ‘abate’
From BNC: {fuss, problem, tensions, fighting, price war, hysterical
media clap-trap, disruption, slump, inflation, recession, the Mozart
frenzy, working-class militancy, hostility, intimidation, ferocity of
sectarian politics, diplomatic isolation, dispute, …}
From AP: {threat, crisis, fighting, hijackings, protests, tensions, antiJapan fervor, violence, bloodshed, problem, crime, guerrilla attacks,
turmoil, shelling, shooting, artillery duels, fire-code violations, unrest,
inflationary pressures, layoffs, bloodletting, revolution, murder of
foreigners, public furor, eruptions, bad publicity, outbreak, jeering,
criticism, infighting, risk, crisis, …}
(All these are kinds of problem.)
23
Part of the lexical set [[Emotion =
Negative]] as subject of ‘abate’
From BNC: {anxiety, fear, emotion, rage, anger, fury, pain,
agony, feelings, …}
From AP: {rage, anger, panic, animosity, concern, …}
24
A domain-specific norm:
[[Person | Action]] abate [[Nuisance]]
(DOMAIN: Law. Register: Jargon)
o undertake further measures to abate the odour, and in Attorney Ge
us methods were contemplated to abate the odour from a maggot farm
s specified are insufficient to abate the odour then in any further
as the inspector is striving to abate the odour, no action will be
t practicable means be taken to abate any existing odour nuisance,
ll equipment to prevent, and or abate odour pollution would probabl
rmation alleging the failure to abate a statutory nuisance without
t I would urge you at least to abate the nuisance of bugles forthw
way that the nuisance could be abated, but the decision is the dec
otherwise the nuisance is to be abated.They have full jurisdiction
ion, or the local authority may abate the nuisance and do whatever
25
A more complicated verb: ‘take’
• 61 phrasal verb patterns, e.g.
[[Person]] take [[Garment]] off
[[Plane]] take off
[[Human Group]] take [[Business]] over
• 105 light verb uses (with specific objects), e.g.
[[Event]] take place
[[Person]] take {photograph | photo | snaps | picture}
[[Person]] take {the plunge}
• 18 ‘heavy verb’ uses, e.g.
[[Person]] take [[PhysObj]] [Adv[Direction]]
• 13 adverbial patterns, e.g.
[[Person]] take [[TopType]] seriously
[[Human Group]] take [[Child]] {into care}
• TOTAL: 204 patterns, and growing (but slowly)
26
Finer distinctions: ‘take + place’
Presence or absence of a determiner can determine the
meaning of the verb.
•[[Event]] take {place}: A meeting took place.
•[[Person 1]] take {[[Person 2]]’s place}:
– George took Bill’s place.
•[[Person]] take {[REFLDET] place}: Wilkinson took
his place among the greats of the game.
•[[Person=Competitor]] take {[ORDINAL] place}: The
Germans took first place.
27
Semantic type vs. contextual role
• Mr Woods sentenced Bailey to seven years | life
imprisonment
PATTERN: [[Human 1]] sentence [[Human 2]] {to [[Time Period
| Punishment]]}
• Semantic type: [[Human]]
• Contextual roles: [[Human 1 = Judge]], [[Human 2 =
Convicted Criminal]], seven years [Time Period =
Punishment in jail]]
– Semantic type is an intrinsic semantic property of a
lexical item.
– Contextual role is extrinsic; the meaning is
imposed (activated, selected) by the context in
which the word is used.
28
Exploitations
• People don’t just say the same thing, using the
same words repeatedly.
• They also exploit normal usage in order to say
new things, or in order to say old things in new
and interesting ways.
• Exploitations include: 1) anomalous arguments;
metaphor and other figures of speech, 2) ellipsis,
word creation.
• Exploitations are a form of linguistic creativity.
29
Exploitation type 1:
Anomalous arguments
• A rally driver urging his car through the
forest
– Normally, you urge someone (e.g. a politician)
to do something OR you urge a horse (not a
car) in a certain direction
30
Exploitation type 2:
Metaphor
• The uninspiring, fragment-bearing wind
that blows through much of Eliot's poetry
31
Norms and exploitations
• We need a theory (and a procedure) that
distinguishes the normal, conventional,
idiomatic phraseology of each word from
exploitations of those phraseological norms.
32
The biggest challenges currently
facing CPA
• Finding quick, efficient, reliable, automatic
or semi-automatic ways of populating the
lexical sets.
• Distinguishing patterns from “the general
much of goings on”.
• Predicting patterns by computational
analysis of data, for verbs not yet analysed.
• Dealing with freshly created metaphors and
other exploitations.
33
How is CPA different from
FrameNet?
CPA:
• finds syntagmatic criteria (patterns) for distinguishing different
meanings of polysemous words, in a “semantically shallow” way;
• proceeds word by word;
• When a verb has been analysed, the patterns are ready for use.
FrameNet:
• proceeds frame by frame, not word by word;
• analyses situations in terms of frame elements;
• does not explicitly study meaning differences of polysemous words;
• does not analyse corpus data systematically, but goes fishing in
corpora for examples in support of hypotheses;
• has no established inventory of frames;
• has no criteria for completeness of a lexical entry.
34
Goals of CPA
• To create an inventory of semantically motivated
syntagmatic patterns, so as to reduce the ‘lexical entropy’
of each word.
– A benchmark for assigning meaning to patterns in previously
unseen, untagged text
• To develop procedures for populating lexical sets by
computational cluster analysis of text corpora.
• To collect evidence for the principles that govern the
exploitations of norms.
35