Convention, Metaphors, and Similes

Download Report

Transcript Convention, Metaphors, and Similes

How to Compute the Meaning of
Natural Language Utterances
Patrick Hanks,
Research Institute of Information and Language
Processing,
University of Wolverhampton
***
1
Goals of the tutorial
• To explore the relationship between meaning and
phraseology.
• To explore the relationship between conventional uses of
words and creative uses such as freshly coined metaphors.
• To discover factors that contribute to the dynamic power of
natural language, including anomalous arguments, ellipsis,
and other “explotations” of normal usage.
2
Procedure
• We shall focus on verbs.
– We shall not assume that the analytic procedure developed for
verbs is equally suitable for nouns
• We shall not invent examples.
– Instead, we shall analyse data.
• Instead, we shall look at large numbers of actual uses of a
verb, using concordances to a very large corpus.
• We shall ask questions such as:
–
–
–
–
What patterns of normal use of this verb can we detect?
What is the nature of a “pattern”?
Does each pattern have a different meaning?
What is the nature of lexical ambiguity, and why has it been so
troublesome for NLP?
3
Patterns in Corpora
• When you first open a concordance, very often some
patterns of use leap out at you.
– Collocations make patterns: one word goes with another
– To see how words make meanings, we need to analyse collocations
• The more you look, the more patterns you see.
• BUT
• When you try to formalize the patterns, you start to see
more and more exceptions.
• The boundaries are fuzzy and there are many outlying
cases.
4
Analysis of Meaning in Language
• Analysis based on predicate logic is doomed to failure:
– Words are NOT building blocks in a ‘Lego set’
– A word does NOT denote ‘all and only’ members of a set
– Word meaning is NOT determined by necessary and sufficient
conditions for set membership
• Instead, a prototype-based approach to the lexicon is
necessary:
– mapping prototypical interpretations onto prototypical phraseology
5
The linguistic ‘double-helix’
hypothesis
• A language is a system of rule-governed behaviour.
• Not one, but TWO (interlinked) sets of rules:
1. Rules governing the normal uses of words to make
meanings
2. Rules governing the exploitation of norms
6
Exploitations
• People exploit the rules of normal usage for various
purposes:
• For economy and speed:
– Conversation is quick
– Listeners (and readers) get bored easily
– Words that are ‘obvious’ can sometimes be omitted
• To say new things (reporting discoveries)
• To say old things in new ways
• For rhetoric, humour, poetry, politics …
7
Lexicon and prototypes
• Each word is typically used in one or more
patterns of usage (valency + collocations)
• Each pattern is associated with a meaning:
– a meaning is a set of prototypical beliefs
– In CPA, meanings are expressed as ‘anchored implicatures’.
– few patterns are associated with more than one meaning.
• Corpus data enables us to discover the patterns that are
associated with each word.
8
What is a pattern? (1)
• The verb is the pivot of the clause.
• A pattern is a statement of the clause structure
(valency) associated with a meaning of a verb,
– together with the typical semantic values of each
argument.
– arguments of verbs are populated by lexical sets of
collocates
• Different semantic values of arguments activate
different meanings of the verb.
9
What is a pattern? (2)
•
•
•
•
•
•
[[Human]] fire [[Firearm]]
[[Human]] fire [[Projectile]]
[[Human 1]] fire [[Human 2]]
[[Anything]] fire [[Human]] {with enthusiasm}
[[Human]] fire [NO OBJ]
Etc.
10
Semantic Types and Ontology
• Items in double square brackets are semantic
types.
• Semantic types are being gathered together into a
shallow ontology.
– (This is work in progress in the currect CPA project)
• Each type in the ontology will (eventually) be
populated with a set of lexical items on the basis
of what’s in the corpus under each relevant
pattern.
11
Shimmering lexical sets
• Lexical sets are not stable – not „all and
only”.
• Example:
– [[Human]] attend [[Event]]
– [[Event]] = meeting, wedding, funeral, etc.
– But not thunderstorm, suicide.
12
Meanings and boundaries
• Boundaries of all linguistic and lexical categories
are fuzzy.
– There are many borderline cases.
• Instead of fussing about boundaries, we should
focus instead on identifying prototypes
• Then we can decide what goes with what
– Many decision will be obvious.
– Some decisions – especially about boundary cases –
will be arbitrary.
13
The importance of
phraseology
• “Many, if not most, meanings depend on the
presence of more than one word for their
realization.” – John Sinclair
14
The Idiom Principle (Sinclair)
• In word use, there is tension between the
„terminological tendency” and the
„phraseological tendency”:
– The terminological tendency: the tendency for words
to have meaning in isolation
– The phraseological tendency: the tendency for the
meaning of a word to be activated by the context in
which it is used.
15
Computing meaning (1)
• Each user of a language has a “corpus” of uses stored
inside his or her head
– These are traces of utterances that the person has seen, heard, or
uttered
• Each person’s mental corpus of English (etc .) is different
• What all these “mental corpora” have in common is
patterns
• By analysing a huge corpus of texts computationally, we
can create a pattern dictionary for use by computers as well
as by people.
• In a pattern dictionary, each pattern is associated with a
meaning (or a translation, or other implicature)
16
Computing meaning (2)
• When processing unseen text, the computer compares the
actual use of each verb in the text with the inventory of
patterns in the pattern dictionary, unsing information about
a) valency, and b) semantic types of collocates.
• Exact matches are not to be expected.
• Best match wins: the pattern dictionary provides the most
probable meaning (or trnaslation) of the word in context.
17