CPA for verbs - Patrick Hanks

Download Report

Transcript CPA for verbs - Patrick Hanks

Corpus Pattern Analysis
(CPA)
Patrick Hanks
Research Institute of Information and
Language Processing,
University of Wolverhampton
***
[email protected]
1
Patterns in Corpora
• When you first open a concordance, very often some
patterns of use leap out at you.
– Collocations make patterns: one word goes with another
– Each pattern is associated with a meaning
– To see how words make meanings, we need to analyse collocations
• The more you look, the more patterns you see.
• BUT
• When you try to formalize the patterns, you start to see
more and more exceptions.
• The boundaries are fuzzy and there are many outlying
cases.
2
Analysis of Meaning in Language
• Analysis based on predicate logic is doomed to failure:
– Words are NOT building blocks in a ‘Lego set’
– A word does NOT denote ‘all and only’ members of a set
– Word meaning is NOT determined by necessary and sufficient
conditions for set membership
• Instead, a prototype-based approach to the lexicon is
necessary:
– mapping prototypical interpretations onto prototypical phraseology
– classifying unusual uses (unusual syntax, unusual collocations) for
what they are: exploitations of normal patterns of word use.
3
The linguistic ‘double-helix’
hypothesis
• A language is a system of rule-governed behaviour.
BUT:
• Not one, but TWO (interlinked) sets of rules:
1. Rules governing the normal uses of words to make
meanings
2. Rules governing the exploitation of norms
4
Exploitations
• People exploit the rules of normal usage for
various purposes:
• For economy and speed:
– Conversation is quick
– Listeners (and readers) get bored easily
– Words that are ‘obvious’ are often omitted
• So ellipsis is also a form of exploitation
• To say new things (reporting discoveries)
• To say old things in new ways
• For rhetoric, humour, poetry, politics …
– To grab the listeners’ (or readers’) attention
5
Lexicon and prototypes
• Each word in a language (more precisely: each
content word) is typically used in one or more
patterns of usage (valency + collocations)
– Function words and inflections are the ‘glue’ that holds
the content words together.
• Each pattern is associated with a meaning:
– a meaning is a set of prototypical beliefs
– In CPA, meanings are expressed as ‘anchored implicatures’.
– few patterns are associated with more than one meaning.
• Corpus data enables us to discover the patterns that are
associated with each word.
6
What is a pattern? (1)
• The verb is the pivot of the clause.
– A verb pattern is a statement of the clause structure
(valency) associated with a meaning of a verb
– Clause structure: SPOCA
– Subject, Predicator, Object, Complement (co-referential with
S or O), and/or Adverbial [a.k.a. Adjunct, a.k.a.
Prepositional Object]
– together with the typical (prototypical, stereotypical)
semantic values of each argument.
Different semantic values of arguments (subject, object,
prepositional object) activate different meanings of the verb.
• To get the meaning of a clause, it is necessary to correlate
the arguments, then map them onto patterns.
7
What is a pattern? (2)
• Some patterns for the verb fire:
–
–
–
–
–
–
[[Human]] fire [[Firearm]]
[[Human]] fire [[Projectile]]
[[Firearm]] fire [[Projectile]]
[[Human 1]] fire [[Human 2]]
[[Anything]] fire [[Human]] {with {enthusiasm}}
[[Human]] fire [NO OBJ]
• Etc. (PDEV has 14 patterns for the verb fire)
8
Semantic Types and the CPA
Shallow Ontology
• Items in double square brackets are semantic types.
• Semantic types are arranged hierarchically in a shallow
ontology.
• Each type in the ontology is populated with a set of lexical
items on the basis of what’s found in the corpus under each
relevant pattern.
• The ontology is corpus-driven, not speculative.
– (This is work in progress in the PDEV project)
9
Shimmering lexical sets
• Lexical sets are not stable – not "all and only".
• Example:
– [[Human]] attend [[Event]]
– [[Event]] = meeting, wedding, funeral, etc.
– But not thunderstorm, suicide.
– ALSO, people attend a school, a clinic, etc.
• School and clinic are [[Location]]s not [[Event]]s, but:
• You attend a school or a clinic because of the
[[Event]]s that take place there.
10
Meanings and boundaries
• Boundaries of linguistic and lexical categories are
fuzzy.
– There are many borderline cases.
• Instead of fussing about boundaries, we need to
focus instead on identifying prototypes.
• Then we can decide what goes with what
– Many decision will be obvious.
– Some decisions – especially about boundary cases –
will be arbitrary.
11
The Idiom Principle (Sinclair)
• According to John Sinclair, in word use there is
tension between the "terminological tendency”
and the "phraseological tendency”:
– The terminological tendency: the tendency for words
to have meaning in isolation
– The phraseological tendency: the tendency for the
meaning of a word to be activated by the context in
which it is used.
12
Verbs vs. nouns
• “Many, if not most, meanings depend on the
presence of more than one word for their
realization.” – John Sinclair
• Semi-prefabricated chunks (Alison Wray:
formulaic language)
– The meaning of a verb is largely determined by the
semantic values of its arguments.
– Predicative adjectives (glad, afraid) and event nouns
(distribution, blow) operate like verbs
– The meanings of noun-y nouns and attributive
adjectives are determined very differently.
• A plug is not a socket.
13
A crucial difference
• Scientific concepts and stipulative terminology:
– Neat, tidy, orderly, lifeless.
– If word meanings were governed by necessary
conditions, you couldn’t use existing words to say new
things.
• Word meanings:
– Messy, chaotic, dynamic.
– It’s the ‘looseness of fit’ that enables us to use existing
words to say new things.
14
What are the components of a
normal context? – Verbs
Apparatus for corpus pattern analysis of verbs:
• Valencies (NOT “NP VP” BUT “SPOCA”).
• Semantic types for the lexical sets in each valency slot:
[[Event]], [[Phys Obj]], [[Human]], [[Location]], etc.
– Lexical sets are populated by nouns – through cluster
analysis of large corpus samples.
• Subvalency items (quantifiers, determiners, etc.) may be
part of the pattern – determining the meaning of the clause:
– ‘Something took place’ [= an event] vs.
– ‘Something took its place’ [= a physical or abstract object]
15
SPOCA
• For CPA of verbs and predicative adjectives, we need a
grammar of clause roles (also known as “lexical
functions”). This is SPOCA:
• Subject (noun): 1
• Predicator (verb): the pivot of the clause.
• Object (noun): 0, 1, or [with verbs of giving] 2
• Complement: noun or adj. [co-ref. with Subj. or Obj.]
– EG She is happy; she is president; they elected her president.
• Adverbial [also known as Adjunct]: 0, 1, or many
– Some Adverbials are meaning-determining [EG They treated her
badly / with respect]
– Others are optional extras [EG They treated her in hospital / with
penicillin]
16
Do words have meaning?
•
•
•
•
What’s the meaning of blow?
What’s the meaning of file?
What’s the meaning of abate?
What’s the meaning of treat?
17
Implicatures:
taking prototypes seriously
When a pilot files a flight plan, he or she informs [they
inform?] ground control of the intended route and obtain[s]
permission to begin flying.
…If someone files a lawsuit, they activate a procedure asking
a court for justice to make a decision about some action.
When a group of people file into a room or other place, they
walk in one behind the other.
(PDEV identifies 14 prototypical patterns for file, verb, but
the distinctions are arbitrary. It would be equally plausible
to argue in favour of twice as many patterns for file.)
18
Implicatures vary according to
context
•
•
•
•
•
•
Peter treated Mary. [He’s a doctor (or a generous chap)]
Peter treated Mary with antibiotics. [Definitely a doctor]
Peter treated Mary badly. [May or may not be a doctor]
Peter treated Mary with respect. [Probably not a doctor]
Peter treated Mary to a fancy dinner. [Generous chap]
Peter treated Mary to his views on Jeremy Corbyn. [Ironic
implication of generosity]
• Peter treated the woodwork with creosote.[None of the
above]
19
Cognitive salience and
social salience
What is the primary implicature of Peter
treated Mary?
Cognitively salient interpretation: he bought
her lunch.
Socially salient interpretation: he was a health
professional, attending to her injuries or
illness.
20
Sample from a concordance
(unsorted)
incessant noise and bustle had abated. It seemed everyone was up
after dawn the storm suddenly abated. Ruth was there waiting when
Thankfully, the storm had abated, at least for the moment, and
storm outside was beginning to abate, but the sky was still ominous
Fortunately, much of the fuss has abated, but not before hundreds of
, after the shock had begun to abate, the vision of Benedict's
been arrested and street violence abated, the ruling party stopped
he declared the recession to be abating, only hours before the
‘soft landing’ in which inflation abates but growth continues moderate
the threshold. The fearful noise abated in its intensity, trailed
ability. However, when the threat abated in 1989 with a ceasefire in
bag to the ocean. The storm was abating rapidly, the evening sky
ferocity of sectarian politics abated somewhat between 1931 and
storm. By dawn the weather had abated though the sea was still angry
the dispute showed no sign of abating yesterday. Crews in
21
Sorted (1): [[Event = Storm]] abate [NO OBJ]
DOMAIN: Weather
dry kit and go again.The storm abates a bit, and there is no problem in
ling.Thankfully, the storm had abated, at least for the moment, and the
sting his time until the storm abated but also endangering his life, Ge
storm outside was beginning to abate, but the sky was still ominously o
bag to the ocean.The storm was abating rapidly, the evening sky clearin
after dawn the storm suddenly abated.Ruth was there waiting when the h
t he wait until the rain storm abated.She had her way and Corbett went
storm.By dawn the weather had abated though the sea was still angry, i
lcolm White, and the gales had abated: Yachting World had performed the
he rain, which gave no sign of abating, knowing her options were limite
n became a downpour that never abated all day.My only protection was
ned away, the roar of the wind abating as he drew the hatch closed behi
22
Sorted (2): [[Event = Problem]] abate [NO OBJ]
Domain: Social Interaction
‘soft landing’ in which inflation abates but growth continues modera
Fortunately, much of the fuss has abated, but not before hundreds of
the threshold. The fearful noise abated in its intensity, trailed
incessant noise and bustle had abated. It seemed everyone was up
ability. However, when the threat abated in 1989 with a ceasefire in
the Intifada shows little sign of abating. It is a cliche to say that
h he declared the recession to be abating, only hours before the pub
he ferocity of sectarian politics abated somewhat between 1931 and 1
been arrested and street violence abated, the ruling party stopped b
the dispute showed no sign of abating yesterday. Crews in
23
Sorted (3): [[Emotion = Negative]] abate [NO OBJ]
DOMAIN: Human Emotion
ript on the table and his anxiety abated a little.This talented, if
that her initial awkwardness had abated # for she had never seen a
es if some inner pressure doesn't abate.He wanted to play at the fun
Baker in the foyer and my anxiety abated.He seemed disappointed and
hained at the time.When the agony abated he was prepared to laugh wi
self; the pain gradually began to abate spontaneously, a great relie
ght, after the shock had begun to abate, the vision of Benedict's sn
y calm, control it!) The fear was abating, the trembling beginning t
his dark eyes. That fear did not abate when, briefly, he halted. For
AN EXPLOITATION OF THIS NORM:
isapproval, his kindlier feelings abated, to be replaced by a resurg
(“kindlier feelings” are normally positive, not negative.)
24
A domain-specific norm:
[[Person | Action]] abate [[Nuisance]]
DOMAIN: Law, REGISTER: Jargon
o undertake further measures to abate the odour, and in Attorney Ge
us methods were contemplated to abate the odour from a maggot farm
s specified are insufficient to abate the odour then in any further
as the inspector is striving to abate the odour, no action will be
t practicable means be taken to abate any existing odour nuisance,
ll equipment to prevent, and or abate odour pollution would probabl
rmation alleging the failure to abate a statutory nuisance without
t I would urge you at least to abate the nuisance of bugles forthw
way that the nuisance could be abated, but the decision is the dec
otherwise the nuisance is to be abated.They have full jurisdiction
ion, or the local authority may abate the nuisance and do whatever
25
Part of the lexical set [[Event =
Problem]] as subject of ‘abate’
From BNC: {fuss, problem, tensions, fighting, price war, hysterical
media clap-trap, disruption, slump, inflation, recession, the Mozart
frenzy, working-class militancy, hostility, intimidation, ferocity of
sectarian politics, diplomatic isolation, dispute, …}
From AP: {threat, crisis, fighting, hijackings, protests, tensions, antiJapan fervor, violence, bloodshed, problem, crime, guerrilla attacks,
turmoil, shelling, shooting, artillery duels, fire-code violations, unrest,
inflationary pressures, layoffs, bloodletting, revolution, murder of
foreigners, public furor, eruptions, bad publicity, outbreak, jeering,
criticism, infighting, risk, crisis, …}
(All these are kinds of problem.)
26
The CPA method
• Create a sample concordance for each word
– 250-500 examples
– from a ‘balanced’ corpus (i.e. general language)
[We use the British National Corpus, 100 million words]
– Classify every line in the sample, on the basis of its
context.
• Take further samples if necessary to establish that
a particular phraseology is conventional
• Check results against corpus-based dictionaries.
• Use introspection to interpret data, but not to
create data.
27
In CPA, classification of every line
in the sample must be attempted
The classes are:
• Norms (normal uses in normal contexts)
• Exploitations (e.g. ad-hoc metaphors)
• Alternations
– e.g. [[Doctor]] treat [[Patient]] <> [[Medicine]] treat [[Patient]]
• Not classified:
–
–
–
–
Names (Midnight Storm: name of a horse, not a storm)
Mentions (to mention a word or phrase is not to use it)
Errors
Unassignables
28
Corpus analysis of ‘shower’,
verb
• Go to corpus and select ‘shower’ v.
• Does Sketch Engine help?
___
• Look at PDEV, ‘shower’ v.
• Compare the entries in existing dictionaries: OED,
(N)ODE, COED, OALDCE.
• Are they all mutually compatible?
– Do they have to be?
29
The Pattern Dictionary of English
Verbs
• http://www.pdev.org.uk/
– freely available – no login, no subscription.
– There are approximately 5600 verbs (“base verbs”) in
normal use in English.
– Phrasal verbs and idioms are analysed simply as
patterns of the base verb.
– At the time of writing we have completed pattern
analysis of 1200 English verbs.
30