PowerPoint Presentation - When Corpus Meets Theory

Download Report

Transcript PowerPoint Presentation - When Corpus Meets Theory

The DVC project: Disambiguation
of Verbs by Collocation
____
an introduction to the linguistic theory of
norms and exploitations
Patrick Hanks
Research Institute of Information and Language Processing,
University of Wolverhampton
[email protected]
1
Words are very ambiguous;
dictionaries are misleading
• In any dictionary, more than one sense is usually given for each
word.
– Often, many senses.
– For example, in MWALED (Merriam Websters’ Advanced Learner’s English
Dictionary) the verb blow has 12 senses, plus 6 subsenses, plus 7 phrasal verbs
(each with between 1 and 6 senses), plus 15 idiomatic phrases.
– The noun is even more complicated.
• Dictionaries do not tell the user (a learner or a programmer) how
to distinguish one sense of a word from another.
• WSD (word sense disambiguation) projects in NLP, using
dictionaries, have failed, according to leaders in the field (e.g.
Ide and Wilks 2006).
2
Phraseological patterns of word use
• Most utterances consists of words used in familiar patterns, e.g.:
–
–
–
–
–
–
The wind was blowing from the east;
the wind blew the napkin off the table;
the referee blew his whistle for the end of the match;
he blew his nose.
They blew up the bridge;
the bridge blew up.
• These are examples of phraseological ‘norms’ associated with blow.
• Unconsciously, ordinary language users repeat the same norms
(patterns) over and over again, with minor variations in the
various slots in the patterns.
– e.g. ‘east’ alternates with ‘west’, ‘north’, ‘south’, etc.
3
Patterns are unambiguous
• Unlike words, patterns are unambiguous.
• ‘He blew up a bridge’ and ‘He blew up a balloon’ have quite
distinct, unambiguous meanings
– even though the words blow, bridge, and balloon can all be ambiguous
when taken in isolation, out of context.
– The verb is the pivot of the clause.
– Each verb is associated with one or more stereotypical phraseological
patterns.
• For NLP and language teaching alike, there is a great need for a
dictionary or inventory of normal phraseological patterns.
• A pattern is a statistical probability, not a cut-and-dried certainty.
• The aim must be to inventorize all normal usage, not all possible
usage.
4
Norms and exploitations
• The DVC project at RIILP is developing a method (Corpus
Pattern Analysis) for identifying and building an inventory of
prototypical phraseological norms.
• www.pdev.org.uk
• Each pattern consists of a syntagmatic structure plus lexical
sets of collocations.
• Understanding meaning depends on matching the wording of an
actual utterance with a pattern.
– Best match wins!
• Speakers and writers sometimes exploit norms in various ways,
for example to create new metaphors.
• The DVC project is also studying the rules governing
exploitations of phraseological norms.
5