WordNet: Connecting words and concepts

Download Report

Transcript WordNet: Connecting words and concepts

WordNet:
Connecting words and concepts
Peng.Huang
What is WordNet?
• A large lexical database, or “electronic
dictionary”, for English Language
• Started in 1985, by Miller
• Covers most English nouns, verbs,
adjectives, adverbs
• Electronic format makes it amenable to
automatic manipulation
What’s so special about WordNet?
• Traditional paper dictionaries are organized
alphabetically, so words that are grouped
together (on the same page) are unrelated
• WordNet is organized by meaning, so
words in close proximity are related
Basic Design of WordNet
WordNet entries are word-concept mappings
Natural Languages map many-to many:
One concept can be expressed by many words
(synonymy):
{car, auto, automobile}
{close, shut}
Basic Design of WordNet
One word can express many concepts (polysemy):
{club, stick}
{club, nightclub}
{club, playing card}
Basic Design of WordNet
WordNet’s building blocks: sets of synonyms (synsets)
--{hit, beat}, {big, large}, {queue, line}
Each synset expresses a distinct concept.
A gloss is a textual definition of the synset
-- “band -- (a range of frequencies between two limits)”
Currently, WordNet 3.0 contains appr. 117,000 synsets
Basic Design of WordNet
• Groups the meanings of English words into
five categories
–
–
–
–
–
Nouns
Verbs
Adjectives
Adverbs
Function words(prepositions, pronouns,
determiners)
Basic Design of WordNet
WordNet stores, and allows one to retrieve,
--all concepts that a given word can express
--all words that express a given concept
But there’s more!
• Words and synsets are connected via meaning-based
relations
–
–
–
–
–
Synonymy (Pipe, Tube)
Antonymy (Wet, Dry)
Hyponymy (Tree, Plant)
Meronymy (Ship, Fleet)
Morphological relations
• Result: a large semantic network
(as opposed to a flat list in a paper dictionary)
Relations among WN noun
synsets
•
Hyperonymy/hyponymy relates super/subordinate synsets (denting more/less
general concepts):
{vehicle}
/ \
{car, automobile} {bicycle, bike}
/
\
\
{convertible} {SUV} {mountain bike}
Transitivity:
A car is a kind of vehicle
An SUV is a kind of car
=> An SUV is a kind of vehicle
Relations among noun synsets
• Meronymy/holonymy (part/whole)
{car, automobile}
|
{engine}
/
\
{spark plug} {cylinder}
Inheritance:
A car has an engine
An engine has spark plugs
=> A car has spark plugs
Relations among verb synsets
Verbs denote event
Related by a “manner” relation
{communicate}
|
{talk}
/
\
{stammer} {whisper}
Relations among verb Synset
Semantics of events (verbs) are very different
from semantics of entities (nouns)
WordNet captures this fact with different
relations
Relation refer to temporal properties of events
--partial and complete overlap of two events
--prior or posterior events
WordNet
Relations among synsets create interconnected network
Different senses of polysemous words are members of
distinct synsets that are related to different synsets (i.e.,
occupy different locations in the network)
e.g., {stock, broth} has superordinate synset {dish}
{stock, breed} has superordinate {variety}
These different synsets are also linked to different part/whole
synsets
WordNet
A word’s meaning can be defined in terms of
its position in the network
club1 is a kind of association/has members
club2 is a kind of stick
Relatedness between words or synsets can be
quantified in terms of path length (number
of connections among synsets)
WordNet
• How closely related are {zebra} and {horse}?
Very: Both share the direct superordinate equine
• What about {horse, sawhorse} and {horse,
gymnastic horse}?
Related, but less so: joint superordinate {artifact} is
4-5 levels up
• What about {zebra} and {horse, gymnastic horse}?
Unrelated: the trees containing them never intersect!
WordNet for Word Sense
Disambiguation
• WSD is a major problem in Natural
Language Processing
• Assumption: words in a context (phrase,
sentence, discourse) are semantically related
• So, horse in the neighborhood of zebra is
likely to mean “equine”; in the
neighborhood of gym it likely means
“gymnastic horse.”
WordNet for WSD
If you want to disambiguate “horse” in the
context of “zebra,” look for all WordNet
paths from “zebra” to “horse.” The shortest
one is likely to give you the correct sense of
“horse.”
WordNet for WSD
• Can take advantage of WordNet classes (trees of
hierarchically related synsets)
• e.g., run1 co-occurs with nouns that are all
hyponyms (subordinate, more specific concepts)
of office (mayor, congresswoman, President,...)
• run2 co-occurs with nouns that are hyponyms of
machine (computer, washer, printing press,
engine,...)
Topics/Domain in WordNet
• Hierachical organization leaves many
related concepts unconnected
• Solution: link synsets across “trees” in
terms of their membership in a “domain” or
topic
• E.g., synsets {contraindication},{surgery},
{physician},....are all linked to {medicine},
the concept that defines a domain or topic
Topics/Domain in WordNet
• Customizable: user can define new topics
• Topics can be as coarse- or fine-grained as
desired
• By using synsets as topic labels, the
concepts subsumed under the new topic(s)
will continue to be part of the network
Current and Future Work
• Increase density of WordNet
• More links, new relations
• E.g. “role” relation among nouns:
distinguish {poodle}-{dog} (a “type” relation)
from {poodle}-{pet} (a “role” relation)
poodle is a type of dog, but not a type of pet
poodle can (but must not) play the “role” of pet
Work just completed...
(sponsored by ARDA/AQUAINT)
Manually link nouns, verbs, adjectives,
adverbs in the definitions (“glosses”) to the
appropriate synset:
{bank (a financial institution that accepts
deposits...)}
{bank (sloping land..)}
Gloss Disambiguation
{bank (a financial institution that accepts
deposits...)}
{financial, fiscal} {institution, establishment}
{institution, custom}
{bank (sloping land..)}
{slope, incline} {land, ground, earth}
{land, country}
Gloss Disambiguation: Results
• A closed system linking glosses and synsets (and a
more densely connected network)
• Each gloss is more informative as it adds synset
information for the words in the gloss
• Glosses are examples of contexts for many wordsense pairs, telling us how words with specific
senses are being used in context
• Glosses can be used as training data for machine
learning systems that want to “learn” to
disambiguate words automatically
Summary
• From Google about 1,190,000 item
with respect to WordNet
• There is more than what you see…But
less than what you imagine!!!
Where to find WordNet
Freely downloadable:
http://wordnet.princeton.edu/
Database, browser, documentation
Global WordNet
Currently, wordnets exist for some 40
languages, including
Arabic, Basque, Bulgarian, Estonian, Hebrew,
Icelandic, Italian, Kannada, Latvian, Persian,
Romanian, Sanskrit, Tamil, Thai, Turkish,...
http://www.globalwordnet.org
Thank you!
Q&A