Wordnet - Columbia University

Download Report

Transcript Wordnet - Columbia University

New Developments in WordNet:
An Electronic Lexical Resource
Helen Langone
Columbia University
Princeton Cognitive Science Laboratory
4/22/04
“Classic” WordNet
●
●
●
A lexical semantic network relating word forms and
lexicalized concepts (i.e., concepts that speakers have
adopted word forms to express)
Main relations—hyponymy/troponymy (kind-of/way-to),
meronymy (part-whole), synonymy, antonymy
Predominantly hierarchical, few relations across grammatical
class, glosses & example sentences do not participate in
network
●
Nouns organized under 9 unique beginners
●
Command-line interface & C library
●
Prehistoric (but greppable!) db format
4/22/04
act, human action
entity
physical object
causal agent
living thing
organism, being
whole thing, unit
action
person
artifact
choice,
selection
bad person
creation
decision
wrongdoer
representation
copy
move
deceiver
imitation
tactical
maneuver
feint
imposter
impostor
sham
pretender
faker
juke
shammer
role player
fraud
fake, n
pseud
pseudo
postiche
4/22/04
act, move
make, create
express, verbalize
interact
treat
wrong
represent
victimize
re-create
deceive
misrepresent
cheat, chisel
feign
bull
cook
misrepresent
fudge
falsify
forge
counterfeit
fake, v
wangle
bullshit
4/22/04
faux
false
simulated
imitation
fake, a
bastard
bogus
phony
phoney
4/22/04
Unique beginner synsets
4/22/04
4/22/04
What's ahead/new...
●
Instance-of pointers
●
“Morphosemantic” links
●
Ontological reorg
●
Glosses disambiguated, parsed and translated into
logical forms
●
SQL database access & Perl library
●
Alternative forms, inflected forms
4/22/04
Instance-of pointer
●
●
●
Classic WordNet: hyponymy relation made no
distinction between subsumption (type or kind of
x) and instantiation (is an instance of an x)
e.g. George Washington is an Instance-of
{United_States_President}, not a Kind-of
e.g. a playwright is a Kind-of {writer, author}, while
Shakespeare is an Instance-of the type {dramatist,
playwright}
4/22/04
“Morphosemantic” links
●
Links between word forms motivated by both form and
meaning.
–
●
●
●
verb-noun (write/writer), noun-adj (duplicity/duplicitous), verb-adj
(talk/talkative), noun-noun (friend/friendship)
Not the sense of “morphosemantics” used in morphology for
form-meaning correspondences.
For polysemous forms, relates only sense-specific
derivationally-related forms: e.g. digest/digestion. 2 distinct
senses of the verb digest (physiological, psychological)
correspond with 2 distinct senses of the noun digestion.
How is this different from Porter & the like?
4/22/04
Porter stemmer
●
Pattern-matches on word endings
●
Misses many regular form-form correspondences
●
–
biography/biographer/biographical
–
song/sing/songster
–
deception/deceive
Conflates unrelated forms
–
amor-->amorous, amorously, amorousness, amoral,
amorally, amoralism, amorality...!
4/22/04
Derivationally-related forms of myth
mythological
mythologic
mythology
mythicise
mythologize
myth
mythicize
mythologization
mythologise
mythologisation
mythic
mythical
mythologist
4/22/04
Porter-stemmed forms of myth*
mytholog-
mythological
mythicis-
mythologic
mythology
mythicise
mythologize
myth
mythologization
mythologismyth-
mythicize
mythologise
mythologisation
mythic
mythical
mythologist
mythic-
mythologist-
4/22/04
mythologization
mythologisation
mythicize
mythologist
mythicise
mythological
mythologic
mythology, #1
mythical
mythologize
mythologise
mythology, #2
myth, n
mythic, #2
mythic, #1
4/22/04
Why should we care about the
“semantic” part?
Catvar (Habash and Dorr, 2003) clusters over 100,000
word forms into 63,000 clusters based on morphological
relatedness.
●
Viegas et al.'s (1996) lexical rules automatically derive
related words from a shared stem.
●
Neither consider polysemy—like Porter, they lump
together all forms having the same stem.
●
Relating forms that derive from different senses will affect
language understanding.
●
4/22/04
think, cogitate, cerebrate
show
evaluate, judge
reject
renounce, repudiate
refute, rebut
oppose, controvert, contradict
protest, resist, dissent
present
demo
exhibit
show
demonstrate, v
march
4/22/04
entity
abstract_entity
psychological
feature
abstraction
communication
event
act
group_action
visual
communication
resistance
activity
diversion,
recreation`
entertainment,
amusement
protest
show
presentment
manifestation
demo
demonstration, n
presentation
4/22/04
Why should we care about the
“semantic” part?
{ demonstration, demo }
{ show, demo, exhibit, present, demonstrate }
show or demonstrate something to an interested
audience;
"She shows her dogs frequently"; "We will demo
the new software in Washington"
{ demonstrate, march }
march in protest; take part in a demonstration;
"Thousands demonstrated against globalization
during the meeting of the most powerful economic
nations in Seattle”
a visual presentation showing how something
works;
"the lecture was accompanied by dramatic
demonstrations”; "the lecturer shot off a pistol
as a demonstration of the startle response"
{ demonstration, manifestation }
a public display of group feelings (usually of a
political nature);
"there were violent demonstrations against the
war"
{ presentation, presentment, demonstration }
a show or display; the act of presenting something
to sight or view;
"the presentation of new data"; "he gave the
customer a demonstration”
4/22/04
Why should we care about the
“semantic” part?
{ demonstrator }
{ show, demo, exhibit, present, demonstrate }
a teacher or teacher's assistant who demonstrates
the principles that are being taught;
show or demonstrate something to an interested
audience;
"She shows her dogs frequently"; "We will demo
the new software in Washington"
{ demonstrate, march }
march in protest; take part in a demonstration;
"Thousands demonstrated against globalization
during the meeting of the most powerful economic
nations in Seattle”
{ demonstrator, protester }
someone who participates in a public display of
group feeling;
4/22/04
Reorg of the top levels
●
●
●
All noun hierarchies are now subsumed under a
single synset, { entity }.
The reorganization brings WordNet more in
alignment with ontologies that would map into it
(e.g., SUMO)
The noun file is now structured more like a
general-purpose ontology, but be aware multiple
inheritance exists.
4/22/04
4/22/04
Disambiguation & parsing of the
glosses
●
●
●
●
●
Classic WordNet: only synsets and entry word forms participate in the
network of relations
Sense-tagging is a process of disambiguation—a word form is linked to
its context-appropriate sense (e.g., run a company vs. run a race)
Sense-tagging will do the equivalent of hyper-linking every open-class
word in the glosses to every other semantically-related word/concept in
WordNet
Will add an additional 800,000 direct links to the 120,000 bidirectional
links currently in place; the number of words/synsets indirectly
reachable will be far greater.
Disambiguated glosses will be parsed and translated into first order
predicate logic (by Jerry Hobbs at USC/ISI)
4/22/04
WNDEV
●
Current database format poses some limitations
–
Restricts the number of kinds of relations & searches possible
–
Separate files for each part of speech means that changes
affecting more than one file requires coordination among
lexicographers
–
Unstructured format means that formatting errors may not be
caught
–
Byte offset access means maintaining separate editable and
compiled versions of the data, and a lengthy grind-and-fixparse-errors process
–
Adding to the lexicon must be done manually using an arcane
syntax; no automatic means exists for loading entries
4/22/04
4/22/04
WNDEV
●
●
●
●
WordNet development environment: a suite of
tools for developing, editing, and using WordNet
SQL database & Perl library
Graphical interface for editing synsets, linking,
creating new relations, etc.
Tools for automatic loading of externally-created
entry sets, database patches, and format
conversions.
4/22/04