Word Net and Extended WordNet
Download
Report
Transcript Word Net and Extended WordNet
WordNet and Extended WordNet
Sriram Rajaraman
23- November-09
1
WordNet and eXtended WordNet
Objective
Introduce the idea of an semantic lexicon
ontology, especially WordNet and
eXtended WordNet
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
2
WordNet and eXtended WordNet
Focus
Introduction
WordNet
eXtended WordNet
Summary
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
3
WordNet and eXtended WordNet
Reference
1.
2.
3.
4.
5.
6.
WordNet: http://wordnet.princeton.edu/
eXtended WordNet: http://xwn.hlt.utdallas.edu/
Christiane Fellbaum,MIT ,”WordNet : an electronic lexical database”, MIT
Press, 1999, c1998.
George A. Miller, Richard Beckwith, Christiane Fellbaum,Derek Gross, and
Katherine Miller, “Introduction to WordNet: An On-line Lexical Database”,
core working paper
Rada Mihalcea, Dan I. Moldovan,” eXtended WordNet: progress report ”
Proceedings of NAACL Workshop on WordNet and Other Lexical
Resources , 2001
Sanda M. Harabagiu, George A. Miller, Dan I. Moldovan, “WordNet 2 - A
Morphologically and Semantically Enhanced Resource”, SIGLEX 1999
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
4
WordNet and eXtended WordNet
Focus
Introduction
WordNet
eXtended WordNet
Summary
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
5
WordNet and eXtended WordNet
Introduction
Traditional Dictionary
What is available:
spelling
pronunciation
inflected and derivative forms
etymology
part of speech
definitions
illustrative uses of alternative senses
synonyms and antonyms
special usage notes
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
6
WordNet and eXtended WordNet
Tree
Ref: http://www.merriam-webster.com/dictionary/Tree
Main Entry: tree
Pronunciation: \ˈtrē\
Function: noun
Etymology: Middle English, from Old English
trēow; akin to Old Norse trē tree, Greek drys,
Sanskrit dāru wood
Date: before 12th century
- a woody perennial plant having a single
usually elongate main stem generally with few
or no branches on its lower part
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
7
WordNet and eXtended WordNet
Drawback of traditional
dictionary
What is missing:
It does not say, for example, that trees have roots, or that they
consist of cells having cellulose walls, or even that they are
living organisms
“Sense” of the super ordinate term aka hypernym (living plant
or industrial plant)
Coordinate terms (bushes, shrubs, …)
Hyponyms - types of trees (pine, tropical,deciduous..)
Information assumed to be known to everyone ( trees have barks
and leaves, they grow from seeds, they make their own food by
photosynthesis- probably information for encyclopedia!)
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
8
WordNet and eXtended WordNet
How can we improve ?
The missing information is structural – every
word points upwards to its super-ordinate
(hypernym), but not sideward to its co-ordinates
or downward to the hyponym.
Restriction due to alphabetical ordering, budget
and size constraints- which can be overcome in
an electronic lexical database
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
9
WordNet and eXtended WordNet
Focus
Introduction
WordNet
eXtended WordNet
Summary
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
10
WordNet and eXtended WordNet
What is WordNet?
WordNet is a lexical database for the English language.
WordNet 3.0 has [1]:
– 117,097 nouns (average noun has 1.23 senses)
– 11,488 verbs (average verb has 2.16 sense)
– 22,141 adjectives
– 4,601 adverbs
Created and maintained at the Cognitive Science Laboratory of Princeton University
Accessible online @
http://wordnetweb.princeton.edu/perl/webwn
(Also Downloadable)
Interfaces available in , c, dot Net , java, perl, php, python, sql etc..(JWNL,
WordNet.Net, RTiA wordNet, pywordne ..)
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
11
WordNet and eXtended WordNet
WordNet Structure
Words are organized as synsets in
WordNet
There are four disjoint kinds of synsets,
containing either
23- November-09
Nouns
verbs
Adjectives
Adverbs
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
12
WordNet and eXtended WordNet
What is a synset?
Basic unit of WordNet
A group of synonymous words which refer to
a common semantic concept
Words may belong to more than one synset –
first sense is the most frequent sense
Words also include collocations (“eye
contact’, “mix up”)
Example
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
13
WordNet and eXtended WordNet
Synset example
“car” as in
23- November-09
{car, auto, automobile, machine, motorcar}
{car, railcar, railway car, railroad car}.
“Chocolate” as in-
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
14
WordNet and eXtended WordNet
How are synsets related?
A list of pointers associated with each sysnet to
express the relationship between synsets
WordNet defines 17 relations
10 between synsets
5 between wordsense
"gloss" (between a synset and a sentence, i.e a textual
definition for each synset)
"frame" (between a synset and a verb construction
pattern)
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
15
WordNet and eXtended WordNet
WordNet relations
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
16
WordNet and eXtended WordNet
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
17
WordNet and eXtended WordNet
Applications of WordNet
Information Extraction
Information Retreival
Question Answering
Word Sense Disambiguation
Text Inference
Coreference, coherence and metonymy
Knowledge acquisition
Internet Search engine
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
18
WordNet and eXtended WordNet
Limitations of WordNet
Designed as a semantic lexicon, not a
knowledge base
Limited connections between topically
related words
Lack of morphological relationship(special
algorithm does that)
Lack of selectional restriction
And more…. [6]
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
19
WordNet and eXtended WordNet
Focus
Introduction
WordNet
eXtended WordNet
Summary
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
20
WordNet and eXtended WordNet
eXtended WordNet[2]
A project at the Human Language Technology
Research Institute , at The University of Texas at
Dallas(http://xwn.hlt.utdallas.edu)
Provides several important enhancements (over
WordNet2.0) intended to remedy the present
limitations of WordNet
Current Version: eXtended WordNet 2.0 (xwn
2.0-1.1)
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
21
WordNet and eXtended WordNet
Objective of eXtended WordNet
Exploit the rich information, available in synset
glosses (gloss is a sentence, i.e a textual
definition for each synset)
Semantic and logical enhancements to WordNet
Increase the connectivity among the synsets by at
least one order of magnitude
Enable access to a broader context for each
concept
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
22
WordNet and eXtended WordNet
What eXtended WordNet does?[5]
Preprocessing and Parsing
Word Sense Disambiguation
All words in a gloss is tagged with appropriate senses
and linked to corresponding synsets
Logical Form Transformation
Separation of glosses into definition and examples,
tokenization and identification of compound words
Gloss Logical Forms
Topical Relations
Connections are established between the words,
based on the context/topic
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
23
WordNet and eXtended WordNet
Extended WordNet
“Tennis court: A court on which tennis is played.”
tennis court
def
court location-of play
object
tennis
{“tennis”,
“lawn tennis”}
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
24
WordNet and eXtended WordNet
eXtended WordNet format
Consists of four XML files--one for each
part of speech:
Noun
Verb
Adjective
Adverb
The xml tags contains attributes that
specify the relationships
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
25
WordNet and eXtended WordNet
eXtended WordNet- Applications
Core Knowledge Base for applications
23- November-09
Question Answering
Information Retrieval
Information Extraction
Summarization
Natural Language Generation
Inferences
Other knowledge intensive applications
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
26
WordNet and eXtended WordNet
Focus
Introduction
WordNet
eXtended WordNet
Summary
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
27
WordNet and eXtended WordNet
Further Reading
W3C- RDF/OWL Representation of WordNet
eXtended WordNet Format/algorithm
http://xwn.hlt.utdallas.edu/wsd.html
Current research at Princeton
http://www.w3.org/TR/wordnet-rdf/
http://wordnet.cs.princeton.edu/projects.html
Related Projects (APIs, Web Interface, Extension)
http://wordnet.princeton.edu/wordnet/related-projects/
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
28
WordNet and eXtended WordNet
Back up
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
29