WordNet - bYTEBoss

Download Report

Transcript WordNet - bYTEBoss

WordNet and Extended WordNet
Sriram Rajaraman
23- November-09
1
WordNet and eXtended WordNet
Objective

Introduce the idea of an semantic lexicon
ontology, especially WordNet and
eXtended WordNet
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
2
WordNet and eXtended WordNet
Focus




Introduction
WordNet
eXtended WordNet
Summary
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
3
WordNet and eXtended WordNet
Reference
1.
2.
3.
4.
5.
6.
WordNet: http://wordnet.princeton.edu/
eXtended WordNet: http://xwn.hlt.utdallas.edu/
Christiane Fellbaum,MIT ,”WordNet : an electronic lexical database”, MIT
Press, 1999, c1998.
George A. Miller, Richard Beckwith, Christiane Fellbaum,Derek Gross, and
Katherine Miller, “Introduction to WordNet: An On-line Lexical Database”,
core working paper
Rada Mihalcea, Dan I. Moldovan,” eXtended WordNet: progress report ”
Proceedings of NAACL Workshop on WordNet and Other Lexical
Resources , 2001
Sanda M. Harabagiu, George A. Miller, Dan I. Moldovan, “WordNet 2 - A
Morphologically and Semantically Enhanced Resource”, SIGLEX 1999
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
4
WordNet and eXtended WordNet
Focus

Introduction

WordNet
eXtended WordNet
Summary


23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
5
WordNet and eXtended WordNet
Introduction


Traditional Dictionary
What is available:









spelling
pronunciation
inflected and derivative forms
etymology
part of speech
definitions
illustrative uses of alternative senses
synonyms and antonyms
special usage notes
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
6
WordNet and eXtended WordNet
Tree
Ref: http://www.merriam-webster.com/dictionary/Tree






Main Entry: tree
Pronunciation: \ˈtrē\
Function: noun
Etymology: Middle English, from Old English
trēow; akin to Old Norse trē tree, Greek drys,
Sanskrit dāru wood
Date: before 12th century
- a woody perennial plant having a single
usually elongate main stem generally with few
or no branches on its lower part
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
7
WordNet and eXtended WordNet
Drawback of traditional
dictionary

What is missing:





It does not say, for example, that trees have roots, or that they
consist of cells having cellulose walls, or even that they are
living organisms
“Sense” of the super ordinate term aka hypernym (living plant
or industrial plant)
Coordinate terms (bushes, shrubs, …)
Hyponyms - types of trees (pine, tropical,deciduous..)
Information assumed to be known to everyone ( trees have barks
and leaves, they grow from seeds, they make their own food by
photosynthesis- probably information for encyclopedia!)
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
8
WordNet and eXtended WordNet
How can we improve ?


The missing information is structural – every
word points upwards to its super-ordinate
(hypernym), but not sideward to its co-ordinates
or downward to the hyponym.
Restriction due to alphabetical ordering, budget
and size constraints- which can be overcome in
an electronic lexical database
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
9
WordNet and eXtended WordNet
Focus

Introduction

WordNet

eXtended WordNet
Summary

23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
10
WordNet and eXtended WordNet
What is WordNet?

WordNet is a lexical database for the English language.

WordNet 3.0 has [1]:

– 117,097 nouns (average noun has 1.23 senses)

– 11,488 verbs (average verb has 2.16 sense)

– 22,141 adjectives

– 4,601 adverbs

Created and maintained at the Cognitive Science Laboratory of Princeton University

Accessible online @
http://wordnetweb.princeton.edu/perl/webwn
(Also Downloadable)

Interfaces available in , c, dot Net , java, perl, php, python, sql etc..(JWNL,
WordNet.Net, RTiA wordNet, pywordne ..)
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
11
WordNet and eXtended WordNet
WordNet Structure


Words are organized as synsets in
WordNet
There are four disjoint kinds of synsets,
containing either




23- November-09
Nouns
verbs
Adjectives
Adverbs
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
12
WordNet and eXtended WordNet
What is a synset?





Basic unit of WordNet
A group of synonymous words which refer to
a common semantic concept
Words may belong to more than one synset –
first sense is the most frequent sense
Words also include collocations (“eye
contact’, “mix up”)
Example
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
13
WordNet and eXtended WordNet
Synset example

“car” as in



23- November-09
{car, auto, automobile, machine, motorcar}
{car, railcar, railway car, railroad car}.
“Chocolate” as in-
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
14
WordNet and eXtended WordNet
How are synsets related?


A list of pointers associated with each sysnet to
express the relationship between synsets
WordNet defines 17 relations




10 between synsets
5 between wordsense
"gloss" (between a synset and a sentence, i.e a textual
definition for each synset)
"frame" (between a synset and a verb construction
pattern)
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
15
WordNet and eXtended WordNet
WordNet relations
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
16
WordNet and eXtended WordNet
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
17
WordNet and eXtended WordNet
Applications of WordNet








Information Extraction
Information Retreival
Question Answering
Word Sense Disambiguation
Text Inference
Coreference, coherence and metonymy
Knowledge acquisition
Internet Search engine
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
18
WordNet and eXtended WordNet
Limitations of WordNet





Designed as a semantic lexicon, not a
knowledge base
Limited connections between topically
related words
Lack of morphological relationship
(special algorithm does that)
Lack of selectional restriction
And more…. [6]
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
19
WordNet and eXtended WordNet
Focus

Introduction
WordNet

eXtended WordNet

Summary

23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
20
WordNet and eXtended WordNet
eXtended WordNet[2]

A project at the Human Language Technology
Research Institute , at The University of Texas at
Dallas(http://xwn.hlt.utdallas.edu)

Provides several important enhancements (over
WordNet2.0) intended to remedy the present
limitations of WordNet

Current Version: eXtended WordNet 2.0 (xwn
2.0-1.1)
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
21
WordNet and eXtended WordNet
Objective of eXtended WordNet




Exploit the rich information, available in synset
glosses (gloss is a sentence, i.e a textual
definition for each synset)
Semantic and logical enhancements to WordNet
Increase the connectivity among the synsets by at
least one order of magnitude
Enable access to a broader context for each
concept
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
22
WordNet and eXtended WordNet
What eXtended WordNet does?[5]

Preprocessing and Parsing


Word Sense Disambiguation


All words in a gloss are tagged with appropriate
senses and linked to corresponding synsets
Logical Form Transformation


Separation of glosses into definition and examples,
tokenization and identification of compound words
Gloss  Logical Forms
Topical Relations

Connections are established between the words,
based on the context/topic
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
23
WordNet and eXtended WordNet
Extended WordNet
“Tennis court: A court on which tennis is played.”
tennis court
def
court location-of play
object
tennis
{“tennis”,
“lawn tennis”}
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
24
WordNet and eXtended WordNet
eXtended WordNet format

Consists of four XML files--one for each
part of speech:





Noun
Verb
Adjective
Adverb
The xml tags contain attributes that specify
the relationships
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
25
WordNet and eXtended WordNet
eXtended WordNet- Applications

Core Knowledge Base for applications 






23- November-09
Question Answering
Information Retrieval
Information Extraction
Summarization
Natural Language Generation
Inferences
Other knowledge intensive applications
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
26
WordNet and eXtended WordNet
Focus

Introduction
WordNet
eXtended WordNet

Summary


23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
27
WordNet and eXtended WordNet
Further Reading

W3C- RDF/OWL Representation of WordNet


eXtended WordNet Format/algorithm


http://xwn.hlt.utdallas.edu/wsd.html
Current research at Princeton


http://www.w3.org/TR/wordnet-rdf/
http://wordnet.cs.princeton.edu/projects.html
Related Projects (APIs, Web Interface, Extension)

http://wordnet.princeton.edu/wordnet/related-projects/
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
28
WordNet and eXtended WordNet

Back up
23- November-09
University of Texas at Dallas
Erik Jonnson School of Engineering and Computer
Science
29