Transcript document

CSCE 771
Natural Language Processing
Lecture 18
Ontologies and Wordnet
Topics



Ontologies
Wordnet
Overview of Meaning
Readings:
 Text 13.5
 NLTK book Chapter 2
March 25, 2013
Overview
Last Time (Programming)




Chunking
Chunking with NLTK
HW 5
Project Ideas
Today

app.ChunkParser under NLTK
Readings:



Chapter 7
http://www.nltk.org/howto
http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html
Next Time:
–2–
CSCE 771 Spring 2013
Ontologies – the old meaning
http://www.merriam-webster.com/dictionary/ontology
1. : a branch of metaphysics concerned with the nature
and relations of being
2. : a particular theory about the nature of being or the
kinds of things that have existence
–3–
CSCE 771 Spring 2013
Ontologies – the new (CS) meaning
http://en.wikipedia.org/wiki/Ontology_(information_science)
“In computer science and information science, an ontology
formally represents knowledge as a set of concepts
within a domain, and the relationships between pairs of
concepts.”
"Toward Principles for the Design of Ontologies Used for
Knowledge Sharing" by Tom Gruber 1993
• “An ontology is a formal, explicit specification of a
shared conceptualization.”
–4–
CSCE 771 Spring 2013
http://en.wikipedia.org/wiki/Ontology_(information_science)
Gruber elaborating
"An ontology is a description (like a formal
specification of a program) of the concepts and
relationships that can formally exist for an agent or a
community of agents. This definition is consistent
with the usage of ontology as set of concept
definitions, but more general. And it is a different
sense of the word than its use in philosophy."[8]
Gruber 2001 “
–5–
CSCE 771 Spring 2013
Focus Levels of Ontologies
Generic
Core
Domain
Task
Application
–6–
CSCE 771 Spring 2013
Examples of in-use Ontologies
Medical
• UMLS
• SNOMED-RT,
• GALEN,
• MEDLINE
Linguistics
• Wordnet Miller Princeton 1990s
• Gold http://linguistics-ontology.org/
–7–
CSCE 771 Spring 2013
Early OWL versions
OWL provides three increasingly expressive
sublanguages
1. OWL Lite supports those users primarily needing a
classification hierarchy and simple constraints
2. OWL DL supports those users who want the
maximum expressiveness while retaining


computational completeness (all conclusions are
guaranteed to be computable) and
decidability (all computations will finish in finite time).
3. OWL Full is meant for users who want maximum
expressiveness and the syntactic freedom of RDF
with no computational guarantees
–8–
CSCE 771 Spring 2013
http://www.w3.org/TR/2004/REC-owl-features-20040210/#s1.3
Owl 2.0
The OWL 2 Web Ontology Language, informally OWL 2,
is an ontology language for the Semantic Web with
formally defined meaning.
OWL 2 ontologies provide classes, properties,
individuals, and data values and are stored as
Semantic Web documents.
OWL 2 ontologies can be used along with information
written in RDF, and OWL 2 ontologies themselves are
primarily exchanged as RDF documents.
–9–
http://www.w3.org/TR/owl2-overview/
CSCE 771 Spring 2013
Owl 2 relationships to other languages
– 10 –
http://www.w3.org/TR/owl2-overview/#Semantics
CSCE 771 Spring 2013
ontology tools - Editors
Editors – protégé http://protege.stanford.edu/
– 11 –
CSCE 771 Spring 2013
Semantic Web
Web – static web pages +
Web 2.0 - http://en.wikipedia.org/wiki/Web_2.0 ~1999
Semantic Web
"The Semantic Web is not a separate Web but an extension of the
current one, in which information is given well-defined
meaning, better enabling computers and people to work in
cooperation." It is a source to retrieve information from the web
(using the web spiders from RDF files) and access the data
through Semantic Web Agents or Semantic Web Services.
Source: "The Semantic Web" by Tim Berners-Lee, James
Hendler, and Ora Lassila, Scientific American, 2001
– 12 –
CSCE 771 Spring 2013
Basic NLTK Corpus Functionality
Example
fileids()
fileids([categories])
categories()
categories([fileids])
raw()
raw(fileids=[f1,f2,f3])
raw(categories=[c1,c2])
words()
words(fileids=[f1,f2,f3])
words(categories=[c1,c2])
sents()
sents(fileids=[f1,f2,f3])
sents(categories=[c1,c2])
abspath(fileid)
encoding(fileid)
open(fileid)
root()
–readme()
13 –
Description
the files of the corpus
the files of the corpus corresponding to these categories
the categories of the corpus
the categories of the corpus corresponding to these files
the raw content of the corpus
the raw content of the specified files
the raw content of the specified categories
the words of the whole corpus
the words of the specified fileids
the words of the specified categories
the sentences of the whole corpus
the sentences of the specified fileids
the sentences of the specified categories
the location of the given file on disk
the encoding of the file (if known)
open a stream for reading the given corpus file
the path to the root of locally installed corpus
the contents of the README file of the corpus
Reference: NLTK Book Chapter 2
CSCE 771 Spring 2013
More from Chapter 2 of NLTK Book
2.2 Conditional Frequency Distributions
•
•
•
•
Conditions and Events
Counting Words by Genre
Plotting and Tabulating Distributions
Generating Random Text with Bigrams
2.3 More Python: Reusing Code
•
•
Functions
Modules
2.4 Lexical Resources
•
•
•
•
Wordlist Corpora
A Pronouncing Dictionary
Comparative Wordlists
Shoebox and Toolbox Lexicons
2.5 WordNet
– 14 –
Reference: NLTK Book Chapter 2
CSCE 771 Spring 2013
Wordnet
George Miller Princeton University
NLTK includes the English WordNet, with 155,287 words
and 117,659 synonym sets
Links:
• http://en.wikipedia.org/wiki/WordNet
• http://wordnet.princeton.edu/
•
http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html
– 15 –
Reference: NLTK Book Chapter 2
CSCE 771 Spring 2013
WordNet
WordNet distinguishes between nouns, verbs,
adjectives and adverbs—it does not include
prepositions, determiners etc.
Every synset contains a group of synonymous words or
collocations
Different senses of a word are in different synsets.
– 16 –
CSCE 771 Spring 2013
Nouns in Wordnet
hypernyms: Y is a hypernym of X if every X is a (kind of)
Y (canine is a hypernym of dog, because every dog
is a member of the larger category of canines)
hyponyms: Y is a hyponym of X if every Y is a (kind of)
X (dog is a hyponym of canine)
coordinate terms: Y is a coordinate term of X if X and Y
share a hypernym (wolf is a coordinate term of dog,
and dog is a coordinate term of wolf)
holonym: Y is a holonym of X if X is a part of Y (building
is a holonym of window)
meronym: Y is a meronym of X if Y is a part of X
(window is a meronym of building)
– 17 –
CSCE 771 Spring 2013
Verbs in Wordnet
hypernym: the verb Y is a hypernym of the verb X if the
activity X is a (kind of) Y (to perceive is an hypernym
of to listen)
troponym: the verb Y is a troponym of the verb X if the
activity Y is doing X in some manner (to lisp is a
troponym of to talk)
entailment: the verb Y is entailed by X if by doing X you
must be doing Y (to sleep is entailed by to snore)
coordinate terms: those verbs sharing a common
hypernym (to lisp and to yell)
– 18 –
CSCE 771 Spring 2013
Adjectives/Adverbs in Wordnet
Adjectives
• related nouns
• similar to
• participle of verb
Adverbs
• root adjectives
– 19 –
CSCE 771 Spring 2013
Knowledge Structure Example
defined by hypernym or IS A relationships
Example:
dog, domestic dog, Canis familiaris
=> canine, canid
=> carnivore
=> placental, placental mammal, eutherian mammal
=> mammal
=> vertebrate, craniate
=> chordate
=> animal, animate being, beast, brute, creature, fauna
=> ...
– 20 –
CSCE 771 Spring 2013
Hypernym/Hyponym
Inverse relations
Hyponym == ISA
Hypernym == “contains the subset”
Examples
•
car is a hyponym of vehicle  vehicle is a hypernym of car
•
Dog is hyponym of animal
•
Sometimes superordinate used instead of hypernym
– 21 –
 animal is a hypernym of dog
CSCE 771 Spring 2013
WordNet as an ontology
Hyponym == ISA
Meronymy – part of relation
wheel part of car  wheel is meronymy of car
Holnym inverse of meronymy
– 22 –
CSCE 771 Spring 2013
Senses and Synonyms
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('motorcar')
[Synset('car.n.01')]
one meaning the first(01) noun sense(n) of car
>>> wn.synset('car.n.01').lemma_names
['car', 'auto', 'automobile', 'machine', 'motorcar']
synonymous words (or "lemmas")
– 23 –
Reference: NLTK Book Chapter 2
CSCE 771 Spring 2013
Definitions and examples
>>> wn.synset('car.n.01').definition
'a motor vehicle with four wheels; usually propelled
by an internal combustion engine'
>>> wn.synset('car.n.01').examples
['he needs a car to get to work']
– 24 –
Reference: NLTK Book Chapter 2
CSCE 771 Spring 2013
>>> wn.synsets('car')
[Synset('car.n.01'), Synset('car.n.02'),
Synset('car.n.03'), Synset('car.n.04'),
Synset('cable_car.n.01')]
>>> for synset in wn.synsets('car'):
... print synset.lemma_names
...
['car', 'auto', 'automobile', 'machine', 'motorcar']
['car', 'railcar', 'railway_car', 'railroad_car']
['car', 'gondola']
['car', 'elevator_car']
– 25 –
Reference:
['cable_car',
'car'] NLTK Book Chapter 2
CSCE 771 Spring 2013
The WordNet Hierarchy
Hypernyms (up)
Hyponyms (down)
Meronymscomponents
holonyms - things they
are contained in
– 26 –
Reference: NLTK Book Chapter 2
CSCE 771 Spring 2013
Synonyms and Lemmas
>>> motorcar = wn.synset('car.n.01')
>>> types_of_motorcar = motorcar.hyponyms()
>>> types_of_motorcar[26] Synset('ambulance.n.01')
>>> sorted([lemma.name for synset in
types_of_motorcar for lemma in synset.lemmas])
['Model_T', 'S.U.V.', 'SUV', 'Stanley_Steamer',
'ambulance', 'beach_waggon', … ]
– 27 –
Reference: NLTK Book Chapter 2
CSCE 771 Spring 2013
Meronyms and Holonyms
>>> wn.synset('tree.n.01').part_meronyms()
[Synset('burl.n.02'), Synset('crown.n.07'),
Synset('stump.n.01'), Synset('trunk.n.01'),
Synset('limb.n.02')]
>>> wn.synset('tree.n.01').substance_meronyms()
[Synset('heartwood.n.01'), Synset('sapwood.n.01')]
>>> wn.synset('tree.n.01').member_holonyms()
[Synset('forest.n.01')]
– 28 –
Reference: NLTK Book Chapter 2
CSCE 771 Spring 2013
>>> for synset in wn.synsets('mint', wn.NOUN):
... print synset.name + ':', synset.definition
...
batch.n.02: (often followed by `of') a large number or amount or
extent
mint.n.02: any north temperate plant of the genus Mentha with
aromatic leaves and small mauve flowers
mint.n.03: any member of the mint family of plants
mint.n.04: the leaves of a mint plant used fresh or candied
mint.n.05: a candy that is flavored with a mint oil
mint.n.06: a plant where money is coined by authority of the
government
– 29 –
Reference: NLTK Book Chapter 2
CSCE 771 Spring 2013
Entailments
walking entails stepping
>>> wn.synset('walk.v.01').entailments()
[Synset('step.v.01')]
>>> wn.synset('eat.v.01').entailments()
[Synset('swallow.v.01'), Synset('chew.v.01')]
>>> wn.synset('tease.v.03').entailments()
[Synset('arouse.v.07'), Synset('disappoint.v.01')]
– 30 –
Reference: NLTK Book Chapter 2
CSCE 771 Spring 2013
Antonyms
– 31 –
Reference: NLTK Book Chapter 2
CSCE 771 Spring 2013
Semantic Similarity
>>> right = wn.synset('right_whale.n.01')
>>> orca = wn.synset('orca.n.01')
>>> minke = wn.synset('minke_whale.n.01')
>>> tortoise = wn.synset('tortoise.n.01')
>>> novel = wn.synset('novel.n.01')
>>> right.lowest_common_hypernyms(minke)
[Synset('baleen_whale.n.01')]
>>> right.lowest_common_hypernyms(orca)
[Synset('whale.n.02')]
>>> right.lowest_common_hypernyms(tortoise)
[Synset('vertebrate.n.01')]
>>> right.lowest_common_hypernyms(novel)
[Synset('entity.n.01')]
– 32 –
Reference: NLTK Book Chapter 2
CSCE 771 Spring 2013
Generality/Specificity and Depth
>>> wn.synset('baleen_whale.n.01').min_depth()
14
>>> wn.synset('whale.n.02').min_depth()
13
>>> wn.synset('vertebrate.n.01').min_depth()
8
>>> wn.synset('entity.n.01').min_depth()
0
– 33 –
CSCE 771 Spring 2013
Similarity Scores from Right Whale
>>> right.path_similarity(minke)
0.25
>>> right.path_similarity(orca)
0.16666666666666666
>>> right.path_similarity(tortoise)
0.076923076923076927
>>> right.path_similarity(novel)
0.043478260869565216
– 34 –
CSCE 771 Spring 2013
Googlecode - HowTo
http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.ht
ml
WordNet Interface
>>> from nltk.corpus import wordnet as wn
Reference: http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html
– 35 –
CSCE 771 Spring 2013