Presentation_Hao_Li - Programming Systems Lab

Download Report

Transcript Presentation_Hao_Li - Programming Systems Lab

WordNet ® and its Java API

♦ Introduction to WordNet

♦ WordNet API for Java


Name: Hao Li
Uni: hl2489
Introduction to WordNet ®
1.WordNet® is a large lexical database of English. It is kind of a dictionary. It is
developed by Cognitive Science Laboratory of Priceton University.
2.In WordNet, Nouns, verbs, adjectives and adverbs are grouped into sets of
cognitive synonyms (synsets), each expressing a distinct concept.
3.In WordNet, Synsets are interlinked by means of conceptual-semantic and
lexical relations.
4.WordNet is freely and publicly available for download and also have APIs for
different programming languages. WordNet's structure makes it a useful tool for
computational linguistics and natural language processing.
WordNet API for JAVA(1)

Method Summary of Class WordNetDatabase
― abstract String[] getBaseFormCandidates(String inflection, SynsetType type)
Returns lemma representing word forms that might be present in WordNet.
√ static WordNetDatabase getFileInstance()
Returns an implementation of this class that can access the WordNet database by searching files
on the local file system.
― Synset[] getSynsets(String wordForm)
Returns all synsets that contain the specified word form or a morphological variation of that
word form.
√ Synset[] getSynsets(String wordForm, SynsetType type)
Returns only the synsets of a particular type (e.g., noun) that contain a word form or
morphological variation of that form.
― abstract Synset[] getSynsets(String wordForm, SynsetType type, boolean useMorphology)
Returns only the synsets of a particular type (e.g., noun) that contain a word form matching the
specified text or one of that word form's variants.
WordNet API for JAVA(2)
•
Method Summary of Calss Synset
― WordSense[] getAntonyms(String wordForm)
Returns the antonyms (words with the opposite meaning), if any, associated with a word form in this synset.
√ String getDefinition()
Retrieve a short description / definition of this concept.
― WordSense[] getDerivationallyRelatedForms(String wordForm)
Returns word forms that derivationally related to the one specified.
√ int getTagCount(String wordForm)
Returns a number that's intended to provide an approximation of how frequently the specified word form is
used to represent this meaning relative to how often it's used to represent other meanings.
― SynsetType getType()
Retrieve the type of synset this object represents.
― String[] getUsageExamples()
Retrieve sentences showing examples of how this synset is used.
√ String[] getWordForms()
Retrieve the word forms.
Method used in the project(1)
WordNetDatabase.getSynsets(String wordForm, SynsetType type)

Take word “pig” as example:
Synset[0]=Noun@2395406[hog,pig,grunter,squealer,Sus scrofa] - domestic swine
Synset[1]=Noun@10612210[slob,sloven,pig,slovenly person] - a coarse obnoxious person
Synset[2]=Noun@10179649[hog,pig] - a person regarded as greedy and pig-like
Synset[3]=Noun@9879144[bull,cop,copper,fuzz,pig] - uncomplimentary terms for a policeman
Synset[4]=Noun@3935116[pig bed,pig] - mold consisting of a bed of sand in which pig iron is cast
Synset[5]=Noun@3934998[pig] - a crude block of metal (lead or iron) poured from a smelting furnace
Method used in the project(2)
Synset. getDefinition()
Take Synset[0] of word “pig” as example:
domestic swine
Method used in the project(3)
Synset.getTagCount(String wordForm)

It is a very useful method. It represent the frequency of the specified word used to
represent this meaning relative to how often it's used to represent other meanings.

This method has two usage according to my understanding:
(1)Analyse the same word of its different synets.
(2) Analyse different words of the same synset.
Analyse the same word of its different synets.
Synset.getTagCount(String wordForm)
The results shows us which meaning of the word is more frequently used.
For example:
The frequemcy of the word “bridge” in the following synset is 4.
Synset[0]=Noun@2898711[bridge,span] - a structure that allows people or vehicles to cross an
obstacle such as a river or canal or railway etc.
And in another synset of “bridge” is 1.
Synset[4]=Noun@490569[bridge] - any of various card games based on whist for four players
The above example means when people talk about “bridge”, it is more likely about a structure
“bridge ”than the card game “bridge”.
Analyse different words of the same synset.
Synset.getTagCount(String wordForm)
The result shows us in order to express a definition, which word is more
accurate and will not cause word sense ambiguation.
For example:
In a synset of the word “java”
Synset=Noun@7929519[coffee,java] - a beverage consisting of an infusion of ground coffee beans
The frequency of the word “coffee” is 46 and the word “java” is 1 .
It means “coffee” is more representative in the meaning of “a beverage consisting of an infusion
of ground coffee beans” than the word “java”.
when people talks about “coffee”, you will understand they are talking about “a beverage
consisting of an infusion of ground coffee beans” but not other meanings. And when people talks
about “java”, they may talk about the beverage or the programming language “java”.
Conclusion
1.There are two purpose of WordNet application: one is to produce a combination of
dictionary and thesaurus that is more intuitively usable, and the other is to support
automatic text analysis and artificial intelligence applications.
2.Because of its features, WordNet is now videly used in information systems, including
word sense disambiguation, information retrieval, automatic text classification,
automatic text summarization, and even automatic crossword puzzle generation.
And it is also used in our project!
I will tell you -------------what our WordNet based algorithm is in demo next week .
Thank you!