Tamil - cfilt - Resource Centre for Indian Language
Download
Report
Transcript Tamil - cfilt - Resource Centre for Indian Language
29 April 2013
DRAVIDIAN WORDNET
S.Arulmozi
Dravidian University
29 April 2013
Tamil Thesaurus
• Preliminary work on lexical semantics.
• Monumental work on Tamil Thesaurus.
• Ontologicial classification of Tamil Vocabulary
• Rajendran, S. (2001) tamizhc
coRkaLanjciyam. (in Tamil).Tamil University
Publication.
29 April 2013
Domains in Tamil Thesaurus
• Tamil vocabulary is classified into four
major domains:
• Entities
• Abstracts
• Events and
• Relationals
29 April 2013
Lexical Hierarchy of the Domain `Construction’
parumaippeyarkaL
`concrete nouns
'
aHRinaippeyarkaL
`irrational nouns'
uyirillaatavai
`non-living beings'
uruvaakkiya maRRum patananjceyta poruTkaL
`manufactured and processed items'
kaTTappaTTavai
`constructed'
29 April 2013
Nouns
Relations
Synonymy
Hypernymy-Hyponymy
Hyponym-Hypernymy
Holonymy-Meronymy
Meronymy-Holonymy
Related Verb
Coordinate terms
Example
viiTu ‘house’ - illam `house‘
paLLi 'school' – kalviccaalai
'educational institution‘
kalluuri 'college' –
aracukkalluuri `govt college‘
ndaaRkaali 'chair' - kaal 'leg‘
cakkaram 'wheel' to vaNTi
'cart‘
paTittal ‘reading’ – paTi ‘read’
kooyil `temple' – macuuti
'mosque'
29 April 2013
Verbs
Relations
Synonym
Hypernymy
Troponymy
Nominal
Related Noun
Example
paTi ‘read’ – payilu ‘read’
cuvai ‘taste’ – uNar
keeL ‘ask’– kenjcu ‘plead’
paruku `drink’ – parukutal `drinking’
kaNTupiTi `discover’ – kaNTupiTippu
`discovery’
29 April 2013
Tamil WordNet
Objective: To build a WordNet for Tamil to
enhance machine translation
Resources: Tamil Thesaurus, Technical
Glossaries (Tamil University Publications),
Princeton English WordNet
Funding Agency: Tamil Software Development
Fund, Tamil Virtual University - 4 lacs
Time Frame: 18 months
29 April 2013
Details
Software
used
– Java
Back-end - Mysql Database
Front-end
Project
50k
Deliverables
root words
Relationships coded
Stand-alone and web-based interface
Embedded morphological analyser
29 April 2013
Statistics
Total Words: 50497
Unique Senses:
41013
Nouns: 46710
Verbs: 2881
Adjectives: 416
Adverbs: 490
29 April 2013
Total Words: 50497
Unique Senses: 41013
50000
45000
40000
35000
30000
25000
20000
15000
10000
5000
0
Total Words
Unique Senses (Tokens)
Nouns
Verbs
Adjectives
Adverbs
Project Completed (2004)
http://www.nrcfosshelpline.in/code/wiki/TamilWordnet
29 April 2013
29 April 2013
Standalone version – Tamil WordNet (Snapshot)
29 April 2013
Standalone version – Tamil WordNet (Snapshot)
29 April 2013
Web-version – Tamil WordNet (Snapshot)
29 April 2013
Web-version – Tamil WordNet (Snapshot)
29 April 2013
First Effort on Dravidian Languages
• National Workshop on WordNet for Dravidian
Languages
•2-3 June 2003
•Organized by AU-KBC Research Centre,
Chennai, Central Institute of Indian
Languages, Mysore and Tamil University.
•Hands-on experience on specified domain –
construction
•Report available on Global WordNet website
29 April 2013
MHRD Project
Creation of Machine Translation tools and resources
for English to Dravidian Languages: Pilot Study
to develop Machine Translation(MT) system and needed
linguistic resources for
English-Dravidian languages(Tamil, Malayalam, Telugu and Kannada),
This would facilitate the creation of rich educational contents in
Indian languages.
This research effort is to make all the tools and translation
system to be based on Machine Learning methodologies so
that computer graduates and other such non-linguists are able
to immediately participate in the national mission on literacy by
contributing additional tools for language translation.
29 April 2013
Modules
• Module 1: Machine Translation
• aims at developing teaching material corresponding to the tools
developed so that it can be delivered as part of undergraduate
computer science and engineering curriculum on data
mining/machine learning.
• This will ensure a critical amount of man power required for
sustaining translation effort needed for national mission on
education.
• Module 2: Training
• aims at training 500 faculties selected from across the country on
machine translation methodologies using machine learning
techniques.
• Module 3: Dravidian WordNet
• aims at developing a Dravidian WordNet required for translation.
29 April 2013
Total Budget
• IIT Bombay – 15 lacs
• Amrita University – 40 lacs
• Tamil University – 15 lacs
• University of Hyderabad – 15 lacs
• Dravidian University – 15 lacs
• Time Frame
• 12 months
• March 30, 2009 – March 29, 2010
29 April 2013
Work done
• Part of a one year Pilot project involving
Tamil, Telugu, Malayalam and Kannada
• Funding Agency: Ministry of HRD
• Duration: 18 months (July 2009-Dec 2010)
• Deliverable: 13k synsets
• 7k synsets linked to IndoWordNet,
available at
http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php
29 April 2013
Statistics on Dravidian WordNet
29 April 2013
Publications
`Tamil WordNet’, Proceedings of the Fifth Global WordNet
Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Rajendran)
`Building a WordNet’ for Dravidian Languages, Proceedings of
the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4
Feb 2010 (S.Rajendran, S.Gopakumar, V.Dhanalakshmi)
`Representation of Kinship in WordNet’, Proceedings of the 9th
International Tamil Internet Conference, Coimbatore, 23-27
June 2010 (S.Arulmozi)
`Polysemy in Tamil and other Indian Languages’, Proceedings
of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4
Feb 2010 (S.Arulmozi & Panchanan Mohanty)
`Telugu WordNet’, Proceedings of the Fifth Global WordNet
Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Arulmozi)
29 April 2013
First IndoWordNet Workshop
• Amrita University
• 11-14 June 2009
• Necessity for developing linked WordNets of different
languages of India was stressed
• Challenges such as language divergence, lexical semantics,
embedding WordNet in MT and cross-lingual search applications
can be achieved
• Participation from groups: Hindi, Marathi, Sanskrit, Nepali,
Assamese, Bodo, Manipuri, Konkani, Kashmiri, Tamil,
Telugu, Malayalam, Kannada
• Proposal on Indhradhanush
29 April 2013
Dravidian WordNet
• Present Project
• Funded by DIT.
29 April 2013
Links
Tamil WordNet – Open Source
http://www.nrcfosshelpline.in/code/wiki/TamilWordnet
VerbNet (English)
http://verbs.colorado.edu/~mpalmer/projects/verbnet.html
Princeton English WordNet
http://wordnet.princeton.edu/
Global WordNet Association
http://www.globalwordnet.org/
WordNets in the World
http://www.globalwordnet.org/gwa/wordnet_table.htm
WordNet Bibliography
http://lit.csci.unt.edu/~wordnet/
IndoWordNet
http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php
29 April 2013
Thank you!