Machine Translation marazI to UNL

Download Report

Transcript Machine Translation marazI to UNL

Machine Translation
marazI to UNL
Presented by
Ashwini, Salil
Center for Indian Language Technology Solutions
CSE, IIT Powai
Characteristics of marazI
a. Syntactic structure
–
Subject-object-verb
e.g. rama Baat Katao.
–
Similarity with Hindi
b. Morphology
– P`a%yaya
–
Differences with Hindi
Main tasks
1. Marathi-UW dictionary building
2. Rulebase building for converting Marathi
language phenomenon to UNL expressions
3. Testing using corpus sentences
4. Verification with Hindi and Marathi
deconverters.
Analysis consists of
•
•
•
•
Morphology
Syntax
Semantics
Pragmatics
Marathi analysis done so far
We focus on Marathi morphology
•
•
•
•
•
Noun morphology
Pronoun morphology click
Verb morphology click
Relation label morphology click
Adjective morphology click
Types of adjectives in Marathi
1. Pronounic adjectives
1.1 Pronoun adjectives: The nine pronouns being
used as adjectives.
1.2 Adjectives derived from the nine pronouns
2. Qualitative adjectives
2.1 Adjectives ending with vowel +É
2.2 Adjectives ending with vowels other than +É
2.3 Postposition adjectives
Type of adjectives [contd.]
3. Numerical adjectives
• 3.1 Cardinal
3.1.1 (whole number)
3.1.2 (fractional number)
3.1.3 (entirety, totality, completeness)
• 3.2 Ordinal
• 3.3 Occurrencial
6 types
• 3.4 Distinctive
[pAvaNedonashe] means 175 or 199.75?
- There is no word assigned to 199.75, 299.75, etc.
- the problems with paun, pauvane and savva.
- (pAvaNedon) times 100 (she). she and shambhar,
both mean 100. pAUNashe means 75.
pAvaNeshambhar means 99.75.
- The powers of ten for which there is a distinct
word in Marathi need to be stored separately.
- pronunciation is not
pAvaNedona-[pause]-she but
pAvaNe
-[pause]-donashe
Tables of numbers:
continous and random access.
• Some forms of numbers are used for verbalizing
the tables of numbers: ºÉÉiÉ / ºÉÉiÉÉ / ºÉÉiÉä / ºÉÉiÉÒä /
ºÉiiÉä.
• Marathi: A, B times, (is C), occurring in the table
for A. English: B A’s (are C).
• Usage of forms: 1. only for the expression ‘A’
2. only for ‘B times’ 3. only while recalling the
number directly without going through the table.
• Some forms occur especially for square. The
repetition is emphasized.
words used to familiarise a child
with numbers
• Some words are used mostly to familiarise a
child with numbers: BEÒ BE, nÖEÔ nÉäxÉ, ÊiÉEÔ
iÉÒxÉ, etc. The similarity of each word with
the number is used to help a child remember
the number. The words used as familiarisers
are: BEÒ, nÖEÔ, ÊiÉEÔ, SÉÉèEÒ, {ÉÉSÉÒ, ºÉɽÒ,
ºÉÉiÉÒä, +É`Ò, xÉ´Éä, nɽÒ.
playing cards and game of
cricket
1. playing cards:
ekka, durri / durra, tirri / tirra, chavvi / chouka,
panji / panja, chhakki / chhakka, satti / satta,
atthi / attha, navvi / nashsha, dashshi / dashsha.
2. shots scoring multiple runs in the game of cricket:
SÉÉèEÉ®, ¹É]EÉ®.
The current status of dictionary
Number of entries 375
•Dictionary click
•Nouns
•Noun morphology suffixes
•Verbs
•Verb morphology suffixes
The current status of rulebase
Number of rules is 1050.
• Verb morphology (Simple and conjunct verbs)
– Tense (Past, Present, Future)
– Aspect of tense (Progress, complete, custom)
– Voice (Passive voice)
– +lÉÇ (imperative, should, negative)
– Ability, intention etc. for conjunct verbs only.
The current status of rulebase
[contd.]
• Noun morphology
– Number
– With case marker (ºÉɨÉÉxªÉ° {É)
• Case when penultimate vowel is either
> or <Ç
e.g. ¨ÉÚ±É - ¨ÉÖ±Éä (Plural)
The current status of rulebase
[contd.]
• Relation labels used so far
agt, obj, gol, aoj, and, or
e.g.
¨ÉÖ±ÉÉÆxÉÒ +ÉƤÉä JÉɱ±Éä xÉ´½iÉäÃ.
obj(eat(icl>do).@entry.@pred.@past.@not.
@complete, mango(icl>fruit):08.@pl)
agt(eat(icl>do).@entry.@pred.@past.@not.
@complete, child(icl>person):00.@pl)
Plans
• Adjective morphology
• Pronoun morphology
• Relation labels handling for corpus
sentences.
For simple sentence only.
THANK YOU
References:
•Damle, Moro Keshav (1970). Shastriya marathi
vyakarana. [SaswrIya marATI vyAkaraNa]. (Ed: K.
S. Arjunwadkar). Pune: Deshmukh & Co.
•Meying, Zhu (2000) EnConverter specifications,
version 2.1. Tokyo: UNU/IAS/UNL Center.
• Meying, Zhu (2002) UNL specifications, version
3 edition 1. Tokyo: UNU/IAS/UNL Center.
•Valambe, M. R. (2001) Sugam marathi vyakaran
lekhan [sugama marATI vyAkaraNa leKana]. Pune:
Nitin Prakashan.