www.tjhsst.edu

Download Report

Transcript www.tjhsst.edu

Creation of a Russian-English
Translation Program
Karen Shiells
Purpose

Object-oriented approach

Interactive machine translation

Designed for aid, not independent translation

Explore algorithms used in machine translation

Identify grammatical obstacles to translation

Create a base to expand later
Scope of Study

Machine translation is and will be imperfect

Modern translation uses statistical methods

Project is limited to:

Separating base words from morphological endings

Constructing syntax trees from source text

Generating simple English output from tree

Identifying words already known to the program
Other Research


Part-of-speech tagging:

Uses probability to identify parts of speech

Applied to unknown words and structures

Complex labeling systems, beyond conventional
Translation algorithms:

Massive dictionaries store words and information

Aided by verb categorization

Omit unknown words and translate without

Usually comprehensible, but require human revision
Old Methods

Direct Translation




First method
Rearranges sentences without parsing
Based on rules of transfer for specific languages
Interlingua




From era of international languages
Uses one representation as an intermediary
Intermediary is usually a constructed language
Easier to add language pairs
Syntactic Transfer








Similar to interlingua
Generates syntax tree using specific parser
Rearranges tree to fit target structure
Uses specific generation method to form output
Entire algorithm specific to one language pair
Best quality translations
Relatively new
Not as common in commercial software
Alternative Structures

Valency




Stores number of complements for each word
Type of complements not specified
Occupies less space in dictionary
Phrase-Structure Representation




Most familiar: noun phrase, verb phrase, etc.
Breaks sentence into superstructures
Puts terminal symbols only in leaves
Non-terminal symbols for branches
Dependency Trees


Uses words as nodes, not just leaves
Examples:






Verb dependent on subject
Objects dependent on verb
Adjectives dependent on nouns
Prepositions vary by type of prepositional phrase
Easier to verify agreement between words
Occupies less space
Object Orientation





Object-oriented approach allows more flexibility
Endings, cases, and declensions are classes
Fewer hard-coded rules
Methods for locating dependents are in classes
Modular design allows gradual changes


Changes in lexical analysis do not affect parsing
Changes in dictionary do not affect translation
Verb Typing

Divides verbs into categories, for example:




Transitive
Intransitive
Directional or Non-directional motion
Condenses structure storage



Dictionary stores only type of a verb
Particular structures taken from general
Code can apply to general structures, not specific
Dictionary


Open, save, add, remove, and search functions
Stores:





Russian nominative
English nominatives
Part of speech
Noun/pronoun attributes
Verb types
Translator



Uses transliteration for ease of testing
Can be easily converted to Unicode Cyrillic
Debugging output to terminal window
Results

Subject, verb, direct object translated




Adjectives matched to nouns



Subject is first nominative
Verb matched by gender, number, and person
Direct object is first accusative
Matched by case, number, and gender
Word order not considered
Word order should be accounted for, but aren't


Adjectives to nearest, not matching
Prepositional objects should be nearby
Conclusions

Part-of-speech guessing could be added easily




Verb typing would be harder, but helpful




When a subordinate is not found, add to list
For each unmatched word, prompt user
Allow selection between subordinates not found
Restricting complements makes more precise
More efficient, not searching for all possible
Prepositions could be associated with nouns
Even in inflecting languages, word order matters


Subordinates should be located by proximity
Multiple functions use the same inflections
Bibliography






Allen, James. Natural Language Understanding. New York:
Benjamin/Cummings Publishing Company, 1995.
Arnold, Doug, Lorna Balkan, Siety Meijer, R. Lee Humphreys, and Louisa
Sandler. Machine Translation: An Introductory Guide. London: NCC
Blackwell, 1994. Available Online:
http://www.essex.ac.uk/linguistics/clmt/MTbook/PostScript.
Barber, Charles. The English Language: A Historical Introduction.
Cambridge: Cambridge University Press, 1993.
Beard, Robert. “Russian: An Interactive On-Line Reference Grammar”.
November 1, 2005. Available Online:
http://www.alphadictionary.com/rusgrammar/.
Comrie, Bernard, ed. The World's Major Languages. Oxford: Oxford
University Press, 1990.
Hutchins, John and Harold Somers. An Introduction to Machine Translation.
London: Academic Press, 1992. Available Online:
http://ourworld.compuserve.com/hompages/WJHutchins/IntroMT-TOC.htm.