www.tjhsst.edu
Download
Report
Transcript www.tjhsst.edu
Creation of a Russian-English
Translation Program
Karen Shiells
Purpose
Object-oriented approach
Interactive machine translation
Designed for aid, not independent translation
Explore algorithms used in machine translation
Identify grammatical obstacles to translation
Create a base to expand later
Scope of Study
Machine translation is and will be imperfect
Modern translation uses statistical methods
Project is limited to:
Separating base words from morphological endings
Constructing syntax trees from source text
Generating simple English output from tree
Identifying words already known to the program
Other Research
Part-of-speech tagging:
Uses probability to identify parts of speech
Applied to unknown words and structures
Complex labeling systems, beyond conventional
Translation algorithms:
Massive dictionaries store words and information
Aided by verb categorization
Omit unknown words and translate without
Usually comprehensible, but require human revision
Old Methods
Direct Translation
First method
Rearranges sentences without parsing
Based on rules of transfer for specific languages
Interlingua
From era of international languages
Uses one representation as an intermediary
Intermediary is usually a constructed language
Easier to add language pairs
Syntactic Transfer
Similar to interlingua
Generates syntax tree using specific parser
Rearranges tree to fit target structure
Uses specific generation method to form output
Entire algorithm specific to one language pair
Best quality translations
Relatively new
Not as common in commercial software
Alternative Structures
Valency
Stores number of complements for each word
Type of complements not specified
Occupies less space in dictionary
Phrase-Structure Representation
Most familiar: noun phrase, verb phrase, etc.
Breaks sentence into superstructures
Puts terminal symbols only in leaves
Non-terminal symbols for branches
Dependency Trees
Uses words as nodes, not just leaves
Examples:
Verb dependent on subject
Objects dependent on verb
Adjectives dependent on nouns
Prepositions vary by type of prepositional phrase
Easier to verify agreement between words
Occupies less space
Object Orientation
Object-oriented approach allows more flexibility
Endings, cases, and declensions are classes
Fewer hard-coded rules
Methods for locating dependents are in classes
Modular design allows gradual changes
Changes in lexical analysis do not affect parsing
Changes in dictionary do not affect translation
Verb Typing
Divides verbs into categories, for example:
Transitive
Intransitive
Directional or Non-directional motion
Condenses structure storage
Dictionary stores only type of a verb
Particular structures taken from general
Code can apply to general structures, not specific
Dictionary
Open, save, add, remove, and search functions
Stores:
Russian nominative
English nominatives
Part of speech
Noun/pronoun attributes
Verb types
Translator
Uses transliteration for ease of testing
Can be easily converted to Unicode Cyrillic
Debugging output to terminal window
Results
Subject, verb, direct object translated
Adjectives matched to nouns
Subject is first nominative
Verb matched by gender, number, and person
Direct object is first accusative
Matched by case, number, and gender
Word order not considered
Word order should be accounted for, but aren't
Adjectives to nearest, not matching
Prepositional objects should be nearby
Conclusions
Part-of-speech guessing could be added easily
Verb typing would be harder, but helpful
When a subordinate is not found, add to list
For each unmatched word, prompt user
Allow selection between subordinates not found
Restricting complements makes more precise
More efficient, not searching for all possible
Prepositions could be associated with nouns
Even in inflecting languages, word order matters
Subordinates should be located by proximity
Multiple functions use the same inflections
Bibliography
Allen, James. Natural Language Understanding. New York:
Benjamin/Cummings Publishing Company, 1995.
Arnold, Doug, Lorna Balkan, Siety Meijer, R. Lee Humphreys, and Louisa
Sandler. Machine Translation: An Introductory Guide. London: NCC
Blackwell, 1994. Available Online:
http://www.essex.ac.uk/linguistics/clmt/MTbook/PostScript.
Barber, Charles. The English Language: A Historical Introduction.
Cambridge: Cambridge University Press, 1993.
Beard, Robert. “Russian: An Interactive On-Line Reference Grammar”.
November 1, 2005. Available Online:
http://www.alphadictionary.com/rusgrammar/.
Comrie, Bernard, ed. The World's Major Languages. Oxford: Oxford
University Press, 1990.
Hutchins, John and Harold Somers. An Introduction to Machine Translation.
London: Academic Press, 1992. Available Online:
http://ourworld.compuserve.com/hompages/WJHutchins/IntroMT-TOC.htm.