Corpora and Translation for Language Learners

Download Report

Transcript Corpora and Translation for Language Learners

Translating into the L2:
Corpus tools and resources
Federico Zanettin
Università di Perugia
Outline




Translation into the L2
Corpus resources and tools
Sample translation activity
 Role of corpora
 Role of students
 Role of teacher
Conclusions
Translation into the L2

Is translation into L2 to be avoided?





Should the teacher be a native speaker of the L2?


standard practice in translator training
textbooks and manuals
actual professional practice
actual practice in L2 learning environments
Many are not…
A number of studies challenge these views:

e.g. Campbell 1998, Stewart 1999, 2000, forthcoming,
Grosman et al. 2000, Kelly et al. 2003, Pokorn 2005,
Kearns forthcoming, …
Corpus resources and tools

Corpora



The Web as corpus
The Web as a source for DIY corpora
Online corpora



‘Traditional’ corpora (i.e. non-native electronic texts)



Monolingual
Parallel
Monolingual
Multilingual/Parallel
Tools




General purpose search engines (e.g. Google)
Online corpus analysis services
Stand-alone corpus analysis software (e.g.Wordsmith Tools, Textstat,
Paraconc)
Custom software (e.g. Xaira, ENPC Explorer, etc.)
The Web as corpus


Search engines advanced options
Specialized “sub-webs”



Google scholar
Google books
Online concordancers



WebCorp
WebCONC
KwikFinder
The Web as source of DIY corpora

Manual DIY corpora


Download + corpus analysis software (e.g.
Wordsmith Tools, TextStat, etc.)
(Semi) automatic DIY corpora

Sketch Engine
Sketch Engine



Create your instant DIY web corpora
Add linguistic annotation to your corpora
Consult very large corpora for many
languages




Word lists
Concordances
Word profiles (Word Sketch)
…
Word Sketch for ‘Disease’
Word Sketch



A Word Sketch is a corpus-based summary of a word's grammatical
and collocational behaviour.
Each column shows the words that typically combine with disease in a
particular grammatical relations. For example, "object_of" lists - in order
of statistical significance rather than raw frequency - the verbs that most
typically occupy the verb slot in cases where disease is the object of a
verb.
Switching between Concordance mode and Word Sketch mode is a
useful way of getting more information about a particular word
combination. Thus, if you want to look at examples of the string
“transmit + disease", simply click on the number next to “transmit" in
the object_of list (93) and you will be taken directly to a concordance
showing all instances of this combination.
Adapted from the Sketch Engine website
Online corpora

Monolingual




Leeds Internet corpora
The corpus of contemporary American English
(COCA)
etc.
Bilingual



OPUS
Compara
…
Internet Corpora at Leeds
al-luġatu l-’arabiyyatu l-fuṣḥā
OPUS (Europarl parallel corpus)

in modo sistematico

Systematically vs. in a systematic way
The Web vs. well-constructed corpora


Corpora = reliability, core patterns of
language use
The Web


Lexical and terminological richness
Multi-word expressions
“naked eye”

“to the naked eye”

Google = 2.5 million hits
BNC = 884 hits


“visible to the naked eye”

Google = 1.2 million hits
BNC = 18 hits


“barely visible to the naked eye”

Google = 83,000 hits
BNC = none


“be barely visible to the naked eye”

Google = 49,000 hits
BNC = none


“grains that are so small as to be barely
visible to the naked eye”

Google = 5 hits (2 different results,
duplicated)
Sample activity



Revise the output of an online machine
translation system
Source text: specialized text in a curricular
field (e.g. history, economics, politics)
Tools for revision


Dictionaries
Corpus resources and tools
An example
•
I puritani della Nuova Inghilterra furono i primi fra tutti i coloni inglesi
d'America ad elaborare in modo sistematico una teoria originale dello
Stato e della società.
•
The puritani of New England were the first between all
coloni English of America to elaborate in systematic way a
theory originate them of the State and the society.
•
New England Puritans were the first among all
English colonists of America to elaborate
systematically an original theory of State and
society.
Google advanced search




New England Puritans
The New England Puritans
(The) Puritans of New England
(The) New England’s Puritans
How the students worked
Doubts
about the MT outcome
Unknown words and expressions
Too literal renderings
“in some cases, it was just a matter of
verifying the accuracy of the MT output,
whereas in others there were good reasons
to improve the overall quality of the text.”
Use of corpus resources




Does something exist?
Are there better alternatives?
Doubts confirmed (MT wrong)
Doubts disconfirmed (MT right)


Specific terminology
Need to



ask the right questions
formulate queries properly
analyse results successfully
Example 1



Can globalization “exercise an effect” on
income redistribution?
Google search = no results
Search for “an effect”


Something can “have” or “produce” an effect
“globalization has ( a number of) effects on
income redistribution”
Example 2




“the central theme of the debate”
Very literal: “il tema centrale del dibattito”
Google search = many results
EU proceedings parallel corpus = many
results
Example 3






“processi (economici) in corso” =
“(economic) processes Corsican” ?
Search for “processes” (COCA)
ongoing + processes (frequent collocates)
“ongoing (economic) processes”
Attested in comparable texts (sources of
concordance lines)
Example 4
“la diffusione di nuove tecnologie” = “the spread of new technologies”?

Google search = attested expression

But: what about “diffusion”?

Search for “spread” vs “diffusion (Web + COCA)

Search for

“the spread of * technologies” vs

“the diffusion of * technologies”

Spread = general English

Diffusion = academic English

“the diffusion of new technologies”
Example 5



“avere i requisiti per votare” = “have the requirements to vote”?
Dictionary: “fulfil/satisfy/comply with/suit/match
the requirements”
Corpora: “meet the requirements”
Role of corpora





"dictionary items + combinatory rules"
VS
"corpora + rules for querying and analyzing
them"
focus on language units larger than the single
word
Multiple local grammars
grammars for 1, 2, 3… word combinations



Unanalyzed knowledge
Acquisition vs learning
Corpora used to produce generalizations

Gerund + “is not a duty”
Role of students


Serendipity/discovery learning
A corpus is not necessarily “expected to
provide the right answers … but constantly
presents new challenges and stimulates new
questions, renewing the user’s curiosity and
offering ample opportunity for researching
aspects of language and culture” (Bernardini
2002:166).
Role of teacher

Guide, facilitator vs “walking dictionary”

Can only L2 native speakers be good translation
teachers?

Native speakers


More knowledgeble about target language
Non-native speakers




More knowledgeble about source language
Same directionality of translation
Better understanding of translation difficulties
Better able to evaluate translation process
Risks

Insufficient expertise in the use of software will result
in clumsy and superfluous searches


Insufficient expertise in the analysis of the data
(concordances) will result in wrong conclusions ...
and in turn in bad translations


so enough time should be devoted to teaching search
techniques, which are often specific to the corpora used
so enough time should be devoted to teaching how to
manipulate and interpret corpus data
However, something can also be learned from less
successful learners, whose comments highlight
areas of difficulty.
Conclusions




By using corpora in L2 translation learners can
heighten their awareness of contrastive aspects and of
varieties of possible translations
Even if equipped with limited formal linguistic
knowledge learners are given the opportunity to
discover language rules and conventions by
themselves
The use of corpus resources in a translation task
fosters reading and writing skills and encourages selfconfidence and autonomy
Teachers do not necessarily have to be target
language native speakers, but rather experts in using
resources, formulating queries, evaluating findings