Transcript Document

Current Trends in
Documentation of Endangered
Languages
Peter K. Austin
ELAP, Department of Linguistics
SOAS
Thanks to Oliver Bond, Lise Dobrin, Lenore
Grenoble, David Nash David Nathan for
discussion of the ideas in this presentation;
they are absolved of responsibility for errors
Outline
Documentary linguistics and language
documentation
Components and skills for documentation
Some current issues and future concerns
Conclusions
Documentary linguistics
 new field of linguistics “concerned with the methods,
tools, and theoretical underpinnings for compiling a
representative and lasting multipurpose record of a
natural language or one of its varieties” (Himmelmann
1998, 2006)
 has developed over the last decade in large part in
response to the urgent need to make an enduring record
of the world’s many endangered languages and to
support speakers of these languages in their desire to
maintain them, fuelled also by developments in
information and communication technologies
 essentially concerned with role of language speakers
and their rights and needs
Features of documentary linguistics
 Himmelmann (2006:15) identifies important new features of documentary
linguistics:
 Focus on primary data – language documentation concerns the collection and
analysis of an array of primary language data to be made available for a wide
range of users;
 Explicit concern for accountability – access to primary data and
representations of it makes evaluation of linguistic analyses possible and
expected;
 Concern for long-term storage and preservation of primary data – language
documentation includes a focus on archiving in order to ensure that
documentary materials are made available to potential users into the distant
future;
 Work in interdisciplinary teams – documentation requires input and expertise
from a range of disciplines and is not restricted to linguistics alone;
 Close cooperation with and direct involvement of the speech community –
language documentation requires active and collaborative work with
community members both as producers of language materials and as coresearchers.
A contrast
language documentation: activity of systematic
recording, transcription, translation and analysis
of the broadest possible variety of spoken (and
written) language samples collected within their
appropriate social and cultural context
language description: activity of writing
grammar, dictionary, text collection, typically for
linguists
Ref: Himmelmann 1998, Woodbury 2003
Uses of documentation
 documentation outputs are multifunctional for:
 linguistic research - phonology, grammar, discourse,
sociolinguistics, typology, historical reconstruction
 folklore - oral literature and folklore
 poetics - metrical and music aspect of oral literature
 anthropology - cultural aspects, kinship, interaction styles,
ritual
 oral history, and
 education - applications in teaching
 language revitalisation
Users of documentation
 collection, analysis and presentation of data
 useful not only for linguistics but also for research into the
socio-cultural life of the community
 analysed and processed so it can be understood by
researchers of other disciplines and does not require any
prior knowledge of the language in question
 usable by members of the speaker community
 respects intellectual property rights, moral rights, individual
and cultural sensitivities about access and use and is done
in most ethical manner possible
The documentation record
 core of a documentation project is usually understood to be a
corpus of audio and/or video materials with transcription, multitier annotation, translation into a language of wider
communication, and relevant metadata on context and use of
the materials
 the corpus will ideally be large, cover a diverse range of genres
and contexts, be expandable, opportunistic, portable,
transparent, ethical and preservable (Woodbury 2003)
 as a result documentation is increasingly done by teams rather
than ‘lone wolf linguists’
 need to see grammatical analysis and description as a tertiarylevel activity contingent on and emergent from the
documentation corpus
Phases in documentation project
Project conceptualisation and design
Establishment of field site and permissions
Funding application
Data collecting and processing (including
archiving)
Creation of outputs
Monitoring, evaluation and reporting
Phases in data collection and analysis
Recording – of media and text (including
metadata)
Capture – analogue to digital transfer
Analysis – transcription, translation, annotation,
notation of metadata
Archiving – creating archival objects, assigning
access and usage rights
Mobilisation – publication and distribution of
materials
Some current issues and challenges







Documentation versus description
The ‘representative’ record
Quality of language documentation
Commodification
Interdisciplinarity
Training for language documentation
Communicating with the wider world
Documentation vs description
Himmelmann and others have tried to distinguish language
documentation from language description, but it is unclear whether
such a separation is truly meaningful, and even if it is where the
boundaries between the two might lie.
Documentation projects must rely on application of theoretical and
descriptive linguistic techniques, if only to ensure that they are
usable (i.e. have accessible entry points via transcription, translation
and annotation) as well as to ensure that they are comprehensive.
It is only through linguistic analysis that we can discover that some
crucial speech genre, lexical form, grammatical paradigm or
sentence construction is missing or under-represented in the
documentary record.
Without good analysis, recorded audio and video materials do not
serve as data for any community of potential users. Similarly,
linguistic description without documentary support is sterile, opaque
and untestable.
The “representative” record
 On a theoretical level, once can define “representative”
documentation as the collection of sample texts of all discourse
types, all registers and genres, from speakers representing all ages,
generations, socioeconomic classes, and so on. On a practical level,
however, there are concrete limitations to the range and number of
texts which can be collected, transcribed and analysed. Most
linguists cannot devote their entire careers to time in the field, which
would be required for a truly thorough collection and analysis of
data.
 A solution (proposed by Siefart in LDD 5) is sampling, ie.
identification of some subset of types that is representative of the
language as a whole – but how do we do this in a meaningful way:
(i) for an individual language (ii) cross-linguistically in a comparable
manner?
Sampling criteria
Criteria for differentiation of communicative events:
 “Ways of speaking“ as distinguished in specific culture /
speech community (Ethnography of Communication)
 Medium: spoken / written
 Plannedness: unplanned / planned
 Register: formal / informal
 Manner of obtaining data: spontaneous (‘natural’) vs.
elicitation vs. stimulated
 Target: child-directed / adult-directed / foreignerdirected
 It is clear that the success of a documentation project rests on
intimate collaboration with community members. In the ideal, they
can be trained to be engaged in data collection themselves,
thereby expediting the process (eg. Florey 2004). Even if this is
not possible, community members can direct (external) linguists to
varying discourse types and to differing speech patterns.
 Note however that this could result in focus on
rare/unusual/unique discourse types that were in no sense
‘representative’
 Himmelmann (2006:66) identifies five major types of
communicative events ranged along a continuum from unplanned
to planned (next slide) however it is not clear that this typology is
applicable to all languages and all speech communities – just
what is a ‘representative’ account of language in use remains
unclear, and perhaps should be abandoned
Himmelmann genres
Parameter
Major Types
Examples
Unplanned
exclamative
Ouch! Fire! Jishin da!
directive
Scalpel! Sit! Achi ike!
conversational
greetings, small talk, chat, discussion,
interview
monological
narrative, description, speech,
formal address
ritual
prayer, ceremonial address
Planned
Quality of documentation
 There is a tendency among some researchers to equate
documentation outcomes with archival objects (part of what David
Nathan has termed ‘archivism’), that is, the number and volume of
recorded digital audio and/or video files and their related
transcription, annotation, translation and metadata.
 Mere quantity of objects is not a good proxy for quality of research.
 Equally, some would argue that outcomes which contribute to
language maintenance and revitalization are the true measure of the
quality of a documentation project (what better success of an
endangered language project than that the language continues to be
used?).
 So how could we measure ‘quality’ of a documentary corpus? What
parameters might be included?
Possible metrics
 volume (quantity) as a proxy
 form
 media – audio, video, stills – how measured?
 text – explicit, transparent, well-structured,
standardised, richly detailed, machine-readable
 links (relations, hypertext, multimedia) – explicit,
well-structured, machine readable
More possible metrics
 content:




new – never inscribed before
unique – not readily replicable
interesting
…
 organisation and management (workflow,
transformations, archiving)
 relevance and use of outputs for stakeholders
 impact on community of speakers (or other
stakeholders)
 impact on future of language
Commodification
 reduction of languages to things and their treatment as if they
were a tradeable commodity
 reflected in language documentation through the transformation of
languages into bounded objects, indices, technical encodings, and
exchangeable goods
 results from forces of objectification, standardisation and audit that
shape the management of information in contemporary Western
culture, especially academic culture with its focus on outputs and
counting (eg. RAE, RQF, citation indices, research impact
statements etc)
 also reflects a theoretical and methodological vacuum that has
been filled not by linguistics but by preservationists, archives and
technologists
Languages as bounded objects
selections of phenomena crystalised into a
singular “language”
languages placed within boundaries, on maps
etc.
Languages as indices
 language vitality indicators: Unesco defines 9 criteria
with 6 scoring levels; SIL uses 8 indicators
 these objectify languages: the vitality of an individual
language can be quantified, and languages can be
ranked according to degree of endangerment
 Unesco presents a deterministic relationship between
the 9 factors and the vitality and function of languages:
“taken together, these nine factors can determine the
viability of a language, its function in society and the
type of measures required for its maintenance or
revitalization”
Languages as exchangeable goods
 goal of research is for languages to be ‘preserved’ as
‘resources’ that ‘consumers’ (linguists et al) discover and
access via ‘service providers’ (OLAC publicity)
 linguists’ professional obligations to speaker communities
now often formulated in grant applications and elsewhere
in terms of transacted objects (language primers, CDs,
books) rather than knowledge sharing, joint engagement
in language maintenance activities or other interactions
 granting agencies require linguist’s bona fides to be
distilled into a ‘letter of support’ from ‘an appropriate
representative of the language community’ thus turning a
complex of social and political dynamics into an object that
is used to legitimise the research
Languages as technical encodings
 quantifiable properties (recording hours, data volume, file
parameters) and technical desiderata (‘archival quality’,
‘portability’, standardised ontologies) have become
reference points in discussing and assessing the methods
and goals of documentation
 results in grant application by formula: 100 hours of 16 bit
44.1MHz audio, 25 hours MPEG-2 video, 10% ELAN .eaf
files and Toolbox annotations
 technical parameters replace balanced discussion of
documentation methods; eg. video recordings proposed
without reference to hypotheses, goals or methodology;
avoidance of data compression substitutes for knowledge
of art of audio recording; file formats named rather than
corpus structure described
Interdisciplinarity
 Himmelmann and others have pointed to the importance of
taking a multidisciplinary perspective in language
documentation and drawing in researchers, theories and
methods from a wide range of areas, including
anthropology, musicology, psychology, ecology, applied
linguistics etc (see Harrison 2005, Coelho 2005, Eisenbeiss
2005).
 True interdisciplinary research, is difficult to achieve, both
because of theoretically different orientations, and practical
differences in approach (ranging from differences in
linguists’ and anthropologists’ practices concerning
payments for consultants traditionally have differed, to more
significant differences in academic paradigm that make
communication and understanding fraught).
Mainstream linguistics has tended to turn away
from other disciplines and to emphasise its
‘independence’ by concentrating on theoretical
concerns that are of internal interest to linguists
only (minimalism, OT phonology – see
Libermann 2007).
Documentary linguistics opens new doors to
interdisciplinary collaboration but we need to
work out how to achieve it.
Reaching the wider world
 There are great opportunities for communicating about
language and language issues to the general
community
 At SOAS we have run “Endangered Languages Week”
in 2007 and 2008, film showings, public lectures,
exhibitions (“Disappearing Voices”), David Crystal’s
play (“Living On”)
 We see part of your work as ELDP grantees as
including outreach and communication activities – we
will encourage you to contribute “stories” and images
for things like the HRELP annual report, the website
etc.
Exhibition
Identifying the gaps
 The discourse of endangered languages and language documentation
has a strong moral and emotional power which has not been matched
by conceptual guidance on what linguistics and linguists can do in
response
 publications and debates about effective and appropriate documentary
methodologies for linguists have been slow to develop, resulting in
many unanswered questions:




are the goals of documentary linguistics social or formal?
are its data symbolic or digital recordings of events?
what role(s) should archives play?
how could we decide between competing interests?
 we lack a framework for assessing quality, value, effectiveness and
progress of our work so documentary linguists fall back on established
patterns like quantifiable indices and technical standards
Setting some agendas
 recognising that some of the challenges described here
derive from bureaucratic and technological contexts and
should not be taken for granted as defining the discipline
 we need to develop a new approach to language
documentation that implements the moral and ethical vision
that has attracted new participants
 replacing the rhetoric that documentation is a separate
discipline from descriptive linguistics with a better
understanding of their respective goals, methodologies and
evaluative criteria
 and locating documentation within a wide range of
interdisciplinary approaches to human language
 with development of appropriate training and outreach
Our goals for the training course
 To expose you to good practices in documentation
(recording, analysis, archiving, mobilisation, ethics and
IPR)
 To raise issues that we see as theoretical and practical
challenges and to share experiences and ideas (a twoway process
 To begin what we hope is a long-term on-going
relationship between you as researchers and us as
trainers, archivists, researchers and all round good
guys
The end