lin3098-genre-register
Download
Report
Transcript lin3098-genre-register
LIN 3098 – Corpus
Linguistics
Albert Gatt
In this lecture
Corpora for the study of
genre/register variation
revisit the concept of representativeness
and balance
external vs. internal criteria: Biber (1992)
introduce the multi-dimensional
approach to register/genre variation
(Biber 1988)
Part 1
The concept of register/genre
A preliminary example
Compare the following:
It is hard to resolve this problem.
I find it hard to resolve this problem.
Is one intuitively more “formal”?
Why?
A preliminary example
Extraposed to-clause
It is hard to resolve this problem.
It (expletive)
Verb be
An adjective (hard) or participle (boring)
Clause starting with to + infinitive verb
Tends to be associated with a formal,
“anomymous” style.
Tends to be “static”:
Adjective or participle denotes a state, not a
dynamic event.
A preliminary example
Extraposed to-clause
It is hard to resolve this problem.
It (expletive)
Verb be
An adjective (hard) or participle (boring)
Clause starting with to + infinitive verb
If our intuitions are correct, we would expect
the distribution of this clause to vary across
genres and registers.
What is a register?
Would you consider the following to
be registers?
1. recipe English
2. legal Maltese
3. specialised language used by shipbuilders
What are the crucial characteristics
of register?
Defining register
Possible definitions (see overview in
Paolillo 2000):
register = “a field of discourse” or “topic”
register = “a combination of all the
parameters of the communicative
situation”
register = “an occupationally determined
variety of language”
Defining genre
In discourse analysis and related
fields, genre is given a “sociologically
oriented” definition:
“A socially ratified way of using
language in connection with a
particular type of social activity”
suggests “typical” settings in which
language is used
e.g. interview, lecture, story…
Why is this relevant?
Reminder (see lecture 2):
general-purpose corpora aim for balance
and representativeness
how genre/register are defined affects the
structure and the uses of the corpus
corpus-based studies of variation
across/within registers need a welldefined notion
Balance and representativeness
Balance:
refers to the range of types of text in the corpus
e.g. the BNC’s construction was based on an a
priori classification of texts by domain, time and
medium
Representativeness:
refers to the extent to which the corpus contains
the full range of variation in the language.
Representativeness depends on balance as
a prerequisite
Biber (1993) on achieving balance
Biber distinguishes:
external criteria:
social and communicative contexts in which
a particular sample of text/speech is
produced
external criteria define registers or genres
internal criteria:
linguistic (e.g. lexico-grammatical) features
that distinguish texts
internal criteria define text types
External vs. internal
Example: academic writing vs. spoken
conversation
Some external criteria of differentiation:
primary channel (spoken/written/…)
type of addressee
factuality
Some internal criteria of differentiation:
more uses of personal pronouns in spoken
discourse
more use of passives in academic writing
…
Which should come first?
Biber’s argument:
“in defining the population for a
corpus, register/genre distinctions
[i.e. external criteria] take
precedence over text-type
distinctions. […] identification of the
salient text-type distinctions in a
language requires a representative
corpus of texts…”
Biber’s external criteria
1. Primary channel:
written/spoken/scripted
2. Format:
published/unpublished
includes various publication formats
3. Setting:
institutional/other/private-personal
Biber’s external criteria
4. Addresse/receiver
a. Plurality: unenumerated/
plural/individual/self
b. Presence: present/absent
c. Interactiveness: none/little/extensive
d. Shared knowledge: general/ specialised/
personal
Biber’s external criteria
5. Addressor:
a. Demographic variation: age, sex etc
b. Acknowledgement: acknowledged
invididual/insititution
6. Factuality: factual-informational /
intermediate / imaginative
7. Purposes: persuade, entertain, edify,
inform, instruct…
8. Topics: [cf. the “Domain” definition in BNC
texts]
The logic behind genre/register
comparison
A priori distinction between different
genres/registers
adequately sampled to be representative
Given these externally-based
distinctions, the question is:
what linguistic features are characteristic
(give rise to) different genres?
Part 2
The multifeature/multidimensional
framework (Biber 1988, Biber 1995)
Biber (1988, 1995)
Compared twenty-one genres in spoken
and written British English
Used a precompiled list of 67 linguistic
features, comparing:
the extent to which these features “cluster
together” across genres
high relative frequency of personal pronouns
=> high relative frequency of questions
the extent to which these clusters are more
clearly present in different genres
Primary goals
1. identify the main dimensions
(clusters of features) of variation
underlying all registers
2. find similarities and differences
between different registers
Dimensions
Dimension:
group of features that are empirically
determined to co-occur in text
Functional interpretation:
given a set of features forming a dimension
e.g. pers. pronouns + questions
the crucial question is: how do we interpret it
functionally?
e.g. the cluster containing pers. pronouns and
questions shows a high level of interpersonal
focus in the text
Factor analysis
The MF/MD approach uses factor
analysis
statistical technique to group together
related features based on their cooccurrence
resulting clusters of features (“factors)
are then interpreted and given a label
this is the process of identification and
functional interpretation of dimensions
Biber’s methodology
1.
Identify the grammatical features
based on review of existing literature
2.
tag all relevant features in the corpus texts
3.
post-edit the texts to ensure accuracy
4.
count frequency of each feature in each text
5.
apply factor analysis to compute co-occurrence patterns
among features
6.
interpret the resulting dimensions functionally
7.
compare different registers to see how much each
dimension is represented in them
Types of features
Lexical features
type-token ratio (indicates the average
no. of different types given the number
of tokens)
word length
lexical semantic features
e.g. word classes like hedges (probably,
possibly…); speech act verbs (declare),
etc
Types of features
Grammatical feature classes
nouns, prepositional phrases, attributive
and predicative adjectives, etc.
Syntactic features:
relative clauses, that-complements,
pied-piping constructions (Which car
does he like?), conditional subordination
(should you ever…)
The dimensions identified
Involved vs. informational production
Narrative vs. non-narrative
production
Elaborated vs. situation-dependent
reference
Overt expression of persuasion
Abstract vs. non-abstract style
NB. Many of these dimensions define
“poles of opposition”
Dimension 1: involved vs.
informational
Features:
1st & 2nd personal
pronouns
questions
reductions
stance verbs
hedges
emphatics
adverbial
subordination
nouns
adjectives
prepositional phrases
long words
Typical of conversations, letters
(high personal involvement)
Typical of informational
exposition, e.g. in official
documents and academic
writing
Dimension 2: Narrative vs. nonnarrative
Features:
past tense
perfect aspect
3rd person pronouns
speech act verbs
present tense
attributive adjectives
Typical of fiction
Typical of broadcasts,
telephone conversations,
professional letters
Dimension 3: elaborated vs.
situation-dependent reference
Features:
wh-relative clauses
pied-piping
phrasal coordination
time adverbials
place adverbials
Typical of “elaborated” text:
official documents, professional
letters, written exposition
Typical of “situationindependent language”
Typical of “situationdependent language”, e.g.
broadcasts, fiction, personal
letters
Dimension 4: Overt expression of
persuasion
Features:
modals
conditional
subordination
lack of any of the
above
Defines an “overt expression of
persuasion type”
e.g editorials, professional
letters
Language which does not
overtly seek to persuade
Dimension 5: Abstract vs. nonabstract style
Features:
agentless passives
by-passives
…
lack of any of the
above
An “abstract style”: technical
prose, academic prose, official
documents
Language which is typically
not abstract: conversation,
public speeches, broadcasts…
Biber’s main argument
No one dimension is enough to
characterise the properties of a
particular register
dimensions are coherent, correlated
groupings of features
every register could be defined in terms
of the relative prominence of all 5
dimensions
Biber’s main argument
Biber finds no evidence of an absolute
difference between spoken and written
language
e.g. conversations often display similar
characteristics to other non-spoken genres
Better to identify different types of speech
(broadcast, scripted, spontaneous)
view similarities and differences to different
types of writing
Summary
Biber’s MF/MD approach has proved
highly influential in the study of
register and genre
Crucially, relies on a priori definition
of:
features (“what to look for”)
registers (“situationally-defined uses of
language”)
References
Paolillo, J. C. (2000). Formalising formality.
Journal of Linguistics, 36: 215—259
Biber, D. (1993). Representativeness in
corpus design. Literary and Linguistic
Computing, 8 (4): 243-258.
Biber, D. (1995). On the role of
computational, statistical and interpretive
techniques in multi-dimensional analysis of
register variation. Text, 15 (3): 314—370