lin3098-genre-register

Download Report

Transcript lin3098-genre-register

LIN 3098 – Corpus
Linguistics
Albert Gatt
In this lecture
 Corpora for the study of
genre/register variation
 revisit the concept of representativeness
and balance
 external vs. internal criteria: Biber (1992)
 introduce the multi-dimensional
approach to register/genre variation
(Biber 1988)
Part 1
The concept of register/genre
A preliminary example
 Compare the following:
 It is hard to resolve this problem.
 I find it hard to resolve this problem.
 Is one intuitively more “formal”?
 Why?
A preliminary example
 Extraposed to-clause
 It is hard to resolve this problem.




It (expletive)
Verb be
An adjective (hard) or participle (boring)
Clause starting with to + infinitive verb
 Tends to be associated with a formal,
“anomymous” style.
 Tends to be “static”:
 Adjective or participle denotes a state, not a
dynamic event.
A preliminary example
 Extraposed to-clause
 It is hard to resolve this problem.




It (expletive)
Verb be
An adjective (hard) or participle (boring)
Clause starting with to + infinitive verb
 If our intuitions are correct, we would expect
the distribution of this clause to vary across
genres and registers.
What is a register?
 Would you consider the following to
be registers?
1. recipe English
2. legal Maltese
3. specialised language used by shipbuilders
 What are the crucial characteristics
of register?
Defining register
 Possible definitions (see overview in
Paolillo 2000):
 register = “a field of discourse” or “topic”
 register = “a combination of all the
parameters of the communicative
situation”
 register = “an occupationally determined
variety of language”
Defining genre
 In discourse analysis and related
fields, genre is given a “sociologically
oriented” definition:
 “A socially ratified way of using
language in connection with a
particular type of social activity”
 suggests “typical” settings in which
language is used
 e.g. interview, lecture, story…
Why is this relevant?
 Reminder (see lecture 2):
 general-purpose corpora aim for balance
and representativeness
 how genre/register are defined affects the
structure and the uses of the corpus
 corpus-based studies of variation
across/within registers need a welldefined notion
Balance and representativeness
 Balance:
 refers to the range of types of text in the corpus
 e.g. the BNC’s construction was based on an a
priori classification of texts by domain, time and
medium
 Representativeness:
 refers to the extent to which the corpus contains
the full range of variation in the language.
 Representativeness depends on balance as
a prerequisite
Biber (1993) on achieving balance
 Biber distinguishes:
 external criteria:
 social and communicative contexts in which
a particular sample of text/speech is
produced
 external criteria define registers or genres
 internal criteria:
 linguistic (e.g. lexico-grammatical) features
that distinguish texts
 internal criteria define text types
External vs. internal
 Example: academic writing vs. spoken
conversation
 Some external criteria of differentiation:
 primary channel (spoken/written/…)
 type of addressee
 factuality
 Some internal criteria of differentiation:
 more uses of personal pronouns in spoken
discourse
 more use of passives in academic writing
 …
Which should come first?
 Biber’s argument:
“in defining the population for a
corpus, register/genre distinctions
[i.e. external criteria] take
precedence over text-type
distinctions. […] identification of the
salient text-type distinctions in a
language requires a representative
corpus of texts…”
Biber’s external criteria
1. Primary channel:
 written/spoken/scripted
2. Format:
 published/unpublished
 includes various publication formats
3. Setting:
 institutional/other/private-personal
Biber’s external criteria
4. Addresse/receiver
a. Plurality: unenumerated/
plural/individual/self
b. Presence: present/absent
c. Interactiveness: none/little/extensive
d. Shared knowledge: general/ specialised/
personal
Biber’s external criteria
5. Addressor:
a. Demographic variation: age, sex etc
b. Acknowledgement: acknowledged
invididual/insititution
6. Factuality: factual-informational /
intermediate / imaginative
7. Purposes: persuade, entertain, edify,
inform, instruct…
8. Topics: [cf. the “Domain” definition in BNC
texts]
The logic behind genre/register
comparison
 A priori distinction between different
genres/registers
 adequately sampled to be representative
 Given these externally-based
distinctions, the question is:
 what linguistic features are characteristic
(give rise to) different genres?
Part 2
The multifeature/multidimensional
framework (Biber 1988, Biber 1995)
Biber (1988, 1995)
 Compared twenty-one genres in spoken
and written British English
 Used a precompiled list of 67 linguistic
features, comparing:
 the extent to which these features “cluster
together” across genres
 high relative frequency of personal pronouns
=> high relative frequency of questions
 the extent to which these clusters are more
clearly present in different genres
Primary goals
1. identify the main dimensions
(clusters of features) of variation
underlying all registers
2. find similarities and differences
between different registers
Dimensions
 Dimension:
 group of features that are empirically
determined to co-occur in text
 Functional interpretation:
 given a set of features forming a dimension
 e.g. pers. pronouns + questions
 the crucial question is: how do we interpret it
functionally?
 e.g. the cluster containing pers. pronouns and
questions shows a high level of interpersonal
focus in the text
Factor analysis
 The MF/MD approach uses factor
analysis
 statistical technique to group together
related features based on their cooccurrence
 resulting clusters of features (“factors)
are then interpreted and given a label
 this is the process of identification and
functional interpretation of dimensions
Biber’s methodology
1.
Identify the grammatical features

based on review of existing literature
2.
tag all relevant features in the corpus texts
3.
post-edit the texts to ensure accuracy
4.
count frequency of each feature in each text
5.
apply factor analysis to compute co-occurrence patterns
among features
6.
interpret the resulting dimensions functionally
7.
compare different registers to see how much each
dimension is represented in them
Types of features
 Lexical features
 type-token ratio (indicates the average
no. of different types given the number
of tokens)
 word length
 lexical semantic features
 e.g. word classes like hedges (probably,
possibly…); speech act verbs (declare),
etc
Types of features
 Grammatical feature classes
 nouns, prepositional phrases, attributive
and predicative adjectives, etc.
 Syntactic features:
 relative clauses, that-complements,
pied-piping constructions (Which car
does he like?), conditional subordination
(should you ever…)
The dimensions identified
 Involved vs. informational production
 Narrative vs. non-narrative
production
 Elaborated vs. situation-dependent
reference
 Overt expression of persuasion
 Abstract vs. non-abstract style
NB. Many of these dimensions define
“poles of opposition”
Dimension 1: involved vs.
informational
 Features:
 1st & 2nd personal
pronouns
 questions
 reductions
 stance verbs
 hedges
 emphatics
 adverbial
subordination
 nouns
 adjectives
 prepositional phrases
 long words
Typical of conversations, letters
(high personal involvement)
Typical of informational
exposition, e.g. in official
documents and academic
writing
Dimension 2: Narrative vs. nonnarrative
 Features:




past tense
perfect aspect
3rd person pronouns
speech act verbs
 present tense
 attributive adjectives
Typical of fiction
Typical of broadcasts,
telephone conversations,
professional letters
Dimension 3: elaborated vs.
situation-dependent reference
 Features:
 wh-relative clauses
 pied-piping
 phrasal coordination
 time adverbials
 place adverbials
Typical of “elaborated” text:
official documents, professional
letters, written exposition
Typical of “situationindependent language”
Typical of “situationdependent language”, e.g.
broadcasts, fiction, personal
letters
Dimension 4: Overt expression of
persuasion
 Features:
 modals
 conditional
subordination
 lack of any of the
above
Defines an “overt expression of
persuasion type”
e.g editorials, professional
letters
Language which does not
overtly seek to persuade
Dimension 5: Abstract vs. nonabstract style
 Features:
 agentless passives
 by-passives
 …
 lack of any of the
above
An “abstract style”: technical
prose, academic prose, official
documents
Language which is typically
not abstract: conversation,
public speeches, broadcasts…
Biber’s main argument
 No one dimension is enough to
characterise the properties of a
particular register
 dimensions are coherent, correlated
groupings of features
 every register could be defined in terms
of the relative prominence of all 5
dimensions
Biber’s main argument
 Biber finds no evidence of an absolute
difference between spoken and written
language
 e.g. conversations often display similar
characteristics to other non-spoken genres
 Better to identify different types of speech
(broadcast, scripted, spontaneous)
 view similarities and differences to different
types of writing
Summary
 Biber’s MF/MD approach has proved
highly influential in the study of
register and genre
 Crucially, relies on a priori definition
of:
 features (“what to look for”)
 registers (“situationally-defined uses of
language”)
References
 Paolillo, J. C. (2000). Formalising formality.
Journal of Linguistics, 36: 215—259
 Biber, D. (1993). Representativeness in
corpus design. Literary and Linguistic
Computing, 8 (4): 243-258.
 Biber, D. (1995). On the role of
computational, statistical and interpretive
techniques in multi-dimensional analysis of
register variation. Text, 15 (3): 314—370