Corpus linguistics and the study of English grammar

Download Report

Transcript Corpus linguistics and the study of English grammar

Corpus linguistics and the study of
English grammar
Doug Biber
Northern Arizona University
©2010 Doug Biber
Major goals of the talk

Introduce corpus linguistics and survey the
methodologies used for corpus analysis

Illustrate corpus-based descriptions of linguistic
variation



from the Longman Grammar of Spoken and Written
English
from University Language
Discuss theoretical implications:



The unreliability of intuitions
The centrality of register for descriptions of language
use and language teaching
Grammatical complexity in conversation versus
academic writing
©2010 Doug Biber
What is corpus linguistics?

A research approach for describing
language use:
How do speakers and writers actually use
the vocabulary and grammar resources
available in a language?
©2010 Doug Biber
What is a corpus?


A large, principled collection of ‘natural’ texts
stored on computer
A corpus should ‘represent’ particular
language varieties or registers (e.g.,
conversation or university textbooks)


Design is important: texts must be sampled
from particular target registers
Size is equally important: Some language
features are rare but still have systematic
patterns of use
©2010 Doug Biber
Characteristics of corpus-based
analysis (I)

Relies on computer-assisted techniques


Concordancers (‘KWIC’ displays = ‘Key Word In
Context’)
Computer programs
Automatic (e.g., grammatical ‘taggers’)
 Interactive (to code grammatical variants)

©2010 Doug Biber
Characteristics of corpus-based analysis
(II)



Analyses are empirical
Uses both quantitative and qualitative /
interpretive techniques
Meaningful analyses must be motivated by
linguistic research questions (not simply by
the availability of a corpus)
©2010 Doug Biber
New perspectives offered by corpus linguistic
research





Word use: collocations, semantic prosody, function words
in context
Register distribution of linguistic features and variants –
words and grammatical structures: most features have a
skewed distribution across registers
Frequency of linguistic features and variants: rare or
common
Lexico-grammatical patterns: how a grammatical
feature is associated with a particular set of words
Discourse factors influencing the choice among
grammatical variants
©2010 Doug Biber
Case studies




Word use: pronouns this versus that
Grammatical features: common verbs in
conversation, verb aspect
Grammatical complexity in conversation
versus academic writing
Lexico-grammar: verbs controlling that-clauses
versus to-clauses
©2010 Doug Biber
Composition of the Longman Spoken and
Written English (LSWE) Corpus
Conversation BrE
Conversation AmE
Fiction
News BrE
News AmE
Academic prose
# of “texts”
3,436
329
139
20,395
11,602
408
©2010 Doug Biber
# of words
3,929,500
2,480,800
4,980,000
5,432,800
5,246,500
5,331,800
Composition of the T2K-SWAL Corpus
Register
Spoken:
Class sessions
Classroom management
Labs/In-class groups
Office hours
Study groups
Service encounters
Total speech:
# of text files
# of words
176
(40)
17
11
25
22
251 (+40)
1,248,800
39,300
88,200
50,400
141,100
97,700
1,665,500
Written:
Textbooks
Course packs
Course management
Institutional writing
Total writing:
87
27
21
37
172
760,600
107,200
52,400
151,500
1,071,700
TOTAL CORPUS:
423
2,737,200
©2010 Doug Biber
Word use:
demonstrative pronouns

Many simple analyses can be carried out
using only concordancing software
©2010 Doug Biber
KWIC Screen from MonoConc
©2010 Doug Biber
Demonstrative pronouns this versus that

The traditional description of the difference:



This refers to a thing near the speaker
That refers to something that is not near the
speaker
Which is more frequent?


in conversation
in academic writing
©2010 Doug Biber
Demonstrative pronouns that versus this
Frequency per million words
12000
10000
8000
that
6000
this
4000
2000
0
Conversation
Academic WR
©2010 Doug Biber
Demonstrative pronouns that versus this (cont.)

Examples of that in conversation for
evaluation
A: We see those cactus a lot.
B: Yeah, that’s nice.
A: I had a good sleep last night.
B: Well that’s good.

Examples of this in academic writing
(text deixis)
GAAP requires that a business use the accrual
basis. This means that the accountant records
revenues as they are©2010
earned…
Doug Biber
Grammatical features:
centrality of register
common verbs in conversation,
verb aspect

Corpus analyses that require a ‘tagged’
corpus and custom software
©2010 Doug Biber
Excerpt from a tagged text
Any ^dti++++
attempt ^nn++++
to ^to++++
interpret ^vbi++++
mass ^nn++++
or ^cc++++
contaminant ^jj+atrb+++
distributions ^nns+nom+++
or ^cc++++
to ^to++++
analyze ^vbi++++
a ^at++++
problem ^nn++++
quantitatively ^rb++++
requires ^vbz++++
estimates ^nns++++
of ^in++++
the ^ati++++
important ^jj+atrb+++
transport ^nn++++
parameters ^nns++++
©2010 Doug Biber
Noticing register differences
©2010 Doug Biber
Ecology Textbook
Make two lists: Nouns and verbs
Notice the use of modal verbs and pronouns
Wildlife photography represents the
nonconsumptive use of wildlife, which is the
use, without removal or alteration, of natural
resources. For much of this century, the
management of wildlife for the hunter has
been emphasized by wildlife managers. In
recent years, however, management for
nonconsumptive uses such as wildlife
photography and bird-watching has received
more attention.
©2010 Doug Biber
Classroom Teaching
Make two lists: Nouns and verbs
Notice the use of modal verbs and pronouns
Uh, one of the U.S. Court District Judges, I think it was
W. C., in the CITY U.S. District Court, made a
statement one time that in his opinion, one half of
the lawyers who were, uh presenting cases before
him were incompetent. And he wasn't saying
mentally incompetent, he was just saying they
weren't practicing law with a skill that was
professional. Now, now, I'm not trying to scare you,
you know what I'm trying to do? I'm trying to let
you know that you, you better pay attention to who
your lawyer is, and get someone who has respect.
©2010 Doug Biber
Course Syllabus
Notice the use of modal verbs and pronouns
In this course you will learn how to develop instructional
software and articulate the issues involved in using the
computer for instruction. This can only be achieved if you
are actively engaged in lesson development. Therefore, most
of your instruction will consist of reading relevant
documentation on the computer and completing assigned
projects on your own. […] While this instructional format is
interesting and rewarding, it requires that you be more
responsible for your own learning than in the lecture-test
format you may be used to. Not everything you need to
know will be told to you. You will need to access available
resources to find answers to your questions and be willing to
ask when you can't find them.
©2010 Doug Biber
Ecology Textbook, lower division
Nouns are underlined; verbs in bold italics
Wildlife photography represents the
nonconsumptive use of wildlife, which is the
use, without removal or alteration, of natural
resources. For much of this century, the
management of wildlife for the hunter has
been emphasized by wildlife managers. In
recent years, however, management for
nonconsumptive uses such as wildlife
photography and bird-watching has
received more attention.
©2010 Doug Biber
Business Classroom Teaching
Nouns are underlined; verbs in bold italics
Notice the dense use of I and you
Uh, one of the U.S. Court District Judges, I think it
was W. C., in the CITY U.S. District Court, made a
statement one time that in his opinion, one half of
the lawyers who were, uh presenting cases before
him were incompetent. And he wasn't saying
mentally incompetent, he was just saying they
weren't practicing law with a skill that was
professional. Now, now, I'm not trying to scare
you, you know what I'm trying to do? I'm trying
to let you know that you, you better pay attention
to who your lawyer is, and get someone who has
respect.
©2010 Doug Biber
2nd
Course Syllabus
person pronouns are underlined
In this course you will learn how to develop instructional
software and articulate the issues involved in using the
computer for instruction. This can only be achieved if you
are actively engaged in lesson development. Therefore, most
of your instruction will consist of reading relevant
documentation on the computer and completing assigned
projects on your own. […] While this instructional format is
interesting and rewarding, it requires that you be more
responsible for your own learning than in the lecture-test
format you may be used to. Not everything you need to
know will be told to you. You will need to access available
resources to find answers to your questions and be willing
to ask when you can't find them.
©2010 Doug Biber
Course Syllabus
Prediction modal verbs are underlined
In this course you will learn how to develop instructional
software and articulate the issues involved in using the
computer for instruction. This can only be achieved if you
are actively engaged in lesson development. Therefore, most
of your instruction will consist of reading relevant
documentation on the computer and completing assigned
projects on your own. […] While this instructional format is
interesting and rewarding, it requires that you be more
responsible for your own learning than in the lecture-test
format you may be used to. Not everything you need to
know will be told to you. You will need to access available
resources to find answers to your questions and be willing to
ask when you can't find them.
©2010 Doug Biber
Content word classes across university registers
350
Frequency per 1,000 words
300
Nouns
250
Verbs
200
150
Adjectives
100
Adverbs
50
0
sy
lla
s
c
et
ok
,
bi
o
tb
.
h
ac
g
in
rs
ou
te
eh
ss
x
te
fic
of
cla
©2010 Doug Biber
Nouns and pronouns across university registers
350
Nouns
Frequency per 1,000 words
300
250
1st person
pronouns
200
2nd person
pronouns
150
100
3rd person
pronouns
50
0
.
tc
,e
bi
lla
sy
s
s
ur
ho
ng
hi
ac
te
ok
bo
xt
te
e
fic
of
s
as
cl
©2010 Doug Biber
Modal verb classes across registers
18
Frequency per 1,000 words
16
Possibility
modals
14
12
Necessity
modals
10
8
6
Prediction
modals
4
2
0
ks
oo
tb
.
tea
s
ur
ho
tc
,e
bi
lla
sy
tex
e
fic
of
om
ro
ss
cl a
g
in
ch
©2010 Doug Biber
Checking your intuitions
©2010 Doug Biber
Common verbs in conversation
What is the most common lexical verb in
conversation?
(excluding the primary verbs be, have, do)
©2010 Doug Biber
Frequencies in Conversation of
the most common lexical verbs
10000
9000
Frequency per million words
8000
7000
6000
5000
4000
3000
2000
1000
0
get
go
say
know think
see
want come mean
©2010 Doug Biber
take
make
give
Selected uses of GET in conversation
Obtaining something (activity):
See if they can get some of that beer. (Conv)
How much are you getting a pay raise for? (Conv)
Moving to or away from something (activity):
Get in the car. (Conv)
Causing something to move (causative):
Jessie get your big bum here. (Conv)
We ought to get these wedding pictures into an album of some sort.
(Conv)
Causing something to happen (causative):
Uh, I got to get Max to sign one, too (Conv)
It gets people talking again, right. (Conv)
Changing from one state to another (occurrence):
She's getting ever so grubby looking now. (Conv)
So I'm getting that way now. (Conv)
Understanding something (mental):
Do you get it? (Conv)
Get in the perfect asoect with a stative meaning similar to have:
The Amphibicar - It's got little©2010
propellers
in the back. (Conv)
Doug Biber
You got your homework done, Jason? (Conv)
Proportion of most common lexical verbs
across four registers
140000
Frequency per million words
120000
100000
80000
60000
40000
20000
0
Conversation
Fiction
12 most cmn lex VBs
News
Other lexical verbs
©2010 Doug Biber
Academic

Verb aspect in conversation:

Simple: He works very hard

Progressive (or ‘continuous’): Tom is writing a

Perfect: Charlie has gone home
letter
Which verb aspect is most common in
conversation?
©2010 Doug Biber
Progressive aspect verb phrases across registers
(based on LGSWE, Figure 6.4)
©2010 Doug Biber
Frequency of simple, perfect, and progressive aspect in
four registers (based on LGSWE Fig 6.2)
Frequency per million words
140000
120000
Progressive
aspect
100000
80000
Perfect
aspect
60000
Simple
aspect
40000
20000
0
Conversation
Fiction
News
©2010 Doug Biber
Academic
Simple aspect verbs in conversation
B:
A:
B:
A:
B:
A:
B:
A:
B:
A:
-- What do you do at Dudley Allen then?
What the school?
Yeah. Do you No I'm, I'm only on the PTA.
You're just on the PTA?
That's it.
You don't actually work?
I work at the erm I know you work at Crown Hills, don't you?
Yeah.
©2010 Doug Biber
Grammatical complexity:
Dependent clauses

Common in academic writing?
©2010 Doug Biber
Grammatical complexity: Dependent clauses

More common in speech than in academic writing!



Adverbial clauses are more common in speech
 So she can blame someone else if it doesn't work.
Complement clauses are more common in speech
 I don’t know how they do it
Post-nominal clauses are more common in academic
writing
 the quantity of waste that falls into this category …
 The results shown in Tables IV and V add to the picture …
©2010 Doug Biber
Figure 1. Com m on finite clause types functioning as clausal constituents
12
10
Conversation
Rate per 1,000 words
8
6
Academic
Writing
4
2
0
Finite adverbial
clauses
V+THAT
©2010 Doug Biber
V+WH
Examples from conversation


I just don’t know
[if that’s
[what he wants] ]
But I don't think
[we would want
[to have it
[sound like
[it's coming from us] ] ] ]
©2010 Doug Biber
Conversation
Gayle: And Dorothy said Bob's getting terrible with, with the
smoking. Uh, he's really getting defiant about it because there
are so many restaurants where you can't smoke and he just
gets really mad and won't go to them.
[…]
Peter: Well they, they had a party. I forget what it was. They had it
at a friend's house. I can't remember why it wasn't at their
house any way. And they had bought a bottle of Bailey's because
they knew I liked Bailey's.
[…]
Gayle: I can't remember who it was. One of us kids.
[…]
Peter: Oh. I'll tell you I think the biggest change in me is since I
had my heart surgery.
Gayle: Really? Yeah I guess my, I mean I know my surgery was a
good thing but
Peter: <?> It makes you think. You realize it can happen to you.
©2010 Doug Biber
Grammatical complexity in academic writing:
Dependent phrases, not clauses
For example:
Each new level [of system differentiation] opens up space
[for further increases [in complexity] ].

Attributive adjectives


Nouns as pre-modifiers of a head noun


new level, further increases
system differentiation
Post-nominal prepositional phrases
level [of system differentiation]
 space [for further increases [in complexity]

©2010 Doug Biber
Figure 2. Com m on dependent phrasal types functioning as constituents in a noun
phrase
70
60
Conversation
Rate per 1,000 words
50
40
Academic
Writing
30
20
10
0
Attributive
adjectives in NPs
Premodifying
nouns in NPs
Postmodifying
prepositional
phrases in NPs
©2010 Doug Biber
Academic writing: Noun phrases with phrasal modifiers



This patterning [of behavior] [by households]
[on other households] takes time.
Each new level [of system differentiation]
opens up space [for further increases [in
complexity] ].
This may indeed be part [of the reason [for the
statistical link [between schizophrenia and
membership [in the lower socioeconomic
classes] ] ] ].
©2010 Doug Biber
Pedagogical implications

Academic writing




Need to understand phrasal embeddings for academic
reading
meaning relationships are not explicit
T-units and other embedded clauses are not good
measures of writing development
Conversation
A teaching dilemma:
 Conversational skills are needed at the lowest levels
BUT
 Dependent clauses – an ‘advanced’ topic -- are
extremely frequent in conversation

©2010 Doug Biber
The lexico-grammatical perspective:
Verbs that control that-clauses


Almost 200 verbs are attested in the LSWE
Corpus (e.g., feel, realize, hear, assume,
suggest, ensure, indicate, imply, propose)
But only 4 verbs are extremely common in
conversation:
think, say, know, guess
©2010 Doug Biber
Verbs controlling that-clauses in conversation
©2010 Doug Biber
Selected common lexical bundles with
simple present tense verbs in conversation

I don’t know
what
how
if
why
where
who
__________________

I don’t think
he/she
I
it’s
you
they
__________________
©2010 Doug Biber
Conclusion

What’s clear: Corpus-based research 


Enables the description of complex patterns
of use
Often results in surprising findings, running
counter to prior intuitions
What’s not yet clear:


What are the most effective applications of
corpus research findings to classroom
teaching?
we need corpus-based materials and
empirical research on their effectiveness!
©2010 Doug Biber