HawkinsEnglishAustralia2010 - University of California, Davis

Download Report

Transcript HawkinsEnglishAustralia2010 - University of California, Davis

Criterial Features in the
Learning of English
John A. Hawkins
University of Cambridge ESOL Examinations and
Research Centre for English and Applied Linguistics
University of California Davis Linguistics Department
Several decades of practical work on
language testing and teaching involving
English and other languages have led to the
six proficiency levels of the Common
European Framework of Reference (CEFR).
Levels:
C2
C1
B2
B1
A2
A1
Mastery
Effective Operational Proficiency
Vantage
Threshold
Waystage
Breakthrough
These different levels have been defined in
functional terms, i.e. in terms of the uses to
which language can be put and the various
functions that learners can perform as they
gradually master a second language (L2).
See e.g. the "can-do" statements of the Council of
Europe 2001 document, pp.244-257.
Learners at B1 “can express opinions on
abstract/cultural matters in a limited way or offer
advice within a known area”.
Learners at B2 “can follow or give a talk on a
familiar topic”.
Learners at C1 “can contribute effectively to
meetings and seminars within own area of work or
keep up a casual conversation with a good deal of
fluency”.
By focusing on functions the CEFR attempts
to define these levels in a way that is
independent of the different grammatical
and lexical details of the languages of
Europe.
A given level of functional proficiency in L2
German can then be compared with a
corresponding level in Spanish, Finnish and
English.
But learners who perform each of these
functions may be using a wide variety of
grammatical constructions and words in
order to do so.
And the ability to do each of the tasks does
not tell us with precision which grammatical
and lexical properties of English (and of
other target languages) the learner actually
knows and uses at each level.
We need this precision and added
specificity, because the grammatical and
lexical details of each language are key
properties that examiners look for when
they assign candidates to a proficiency level
and score,
because teachers need to teach these
properties
and because learners need to learn them.
I.e. it isn’t sufficient to define the CEFR
levels in functional terms only, see Milanovic
(2009) .
So in this talk I ask:
1) how much of the grammar and lexicon of
English do learners actually know and/or
produce at each of these CEFR levels?
2) what patterns and principles are there in these
developing second language acquisition stages
of English? and
3) what are the practical benefits, for learning,
teaching, assessment and publishing, of
gathering this information?
If we can answer these questions we
contribute to a major goal of the “English
Profile Programme” (EPP).
The EPP was initiated by the Cambridge ESOL group
of Cambridge Assessment in collaboration with
Cambridge University Press, the Cambridge
Research Centre for English and Applied Linguistics,
the Cambridge Computer Laboratory, and other
stakeholders in 2005.
The EPP aims to provide “reference level
descriptions” and to add grammatical and
lexical details of English to CEFR’s functional
characterization of the different levels.
The way we have chosen to do this is
through the search for “criterial features”.
These are properties of English
(constructions, words, rule types, errors and
their frequencies, etc) that are distinctive
and characteristic of L2 proficiency at the
different levels.
See Hawkins & Filipović (2011), Hawkins & Buttery (2009,
2010).
Criterial features can be found in all areas of
English: syntax, morphology, phonology, the
lexicon, semantics, and discourse. They
enable us to distinguish higher proficiency
levels from lower levels in an efficient way.
In this talk I illustrate some of these features and
show how we can make practical use of them.
An analogy for Criterial Features
The defining characteristics for recognising faces in
a police identikit. You don’t need to see all the
features of a person’s face in order to distinguish
that person from others, just the important defining
characteristics that capture essential qualities
Criterial features are similar: they capture essential
distinguishing properties of the CEFR proficiency
levels.
Another example:
Languages change over time, and when historians of
English examine Old English, Middle English and
Early Modern English, they focus on important
differences between these stages, not on what
stayed the same.
So by the Great Vowel Shift words like [mūs] acquired
their modern pronunciation in Standard English,
mouse, and [mijs] became mice.
In historical linguistics it is the changes that matter
from stage to stage, and that define each stage, not
what was carried forward unchanged from previous
stages.
So too in second language learning. We need to
discover what is characteristic of each level as
learners progress.
If we can identify these criterial differences and
give them to learners, teachers, examiners and
publishers, we can make their tasks more efficient
and more focussed.
Learning and teaching can be better calibrated to
the target levels;
assessment and the diagnosis of levels and scores
can become more accurate;
and teaching materials and syllabus design can
focus on what is distinctive and needs to be
learned at each level.
Electronic corpora of learner English make it
possible for us to discover criterial features.
The collaborative work I report on is based on the
Cambridge Learner Corpus (CLC), a corpus of 40
million words of examination scripts at A2 - C2
levels from learners of English around the world
who speak many different first languages.
The CLC gives us empirical evidence for
developmental stages in the learning of new
constructions, words and word meanings.
It gives us quantitative data on learner errors in
syntax, morpho-syntax and lexical choice.
More generally, this research programme is centered
on the notion of criteriality and on criterial
differences between levels.
I.e. we are trying to capture key changes from level
to level in what has been learned and in what can be
produced, based NOT on what we hope learners have
learned and on what they have been taught, but
based on an empirical investigation of what they
actually do, in the CLC.
This corpus has been tagged for parts of
speech and parsed using a sophisticated
automatic parser (Briscoe et al. 2006) permitting
numerous grammatical and lexical searches
to be conducted. Between a third and one
half has been error-coded using codes
devised by researchers at CUP.
Sample Error Codes in the CLC
RN Replace noun Have a good travel (journey)
RV Replace verb I existed last weekend in London
(spent)
MD Missing determiner I spoke to President (the)
I have car (a)
AGV Verb agreement error The three birds is singing
(are)
IV Incorrect Verb Inflection I spended last week in London
(spent)
FJ Wrong Adjective Form
The situation got worst (worse)
UQ Unnecessary Quantifier A little bit quite common (quite
common)
DY Derivation of Adverb
It happened fastly (fast)
Briscoe’s RASP (Robust Accurate Statistical
Parser)
•
•
•
identifies parts of speech (PoS) probabilistically
tagging
generates a parse forest representation containing
all possible subanalyses with associated
probabilities
weighted Grammatical Relations yielded by the nbest parses of the input.
There are different types of criterial features.
Here I focus on just two:
Positive Linguistic Features
These refer to positive, i.e. correct, linguistic
properties of English that have been acquired at a
certain L2 level and that generally persist at all
higher levels. A property P (e.g. a new construction
type) acquired at B2 may differentiate that level and
higher levels from [A1, A2, B1] and will be criterial
for the former. Or P may be acquired at C2 and
differentiate this level from all lower levels.
Negative linguistic features
These are incorrect properties of English, or errors,
that occur at a certain level or levels and with a
characteristic frequency. Both the presence versus
absence of the errors, and especially their
frequency (the "error bandwidth"), can be criterial
for the level(s).
Examples of Positive Criterial Features
The A levels (A1 and A2)
Simple intransitive clauses (NP-V) and the
slightly more complex transitive (NP-V-NP)
sentence types are present from the
beginning:
He went. (NP-V) A1
He loved her. (NP-V-NP)
A1
Modal auxiliary verbs like may, might, can
and must appear first at A1 or A2, but only
in some of their senses.
Can is first attested in the PERMISSION sense at A1
and in the POSSIBILITY sense at A2:
And if you want, you can bring pencils or pens. (PERMISSION)
A1
In this magazine you can see all the new C.D.[s] and all the
dates of the concerts. (POSSIBILITY) A2
Noun Phrase sequences of Pronoun plus
Infinitive are found at A2:
something to eat
nothing to do
A2
A2
as are postnominal modifiers with participial
–ed:
beautiful paintings [painted by famous Iranian painters]
A2
Lexical verbs appearing at the A levels are
typically among the most basic and frequent
verbs of English; they appear first in their
most basic and frequent senses.
Verbs attested at A1 include:
catch, eat, give, put, take and walk
New lexical verbs appearing at A2 include:
break, cut, hit, push, stand, and fall
again typically in their most basic and literal senses.
For break this includes its primary physical sense:
I broke a beautiful glass. A2
for cut it includes the following example in its
primary sense:
First I cut the cake with my mother. A2
The B Levels (B1 and B2)
The new features at B1 involve more complex
syntax, e.g. an “Object Control” structure such as:
I ordered him [to gather my men to the hall]
B1
him is both the object of ordered and the logical
subject of gather here.
This is a criterial construction for B1 and higher
levels which distinguishes them from the A levels.
Structures like the following with finite or
non-finite subordinate clauses and
movement of the WH-word (how, where, etc)
to the front of its clause are also first
attested at B1:
I don’t know [how I could have done it]
I did not know [where to look for it]
B1
B1
And postnominal modifiers in participial –ing
become productive at B1:
I received your mail [asking for the sales report]
B1
Structures with a finite subordinate clause
positioned to the right of predicates like is
true and seems with a subject it are also
criterial for B1 and higher levels:
It’s true [that I don’t need a ring to make me remember you]
B1
i.e. so-called “Extraposition” structures
A large number of new lexical verbs appear
for the first time at B1 including:
divide, fit, grab, spill, stick and tear
And the meanings of the verbs that appeared first
at A1 and A2 begin to expand from their basic
senses.
break appears for the first time in the extended
sense of INTERRUPT at B1:
At last I managed to break the routine of the city … B1
Constructions that are criterial for B2 and
higher levels include “secondary predications”
go and paint the houses yellow and blue
B2
with yellow and blue predicated of the direct object
houses
“Extraposition” structures with a non-finite
subordinate clause positioned to the right of
its predicate are B2
It would be helpful [to work in your group as well]
B2
And so-called “Pseudocleft” structures with
an initial what functioning as subject of its
verb:
What fascinated me was [that I was able to lie on the sea surface]
B2
“Subject-to-Subject Raising” constructions
appear first at B2 with most of the higher
verbs and adjectives that trigger this rule,
for example prove:
The car has proved [to be one of the most important
inventions of our century]
B2
Similar examples are found at B2 with other raising verbs and
adjectives (The car happened to be …, The car appeared to be
…, The car turned out to be …, The car is likely to be …, etc)
New lexical verbs at B2 include
acquire, capture, drag, rush, spread,swallow
and new meanings and uses are attested for the
verbs that appeared earlier.
For break, first attested at A2, these include new
collocations such as break a promise or break the
law:
For cut, also an A2 verb, they include new meanings
at B2 such as REDUCE in cut the cost
The C Levels (C1 and C2)
“Subject-to-Object Raising” constructions with the
verb believe appear first at C1 and are criterial for C
levels:
I believe her [to be this country’s best representative]
C1
Passivized Subject-to-Object Raising
constructions such as the following with
assumed are also criterial for C1:
the low cost of membership and entry was assumed to be an
advantage. C1
Sequences of two prenominal –s genitives are
found at C1:
in the bride’s family’s house
C1
Structurally:
in [[[the bride’s] family’s] house]
New lexical verbs appearing first at C1
include
accumulate, boast, quote, reassure, shape and
stain
along with new meaning possibilities for the verbs
already introduced. E.g. break appears first in the
idiomatic sense of break the bank at C1.
New features appearing at C2 include less
common Subject-to-Object Raising
constructions with higher predicates such as
presume, declare and remember:
He presumed work [to be the way to live]
C2
New lexical verbs at C2 include
stagger, sway, limp, saunter, raid,squander
New meanings for break at C2 include original
figurative senses such as the attested break the
wall that surrounds him.
Negative Linguistic Features
One major distinguishing feature of the C levels can
be seen in the low frequencies for “negative
features” or error types such as those illustrated
above.
There are significant improvements in ALL of the
syntactic and morpho-syntactic error types at the C
levels.
At the B levels, by contrast, improvements are
relatively modest, and for many error types the
scores actually get worse, especially at B2, before
they get better again at C1.
The error codes defined and assigned by
our CUP colleagues involve morphosyntactic errors of inflection, derivation and
grammatical form, and syntactic errors of
omission, positioning and co-occurrence. It
is clear that learners at the C levels are
increasingly mastering these rules of
English, whereas B-level learners are not.
The following 50 grammatical or lexical error
types show significant improvements at C1 or
C2 compared with the immediately preceding
level (see Hawkins & Filipović 2010):
AG (Agreement), AGA (Anaphor Agreement), AGD (Determiner
Agreement), AGN (Noun Agreement), AGV (Verb Agreement), AS
(Argument Structure), CD (Countability of Determiner), CE
(Complex Error),CN (Countability of Noun), CQ (Countability of
Quantifier), DA (Derivation of Anaphor), DC (Derivation of
Conjunction), DD (Derivation of Determiner), DI (Inflection of
Determiners), DJ (Derivation of Adjective), DQ (Derivation of
Quantifier), DT (Derivation of Preposition), DY (Derivation of
Adverb), FA (Form of Anaphor), FD (Form of Determiner), FJ (Form
of Adjective), FN (Form of Noun), FQ (Form of Quantifier)
continued
FV (Form of Verb), FY (Form of Adverb) FFN (False Friend
Noun), FFV (False Friend Verb), FFY (False Friend Averb), IA
(Inflection of Anaphor), IJ (Inflection of Adjective), IQ
(Inflection of Quantifier), IY (Inflection of Adverb), MC
(Missing Conjunction), MD (Missing Determiner), MJ (Missing
Adjective), MN (Missing Noun), MQ (Missing Quantifier), MT
(Missing Preposition), MV (Missing Verb), MY (Missing
Adverb), RQ (Replace Quantifier), RV (Replace Verb), RY
(Replace Adverb), UC (Unnecessary Conjunction ), UD
(Unnecessary Determiner), UN (Unnecessary Noun), UQ
(Unnecessary Quantifier), UV (Unnecesary Verb), X (Negative
Formation).
By contrast, just 14 error types improve at
B1 or B2 relative to the immediately
preceding level:
Anaphor Agreement (AGA), Derivation of Conjunction (DC),
Form of Determiner (FD), FFN (False Friend Noun), FFV (False
Friend Verb), DI (Inflection of Determiners), IQ (Inflection of
Quantifier), IV (Inflection of Verb), MC (Missing Conjunction)
MJ (Missing Adjective), MQ (Missing Quantifier), MY (Missing
Adverb), RQ (Replace Quantifier), UQ (Unnecessary Quantifier)
Most strikingly, all but one of the error
types whose scores improve significantly at
B1 or B2 also improve significantly at C1 or
C2, whereas the converse fails.
A full 37 of the error types that improve
significantly at C1 or C2 do not improve
significantly at B1 or B2 (compare the two
lists above).
We must now ask: WHY do we see these
patterns in the data and why do we see the
criterial features changing the way they do
at the different levels? In particular, WHAT
is it about the features of the higher
proficiency levels that makes them late
acquired rather than early?
It cannot simply be that learners are
imitating the words and constructions they
are explicitly taught in their textbooks.
First, because there are many different textbooks
and teaching methods around the world.
But secondly because learners learn more than they
are explicitly taught, from their reading materials,
papers, magazines, movies, TV, conversations, and
so on.
I.e. second language learning shares many
similarities with first language learning, but not all
obviously.
For example, more frequently occurring
words and constructions are learned before
less frequent ones,
and simpler words, constructions and
meanings are learned before more complex
ones,
in both first and second language
acquisition.
E.g. learning English nouns and verbs with high
frequencies of use is easier than learning those
with lower frequencies, because they are
encountered more frequently (greater exposure);
frequent lexical items are overrepresented at first
in L2 English, moving gradually to L1 English
norms (see Hawkins & Buttery 2009, Hawkins & Filipović 2011)
More frequent construction types
(“subcategorization” frames or “verb cooccurrences”) are acquired earlier than more
infrequent ones, in general (see C. Williams 2007,
Hawkins & Filipović 2011).
The constructions of English that are
learned earliest are those that occur most
frequently in the input, as reflected in e.g.
the British National Corpus. This could be
established by comparing the CLC with the
British National Corpus (BNC), see Williams
(2007).
In fact, the new constructions that are
criterial for A2, B1 and B2, in Williams’ data,
appear to be learned in direct proportion to
their frequency in the input, as reflected in
the BNC. The more exposure, the earlier the
acquisition and the easier the learning.
This is shown in Tables 1 and 2.
Table 1 Frequencies for Verb Co-occurrence
Frames in English Corpora (including BNC)
Average Token Frequencies in the BNC for
the new Verb Co-occurrence Frames
appearing at the learner levels
A2
1,041,634
B1
38,174
B2/C1/C2
27,615
Table 2 Frequency Ranking
Average Frequency Ranking in the BNC for
the new Verb Co-occurrence Frames
appearing at the learner levels
A2
8.2
B1
38.6
B2/C1/C2
55.6
These kinds of data enable us to set up the
following principle of second language
learning (for which there are also well-attested
parallels in first language learning, see e.g.
Tomasello 2003, Diessel 2004, MacWhinney 2005):
Maximize Frequently Occurring Properties (MaF)
Properties of the L2 are learned in proportion to
their frequency of occurrence (as measured, for
example, in the BNC): more frequent exposure of a
property to the learner facilitates its learning and
reduces learning effort.
I.e. more frequent properties will result in earlier L2
acquisition, more of the relevant properties learned, and
fewer errors, in general. Infrequency makes learning more
effortful, with precise predictions depending on other factors.
Another principle of second language learning
(shared with first language learning) involves the
relative simplicity or complexity of structures and
meanings. The criterial grammatical features of
earlier levels are, in general, simpler than those of
later levels.
Also, in phonology simpler consonants and
consonantal distinctions are acquired earlier
than more complex ones (see e.g. Eckman
1984).
Simpler and more basic meanings for verbs
are acquired earlier than more complex and
derived extensions in meaning, figurative
uses, etc.
The verb break in its basic physical sense at A2
break in the sense of INTERRUPT (break the routine) B1
break an agreement, promise, etc. B2
break the bank (idiomatic) C1
break the wall that surrounds him (original figurative) C2
Maximize Structurally and Semantically Simple
Properties (MaS)
Properties of the L2 are learned in proportion to
their structural and semantic simplicity: simplicity
means there are fewer properties to be learned and
less learning effort is required.
I.e. simpler properties will result in earlier L2 acquisition,
more of the relevant properties learned, and fewer errors.
Complexity makes learning more effortful, in general, since
there are more properties to be learned, with precise
predictions depending on other factors.
In second language learning we also see
“transfer” effects from the first language,
either positive (when the transfer results in a
correct L2 property) or negative (when it
results in an error).
This is one thing that differentiates second from
first language acquisition.
E.g. speakers of languages with definite and
indefinite articles find it easier to acquire
the article system of English than do
speakers of languages without articles (see
Hawkins & Buttery 2009, 2010)
Errors involving missing definite and
indefinite articles in the L2 English of the
CLC are consistently low when the L1s also
have articles.
Recall: MD
I spoke to President (the)
I have car (a)
Table 3 (next slide) shows missing
determiner error rates for “the” and “a” at all
proficiency levels for French, German and
Spanish as first languages. All three
languages have an article system. (Data from
Hawkins & Buttery 2009)
The figures indicate the percentage of errors with respect to
the total number of correct uses. For instance a percentage of
10.0% would indicate that a determiner was omitted 1 in every
10 times that it should have appeared.
We see generally low error rates for these languages, without
significant deviation between levels.
Table 3 Missing Determiner Error Rates for L1s with
Articles
French
German
Spanish
French
German
Spanish
A2
4.76
0.00
3.37
Missing “the”
B1
B2
4.67
5.01
2.56
4.11
3.62
4.76
C1
3.11
3.11
3.22
C2
2.13
1.60
2.21
A2
6.60
0.89
4.52
Missing “a”
B1
4.79
2.90
4.28
C1
4.76
3.62
5.16
C2
3.41
2.02
3.58
B2
6.56
3.83
7.91
Table 4 (next slide) shows missing
determiner error rates for “the” and “a” at all
levels for Turkish, Japanese, Korean,
Russian and Chinese as first languages.
These languages do not have an article
system.
There is a general linear improvement, i.e. a decline, in error
rates across the levels with increasing proficiency (shown
from left to right).
Chinese shows an interesting inverted U-shaped progression,
especially in the case of missing “a”.
Table 4 Missing Determiner Error Rates for L1s
without Articles
Turkish
Japanese
Korean
Russian
Chinese
Missing “the”
A2
B1
B2
22.06
20.75
21.32
27.66
25.91
18.72
22.58
23.83
18.13
14.63
22.73
18.45
12.41
9.15
9.62
C1
14.44
13.80
17.48
14.62
12.91
C2
7.56
9.32
10.38
9.57
4.78
Table 4 continued
Turkish
Japanese
Korean
Russian
Chinese
Missing “a”
A2
B1
24.29
27.63
35.09
34.80
35.29
42.33
21.71
30.17
4.09
9.20
B2
32.48
24.26
30.65
26.37
20.69
C1
23.89
27.41
32.56
20.82
26.78
C2
11.86
15.56
22.23
12.69
9.79
One of the learning principles proposed in Hawkins &
Filipović (2011) to account for these data is
Maximize Positive Transfer:
Maximize Positive Transfer (MaPT)
Properties of the L1 which are also present in the
L2 are learned more easily and with less learning
effort, and are readily transferred, on account of
pre-existing knowledge in L1.
Shared L1/L2 properties should result, in general, in earlier L2
acquisition, in more of the relevant properties being learned,
and in fewer errors, unless these shared properties involve e.g.
high complexity and are impacted by other factors. Dissimilar
L1/L2 properties will be harder to learn by virtue of the
additional learning that is required, again in general.
More generally, Hawkins & Filipović (2010)
provide a multi-factor model of learning,
supported and informed by data in the CLC,
and comprising a set of interacting principles
such as those illustrated.
The model is a type of “complex adaptive
system” (see Gell-Mann 1992) and is called the
“CASP” model, short for “complex adaptive
system principles of SLA”.
The principles interact, sometimes reinforcing each
other (e.g. early acquired frequent items are also
often simple), sometimes competing to produce
variable outputs and alternative interlanguages.
Some of the principles are more general, others more
specific. Two of the more general principles are:
Minimize Learning Effort and Minimize Processing
Effort.
Minimize Learning Effort (MiL)
◦ Learners of a second language (L2) prefer to
minimize learning effort when they learn the
grammatical and lexical properties of the L2.
Learning effort is minimized when shared properties of the L1
can be transferred directly into the L2 (MaPT), when properties
of the L2 are frequently occurring in the L2 input (MaF), when
structural and semantic properties of the L2 are simple rather
than complex (MaS), and when there are fewer linguistic items
to be learned in a given grammatical or lexical domain.


Minimize Processing Effort (MiP)
Learners of a second language (L2) prefer to
minimize processing effort when they use the
grammatical and lexical properties of the L2, just
as native speakers do.
E.g. even when more complex properties have been learned
at an acquisition stage, L2 learners will still prefer to use
simpler properties, just like native speakers do.
Some Practical Applications of this Research
Once criterial features and transfer effects
have been identified at the different
proficiency levels they can be put to use for
learning, teaching and assessment
purposes.
Remember that the CLC gives us empirical evidence
for developmental stages in the learning of new
constructions, words and word meanings and
quantitative data on errors
This project analyses what learners have actually
learned and what they produce, i.e. it looks at the
data that examiners must assess.
NB! Our criterial features are taken only from
candidates who scored passing grades at each level.
Hence learners who are studying for the relevant
level can now be told explicitly what their successful
peers have mastered, and teachers can incorporate
these features in their teaching and materials,
thereby optimizing the learners’ chances of success.
For Learning and Teaching
Teaching materials and methods can now be
calibrated to the criterial features of each
level, making learning more efficient.
Grammatical and lexical properties of
English can be presented to learners in
ways that are level-appropriate.
Learners can be encouraged to focus on
both positive and negative features of the
target level(s), thereby optimizing their
chances of success.
Learners striving for B1 can be introduced to Object
Control structures that are first attested at B1 like
I ordered him [to gather my men to the hall]
B1
and to subordinate clauses with WH-movement:
I don’t know [how I could have done it]
I did not know [where to look for it]
B1
B1
They can be introduced to the lexical verbs that
successful candidates in B1 exams know, e.g.
divide, fit, grab, spill, stick, tear
and to the expanding meanings of verbs learned
earlier:
e.g. break appears for the first time in the
extended sense of INTERRUPT at B1:
At last I managed to break the routine of the city … B1
Teachers wanting to help learners attain B1
can focus on the error types that improve
significantly from A2 to B1, e.g. errors with
quantifier words:
MQ Missing quantifier I’ll call in the next days (next few days)
UQ Unnecessary quantifier a little bit quite common (quite
common)
RQ Replace quantifier
IQ
There were people of any age (all ages)
Inflection of quantifier I have been learning it for fours years
(four years)
Learners striving for B2 can be introduced
to the lexical verbs and meanings produced
at that level (see slides above)
and to the constructions that are first
attested at B2, e.g. “secondary predications”
go and paint the houses yellow and blue
B2
and “Extraposition” structures with a non-finite
subordinate clause
It would be helpful [to work in your group as well]
B2
Most Subject-to-Subject Raising constructions are
B2:
The car has proved to be one of the most
important inventions of our century.
B2
Teachers can introduce new lexical verbs and their
extended meaning possibilities to learners
preparing for C1, e.g.:
accumulate, boast, quote, reassure, shape, stain
E.g. break appears first in the idiomatic sense of break the
bank at C1.
Different ‘raising’ structures characteristic of C1
need to be mastered at this level such as:
I believe her to be this country’s best representative
C1
the low cost of membership and entry was assumed to be an
advantage.
C1
And most syntactic and morpho-syntactic error
types need to show significant improvements at C1
from B2, and again at C2 from C1.
Explicit exercises and grammar points can now
target precisely those features that are criterial for
the different levels, making the teaching of
grammar efficient and level-appropriate.
Both grammar and lexicon can be introduced in a sequence
that reflects their frequency in the input and inherent
complexity, as revealed through native speaking corpora like
the BNC and also the CLC.
Written and spoken materials can also be selected
that encourage implicit learning of the syntactic and
morpho-syntactic structures and rules that are
criterial for the different levels.
Study guides and tips can be written for learners
preparing for exams that incorporate the criterial
features of each level.
More generally this research gives added
specificity to the functional descriptors of
CEFR.
It describes what learners can and can’t do
grammatically and lexically at each level.
Some of these new grammatical and lexical items
actually enable users to express the functions in
question.
Recall that learners at B1 “can express opinions on
abstract/cultural matters … or offer advice within a
known area” (Council of Europe 2001: 244-257)
Some of the new grammatical structures at this level
are particularly well-suited for this, e.g. I advise you
to go to the doctor.


Other grammatical and lexical properties are
simply correlated with particular levels and are
characteristic of sentences that express a variety of
functions.
E.g. constructions with broad meanings and usage
possibilities, or improvements in various error
types.
For Assessment
This research provides content that can
help to validate the scores that examiners
of English provide.
The assignment of a level and a grade to a sample of learner
English currently relies on judgments that examiners make
based on their experience and training. Examiners have
learned to assign scores with good inter-examiner
agreement, but there is still a certain amount of intuition that
they bring to the task. Examiners are implicitly rather than
explicitly aware of what to look for in many cases.
This empirical research describes and makes
explicit the properties of English that examiners are
evidently sensitive to, and that underlie their
practical assessments and scores.
I.e. I believe that examiners are implicitly aware of
many of our criterial features, and laying them out
explicitly can be helpful.
Diagnosis and examiner training can become even
more accurate as a result.
An individual script, let us abbreviate it as S, by a
candidate taking an exam at level X can be searched
for the presence versus absence of criterial features
derived from all passing scripts at X, and from
those at the immediately lower level X-1 or at the
immediately higher level X+1.
Script S may contain several constructions and
lexical items that are features of B2 and higher
levels. This establishes that S is at least B2. The
script might contain no uniquely C-level features,
however. These levels are eliminated, therefore,
and B2 is supported. S may even contain a unique
B2 feature. This all supports B2.
Criterial features can also be used in the
preparation of diagnostic grammar tests that
assign students to their appropriate levels of
instruction based on their command of
English grammar.
For publishing
There are practical benefits for publishing
as well. New publishing materials can now
be written, incorporating the criterial
features of each level.
Publishers can also develop market-specific
ELT materials for different groups of
learners.
For learners whose first languages have no definite
and indefinite articles, English language materials
can be written that encourage explicit and implicit
learning in this area.
The learning stages, transfer effects and
error types characteristic of Spanish learners
of English can be reflected in textbooks and
teaching materials designed specifically for
them.
Similarly for Chinese learners, and Japanese
learners, and Russians, etc.
Theoretical interest of this work
The criterial features we are extracting from the
corpus also of interest to theoreticians studying
language acquisition. They provide a new set of
empirical patterns that can be used to inform
predictive and multi-factor theories of learning,
based on principles such as frequency, complexity
and transfer. See the CASP model of Hawkins &
Filipović (2011).
References
Briscoe, E., J. Carroll and R. Watson (2006) ‘The second release of the
RASP system’. In Proceedings of the COLING/ACL 2006 Interactive
Presentation Sessions, Sydney, Australia.
Council of Europe (2001) Common European Framework of
Reference for Languages: Learning, teaching, assessment. CUP,
Cambridge.
Diessel, H. (2004) The Acquisition of Complex Sentences. CUP,
Cambridge.
Eckman, F.R. (1984) ‘Universals, typologies, and interlanguage’.
In:W.E. Rutherford, ed., Language Universals and Second Language
Acquisition, John Benjamins, Amsterdam, 79-105.
Gell-Mann, M. (1992) ‘Complexity and complex adaptive systems’. In
J.A. Hawkins & M. Gell-Mann, eds., The Evolution of Human
Languages, Addison-Wesley, Redwood City, CA.
Hawkins, J.A. (2004) Efficiency and Complexity in Grammars. OUP,
Oxford.
Hawkins, J.A. & P. Buttery (2009) ‘Using learner language from
corpora to profile levels of proficiency: Insights from the English
Profile Programme’. In L. Taylor & C.J. Weir, eds., Language
Testing Matters, Proceedings of the 3rd ALTE Conference 2008,
CUP, Cambridge, 158-175.
Hawkins, J.A. & P. Buttery (2010) ‘Criterial features in learner corpora:
Theory and illustrations’, English Profile Journal 1.
Hawkins, J.A. & L. Filipović (2011) Criterial Features in L2 English:
Specifying the Reference Levels of the Common European
Framework. CUP, Cambridge.
MacWhinney, B. (2005) ‘A unified model of language acquisition’. In
J.F. Kroll & A.M.B. de Groot, eds., Handbook of Bilingualism:
Psycholinguistic Approaches, OUP, Oxford.
Milanovic, M. (2009) ‘Cambridge ESOL and the CEFR’, University of
Cambridge ESOL Examinations Research Notes 37: 2-5.
Tomasello, M. (2003) Constructing a Language: A Usage-based Theory
of Language Acquisition, Harvard University Press, Cambridge,
Mass.
Williams, C.A.M. (2007) ‘A preliminary study into verbal
subcategorisation frame usage in the CLC’, MS, RCEAL, University of
Cambridge.
Acknowledgments
The findings reported here would not have been possible without the
assistance of many collaborators. Special thanks to:
My co-authors Luna Filipović and Paula Buttery (see Hawkins & Filipović 2011,
Hawkins & Buttery 2009, 2010);
Annette Capel of CUP for help with the wordlist searches;
Ted Briscoe of the Cambridge Computer Lab and his colleagues for use of the
RASP parser;
Mike Milanovic & Nick Saville of Cambridge ESOL for theoretical and practical
guidance and financial support (see below);
Mike McCarthy of CUP and Penn State U for advice and input;
Roger Hawkey of Cambridge ESOL for advice and English Profile Programme coordination;
Lu Gram of the Computer Lab for help with error calculations and other searches;
Caroline Williams of Cambridge University for verb subcategorisation data;
and to
CUP’s computational linguists who prepared "The <#S>
Compleat|Complete</#S> Learner Corpus Document" 2006, from which the error
codes and examples sentences in slide 18 are taken.
Financial Support
The work reported here was made possible by
generous financial support from Cambridge
Assessment and from Cambridge University Press,
within the context of the Cambridge English Profile
Programme. This support is gratefully
acknowledged here.