Celebrating 60 years of Vocabulary Learning

Download Report

Transcript Celebrating 60 years of Vocabulary Learning

Dr. Charles Browne
Professor of Applied Linguistics
Meiji Gakuin University, Tokyo
[email protected]
A few current Corpus Projects…
1.
2.
3.
4.
5.
Business English Word List for NHK TV Show in Japan
EnglishCentral (a HUGE video corpus of authentic English)
New General Service List (CEC)
New Academic Word List (CEC)
TOEIC Vocabulary Study List (using past tests materials)
A few of my many online
vocabulary learning projects…
EFL Vocabulary Learning in Japan…
Frequency
600,000
・
・
・
The Negative Effect of “Test English”
・
84,168
・
exasperate
・
42,024
digress
・
・
25,537
・
abstain
・
23,371
・
・
14,641
・
emigrate
 PROBLEM: Students NEED to learn the first 5000
words of English to use English in the real word…
torment
・
・
5,000
・
chaos
 But entrance exams and high school textbooks force
students to memorize hundreds of low-frequency words…
・
・
4,441
・
・
・
2,566
ace
permission
bid
・
HFW
・
2,289
・
sum
・
 RESULT? High school students can’t deal with real
world English because they don’t know hundreds of the
most important high frequency words…
・
3
2
1
and
of
the
4
When reading or listening to a text, students will
of course will not know many words…
What percentage of words
do you think must be known
for them to be able to read
easily?
50% ?
75% ?
85% ?
95% ?
75% Coverage
1000 high frequency words
[ 19
missing words ]
…another possible problem with _____ _____ is how to
_____ learner _____ although research suggests that
_____ are a very _____ way to learn new words (Leitner,
1972, Mondria, 1994, Nation, 1990, 2001), students may
lose interest if _____ are the _____ _____ of doing _____
_____. There is a _____ _____ in the _____ classroom of
using games with a _____ purpose to increase and
_____ learner _____ (Ersoz , 2000, Uberman 1988,
Wright, Betteridge & Buckby, 1984), as well as lower the
learner _____ _____ (Asher, 1965, 1977, Dulay, Krashen
& Burt, 1982)
85% Coverage
2000 high frequency words
[ 13
missing words ]
…another possible problem with _____ _____ is how to
_____ learner _____ although research suggests that
_____ are a very efficient way to learn new words
(Leitner, 1972, Mondria, 1994, Nation, 1990, 2001),
students may lose interest if _____ are the _____ method
of doing _____ _____. There is a rich tradition in the
_____ classroom of using games with a communicative
purpose to increase and maintain learner _____ (Ersoz ,
2000, Uberman 1988, Wright, Betteridge & Buckby, 1984),
as well as lower the learner _____ _____ (Asher, 1965,
1977, Dulay, Krashen & Burt, 1982)
95% Coverage
5000 high frequency words
[4
missing words ]
…another possible problem with vocabulary _____ is
how to sustain learner motivation although research
suggests that _____ are a very efficient way to learn new
words (Leitner, 1972, Mondria, 1994, Nation, 1990, 2001),
students may lose interest if _____ are the sole method
of doing vocabulary review. There is a rich tradition in
the _____ classroom of using games with a
communicative purpose to increase and maintain learner
motivation (Ersoz , 2000, Uberman 1988, Wright,
Betteridge & Buckby, 1984), as well as lower the learner
affective filter (Asher, 1965, 1977, Dulay, Krashen & Burt,
1982)
Vocabulary Thresholds:
• Below 80%, reading comprehension is
almost impossible
(Hu & Nation, 2001)
• 95% coverage is the point at which learners
can read without the help of dictionaries
(Laufer, 1989)
Goals of the NGSL Project…
1.
2.
3.
4.
to update and greatly expand the size of the corpus used
(273 million words) compared to the limited corpus
behind the original GSL (about 2.5 million words), with
the hope of increasing the generalizability and validity of
the list
to create a NGSL of the most important high-frequency
words useful for second language learners of English
which gives the highest possible coverage of English
texts with the fewest words possible.
to make a NGSL that is based on a clearer definition of
what constitutes a word
to be a starting point for discussion among interested
scholars and teachers around the world, with the goal of
updating and revising the list based on this input (in
much the same way that West did with the original
Interim version of the GSL)
Original GSL in a nutshell…






West’s 1953 GSL was actually a more fully developed version of
Faucett’s 1936 “Interim Report on Vocabulary Selection” (sponsored by
the Carnegie Corporation)
Contributors included many famous linguists such as Thorndike, Horn,
Maki, Palmer and West
Based on a 2.5 million word hand collected corpus (later increased to 5
million words)
Combined objective (frequency) and subjective (teacher intuition) criteria
Approximately 2200 words giving about 80% coverage in general texts
No systematic attempt to define what a word was:
“no attempt has been made to be rigidly consistent in the method used for displaying the words:
each word has been treated as a separate problem, and the sole aim has been clearness” (West,
1953, page viii)
General Service Lists GSL (West, 1953)
http://jbauman.com/aboutgsl.html#1953
Academic Word List AWL (Coxhead 2000)
http://www.victoria.ac.nz/lals/resources/academicwordlist/
Getting AWL/GSL lists w/definitions & sound files…
I made a few GSL/AWL apps
and have made all the context
available for free to teachers and
researchers. Please contact me
if you need any of the following
for the GSL or AWL:
-
Word lists
Parts of speech
Definitions in easy English
Definitions in Japanese
Sound files for pronunciation
of words
[email protected]
Original GSL created in 1930s…
2.5m corpus may have had too many agriculture and religion texts?
AGRICULTURE
 plow
 mill
 spade
 cultivator
SEA TRAVEL
 sailor
 oar
 vessel
 merchant
RELIGION
 kingdom
god
 devil
 mercy
 bless
 fellowship
 preach
 sacred
 worship
 holy
 pray
 heaven
 grace
 pupil
 church


Lord
NOT AS IN USE?
 telegraph
 chimney
 coal
 cottage
 gaiety
 shilling
 headdress
 saucer
 woolen
 amongst
Starting Point for NGSL….
Access to Cambridge’s more modern 2 BILLION word corpus
CEC corpora used for preliminary analysis of NGSL
Corpus
Newspaper
Academic
Learner
Fiction
Journals
Magazines
Non-Fiction
Radio
Spoken
Documents
TV
Total
Tokens
748,391,436
260,904,352
38,219,480
37,792,168
37,478,577
37,329,846
35,443,408
28,882,717
27,934,806
19,017,236
11,515,296
1,282,909,322
Problems…
Newspaper subsection was too large
and dominated the frequencies
 Newspaper subsection in CEC had too
much of a bias towards financial terms
 Academic subcorpus of CEC not really
related to needs of General English for
2nd language learners

Corpus Development & WYPIIWYGO….
Balancing the NGSL Corpus…
CEC corpora included in final analysis for NGSL
Corpus
Learner
Fiction
Journals
Magazines
Non-Fiction
Radio
Spoken
Documents
TV
Total
Tokens
38,219,480
37,792,168
37,478,577
37,329,846
35,443,408
28,882,717
27,934,806
19,017,236
11,515,296
273,613,534*
*273 million word subsection used is 100x larger than original GSL corpus…
Next steps…
Removed proper nouns
 Removed numbers, days of the week,
months of the year, etc.
 Used statistical procedures to combine
the frequencies from the various subcorpora while adjusting for differences in
their relative sizes
 Had meetings with Paul Nation to review
list in relation to other frequency list and
add/delete words deemed appropriate

Input from Paul Nation – Thanks!
Comparing the GSL and NGSL:
Apples and Oranges?
Comparing the GSL and NGSL:
“To be or not to be, that is the question.”
• 10 Tokens
to, to, be, be, or, not, that, is, the, question
• 8 Types
to, be, or, not, that, is, the, question
• 7 Lemmas
to, be, or, not, that, the, question
Comparing the GSL and NGSL:
“To be or not to be, that is the question.”
Rank
1
2
3
3
3
3
3
Word
be
to
not
or
question
that
the
Tokens
3
2
1
1
1
1
1
Coverage
30%
20%
10%
10%
10%
10%
10%
Comparing the GSL and NGSL:
The assumption in Word Families is that if the
headword is known, so are all derived forms…
ACCEPT
ACCEPTABILITY
ACCEPTABLE
UNACCEPTABLE
ACCEPTANCE
ACCEPTED
ACCEPTING
ACCEPTS
Comparing the GSL and NGSL:
But are they?
Word
BNCf Difficulty
accept
202 -2.923
acceptable
36 -0.510
unacceptable 12 -0.216
acceptance
27
0.570
Comparing the GSL and NGSL:
THE WORD FAMILY APPROACH (Bauer and Nation, 1993)
Level 1
A different form is a different word. Capitalization is ignored.
Level 2
Regularly inflected words are part of the same family.
Level 3 (10 affixes)
-able, -er, -ish, -less, -ly, -ness, -th, -y, non-, un-, all with restricted uses
Level 4 (10 affixes)
-al, -ation, -ess, -ful, -ism, -ist, -ity, -ize, -ment, in-, all with restricted uses.
Comparing the GSL and NGSL:
Level 5 (48 affixes)
-age (leakage), -al (arrival), -an (American), -ance (clearance), -ant
(consultant), -ary (revolutionary), -atory (confirmatory), -dom (kingdom:
officialdom), -eer (black marketeer), -en (wooden), -en (widen), -ence
(emergence, -ent(absorbent), -ery (bakery: trickery), -ese (Japanese;
officialese), -esque (picturesque, -ette (usherette; roomette), -hood
(childhood), -i (Israeli), -ian (phonetician; Johnsonian), -ite (Paisleyite; also
chemical meaning), -let (coverlet), -ling (ducking), -ly (leisurely), -most
(topmost), -ory (contradictory), -ship (studentship), -ward (homeward), ways (crossways), -wise (endwise; discussion-wise), anti- (anti-inflation),
ante- (anteroom), arch- (archbishop), bi- (biplane), circum(circumnavigate), counter- (counter-attack), en- (encage; enslave), ex- (expresident), fore- (forename), hyper- (hyperactive), inter- (interweave), mid(mid-week), mis- (misfit), neo- (neo-colonialism), post- (post-date), pro(pro-British), semi- (semi-automatic), sub- (subclassify; subterranean).
Comparing the GSL and NGSL:
Level 6 (10 affixes)
-able, -ee, -ic, -ify, -ion, -ist, -ition, -ive, -th, -y
Level 7
Classical roots
Comparing the GSL and NGSL:
However, the GSL is not consistent in defining what
to count as a word.
“no attempt has been made to be rigidly consistent in the method used for
displaying the words: each word has been treated as a separate problem,
and the sole aim has been clearness” (West, 1953, page viii)
To get some consistency, Bauman and Culligan (1995) grouped the
original GSL headwords using Level 4 affixes. Then they ranked the words
according to frequencies from the Brown Corpus.
Subsequently, Nation released a word list with the program Range that
grouped words up to Level 6 affixes, and also included numbers, days of
the week, months of the year, and metric units of measurement.
Comparing the GSL and NGSL:
NGSL: A Modified Lexeme Approach
• All inflected forms for all parts of speech plus
the plural of the gerund
• Includes both British & American spellings
• Examples
– accept: accepts, accepted, accepting, acceptings
– acceptable: acceptables
– paint: paints, painted, painting, paintings
Comparing the GSL and NGSL:
Apples and Oranges no longer…
When both lists are lemmatized, the NGSL provides far more coverage
with far fewer words, one of the chief goals of this project…
A Dedicated Website…
www.newgeneralservicelist.org
List downloadable in many forms
www.newgeneralservicelist.org
Headword list…
List downloadable in many forms
www.newgeneralservicelist.org
Lemmatized list…
List downloadable in many forms
www.newgeneralservicelist.org
List with definitions in easy English…
List downloadable in many forms
www.newgeneralservicelist.org
List with raw data… (coming soon!)
Now available on free Quizlet Program…
www.quizlet.com
Now available on free Quizlet Program…
www.quizlet.com
Quizlet both intuitive and fun…
www.quizlet.com
Quizlet both intuitive and fun…
www.quizlet.com
Quizlet both intuitive and fun…
www.quizlet.com
Quizlet both intuitive and fun…
www.quizlet.com
Quizlet both intuitive and fun…
www.quizlet.com
Quizlet both intuitive and fun…
www.quizlet.com
Quizlet both intuitive and fun…
www.quizlet.com
Quizlet both intuitive and fun…
www.quizlet.com
Quizlet both intuitive and fun…
www.quizlet.com
Quizlet both intuitive and fun…
www.quizlet.com
Quizlet both intuitive and fun…
www.quizlet.com
Soon to be available on WordEngine…
www.wordengine.com
New Cambridge Text Series Using NGSL
(both in text and online)
Screen Shot 2013-10-09 at 3.34.00 PM
Links to NGSL Resources…
Free Graded Text Editor & Analysis Tool
www.er-central.com/ogte/
Free Graded Text Editor & Analysis Tool
www.er-central.com/ogte/
Free Text Helper Tool
identifies/gets meanings/gives learning
tools for words out of your level…
Text Helper in Action…
Text Helper in Action…
Text Helper in Action…
Text Helper in Action…
much more to come…
Thank you!
Dr. Charles Browne
Professor of Applied Linguistics
Meiji Gakuin University, Tokyo
[email protected]