and Low-Frequency Words

Download Report

Transcript and Low-Frequency Words

ENG 626
CORPUS APPROACHES TO LANGUAGE STUDIES
FL, AWL
Bambang Kaswanti Purwo
[email protected]
http://www.ed2go.com/elt_demo/3tv_demo/L02.htm
Word Levels
a 10-year-old native speaker of English has a vocabulary of
around 10,000 word families
A word family describes the base form of a word
plus its closely related inflected and derived forms.
For example, here's the word family for absent:
absent
absented
absenting
absents
absentee
absentees
absenteeism
absently
http://www.ed2go.com/elt_demo/3tv_demo/L02.htm
rough estimate:
the vocabulary size of native speakers
by adding 1,000 word families for each year of their life
up to the age of about 20
 a native speaker of English (a university graduate)
probably knows at least 20,000 words
goals for a learner of English as a second language
[20,000 words – very ambitious]
split up the vocabulary they need to learn into four levels:
high-frequency, low-frequency, academic, and technical
http://www.ed2go.com/elt_demo/3tv_demo/L02.htm
Word Frequency
frequency of a word: how often it occurs in a text
word most frequently used in written English: the
a frequency of around seven in every 100 words of text
= the occurs in almost every line of a written text
[when Paul Nation started studying vocabulary teaching]
to see how often each word occurred
counted a 1,000-word text word-by-word
• manually:
a whole weekend
• now with computers: less than a second
http://www.ed2go.com/elt_demo/3tv_demo/L02.htm
[original text: 1,906 words long, 532 different word types]
Word
Frequency
Word
Frequency
the
100
wide
1
of
74
will
1
to
58
without
1
and
56
work
1
words
46
working
1
a
41
write
1
in
39
yet
1
vocabulary
38
you
1
is
30
young
1
are
25
yourself
1
http://www.ed2go.com/elt_demo/3tv_demo/L02.htm
A small number of words cover a lot of the text.
• “running words” or “tokens”: all the words in a text,
including repeated words
• 11 running words, a and of occur twice
High- and Low-Frequency Words
• a relatively small group of words (around 2,000)
much more frequently used than other words in the lang
• the 2,000 high-frequency words include the function words
and content words.
Function words: articles (a, the), conjunctions (because, but,
although, and), prepositions (in, below, above), determiners
(each, every, this, those), numbers
Content words: nouns, verbs, adjectives, and adverbs
http://www.ed2go.com/elt_demo/3tv_demo/L02.htm
General Service List of English Words (GSL) by Michael West
• 2,000 word families
• lots of useful information about frequency and meanings
• it's been proven to work in graded readers.
graded readers: books specially written in a limited vocabulary
easy to read for learners of English (e.g. some books may have
300 words or less)
• the rest of vocabulary is made up of low-frequency words
• most conservative estimate: 120,000 low-frequency English
words (not including proper names)
• low-frequency words always a problem for lang. Ls n Ts
(unpredictable when they'll occur in a text)
• Ts need to deal with low- n high-frequency words differently
http://www.ed2go.com/elt_demo/3tv_demo/L02.htm
Academic and Technical Words
• academic vocabulary: additional high-frequency word list
known as the Academic Word List (AWL)
• to be learned after students acquire
the 2,000 high-frequency words
• AWL (developed by Averil Coxhead): 570 word families
(not in the most frequent 2,000 words); for anyone doing
academic study in almost any subject area
• technical vocabulary of particular subject areas
e.g., in computing: mouse, pixel, rom, and retrieve
http://www.ed2go.com/elt_demo/3tv_demo/L02.htm
Vocabulary Level
Number of Words
Text Coverage
high-frequency
2,000
70%
academic
570
5%
technical
1,000
20%
low-frequency
6,000
5%
Academic Word List (AWL)
– Averil Coxhead (1998) An Academic
Word List. English Language Institute Occasional Publication No. 18.
• developed at the School of Linguistics and Applied
Language Studies at Victoria University of Wellington, NZ
• a list of 570 words, excluding words in the most frequent
2000 words of English
• to be used for Ls at tertiary level study
• the headwords = the stem form of the words
• the headwords of the AWL are listed on pp. 7–11
• the word families of the AWL are listed in sublists 1–10
• the word family analyse, for example, include the regular
inflections of the verb: analysed, analysing, analyses
the derivations of the word: analysis, analyst, analysts,
analytical, analytically, etc.
the American spelling: analyze, analyzed, analyzes, analyzing
• the most frequently used member of the family is in italics
e.g. analysis the most common form the word family analyse
• the word families of the AWL selected from the words
in the Academic Corpus (AC), approx. 3,500,000 words
• the AC is a written corpus of academic English: journal
articles, book chapters, course workbooks, laboratory
manuals, and course notes
• four faculty sections: ▪ Arts
▪ Commerce
▪ Law
▪ Science
• each faculty section approx. 875,000 running words
• each faculty section divided into seven subject areas,
approx. 125,000 running words