key word analysis
Download
Report
Transcript key word analysis
ENG 626
CORPUS APPROACHES TO LANGUAGE STUDIES
lexico-grammatical profiles
Bambang Kaswanti Purwo
[email protected]
[Hunston ch.1]
» corpus by itself – can do nothing at all
just a store of used language
» corpus access software
▪ re-arrange the store
▪ enable observations of various kinds to be made
▪ process data from corpus in three ways:
showing frequency
phraseology
collocation
[O’Keefee 1.5]
basic corpus linguistic techniques
using standard software such as
▪ concordancing
Wordsmith Tools (Scott 1999)
▪ word frequency counts
Monoconc Pro (2000)
▪ key word analysis
▪ cluster analysis
» concordancing
▪ a core tool in CL
▪ to find every occurrence of a particular word or phrase
(the search word or phrase “node”)
» word frequency counts or wordlists
▪ rapid calculation of word frequency lists (wordlists)
for any batch of texts
▪ with a rank ordering of all the words in order of
frequency
» key word analysis
▪ words whose frequency is unusually high in comparison
with some norm
▪ not usually the most frequent words in a text,
but the more “unusually frequent”
▪ useful way of characterizing a text or a genre
[key word analysis]
▪ potential applications in the areas of
forensic linguistics
stylistics
content analysis
text retrieval
[in ELT] to create word lists (LSP Programs)
» cluster analysis
▪ how language systematically clusters into
combinations of words or “chunks”
(e.g. I mean, this that and the other, etc.)
▪ how this contribute to the description of the
vocabulary of a language
(to help Ls acquire vocab n develop fluency)
▪ 2-, 3-, 4-, 5-, or 6-word combinations
lexico-grammatical profiles
[when looking at concordance lines]
create a “lexico-grammatical profile”
of a word and its contexts of use
1. collocates
▪ which word(s) occur most frequently w/ statistical
significance in the word’s environment?
2. chunks/idioms
▪ does the word form part of any recurrent chunks?
▪ is the word idiom-prone?
▪ what types occur (e.g. binominal or trinominals)?
(rough n ready; ready, willing and able)
3. syntactic restrictions
▪ are there syntactic patterns that restrict the word?
(e.g. prepositions that go with the word?)
▪ what are the typical clause-position (initial/medial/final)?
▪ are there any tense/aspect restrictions
4. semantic restrictions
▪ are there semantic restrictions?
(e.g. applied to [+HUM] only, never with an intensifier)
5. (semantic) prosody
▪ words, as well as having typical collocates (e.g. blonde
collocates w/ hair, not w/ car) tend to occur in particular
environments: positive or negative
۰ 90% of collocates of cause are negative
(accident, cancer, commotion, crisis, delay)
۰ provide collocates with words of positive connotation
(care, food, help, jobs, relief, support)
[O’Keefee Ch. 3]
traditional view of vocabulary:
vocabulary = all the single words of language
over years, in the light of corpus analysis:
open the criteria to search for recurrences of
more than one word
(i.e. pairs and trios of words, even larger groupings)
▪ “chunks” like a couple of , at the moment, all the time
as frequent as single words (possible, alone, fun, expensive)
▪ single words has been widely considered to be the basic unit
units of more than one word (phrasal verbs, compound, idioms)
higher level of proficiency
exceptions:
۰ greetings and everyday expressions
how are things? see you tomorrow, thanks very much
۰ specialized functional phrases
Happy New Year, good luck
۰ common prepositional phrases
at the weekend, on the first of May
۰ a high-frequency compounds
bus stop, whiteboard
» collocation
۰ groupings of more than one word
+ unitary of meaning and specialized functions
۰ statistical tendency of words to co-occur (Hunston 2001:12)
۰ collocations are not absolute or deterministic, but are
probabilistic events (resulting from repeated combinations
used n encountered by speakers of any language
e.g. strong tea, powerful cars
۰ common verbs display distinct preferences for what they
combine with: things turn or go grey, brown, white
people go (*turn) mad, insane, bald, blind
» strings of words in corpora
▪ CL: it is lexis, rather than syntax, which accounts for the
organization and patterning of language
▪ two fundamental principles at work in the creation of
meaning: the “idiom principle”
the “open choice principle”
▪ syntax, the slots where there are choices to be made (the
open choice principle) far from being primary; only brought
into service occasionally, a kind of “glue” to cement the
lexical chunks together
▪ form n meaning work hand in hand
Cambridge International Corpus:
[100 examples of be touched by]
14% ‘experience physical contact’
86% nonphysical meaning,
80% of which ‘emotionally affected by’
touch [+passive]: nonphysical senses
» phraseology and idiomaticity
▪ contributors to the understanding of multi-word vocabulary:
۰ corpus linguistics
۰phraseology and the study of idiomaticity (for Ts n Ls)
▪ different terminologies to describe
the phenomena of multi-word
vocabulary or chunks
۰ lexical phrases
۰ prefabricated patterns
۰ routine formulae
۰ formulaic sequences
۰ lexicalized stems
۰ chunks
۰ (restricted) collocations
۰ fixed expressions
۰ multi-word units/expressions
۰ idioms
۰ etc.
multi-word
phenomena
fundamental
feature of
language use
[Hunston Ch.1]
concordance lines – many instances of use of a word or phrase
“latent patterning” – phraseology
phraseology vs. how Ts explain “confusing adjectives” such as
interested and interesting
▪ “the minimal pair” the boy is interested n the boy is interesting
▪ concordance lines:
frequent pattern of
۰ interested: “someone is interested in something”
۰ interesting: always preceded by a noun:
“an interesting thing”, “what is interesting is …”,
“it is interesting to see …”
reference books have difficulty explaining between n through
a phraseological approach:
between: frequently found after nouns such as
difference, distinction, gap, contrast, conflict, n quarrel
relationship, agreement, comparison, meeting, contact,
correlation
through: frequently found after verbs such as
go, pass, come, run, fall, n lead
“semantic functions”
between has a “location” meaning
the channel between Africa n Sicily
earnings between L5 and L6 a week
through has an “instrumental meaning”
NSs often recognize if a phraseology is unusual
to explain why that is the case is not easy
“require to be done” seems wrong to Owen’s (1996) intuition
Bank of English:
“REQUIRE to be” fairly frequently
the past participle verbs to follow [+SPEC],
Not a general verb such as do.
These roses require to be pruned each spring
require to be done very few (3 out of 302)
Owen’s intuitions backed up by evidence of the corpus
(on phraseology, not grammatical grounds)
What’s the contribution of NS’s intuition?
make generalizations from a mass of specific info in a corpus
e.g. Bank of English: CONTACT – verb + noun (Sripicharn 1998)
• typically used with “official” persons (office, newspaper, etc.)
contact your travel agent
• also found when the person a family member or a friend
she had no contact with her father
the difference between two kinds of noun
(travel agent n father) is important (Sripicharn 1998)
REFERENCES
Hunston, Susan. 2002. Corpora in Applied Linguistics.
Cambridge UP.
O’Keeffe, Anne; Michael McCarthy; Ronaldo Carter. 2007.
From Corpus to Classroom: Language Use and Language
Teaching. Cambridge UP.