key word analysis

Download Report

Transcript key word analysis

ENG 626
CORPUS APPROACHES TO LANGUAGE STUDIES
lexico-grammatical profiles
Bambang Kaswanti Purwo
[email protected]
[Hunston ch.1]
» corpus by itself – can do nothing at all
just a store of used language
» corpus access software
▪ re-arrange the store
▪ enable observations of various kinds to be made
▪ process data from corpus in three ways:
showing frequency
phraseology
collocation
[O’Keefee 1.5]
basic corpus linguistic techniques
using standard software such as
▪ concordancing
Wordsmith Tools (Scott 1999)
▪ word frequency counts
Monoconc Pro (2000)
▪ key word analysis
▪ cluster analysis
» concordancing
▪ a core tool in CL
▪ to find every occurrence of a particular word or phrase
(the search word or phrase  “node”)
» word frequency counts or wordlists
▪ rapid calculation of word frequency lists (wordlists)
for any batch of texts
▪ with a rank ordering of all the words in order of
frequency
» key word analysis
▪ words whose frequency is unusually high in comparison
with some norm
▪ not usually the most frequent words in a text,
but the more “unusually frequent”
▪ useful way of characterizing a text or a genre
[key word analysis]
▪ potential applications in the areas of
forensic linguistics
stylistics
content analysis
text retrieval
[in ELT] to create word lists (LSP Programs)
» cluster analysis
▪ how language systematically clusters into
combinations of words or “chunks”
(e.g. I mean, this that and the other, etc.)
▪ how this contribute to the description of the
vocabulary of a language
(to help Ls acquire vocab n develop fluency)
▪ 2-, 3-, 4-, 5-, or 6-word combinations
lexico-grammatical profiles
[when looking at concordance lines]
 create a “lexico-grammatical profile”
of a word and its contexts of use
1. collocates
▪ which word(s) occur most frequently w/ statistical
significance in the word’s environment?
2. chunks/idioms
▪ does the word form part of any recurrent chunks?
▪ is the word idiom-prone?
▪ what types occur (e.g. binominal or trinominals)?
(rough n ready; ready, willing and able)
3. syntactic restrictions
▪ are there syntactic patterns that restrict the word?
(e.g. prepositions that go with the word?)
▪ what are the typical clause-position (initial/medial/final)?
▪ are there any tense/aspect restrictions
4. semantic restrictions
▪ are there semantic restrictions?
(e.g. applied to [+HUM] only, never with an intensifier)
5. (semantic) prosody
▪ words, as well as having typical collocates (e.g. blonde
collocates w/ hair, not w/ car) tend to occur in particular
environments: positive or negative
۰ 90% of collocates of cause are negative
(accident, cancer, commotion, crisis, delay)
۰ provide collocates with words of positive connotation
(care, food, help, jobs, relief, support)
[O’Keefee Ch. 3]
traditional view of vocabulary:
vocabulary = all the single words of language
over years, in the light of corpus analysis:
open the criteria to search for recurrences of
more than one word
(i.e. pairs and trios of words, even larger groupings)
▪ “chunks” like a couple of , at the moment, all the time
as frequent as single words (possible, alone, fun, expensive)
▪ single words has been widely considered to be the basic unit
units of more than one word (phrasal verbs, compound, idioms)
 higher level of proficiency
exceptions:
۰ greetings and everyday expressions
how are things? see you tomorrow, thanks very much
۰ specialized functional phrases
Happy New Year, good luck
۰ common prepositional phrases
at the weekend, on the first of May
۰ a high-frequency compounds
bus stop, whiteboard
» collocation
۰ groupings of more than one word
+ unitary of meaning and specialized functions
۰ statistical tendency of words to co-occur (Hunston 2001:12)
۰ collocations are not absolute or deterministic, but are
probabilistic events (resulting from repeated combinations
used n encountered by speakers of any language
e.g. strong tea, powerful cars
۰ common verbs display distinct preferences for what they
combine with: things turn or go grey, brown, white
people go (*turn) mad, insane, bald, blind
» strings of words in corpora
▪ CL: it is lexis, rather than syntax, which accounts for the
organization and patterning of language
▪ two fundamental principles at work in the creation of
meaning: the “idiom principle”
the “open choice principle”
▪ syntax, the slots where there are choices to be made (the
open choice principle) far from being primary; only brought
into service occasionally, a kind of “glue” to cement the
lexical chunks together
▪ form n meaning work hand in hand
Cambridge International Corpus:
[100 examples of be touched by]
14% ‘experience physical contact’
86% nonphysical meaning,
80% of which ‘emotionally affected by’
 touch [+passive]: nonphysical senses
» phraseology and idiomaticity
▪ contributors to the understanding of multi-word vocabulary:
۰ corpus linguistics
۰phraseology and the study of idiomaticity (for Ts n Ls)
▪ different terminologies to describe
the phenomena of multi-word
vocabulary or chunks
۰ lexical phrases
۰ prefabricated patterns
۰ routine formulae
۰ formulaic sequences
۰ lexicalized stems
۰ chunks
۰ (restricted) collocations
۰ fixed expressions
۰ multi-word units/expressions
۰ idioms
۰ etc.
multi-word
phenomena
 fundamental
feature of
language use
[Hunston Ch.1]
concordance lines – many instances of use of a word or phrase
“latent patterning” – phraseology
phraseology vs. how Ts explain “confusing adjectives” such as
interested and interesting
▪ “the minimal pair” the boy is interested n the boy is interesting
▪ concordance lines:
frequent pattern of
۰ interested: “someone is interested in something”
۰ interesting: always preceded by a noun:
“an interesting thing”, “what is interesting is …”,
“it is interesting to see …”
reference books have difficulty explaining between n through
a phraseological approach:
between: frequently found after nouns such as
difference, distinction, gap, contrast, conflict, n quarrel
relationship, agreement, comparison, meeting, contact,
correlation
through: frequently found after verbs such as
go, pass, come, run, fall, n lead
“semantic functions”
between has a “location” meaning
the channel between Africa n Sicily
earnings between L5 and L6 a week
through has an “instrumental meaning”
NSs often recognize if a phraseology is unusual
to explain why that is the case is not easy
“require to be done” seems wrong to Owen’s (1996) intuition
Bank of English:
“REQUIRE to be” fairly frequently
the past participle verbs to follow [+SPEC],
Not a general verb such as do.
These roses require to be pruned each spring
require to be done very few (3 out of 302)
 Owen’s intuitions backed up by evidence of the corpus
(on phraseology, not grammatical grounds)
What’s the contribution of NS’s intuition?
make generalizations from a mass of specific info in a corpus
e.g. Bank of English: CONTACT – verb + noun (Sripicharn 1998)
• typically used with “official” persons (office, newspaper, etc.)
contact your travel agent
• also found when the person a family member or a friend
she had no contact with her father
 the difference between two kinds of noun
(travel agent n father) is important (Sripicharn 1998)
REFERENCES
Hunston, Susan. 2002. Corpora in Applied Linguistics.
Cambridge UP.
O’Keeffe, Anne; Michael McCarthy; Ronaldo Carter. 2007.
From Corpus to Classroom: Language Use and Language
Teaching. Cambridge UP.