FAE ESP Materials Derived from a Web-based Corpus

Download Report

Transcript FAE ESP Materials Derived from a Web-based Corpus

TESL Ontario’s 36th Annual Conference
“Celebrating the International Year of Languages”
Friday, November 14, 2008, 8.30 – 9.20 am (FAE)
Sheraton Centre Toronto, Canada
Pisamai Supatranont, Ph.D.
Rajamangala University of Technology Lanna Tak, Thailand
[email protected]
Presentation Outline
Background and rationale
Research questions
Research methodology
Data analysis and findings
Discussion
Background of the Study
The study is:
 Funded by
 Conducted in February – July 2008 at
 Under supervision of Assoc. Prof. David Hall
 With consultation of Prof. Pam Peters
The researcher is from RMUTL Tak, Thailand
Rationale of the Study
Cause
Influence of Information and Communication Technology (ICT)
in academic and professional settings
Effect
To get good jobs, university students both in ICT
and non-ICT need English to communicate in
ICT working environment.
ESP Materials Development
1. Limitation of relevant ESP textbooks
Although specialized texts in ICT are abundant, they are
not suitable for unmodified and unsupported use directly
in ESP classes because of their difficulty for EFL students.
 Need for teacher-designed materials in ESP teaching.
ESP Materials Development
2. Difference of students’ background knowledge
ICT students:
 posses some specialized knowledge and skills to design
hardware and software.
 need English to communicate their knowledge in academic
and professional contexts.
Non-ICT students:
 have little knowledge of ICT
 need ICT knowledge as computer users.
 need to learn both basic ICT concepts and English
to communicate in business companies or organizations.
Different learning needs = same level of English
= different level of specialized knowledge
 Need for different specialized contents to facilitate ESP learning
ESP Materials Development
3. Insufficiency of EFL students’ lexical knowledge
 It was found that undergraduate students in EFL countries
e.g. in Thailand (Supatranont, 2005), Oman (Cobb and Horst, 2001),
and Indonesia (Nurweni and Read, 1999) have limited lexical
knowledge and less proficient in English than what is expected for
students at a university level.
 In Supatranont’s study (2005), lexical knowledge of RMUTL
students was found below the lexical threshold to academic study.
With limited vocabulary size of academic words, students cannot
cope well with the specialized texts because most frequent words in
these texts consist of academic and sub-technical words (Mundraya,
2006).
 Academic and technical words should be integrated as main
vocabulary components of language input.
ESP Materials Development

Lexical threshold to academic study is composed of two wordlists:
(Nation, 2001; Coxhead & Nation, 2001; Cobb & Horst, 2001; and Nation & Waring, 1997)
General service list (GSL)
= 2,000 high frequency words (West, 1953)
(and) Academic word list (AWL) = 570 academic words (Coxhead, 1998)
Knowledge of these two wordlists is estimated to provide
over 90% coverage of academic texts in all disciplines.
To read academic texts comprehensibly, 95% coverage of
words known in that text is the minimum point (Laufer, 1988).
Academic vocabulary in this study is based on the GSL and AWL
(downloaded from http://www.uefap.com/vocab/vocfram.htm)
Objectives of the Study
1. To identify high-frequency language items in ICT specialized
texts by focusing on lexical areas:
 academic words: based on GSL and AWL
 technical words: words with particular meaning in ICT
 technical collocations: noun phrases with particular
meaning in ICT
2. To obtain a set of language input to design a course material
for teaching English for ICT to non-ICT EFL students
by using a corpus-based analysis method.
Research Questions
1
What are high-frequency academic words
in ICT specialized texts?
2
3
What are high-frequency technical words
in ICT specialized texts?
What are high-frequency technical collocations
in ICT specialized texts?
Research Methodology
The methodology is divided into three main steps
Text selection
Corpus Compilation
Corpus-based analysis
Corpus compilation
Text Selection
Study a corpus with
Text-analysis software
Research Methodology
Text Selection
 Texts selected exclusively from web-based tutorials in ICT
 Authors: mostly lecturers in universities and tutorial centers.
 5 topics concerning fundamental ICT knowledge:





Computer hardware
Operating systems and graphical user interfaces (OS and GUIs)
Basic application software
Multimedia software
Internet software
 3 text types: articles, manuals and advertisements (of hardware)
Research Methodology
Number of Text Selection
25
20
15
Number of Files
10
5
0
Hardware
OS and GUIs
Application
Multimedia
Internet
Articles
20
20
20
20
20
Manuals
25
20
20
20
20
Advertisement
25
-
-
-
-
Total files = 230
Research Methodology
Number of words
1500-2000 w/article
700-1000 w/manual
200-500 w/ad
40,000
35,000
30,000
25,000
Number of words 20,000
15,000
10,000
5,000
0
Hardware
OS and GUIs
Application
Multimedia
Internet
Articles
36,192
35,525
37,857
38,011
37,925
Manuals
20,171
17,535
18,889
18,597
18,353
Advertisement
8,423
-
-
-
-
Total words = 287,478
Research Methodology
Design of the EICT Corpus
Size
287,478 words
Text files
230 files
Word types
6,064 word types
Medium
Written
Language
Texts written in English
Authorship
Texts written by experts in academic institutions, tutorial centers or manufactures
Contents
Fundamental knowledge of ICT
Text topics
5 topics:
1. Computer hardware
2. Operating system and graphical user interfaces (OS and GUIs)
3. Basic application software
4. Multimedia software
5. Internet software
Text types
1.Articles: passages including definitions and descriptions
2.Manuals: instructions for operating hardware and software
3.Advertisements: details of features and quality computer hardware products
Research Methodology
Text-analysis Software: WordSmith Tools




WordSmith Tools version 5.0
Developed by Mike Scott (2007)
University of Liverpool, UK
www.lexically.net/wordsmith/index.html
Research Methodology
Reference Corpus
According to Bowker and Pearson (2002), Hunston (2002), and Scott (2001):
 To ensure the word’s ‘keyness’, the frequency wordlist of
a corpus should be compared with a larger reference corpus.
 With Log Likelihood Formula:
Unusually frequent or infrequent words can be identified for
their ‘keyness’ and the significance difference (p value) i.e.:
 Words with positive keyness => occurs unusually more often.
 Words with negative keyness => occurs unusually less often.
Research Methodology
Reference Corpus: BNC
 British National Corpus (BNC)
 A general corpus of 100 million words
 Samples of written and spoken language from a wide range of sources
 BNC website is http://www.natcorp.ox.ac.uk
 In the present study, BNC wordlist is from WordSmith Tools
Data Analysis and Findings
The method of analysis is adapted from the suggestions of
Bowker and Pearson (2002), and Scott (2001).
The method and findings are described according to the research questions.
1. What are high-frequency academic words in ICT specialized texts?
2. What are high-frequency technical words in ICT specialized texts?
3. What are high-frequency technical collocations in ICT specialized texts?
Data Analysis and Findings
Question 1: What are high-frequency academic words in ICT specialized texts?
1.1 Download GSL and AWL wordlists from the website of the University
of Hertfordshire, UK at http://www.uefap.com/vocab/vocfram.htm.
Use these words as academic word candidates.
1,937 GSL Headwords
570 AWL Headwords
Data Analysis and Findings
1.2 Build a wordlist of the EICT Corpus, resulting totally in 6064 word types.
1.3 Use academic word candidates to mark all GSL and AWL in the corpus.
Lemmatize them, resulting in 941 headwords of academic word candidates
with ≥ 5 occurrences.
Sort in alphabetical order
Data Analysis and Findings
1.4 Compare the list of academic word candidates with the list of BNC,
using Log Likelihood Formula at the p value 0.000001.
The software is set:
 To process with full lemma
 To display only words
with positive keyness
Data Analysis and Findings
Finding 1
From 941 words, 343 words with ≥ 5 occurrences, positive keyness, and
significance difference are cropped up as high-frequency academic words.
Excluding function words
Sort according to keyness
Sort in alphabetical order
Data Analysis and Findings
Finding 1
It was found that:
general words + technical sense in specialized texts.
From 343 words in total:
95 words e.g. burn, window, word etc. convey particular meanings
in ICT different from their meanings in general texts.
Simple & familiar (but) => students’ confusion
when interpreting incorrectly
As found in previous studies in related fields of ICT.
For example:
Lam’s study (in Chen and Ge, 2007) reported computer science students’
confusion when interpreting the word ‘field’ in the agricultural sense
rather than as an options in a database program.
These words were classified as semi-technical words.
Data Analysis and Findings
Finding 1
All 343 high-frequency academic words were classified into 2 groups.
1. 248 academic words:
e.g. access, compute, illustrate indicate, identify, manipulate,
term, category, feature, occurrence, symbol etc.
2.
95 semi-technical words:
2.1 Words with technical senses or particular meaning
e.g. burn, drive, refresh, card, domain, engine, memory, field
application, character, Word, document, window etc.
2.2 Words in mathematics, geometric shape and diagram
e.g. add, multiply, divide, axis, table, row, degree etc.
2.3. Simple words frequently used as command or method
e.g. edit, enable, paste, shift, help, enter, drag, drop etc.
Data Analysis and Findings
Question 2: What are high-frequency technical words in ICT specialized texts?
Similarly to the method in Question 1:
2.1 Build word frequency list of the whole EICT Corpus.
2.2 Exclude all function words and academic words in finding 1.
2.3 Lemmatize the remaining words, resulting in 938 headwords.
2.4 Keep only words with ≥ 5 occurrences and technical meanings.
2.5 Compare the resulting wordlist with BNC wordlist, using
Log Likelihood at the p value 0.000001.
Data Analysis and Findings
Finding 2
From 938 words,
358 words/acronyms
with
≥ 5 occurrences,
positive keyness,
and significance
difference are
selected to be highfrequency technical
words.
Sort according to keyness
Data Analysis and Findings
Finding 2
All 358 resulting words are classified into 5 groups:
1.
106 words with particular meanings (different from general meaning)
e.g. cache, cookies, bus, port, bitmap, chip, cursor, pixel etc.
2. 87 words referring to basic program, devices, command, keys
e.g. spreadsheet, database, notepad, wizard, backspace etc.
3. 55 abbreviations, acronyms, and extensions
e.g. ASCII, WYSIWYG, ALU, ROM, RAM, OS, RGB, ESC, ALT
txt, doc, gif, wav, http, html, www etc.
4.
5.
17 words in mathematics, geometric shapes and diagram
e.g. equation, ellipse, polygon, cell, column, intersection etc.
92 sub-technical terms and frequent words in ICT
e.g. alignment, compression, directory, multimedia, playlist etc.
Data Analysis and Findings
Question 3: What are high-frequency technical collocations
in ICT specialized texts?
3.1 Set the software:

To produce concordances.

To display 2-5 word clusters
with ≥ 5 co-occurrences

To compute the strength of
relation between words,
using Mutual Information (MI)
≥ 5.000
Data Analysis and Findings
3.2 On the cluster tab,
select only the 2-5 clusters with technical meaning and frequent uses.
Data Analysis and Findings
3.3 Compute the relation value, on the collocate tab.
Sort according to the relation value
Data Analysis and Findings
Finding 3
3.4 Select only the collocations with ≥ 5 occurrences,
MI scores ≥ 5.000, and distribution in ≥ 3 text files.
335 collocates were selected as technical collocations
=> noun phrases with technical meanings
e.g. mail merge
operating system (OS)
uniform resource location (URL)
hypertext markup language (html)
random access memory (RAM)
wide area network (WAN)
etc.
Discussion
Significance of the study:
 Provide an overall idea about language description of
English for ICT.
 Provide a clear goal of language learning for serving
particular learning needs.
In materials design, teacher knows which language items should
be focused on in designing lessons and which ones are already
known by the students.
Apart from typical teaching materials, a corpus itself can also be
a great source of learning.
It makes possible for students’ direct access to the corpus, which
can promote data-driven learning.
References
Bowker, L. and Pearson, J. (2002). Working with Specialized Language: A Practical
Guide to Using Corpora. USA and UK: Routledge.
Chen, Q., & Ge, G. (2007). A corpus-based lexical study on frequency and distribution
of Coxhead’s AWL word families in medical research articles (RAs). English
for Specific Purposes, 26, 502-514. Elsevier Science.
Cobb, T. and Horst, M. (2001). Reading academic English: Carrying learners across
the lexical threshold. In Flowerdew, J. and Peacock, M., (eds.) Research
Perspectives on English for Academic Purposes. pp. 315-329. UK: Cambridge
University Press.
Coxhead, A. (1998). An Academic Word List. ELI occasional publication. No.18. Victory
University of Wellington, New Zealand.
Coxhead, A. and Nation, P. (2001). The specialized vocabulary of English for academic
purposes. In Flowerdew, J. and Peacock, M. (ed.) Research Perspectives on
English for Academic Purposes. pp. 252-267. UK: Cambridge University Press.
Hunston, S. 2002. Corpora in Applied Linguistics. Cambridge: Cambridge University Press.
Laufer, B. 1989. What percentage of text-lexis is essential for comprehension?
Cited in Cobb, T., and Horst, M. Reading academic English: Carrying learners
across the lexical threshold. In Flowerdew, J., and Peacock, M., (eds.) Research
perspectives on English for academic purposes, pp. 315-329. UK : Cambridge
University Press, 2001.
References
Mudraya, O. (2006). Engineering English: A lexical frequency instructional models.
English for Specific Purposes. Volume 25 (2) pp.235-256. Elsevier Science.
Nation, P. (2001). Learning Vocabulary in Another Language. Cambridge: Cambridge
University Press.
Nation, P. and Waring, R. (1997). Vocabulary size, text coverage and word lists.
In Schmitt, N. and McCarthy, M. (eds.) Vocabulary: Description, Acquisition and
Pedagogy. pp. 6-19. Cambridge: Cambridge University Press.
Nurweni, A. and Read, J. (1999). The English vocabulary knowledge of Indonesian
university students.English for Specific Purposes. Volume 18 (2) pp. 161 – 175.
Elsevier Science.
Scott, M. (2001). Comparing corpora and identifying key words, collocations, frequency
distributions through the WordSmith Tools suite of computer programs.
In Ghadessy, M., Henry, A., and Roseberry, R.L. (2001). Small Corpus Studies
and ELT: Theory and Practice. pp. 47-67. US: John Benjamins Publishing.
Scott, M. (2007). WordSmith Tools version 5.0. Oxford University Press.
Available at http://www.lexically.net/wordsmith/index.html.
Supatranont, P. (2005a). Classroom concordancing: Increasing vocabulary size for academic
reading. KOTESOL Proceeding 2005. pp. 35-44. South Korea.
Supatranont, P. (2005b). A Comparison of the Effects of the Concordance-based and the
Conventional Teaching Methods on Engineering Students’ English Vocabulary
Learning. Online Ph.D. Dissertation, Program of English as an International
Language, Chulalongkorn University, Thailand. Available at
http://www.arts.chula.ac.th/~ling/thesis/Pisamai2548.pdf
West, M. (1953). A General Service List of English Words. London: Longman, Green and
Company.
Thank you
for your attention.