teachResearch - British National Corpus

Download Report

Transcript teachResearch - British National Corpus

Using the BNC
for teaching and research
Teaching and learning
Where do corpora fit in?

As a (teacher) reference tool

As a teaching aid in the classroom
 Replace/supplement teacher intuition
 Place native/non-native speaking teachers on equal terms

As a self-access learning aid
 Find out about the language/the culture for yourself (data-driven
learning)
• Hypothesis testing
• Hypothesis generation
 Contradict the teacher
What teachers can do with corpora
•
•
•
•
•
Create informative materials: corpus-based
explanations (teacher consults corpus-based reference
works or corpora)
Create illustrative materials: corpus-based
examples,
e.g. concordances (teacher consults corpora)
Create corpus-based tasks and exercises (on paper
or computer)
Use corpora as a basis for correction and feedback
Do contrastive analyses (lexico-grammatical studies)
(cf. Mukherjee 2002)
What learners can do with corpora
•
•
•
•
•
form-focused activities ('data-driven learning')
meaning-focused activities (e.g. word meanings,
synonyms, semantic prosodies, cultural associations)
skill-focused activities (e.g. developing reading
strategies)
reference activities (e.g. corpus as writing aid)
research activities (e.g. variation studies, cultural
studies, CLIL)
(cf. Aston 2002)
Illustrations
(Thanks to Prof Guy Aston,
University of Bologna)
Learners aren’t linguists

The aim is NOT to provide
 a complete description of all the data
 maximum generalisations/abstractions

The aim IS to take away
 usable partial generalisations
 memorable experiences
 enthusiasm



Using BNC-xml with Xaira can provide these?
Examples focussing on Xaira improvements
From experience with advanced learners of English
Example 1: Grammer

To + gerund
 used to -ing, accustomed to -ing, look forward to ing, object to -ing

Xaira AddKey Query allows you to look for any
word with a specified POS value!

To + VVG|VBG|VDG|VHG
The AddKey query (any VVG)
Or VBG, or VDG, or VHG (multiple selections)…
QueryBuilder: To NEXT VVG
Or VBG, or VDG, or VHG …
Too many solutions
Random 30/14227 (sort 1L)
look forward to / when it comes to / devoted to / well on the way to
To + V.G is written formal …
Collocates of to + V.G (1,0):
by frequency
Learners should take something
away which is





relevant
memorable
typical
not over-general
E.g.
 The French are the meanest when it comes to sending
Christmas cards
 When it comes to buying houses, the British are keenest of
all
 I’m not exactly the archetypal Mills & Boon dark stranger
when it comes to courting girls
Example 2: the verb tend

Missing from textbooks (Carter & McCarthy
1995)

Frequent (>100/M), widely distributed

How is it used?
Too many solutions? Try
Collocation/Analysis
VERB collocates (0,3): by frequency
TEND to concentrate (30/96)
Colligates (0,2: lemmata)
Just
what
nouns?
SUBST collocates (lemmata: 0,2)
Tend * SUBST collocates (25/352)
An odd list of nouns

You can tend:
 gardens
 cattle/sheep/flocks
 fires

Things can tend:
 to unity/infinity
 to sort of VERB
Isn’t this all in the dictionary?

Perhaps, but the corpus gives






More frequency/distribution information
More examples
Access to wider contexts
Practice in working things out for yourself
Casual encounters – did you know tend to unity?
The corpus calls for an open mind – you regularly find
the answer to a different question from the one you
started off with … but you learn a lot in the process
A problematizing resource...

Use of the BNC (or comparable
resources)
 complements (and corrects) intuition
 critiques the myth of the native speaker
 increases learner autonomy

For teacher and learner alike
The ins and outs of autonomous use


Learners may need warning to...
• focus on patterns which recur, without
necessarily trying to explain all the data
• avoid overgeneralisation
... and encouragement to
• be curious
• browse the context
• investigate exceptions
(Dis)confirming intuition

about choices
 have a problem + infinitive or gerund?
 do you make or take decisions?

about vocabulary
 which nouns collocate with hard?

about grammar
 I would be grateful if you [modal]?
Corpus use allows the learner
to become researcher
What about the researcher?
What is the starting point?

Have answer – illustrate w corpus examples
 Plant has several meanings

Have hypothesis – test on corpus
 Plant is usually a noun

Examine patterns in corpus – corpus as
starting point
 Which is the most frequent use of plant?
Know the corpus

What does it contain?
 Types of texts
 Sampling strategies
 More than text?


Potential weaknesses
How to search it
 There are different options
Corpus research





Often form-based
Often includes frequency data
Has to rely on what is in the corpus +
analysis of data
Can be reproduced
Can be part of (larger) study
Corpus research
– some components







Idea
Plan
Pilot/sample searches
Materials gathering
Analysis
Evaluation
Conclusion
Start with some sample searches


How much (relevant) material is there in the
corpus?
If not enough – what can you do?
 Find more data
 Reconsider research idea

If too much – what can you do?
 Only examine a part (of corpus or results)
Subcorpus

A part of the corpus selected according to
defined criteria
 All texts of a particular kind
 A proportion of (certain) texts
 Mix of texts not readily available in corpus


May be possible use partitions in Xaira
(subcorpus can be saved and indexed
separately or can be used with other tools)
Random set of examples



Each example equally possible
From whole corpus or sub-set
Random sample = each time (potentially)
different
Gather data
Make sure to document what you
do! For your own sake and to
make it possible to repeat study.
Too many hits?


Sample the corpus or the search result.
Restrict
 what you are searching (use part of the corpus)
 what you examine (in detail)
Use annotation (or not?)

The annotation and metadata can help
restrict your sample
 plant as verb
 spoken by women, written in newspapers etc

Be aware of limitations
 errors
 un-labelled material

Check reliability
Look at the data!




Qualitative and quantitative analyses
Read/examine examples (both expected
and unexpected ones)
Evaluate reliability (teenagers talking about
Norwegians)
Look for trend and tendencies
‘Counting words’ –
numbers and statistics




Proportions
Frequency per XXX words
Trends vs. absolute truths
Illustrations can help
Plant
• 8,123 or 17,178 instances (form vs lemma)
• 14,572 nouns, 2,605 verbs
• SUBST = 85%, <5/6
NN0
1
0%
NN1
6338
43%
NP0
38
0%
NN1-VVB
1054
7%
NN2
7115
49%
26
0%
NN2-VVZ
Plant figures
Noun tags (plant*)
PLANT
26
1
NN0
NN1
subst
6338
7115
NP0
NN1-VVb
verb
NN2
unc
NN2-VVZ
38
1054
Document






Corpus used
Search strategy/method
Number of instances found
Number of instances examined
Selection criteria
Evaluation
Using the BNC
for teaching and research