teachResearch - British National Corpus
Download
Report
Transcript teachResearch - British National Corpus
Using the BNC
for teaching and research
Teaching and learning
Where do corpora fit in?
As a (teacher) reference tool
As a teaching aid in the classroom
Replace/supplement teacher intuition
Place native/non-native speaking teachers on equal terms
As a self-access learning aid
Find out about the language/the culture for yourself (data-driven
learning)
• Hypothesis testing
• Hypothesis generation
Contradict the teacher
What teachers can do with corpora
•
•
•
•
•
Create informative materials: corpus-based
explanations (teacher consults corpus-based reference
works or corpora)
Create illustrative materials: corpus-based
examples,
e.g. concordances (teacher consults corpora)
Create corpus-based tasks and exercises (on paper
or computer)
Use corpora as a basis for correction and feedback
Do contrastive analyses (lexico-grammatical studies)
(cf. Mukherjee 2002)
What learners can do with corpora
•
•
•
•
•
form-focused activities ('data-driven learning')
meaning-focused activities (e.g. word meanings,
synonyms, semantic prosodies, cultural associations)
skill-focused activities (e.g. developing reading
strategies)
reference activities (e.g. corpus as writing aid)
research activities (e.g. variation studies, cultural
studies, CLIL)
(cf. Aston 2002)
Illustrations
(Thanks to Prof Guy Aston,
University of Bologna)
Learners aren’t linguists
The aim is NOT to provide
a complete description of all the data
maximum generalisations/abstractions
The aim IS to take away
usable partial generalisations
memorable experiences
enthusiasm
Using BNC-xml with Xaira can provide these?
Examples focussing on Xaira improvements
From experience with advanced learners of English
Example 1: Grammer
To + gerund
used to -ing, accustomed to -ing, look forward to ing, object to -ing
Xaira AddKey Query allows you to look for any
word with a specified POS value!
To + VVG|VBG|VDG|VHG
The AddKey query (any VVG)
Or VBG, or VDG, or VHG (multiple selections)…
QueryBuilder: To NEXT VVG
Or VBG, or VDG, or VHG …
Too many solutions
Random 30/14227 (sort 1L)
look forward to / when it comes to / devoted to / well on the way to
To + V.G is written formal …
Collocates of to + V.G (1,0):
by frequency
Learners should take something
away which is
relevant
memorable
typical
not over-general
E.g.
The French are the meanest when it comes to sending
Christmas cards
When it comes to buying houses, the British are keenest of
all
I’m not exactly the archetypal Mills & Boon dark stranger
when it comes to courting girls
Example 2: the verb tend
Missing from textbooks (Carter & McCarthy
1995)
Frequent (>100/M), widely distributed
How is it used?
Too many solutions? Try
Collocation/Analysis
VERB collocates (0,3): by frequency
TEND to concentrate (30/96)
Colligates (0,2: lemmata)
Just
what
nouns?
SUBST collocates (lemmata: 0,2)
Tend * SUBST collocates (25/352)
An odd list of nouns
You can tend:
gardens
cattle/sheep/flocks
fires
Things can tend:
to unity/infinity
to sort of VERB
Isn’t this all in the dictionary?
Perhaps, but the corpus gives
More frequency/distribution information
More examples
Access to wider contexts
Practice in working things out for yourself
Casual encounters – did you know tend to unity?
The corpus calls for an open mind – you regularly find
the answer to a different question from the one you
started off with … but you learn a lot in the process
A problematizing resource...
Use of the BNC (or comparable
resources)
complements (and corrects) intuition
critiques the myth of the native speaker
increases learner autonomy
For teacher and learner alike
The ins and outs of autonomous use
Learners may need warning to...
• focus on patterns which recur, without
necessarily trying to explain all the data
• avoid overgeneralisation
... and encouragement to
• be curious
• browse the context
• investigate exceptions
(Dis)confirming intuition
about choices
have a problem + infinitive or gerund?
do you make or take decisions?
about vocabulary
which nouns collocate with hard?
about grammar
I would be grateful if you [modal]?
Corpus use allows the learner
to become researcher
What about the researcher?
What is the starting point?
Have answer – illustrate w corpus examples
Plant has several meanings
Have hypothesis – test on corpus
Plant is usually a noun
Examine patterns in corpus – corpus as
starting point
Which is the most frequent use of plant?
Know the corpus
What does it contain?
Types of texts
Sampling strategies
More than text?
Potential weaknesses
How to search it
There are different options
Corpus research
Often form-based
Often includes frequency data
Has to rely on what is in the corpus +
analysis of data
Can be reproduced
Can be part of (larger) study
Corpus research
– some components
Idea
Plan
Pilot/sample searches
Materials gathering
Analysis
Evaluation
Conclusion
Start with some sample searches
How much (relevant) material is there in the
corpus?
If not enough – what can you do?
Find more data
Reconsider research idea
If too much – what can you do?
Only examine a part (of corpus or results)
Subcorpus
A part of the corpus selected according to
defined criteria
All texts of a particular kind
A proportion of (certain) texts
Mix of texts not readily available in corpus
May be possible use partitions in Xaira
(subcorpus can be saved and indexed
separately or can be used with other tools)
Random set of examples
Each example equally possible
From whole corpus or sub-set
Random sample = each time (potentially)
different
Gather data
Make sure to document what you
do! For your own sake and to
make it possible to repeat study.
Too many hits?
Sample the corpus or the search result.
Restrict
what you are searching (use part of the corpus)
what you examine (in detail)
Use annotation (or not?)
The annotation and metadata can help
restrict your sample
plant as verb
spoken by women, written in newspapers etc
Be aware of limitations
errors
un-labelled material
Check reliability
Look at the data!
Qualitative and quantitative analyses
Read/examine examples (both expected
and unexpected ones)
Evaluate reliability (teenagers talking about
Norwegians)
Look for trend and tendencies
‘Counting words’ –
numbers and statistics
Proportions
Frequency per XXX words
Trends vs. absolute truths
Illustrations can help
Plant
• 8,123 or 17,178 instances (form vs lemma)
• 14,572 nouns, 2,605 verbs
• SUBST = 85%, <5/6
NN0
1
0%
NN1
6338
43%
NP0
38
0%
NN1-VVB
1054
7%
NN2
7115
49%
26
0%
NN2-VVZ
Plant figures
Noun tags (plant*)
PLANT
26
1
NN0
NN1
subst
6338
7115
NP0
NN1-VVb
verb
NN2
unc
NN2-VVZ
38
1054
Document
Corpus used
Search strategy/method
Number of instances found
Number of instances examined
Selection criteria
Evaluation
Using the BNC
for teaching and research