Vocabulary use during conversation: a cross

Download Report

Transcript Vocabulary use during conversation: a cross

Vocabulary use during
conversation: a crosssectional study of
development amongst learners
of Spanish and French
WORK IN PROGRESS – PLEASE DO NOT CITE
WITHOUT PERMISSION FROM THE AUTHORS
Emma Marsden, University of York,
[email protected]
Annabelle David, University of Newcastle,
[email protected]
Aims
• Document / describe progression – useful
for teaching practice and assessment
Also, indirectly & in long term:
– Begin analysis of use of formulaic language
– Does it interact with learning a generative grammar?
– Begin to explore relationship between
learner’s vocabulary and their
morphosyntactic development
– L1 and early bilingual literature suggests causal link
Outline
•
•
The task, data and participants
Results
1. General diversity of types & counts of
tokens
2. Use of different word classes
•
Nouns versus verbs
3. Diversity of inflections
4. Formulaic language
•
Conclusions
Can we measure lexical knowledge
from oral corpus data?
• Size of vocabulary - no
– need to give them tests
• Richness, sophistication, rarity – yes, but not yet
– needs a comparison with word lists.
– From relevant corpus i.e. oral, and L2 classroom
learners – not available!
• Diversity, or lexical variation - YES!
– when they produce language, how often do they have
to repeat the same words?
– what is the balance of nouns, verbs, adjectives?
The task
• Photos task: semi-guided interview /
conversation
– Descriptions of photos
– Questions about photos
– Discussion around photos,
• relating to past, current and future activities.
The Participants
• English speaking learners of French and
Spanish
• From years 9 and 13 (approximately 230 and
600 hours classroom instruction respectively)
• twenty learners in each group in each language.
– twenty final year undergraduates in Spanish
– native controls
• 15, age-matched Spanish natives,
• and five adult French natives.
– approximately 120 participants in total
The Data: Which bits of speech are
‘words’?
• Data excluded from the analysis
–
–
–
–
Filled pauses (er…)
Repeated language (with or without corrections)
Imitations of researcher
Words in another language (e.g. French & English)
• Data included
– Including made-up words or incorrect words e.g. mi
hermano nadar (for nada)
– Final repair
– Some lemma (stem) counts: va, vamos = 1
– Some whole word counts: va, vamos = 2
– Some counts of just the inflections
Types & tokens
• Ojos…ojos
– 1 lexical type
– 2 lexical tokens
• Mira….miran…miras…
– 1 lexical type
– 3 lexical tokens
Types and tokens based on
LEMMAS
Group
(n)
tokens
(st.dev)
different types
(st.dev)
TTR
Sp
Fr
Sp
Fr
Sp
Fr
year 9
(20)
194
230
64
65
.387* .311
(117)
(115)
(28)
(24)
(.122)
year 13
(20)
523
529
155
142
.300* .279
(134)
(191)
(32)
(38)
(.028)
(.092)
(.038)
•The more diverse the speech, the higher the TTR.
•According to TTR, year 9 have a more diverse vocabulary
than year 13 (Spanish) and no difference in French!
•TTR is problematic, not a valid measure (but it is standard
output in CLAN FREQ commands)
Compensating for influence of text
length
• Guiraud index
Types/√tokens.
• D
Uses random sampling of tokens in plotting
curve of TTR against increasing token
size.
Calculated by vocd in CLAN software
– Usually correlates well with Guiraud
D
Group (n)
D based on words
(st. dev)
D based on lemmas
(st. dev)
Sp
Fr
Sp
Fr
year 9
(17)
35.40
29.40
24.31
20.98
(8.55)
(11.41)
(5.81)
(6.82)
year 13
(20)
56.62
54.57
38.63
35.60
(12.99)
(13.77)
(7.92)
(57.34)
undergraduates
(20)
Natives
Results 2: Use of word class types
1. Basic descriptions of use: nouns, verbs,
adjectives, interrogative pronouns, adverbs
2. How ‘nouny’ are their productions?
•
•
What proportion of word types belong to a certain
class?
What is the density of different word classes in total
productions?
3. Is the diversity of nouns different to the
diversity of verbs?
4. Do these give any indication of progression?
Basic description:
Adjectives and adverbs
Types of
adjectives*
Sp
Tokens of
adjectives*
Fr
Sp
Types of
adverbs**
Fr
Sp
Tokens of
adverbs**
Fr
Sp
Year 9
(20)
1.5 1.6
2.3 2.0
4.1
2.6
(1.4)
(2.6)
(2.4)
(2.8)
Year 13
(20)
10. 8.4
(4.8)
6
14. 11.7
(6.7)
6
12.1
23.6
(2.7)
(9.7)
(4.4)
(6.6)
(1.7)
*lemmas, not colours
(2.3)
**lemmas, not y/n
Fr
Basic description:
Creo que, el hombre que…,
and interrogative pronouns
Types of Tokens of
interrogative interrogative
Tokens of que as
pronouns (lemmas) pronouns (lemmas)
conjunction +
relative
Sp
Year 9 (20)
Year 13 (20)
Fr
Sp
Fr
Sp
0.7
1.2
2.6
(1.3)
(0.9)
(2.5)
6.8
1.8
5.9
(6.6)
(0.8)
(3.2)
Fr
A nouny style
Year 9 (230 hours instruction)
*P02:
two chi eh dos chicos un camisa
Southampton.
*MJA:
now I would like you to ask me
questions about the pictures so...
*P02:
hermanos ?
*MJA:
eh estos son hermanos sí mmm.
How much speech is nouns & verbs?
Noun Tokens /
Total Tokens
Noun Types /
Total Types*
Sp
Year 9
(20)
Sp
Fr
34% 28% 28% 17%
(11)
Year
13
(20)
Fr
(5)
(11)
Verb Types /
Total Types
(4)
Sp
Verb Tokens
/Total Tokens
Fr
Sp
Fr
12% 12% 15% 16%
(4)
(4)
(5)
(5)
28% 25% 18% 12%
15% 15% 18% 19%
(4)
(2)
*all are based on
lemmas
(4)
(2)
(2)
(2)
(3)
(3)
Proportion of types out of all types (see e.g. Kauschke and
Hofmeister, 2002
How much speech is adjectives?
Group (n)
Adj tokens out of all
tokens
Adj types out of all
types
Sp
Fr
Sp
Fr
0.7%
2.1%
1.9%
6.7%
2.0%
5.6%
year 9 (20) 1.0%
year 13 (20) 2.7%
undergraduate
(20)
Natives
Comparing diversity of noun types
to diversity of verb types
• Malvern et al (2004) propose the ‘Limiting
Relative Diversity’ calculation to compare
the diversity of different word classes
when token samples are different
• Implemented by CLAN vocd software
– Square root of division of diversity of one
word class by the diversity of the other
– Needs at least 50 tokens of noun, 50 of verb
Limiting relative diversity
Group (n)
year 9
(n Sp = 5)
(n Fr = 4)
year 13
(n Sp = 18)
(n Fr= 13)
LRD (verbs / nouns)
Sp
Fr
.366
.353
(.061)
.425
(.089)
(.095)
.341
(.073)
•NO stat sig. differences between diversity of verbs and
diversity of nouns between year 9 and 13.
•Unreliable (Small sample sizes) or
•new nouns and verbs learnt at same rate??
• BUT LRD correlates well with:
– proportion of verb types / total types (r=.786**)
– verb tokens / total tokens (r=.862**)
– verb noun ratio (r=.862**)
– And these all DO increase between yr 9 & 13
• Need year UG & natives to validate LRD
Results 3: Inflectional diversity
• Inflectional diversity
total number of words - total stem forms
= number of inflectional variations on stem
forms
i.e. how well are the learners manipulating
stem forms
• See Malvern et al. (2004).
Inflectional Diversity
Group (n)
year 9
(Sp n=17)
(Fr n = 20)
year 13 (20)
Inflectional diversity (D words-D lemmas)
(st. dev)
Sp
Fr
11.09
8.4
(4.38)
(5.3)
17.99
18.98
(5.59)
(7.49)
Does verb use correlate with
inflectional diversity?
• Broeder, Extra, van Hout (1993) found
verb use indicates progression
– See also NSF data, and argument by Myles
(2004)
• Correlating lexical and inflectional diversity
with verb/noun proportions…
Indicators of development?
• As learners use more verb types, they use
more inflections (strong positive
correlations)
• Inflectional diversity does not seem to
correlate with use of other word classes
• Nouns (tokens & types) decrease, verbs
increase (strong negative correlations)
Results 4: Formulaic language –
lexical items?
• Criteria for a ‘chunk’ (Myles et al, 1998)
– Greater length and complexity of sequence
compared with other learner output; usually
well-formed
– Often used inappropriately (syntactically,
semantically, pragmatically), e.g.
overextensions
Formulaic language (chunks)
asking about people in photos:
P02:
eh dónde vives ?
*MJA:
mmm ellos ? …ellos viven en
Southampton .
*P02:
mmm cuántos años tienes ?
*MJA:
eh ellos.
*P02:
tú ?
*MJA:
tienen doce y trece .
Chunks even when some verbs
appear to be manipulated
*P03: come... lleva... están.... hacen ....están
jugando...son...jugan...jugo ...tengo...voy...voy a ir
BUT THEN... eh cuánto años tienes ? (for how old is he?)
BUT later: mi hermano tiene once años y mi hermana que
se llama Ellie y tiene ocho años
MJA: qué haces un sábado normal en tu en tu vida ?
*P02: jugar al fútbol en mañana y salgo con mis amigos
en tarde
• CONTEX-DEPENDENT ACCURACY:
CHUNKS, or item by item learning?
Conclusions
• The tasks in SPLLOC and FLLOC seemed to elicit
broadly similar language
• Greater verb density seems to indicate progression
• 450 more hours instruction does make significant
difference
– both for vocabulary diversity and inflectional diversity
– previous comparisons between smaller gaps suggest no gains
• Formulaic language
– Evidence for item by item learning (constructionist)?
Limitations
• Only one measure of lexical knowledge –
productive, oral
• This quantitative approach doesn’t tell us
about accuracy of lexical or inflectional use
(e.g. gastar (spend) time)
• We can say positive correlation between
inflectional and lexical diversity – but this
product data does not tell us whether
increase in vocabulary enables processing
of morphosyntax
Future directions
• Comparisons with undergrads and native
controls
• A richness measure
– will be based on rarity WITHIN our own corpus
• Analysis of closed class items, using CLAN’s list
• Further exploration of relationship of increased
lexical knowledge, increased verb types and
emerging morphosyntax