*** 1 - Personal.psu.edu
Download
Report
Transcript *** 1 - Personal.psu.edu
Tracking L2 Lexical and Syntactic
Development
Xiaofei Lu
CALPER 2010 Summer Workshop
July 14, 2010
Outline
Lexical & syntactic complexity: The what and why
Syntactic complexity in EFL writing
Lexical complexity in EFL speaking
2
Lexical and Syntactic Complexity:
The What and Why
What is lexical and syntactic complexity
Lexical complexity
A multidimensional feature of language use encompassing
lexical density, sophistication and variation (Wolfe-Quintero et
al. 1998; Read 2000)
Does not focus on errors, a dimension in Read’s (2000)
conceptualization of lexical richness
Syntactic complexity
The range of forms that surface in language production
and the degree of sophistication of such forms (Ortega 2003)
4
Why measure linguistic complexity?
First language acquisition & psycholinguistics
Studies of L1 developmental sequence
Objective measures of L1 developmental level
Ordering experimental stimuli by complexity
Relationship of complexity in childhood to symptoms of
Alzheimer’s disease (Kemper et al. 2001)
5
Why measure linguistic complexity?
Second language acquisition
Objective L2 developmental indices
Assessing cross-proficiency differences
Assessing effect of pedagogical intervention
Tracking L2 learners’ linguistic development over time
Relationship between lexical/syntactic complexity and
proficiency claimed in many test rating scales
6
Syntactic Complexity in EFL Writing
Lu, X. (forthcoming 2010). Automatic analysis of syntactic complexity in second
language writing. International Journal of Corpus Linguistics, 15.
Lu, X. (forthcoming 2010). A corpus-based evaluation of syntactic complexity
measures as indices of college-level ESL writers’ language development. TESOL
Quarterly, 44(4).
Outline
Measures of L2 syntactic complexity
L2 syntactic complexity analyzer
Syntactic complexity & EFL writing development
Summary
8
Measures of L2 syntactic complexity
Measures reviewed in two research syntheses
Wolfe-Quintero et al. (1998)
Ortega (2003)
Selection criterion
At least one previous study showed at least weak
correlation with or effect for proficiency
Issues among previous studies
Variation in measure selection and definition
Variation in experiment design
Inconsistent results reported on the same measures
9
Measures of L2 syntactic complexity
Length of production
1. Mean length of clause (MLC)
2. Mean length of sentence (MLS)
3. Mean length of T-unit (MLT)
Sentence complexity
4. Mean number of clauses per sentence (C/S)
10
Measures of L2 syntactic complexity
Subordination
5.
6.
7.
8.
Mean number of clauses per T-unit (C/T)
Mean number of complex T-units per T-unit (CT/T)
Mean number of dependent clauses per clause (DC/C)
Mean number of dependent clauses per T-unit (DC/T)
11
Measures of L2 syntactic complexity
Coordination
9. Mean number of coordinate phrases per clause (CP/C)
10.Mean number of coordinate phrases per T-unit (CP/T)
11.Mean number of T-units per sentence (T/S)
Particular grammatical structures
12.Mean number of complex nominals per clause (CN/C)
13.Mean number of complex nominals per T-unit (CN/T)
14.Mean number of verb phrases per T-unit (VP/T)
12
L2 syntactic complexity analyzer
Input: plain English text
Step 1: Parsing using Stanford parser
Step 2: Retrieving & counting occurrences of
Words, sentences, clauses, dependent clauses
T-units, complex T-units
Coordinate phrases, complex nominals, verb phrases
Step 3: Computing ratios for the 14 measures
Output: 14 syntactic complexity indices
13
How counting is done
Word: all non-punctuation tokens
Other units: Tregex (Levy & Andrew, 2006)
Define the units linguistically
Formulate Tregex patterns matching the unit definitions
Query the parse trees with the Tregex patterns
Retrieve/count (sub)trees matching each pattern
14
Definition and pattern examples
Clause: subject + finite verb (Polio 1997)
‘S|SINV|SQ < (VP <# MD|VBP|VBZ|VBD)’
Dependent clause: adverbial, adjectival or nominal clause
‘SBAR < (S|SINV|SQ < (VP <# MD|VBP|VBZ|VBD))’
15
Evaluation
Experiment setup
40 essays from the Written English Corpus of Chinese
Learners (Wen et al. 2005), average 315 words
Written by English majors in four-year colleges in China
20 used for training, 20 for testing
Two annotators counted unit occurrences in the essays
Inter-annotator agreement
Evaluated on 10 essays
F-score for unit identification: .907 (CN) - 1.000 (S)
Correlations of complexity ratios: .912 (CT/T) - 1.000 (MLS)
16
Unit identification results on test data
Counts
System-annotator agreement
Manual Identical
357
357
Precision
1.000
Recall
1.000
F-score
1.000
Structure
S
System
357
C
545
558
530
.972
.950
.961
DC
170
178
161
.947
.904
.925
T
376
380
369
.981
.971
.976
CT
129
136
126
.977
.926
.951
CP
138
135
125
.906
.926
.916
CN
660
572
511
.774
.893
.830
VP
750
758
698
.931
.921
.926
17
Correlations of complexity ratios
Measure Development
Test
Measure Development
Test
MLC
.941
.932
DC/T
.950
.941
MLS
1.000
1.000
CP/C
.845
.834
MLT
.989
.987
CP/T
.876
.871
C/S
.939
.928
T/S
.931
.919
C/T
.978
.961
CN/C
.883
.867
CT/T
.903
.892
CN/T
.904
.896
18
Error analysis
Attachment and conjunction scope errors
e.g., benefit a lot from [the Internet in academic study]
More reliable in identifying higher-level units: S, C, T, CT
Learner errors not a major cause for problems
Advanced EFL learners
Idiomaticity vs. grammatical completeness
Some errors do not lead to structural misanalysis
19
Syntactic complexity & EFL writing
development
Research questions
The WECCL corpus
Results
Summary
20
Research questions
1)
2)
3)
4)
5)
Effect of sampling condition
Measures discriminating proficiency levels
Magnitudes for differences to be significant
Relationships between measures
Patterns of development for the measures
21
The WECCL corpus
School
Level
Argumentation
Narration
Exposition
All
Time
d
Untimed
Timed
Untimed
Timed Untimed
1
695
395
89
0
30
0
1209
2
441
398
246
0
28
0
1113
3
504
459
91
0
30
0
1084
4
60
0
88
0
0
0
148
All
1700
1252
514
0
88
0
3554
Essay length: range=[89, 892], mean=315, sd=87
22
Effect of sampling condition
Institution: sig. inter-institution dif. for
All metrics using all data
12 metrics using Y1-3 timed arg essays
Genre: sig. dif. between arg vs. nar for
All metrics using arg & nar essays
All metrics using timed arg & nar essays
13 metrics using timed arg & nar essays from ND
23
Sampling condition effect (cont)
Timing: sig. dif. between un/timed arg for
13 measures using all arg essays
11 metrics using arg essays from ND
Data for other research questions
422 timed argumentative essays from ND
24
Measures discriminating levels
3 showed sig. dif between first 3 levels
MLC, CN/C, and CN/T
4 showed sig. dif between first 2 levels
MLS, MLT, CP/C, and CP/T
5 showed sig. dif. between non-adjacent levels
C/S, C/T, CT/T, DC/C, and DC/T
2 showed no sig. between-level dif.
T/S and VP/T
25
Significant magnitudes
Metric
Magnitude
Levels
Measure
Magnitude
Levels
MLC
.573
2-3
DC/C
-.033
1-4
MLS
1.658
1-2
DC/T
-.071
1-4
MLT
1.651
1-2
CP/C
.040
1-2
C/S
-.112
2-4
CP/T
.061
1-2
C/T
-.078
2-4
CN/C
.133
2-3
CT/T
-.043
2-4
CN/T
.178
2-3
26
Relationships between measures
Strong relationship between measures of the same type or
involving the same structure
MLS and MLT show weak-moderate correlations with
subordination measures
MLC shows low-weak negative correlations with
subordination measures
Length measures show moderate-high correlations with CN
measures and weak-moderate correlations with CP measures
CN and CP measures weakly correlated with each other
27
Developmental patterns
Measures with sig. positive changes
Linear increase Y1-4: MLC, CN/C
Increase Y1-3 (Y4=Y3): CP/C
Increase Y1-3 (Y4<Y3, insig.): MLS, MLT, CP/T, CN/T
Measures with sig. negative changes
Linear decrease Y1-4: C/S
Nonlinear Y1<Y2>Y3>Y4: DC/C, DC/T
28
Summary of findings
Important to control for the effects of relevant
learner-, task- and context-related factors
Seven measures recommended for future use
CN/C, MLC: discriminates 2+ adjacent levels, linear increases
CN/T, MLS, MLT: 2 adjacent levels; positive sig changes
CP/C, CP/T: nonadjacent levels, positive sig changes
Developmental prediction: complexification at
the phrasal level vs. the clausal level
29
Summary of findings (cont.)
Smaller magnitudes than reported previously
Clause as a potentially more informative unit of
analysis than T-unit
30
Limitations and future research
Incorporating more measures and flexible definitions of
structures into the analyzer
Other conceptualizations of proficiency level
Effect of L1 on syntactic development
Relationship between developmental measures of fluency,
accuracy and complexity at different linguistic levels
31
Lexical Complexity in EFL Speaking
Lu, X. (under review). The relationship of lexical richness to the quality of ESL
speakers’ oral narratives.
Outline
Research goals and motivation
Measures of lexical complexity
Methodology
Results
Conclusion
33
Research goals and motivation
Research goals
Automate lexical complexity analysis using 25 measures
Evaluate the relationship of these measures plus the D
measure to the quality of EFL speakers’ oral narratives
Motivation
Lexical complexity an important construct in L2 teaching
and research
Relationship between lexical complexity and proficiency
claimed in many test rating scales
34
Measures of lexical complexity
Lexical complexity measures proposed in language
acquisition studies and reviewed in
Wolfe-Quintero et al. (1998)
Read (2000)
Malvern et al. (2004)
Measures of the following three dimensions
Lexical density
Lexical sophistication
Lexical variation
35
Lexical density
Proportion of lexical words (Nlw / N) (Ure 1971)
Previous findings
Lower in spoken than written texts (Halliday 1985)
Affected by various sources (O’Loughlin 1995)
Relation to L2 writing non-significant (Engber 1995)
Inconsistent definition of lexical words
All nouns and adjectives
Adverbs with adjective base
Full verbs (excluding modal/auxiliary verbs)
36
Lexical sophistication
Five measures examined
LS1:
LS2:
VS1:
CVS1:
VS2:
Nslw / Nlw
Ts / T
Tsv / Nv
Tsv / sqrt(2Nv)
Tsv2 / Nv
(Linnarud 1986; Hyltenstam 1988)
(Laufer 1984)
(Harley & King 1989)
(Wolfe-Quintero et al. 1998)
(Chaudron & Parker 1990)
37
Lexical sophistication (cont.)
Previous findings
LS1: NS-NNS dif sig (Linnarud 1986); non-sig (Hyltenstam 1988)
LS2: sig pre-and post-essay dif (Laufer 1984)
VS1: sig NS-NNS dif (Harley & King 1989)
Varying definitions of sophistication
2000-word BNC frequency list (Leech et al. 2001)
38
Lexical variation
20 measures examined
4 based on NDW
NDW: Number of different words
NDW-50: NDW in first 50 words of sample
NDW-ER50: mean NDW of 10 random 50-word subsamples
NDW-ES50: mean NDW of 10 random 50-word sequences
39
Lexical variation (cont.)
7 based on TTR for total vocabulary
Type token ratio (TTR)
Mean TTR of all 50-word segments (MSTTR)
LogTTR, Corrected TTR, Root TTR, Uber
The D measure (McKee et al. 2000)
9 based on TTR for word classes
T{LW, V, N, Adj, Adv, Mod} / Nlw
Tv / Nv, Tv2 / Nv, Tv / sqrt(2Nv )
40
Lexical variation (cont.)
Previous findings
NDW and TTR useful, but affected by sample size
Transformations of NDW & TTR not equally useful
D claimed superior; results mixed (Jarvis 2002; Yu 2010)
Mixed results for word class TTR measures
No consensus on a single best measure
41
Research questions
How does LD relate to the quality of EFL speakers’ oral
narratives?
How do the LS measures compare with and relate to each
other as indices of the quality of EFL speakers’ oral narratives?
How do the LV measures compare with and relate to each
other as indices of the quality of EFL speakers’ oral narratives?
How do LD, LS and LV compare with and relate to each other as
indices of the quality of EFL speakers’ oral narratives?
42
Data
Spoken English Corpus of Chinese Learners (Wen et
al. 2005)
Transcripts of TEM-4 Spoken Test data in 1996-2002
Task 2 data used: 3-minute oral narratives
Students ranked within groups of 32-35
12 groups of data used (1999-2002; N=32-35 each)
Only rankings available, but not actual scores
Example topic (2001)
Describe a teacher of yours whom you found unusual
43
Computing the measures
Preprocessing
Part-of-speech tagging (Stanford tagger)
Lemmatization (Morpha)
Measure computation
D measure: vocd utility in CLAN
Type counting: w, sw, lw, slw, v, sv, n, adj, adv
Token counting: w, lw, slw, v
Computation of the other 25 ratios
44
Analysis
Spearman’s rho computed for each group
X: test takes’ rankings within the group
Y: Values of each of the 26 measures
Meta-analysis to combine results from the 12 groups
Students divided into 4 levels based on rankings
Levels A, B, C and D
ANOVA’s run to determine inter-level differences
45
Analysis (cont.)
Alpha level = .05 / 28 = .0018
Identification of discriminative measures
Significant combined rho (p < .0018)
Significant between-level differences with linear decreases
from Level A to Level D
46
Lexical density and sophistication
Measure Combined rho p-value Measure Combined rho p-value
Words
.437
.000
LS2
.050
.336
W/Min
.437
.000
VS1
.133
.010
LD
.011
.836
CVS1
.166
.001
LS1
.048
.355
VS2
.165
.001
47
Lexical density and sophistication (cont.)
Measure
A
B
C
D
F
Sig.
Words
336.16
295.95
297.76
256.34
28.335
.000
W/Min
112.052
98.650
99.252
85.446
28.335
.000
LD
.417
.415
.409
.414
.896
.443
LS1
.227
.235
.221
.225
.681
.564
LS2
.261
.272
.256
.260
2.736
.043
VS1
.072
.086
.067
.073
2.629
.050
CVS1
.343
.383
.299
.297
3.722
.042
VS2
.314
.401
.274
.262
2.760
.042
48
Lexical density and sophistication (cont.)
LS1
LS2
VS1
CVS1
LS1
1.000
LS2
.637**
1.000
VS1
.456**
.391**
1.000
CVS1
.414**
.382**
.966**
1.000
VS2
.381**
.350**
.909**
.935**
VS2
1.000
49
Relationships among the dimensions
Low to weak correlations among measures in
different dimensions
Lexical variation demonstrated strongest
relationships to raters’ judgments of the quality
of EFL speakers’ oral narratives
50
Summary of findings
The three dimensions posited in language acquisition
literature appear different constructs
No/small effect for lexical density/sophistication found
Lexical variation correlated strongly with quality
9 LV measures recommended
NDW correlates strongly with length, but worth considering in
the case of timed oral narratives
Transformed TTR measures perform better than the original
TTR measures
51
Limitations and future research
A factor analysis will show patterns of relationships
No scores available, so not possible to run regression models
Division of students into 4 levels could be problematic
Replication using EFL writing data and other
conceptualizations of proficiency
Effects of task-related variables
Relations among factors determining quality
52