Diapositiva 1 - Universidad Autonoma de Madrid

Download Report

Transcript Diapositiva 1 - Universidad Autonoma de Madrid

The WOSLAC project:
Word Order in Second Language Acquisition Corpora
http://www.uam.es/woslac
Université catholique de Louvain (Belgium)
“Learner Corpus Colloquium”
3 April 2006
Amaya Mendikoetxea [email protected]
Cristóbal Lozano [email protected]
Universidad Autónoma de Madrid
1
SUMMARY
 The main purpose of this project is to determine the lexicon-syntax and
syntax-discourse properties which constrain word order in the
interlanguage of L2 learners of English (with L1 Spanish) and L2 learners
of Spanish (with L1 English).
 In particular, we wish to examine the validity of the Unaccusative
Hypothesis at the lexicon-syntax interface and the role of discourse
functions such as topic and focus at the syntax-discourse interface in L2
Spanish and L2 English.
 Our initial hypotheses are the following: (1) The Unaccusative Hypothesis
plays a role in word order in L2 learners’ interlanguage; (2) Lexicon-syntax
properties are acquired before syntax-discourse properties, i.e., properties
at the lexicon-syntax interface are present in the initial stages of
grammatical development, while properties at syntax-discourse interface
are persistently difficult to acquire and generate deficits even at advanced
levels of proficiency; (3) Interlanguages have structures that cannot be
explained with reference to L1 or L2, but rather reflect universal properties
of languages.
 To test these hypotheses, a corpus will be compiled and appropriate
searching tools will be developed. The data obtained will be analysed both
qualitatively and quantitatively.
 The interpretation of the data will be done within a comparative framework
which will help determine the role of L1 in L2 acquisition in the grammar
areas of the study.
2
MAIN PURPOSE
 To determine the lexicon-syntax and syntaxdiscourse properties which constrain word
order in the interlanguage of L2 learners
L2 English (with L1 Spanish)
L2 Spanish (with L1 English).
 To examine the validity of:
the Unaccusative Hypothesis at the lexicon-syntax
interface and
the role of discourse functions such as topic and focus
at the syntax-discourse interface in L2 Spanish and
English.
3
RESEARCH QUESTIONS
(1) The Unaccusative Hypothesis plays a role in word
order in L2 learners’ interlanguage;
(2) Lexicon-syntax > syntax-discourse:
 Lexicon-syntax properties are acquired before syntaxdiscourse properties,
 i.e., properties at the lexicon-syntax interface are present
in the initial stages of grammatical development, while
properties at syntax-discourse interface are persistently
difficult to acquire and generate deficits even at advanced
levels of proficiency;
(3) Interlanguages have structures that cannot be
explained with reference to L1 or L2, but rather reflect
universal properties of languages.
4
DATA
2 written corpora:
L1 Spanish – L2 English
L2 English – L1 Spanish
Data analysis: qualitatively and
quantitatively (descriptive and inferential
statistics, SPSS).
5
L2 English – L1 Spanish
 260 academic essays in electronic format
Range: 500 words up to 2,000 words
(300.000 words)
 1st and 3rd year Spanish students in an
academic writing course on a degree in English
Philology at the Universidad Autónoma de
Madrid.
 Basic procedure for gathering the data :
Learner Profile
Essay Profile
Oxford Quick Placement Test
6
Software: UAM Corpus Tool
 Software for text annotation: UAM CorpusTool
It allows an analyst to select a text from the corpus, and
annotate it in various ways.
It can highlight a segment (e.g., an it-cleft) and then
assign features to that segment.
The tool produces an XML-encoded version of the text
file, including the features assigned to the segments.
It can then automatically detect instances of the
pattern.
7
L2 Spanish – L1 English (online)
8
THEORETICAL FRAMEWORK
 Comparative framework which will help
determine the role of L1 in L2 acquisition in the
grammar areas of the study.
L1 properties
L2 properties
Universal properties
 We look at:
Properties operating at the lexicon-syntax interface
(unergatives vs. unaccusatives).
Properties operating at the syntax-discourse (topic vs.
focus).
9
Cont´d
 →Underlying idea: formal and functional
features interact in the structures under
consideration.
 Formal and functional approaches are,
therefore, essential for the understanding of
SLA data.
 At the same time, SLA data from nonnative
grammars can be potentially significant for the
understanding of linguistic phenomena in
native grammars.
10
THEORETICAL FRAMEWORK:
INTERFACES
 Chomsky’s (1995, 2001, and so on) Minimalist Program
 Emphasis on interface conditions.
(1)
LEXICON
COMPUTATIONAL SYSTEM
Syntax
PF
SM systems
LF
C-I systems
 →A well designed language faculty would involve nothing else other than interface
conditions.
 “a theory of language that takes a linguistic expression to be
nothing other than a formal object that satisfies the interface
conditions in the optimal way.” (Chomsky 1995: 171).
11
Features of the language sample to be
studied
 1. Non-canonical word order
 clause patterns which do not conform to the S(ubject) V(erb)
O(bject) order.
 (a) Inverted subjects
 Into the room came a tiny old lady.
 The came voices all shouting together
 Not before in our history have so many strong influences united to produce so
large a disaster.
 As infections increased in women, so did infections in their babies.
 b) Dislocation in the left-periphery (fronting)
 The paper Terry buys everyday (not a book!)
 Why he said that I will never now
 c) Right-dislocation
 The teacher made clearer the standards which students should be aiming for.
 It’s a pity that he cannot speak Russian.
12
(cont´d)
 2. Variation in word order & special constructions
 a) passive constructions
 The paper is bought (by Terry) everyday.
 b) There-constructions
 There came a tiny old lady into the room.
 c) Ditransitive constructions
 a. I’ll give Mary the book
 b. I’ll give the book to Mary
 (d) Object Placement in Phrasal Verbs
 a. He picked up the telephone
 b. He picked the telephone up
 (e) Clefts
 a. It was his voice that held me.
 b. What held me was his voice
13
FIRST WOSLAC STUDY
Postverbal subjects
L1 Spa – L2 Eng
ICLE corpus
Interfaces
Lexicon-syntax
Syntax-discourse
14
Word order in native English
 Very restricted: canonical word order SV.
 Four girls sang
 Four girls arrived
 Lexicon-syntax interface (Levin & Rappaport-Hovav, etc):
 Unaccusative Hypothesis (Burzio 1986, etc)
 *There sang four girls at the opera. [unergative verb]
 There arrived four girls at the station. [unaccusative verb]
 Syntax-discourse interface (Biber et al, Birner, etc):
 Postverbal material tends to be focus (new info)
 We have complimentary soft drinks and coffee. Also complimentary is red and white wine.
 Syntax-Phonological Form (PF) interface (Arnold et al, etc)
 Heavy material is sentence-final (Principle of End-Weight, Quirk):
 That money is important is obvious.
 It is obvious that money is important.
Postverbal subjects which are focus, long and complex tend to occur postverbally15 in
those structures which allow them.
Previous L2 findings
 Production of postverbal subjects in L2 English (Rutherford 1989,
Oshita 2004):
 L1 Spanish – L2 English:




…it arrived the day of his departure…
And then at last comes the great day.
In every country exist criminals
…after a few minutes arrive the girlfriend with his family too.
 Only with unaccusative verbs (never with unergatives).
 Unaccusatives: arrive, happen, exist, come, appear, live…
 Explanation: syntax-lexicon interface (Unaccusative Hypothesis)
Previous studies focused on ERRORS, thus emphasising the differences between
native and non-native structures.
Our study emphasises the similarities between native and non-native structures 16
licesing conditions are the same.
Hypotheses
 VS order in L1 Spa – L2 Eng…
 GENERAL HYPOTHESIS:
 Conditions licensing VS in L2 Eng are the same as those in
Native Eng, DESPITE differences in grammaticalisation.
 H1: Lexicon-syntax interface:
 Postverbal subjects with unaccs (never with unergs)
 H2: Syntax-PF interface:
 Postverbal subjects: heavy (NOT light)
 H3: Syntax-Discourse interface:
 Postverbal subjects: focus (NOT topic)
17
Method
 Learner corpus: L1 Spa – L2 Eng
 ICLE Spanish subcorpus (Granger et al. 2002)
 UAM corpus [2nd edition of ICLE]
Corpus
ICLE Spanish
UAM
TOTAL
Number of essays
251
85
336
Number of words
200,376
63,836
264,212
 Problem: proficiency level??
 WordSmith v. 4.0 (Scott 2004)
 Excel, SPSS v. 12.0
  Concordance queries can be performed automatically with WordSmith,
BUT there is a lot of manual work (filtering out unusable data, coding data
in Excel, analysing data in SPSS, etc).
18
Data analysis
 Based on Levin (1993) and Levin & Rappaport-Hovav
(1995):
 Unergatives: cough, cry, shout, speak, walk, dance…
 [TOTAL: 41]
 Unaccusatives: exist, live, appear, emerge, happen, arrive…
 [TOTAL: 34]
 WordSmith: query searches:
 For every lemma (e.g., APPEAR, ARISE), we searched for:
 All possible native forms:
• appear, appears, appearing, appeared
• arise, arises, arising, arose, arisen
 All posible overregularised and overgeneralised learner forms:
• arised, arosed,arisened, arosened (“So arised the Sain Inquisition”)
 All possible forms with probable L1 transfer of spelling:
• apear, apears, apearing, apeared
 All other possible misspelled forms:
• appeard, apeard
19
UNACCUSATIVES
UNERGATIVES
SEMANTIC CLASS
VERB
SEMANTIC CLASS
EXISTENCE
exist
flow
grow
hide
EMISSION
SOUND EMISSION
live
remain
rise
settle
spread
survive
APPEARANCE
SEMANTIC
SUBCLASS
LIGHT EMISSION
appear
arise
awake
begin
break
develop
VERB
beam
burn
flame
flash
bang
beat
blast
boom
clash
crack
crash
cry
knock
ring
roll
sing
emerge
SMELL EMIS.
smell
flow
follow
SUBSTANCE
EMISSION
pour
sweat
MANNER OF
SPEAKING
cry (*)
shout
sing (*)
TALK VERBS
speak
talk
BREATHE VERBS
breathe
cough
cry (*)
sweat (**)
happen
occur
rise
DISAPPEARANCE
die
disappear
INHERENTLY
DIRECTED MOTION
arrive
come
drop
enter
COMMUNICAT.
BODILY PROCESSES
20
Data analysis (cont’d)
 CONCORDANCES: RAW OUTPUT
 Thousands of concordances, BUT approx. ¾ were unusable.
 Filtering criteria had to be applied manually.
21
Data coding/analysis: EXCEL
22
Data analysis: preliminary descriptive
stats - EXCEL
23
Data analysis – inferential stats: SPSS
24
GENERAL HYPOTHESIS: Result: types of
VS structures produced
GRAMM.
 Locative inversion:
 In the main plot appear the main characters: Volpone and Mosca.
 There-insertion:
 There exist positive means of earning money.
 AdvP-insertion:
 … and here emerges the problem.
UNGRAM.  * it-insertion:
 *In the name of religion it had occurred many important events…
 * XP-insertion:
 *In 1760 occurs the restoration of Charles II in England.
 * Ø-insertion:
 …*because exist the science technology and the industrialisation.
Grammatical
36.2%
Ungrammatical
63.8%
25
H1: Results: VS and unaccusativity
Table 1: Proportion of postverbal subjects produced
Verb type
Unergative
Unaccusative
# postverbal
Subjects (VS)
0
58
# usable
concordances
181
820
Rate
0/181 (0%)
58/820 (7.1%)
Figure 1: Proportion of postverbal subjects produced.
a. Unergatives
Postverbal Subject
0.0%
Preverbal Subject
100.0%
b. Unaccusatives
Postverbal Subject
7.1%
Preverbal Subject
92.9%
26
H2: Result: VS and weight
Figure 1: Production of unaccusative postverbal subjects: heavy vs. light.
Light
18.97%
Syntactic weight has
to be measured
manually according to
some theoretical
criteria
HEAVY
Heavy
81.03%
Against this society drama emerged an
opposition headed by Oscar Wilde and
Bernard Shaw.
…so came the decline of the theatre.
Then come the necessity to earn more.
LIGHT
So arised the Saint Inquisition…
…and from there began a fire.
Still today … exists the bloody fights.
27
H2: Result: SV and weight
Figure 1: Production of unaccusative preverbal subjects: heavy vs. light.
Heavy
32.29%
Light
67.71%
LIGHT
…but they may appear everywhere.
…since the day eventually came…
…these people should exist, …
HEAVY
…the cases of men mistreated do not
appear in the media…
…a disintegration of culture, tradition
and society would begin…
…the utopian societies created by 28
the
early socialists appeared.
H3: Result: VS and discourse
Figure 1: Production of unaccusative postverbal subjects: topic vs. focus.
Top
1.72%
Discourse status
(topic/focus) has to be
measured manually by
establishing
theoretical criteria and
then by checking the
context (or even the
essay) manually
FOCUS
…there also exists a wide variety of
optional channels which have to be paid.
So arised the Saint Inquisition.
In 1880 it begun the experiments whose
result was the appearance of the
television some years later.
Foc
98.28%
TOPIC
…our modern world, dominated by science
and technology and industrialisation
29
…because exist the science technology
and the industrialisation.
H3: Result: SV and discourse
Figure 1: Production of unaccusative preverbal subjects: topic vs. focus.
Foc 0%
TOPIC
I use the Internet … I find windows … if they
press on any of these windows … these
windows cannot appear because a child
could enter easily…
…the world of drugs: mafias … problems
with mafias finished … dangerous people
making money … no reason why these
people should exist.
Top 100%
30
Summary/Conclusion
VS
Lexicon-syntax
Syntax-discourse
Syntax-PF
Vunacc
NPsubj
FOCUS
HEAVY
SV
NPsubj
Syntax-discourse
Syntax-PF
Vunacc
TOPIC
LIGHT
31
Thank you!
32
Heavy/Light scale ------Table 1: A syntactic scale for measuring syntactic weight
SYNTACTIC WEIGHT
NOMINAL SCALE ORDINAL SCALE
LIGHT
0
(D)
1
(D)
2
HEAVY
3
Notes:
SYNTACTIC STRUCTURE
(D)
(D)
(D)
(D)
(D)
(D)
(D)
(D)
(D)
N
ADJ
ADJ*
ADJ
(ADJ)
ADJ
(ADJ)
ADJ
N
N
N
N*
N
N
N
N
N*
N*
PP
PP*
AdjP*
PP
IP/CP
PP*
(PP*)
(i) The asterisk (*) represents a complex (i.e., recursive) categorical or phrasal structure.
(ii) Parentheses indicate the optional realization of the bracketed category or phrase.
33
Data analysis (cont’d)------- CONCORDANCES: 6 BASIC FILTERING CRITERIA:
 The verb must be intransitive (unergative or unaccusative).
  In the screen of the television one or two “rombos” should appear. [unac]
  Leontes cries and the statue talks. [unerg]
  This government’s movement has created several opinions. [trans]
 The verb must be finite, with(out) aux.















 …also it exists the psychological agresssions… [finite no aux]
 … the cases of men mistreated do not appear in the media. [finite aux]
 This contradiction could disappear [finite modal]
 There’s no reason for it to exist. [for clause + to inf]
 Poor people cross borders to escape from poverty. [to-inf clause]
 …let time pass… [‘let’ constructions]
 …make everyone’s life go ahead [causative + infinitive]
 Returning to the title of this paper,… [gerundive clauses]
 …they go away in order to escape to France. [‘in order to’ clauses]
 …women have to live with the agressor [have to/ought to/able to]
 …prudence was beginning to disappear. [verbal/aspectual periphrases]
 Before entering the argumentation,… [small clauses]
 …instead of following… [complement of P]
 …likely to happen… [complement of A]
 The tests to enter the army are quite difficult now. [complement of N]
34
9. Data analysis (cont’d)------- The verb
must be in the active voice.
  This contradiction could disappear. [active unaccusative]
  This situation has already been happened. [passivised unaccusative]
 The subject must be an NP.
 …it arose [diverse social ranks, the rich and the poor that depended on the
property they had]. [inverted NP subject]
 …it only remains [to add that nowadays we live in a world…] [extraposition]
 It happened [that the countries which make the weapons are…] [extraposition]
The sentence can be either grammatical or ungrammatical in native
English.
  This contradiction could disappear. [gram]
  …it won’t exist nothing of what people don’t get bored or tired. [ungram]
 The subject can appear either postverbally (VS) or preverbally (SV).
  …the real problem appears when they have to look for their first job. [SV]
  So arised the Saint Inquisition. [VS]
35
10. Data analysis (cont’d) -------- OTHER FILTERING CRITERIA
 Target V + V (verbal coordination)
 Families without father exist and work well.
 Coordinator + target V
  …we can manage to obtain it and live in a better world.
 Interrogatives (only if V is the target)
  How could they live?
  Does exist then a manipulation of television?
 Formulaic & Set expressions in English
  As sometimes happens…
  …fall victim to…
  …the world we live in.
 Set expressions transferred from the L1
  …it happens the same.
  …they fall into account that they have treated very badly Mr Hardcastle.
 Phrasal verbs:
 
…a scientist come up with an intention…
 Quotes (literary or other):
 “To what purpose, April, do you return again?
 “Feminism has to evolved or die”, Friedan said
in 1982…
36
11. Data analysis (cont’d)------ OTHER FILTERING CRITERIA (CONT’D)
 Transitive alternants (unacs):
 Rosamond lived a very comfortable life.
  …once you have passed this stage.
  …the University of Pennsylvania developed the electronic calculator.
 Causativizations (unacs):
  …how parents grew their children.
  But this idea could rise the question of…
 Verbs that do not belong to the proposed semantic criteria by Levin &
Rappaport-Hovav:
  …social classes appear to be broken. [≠appearance]
  …we come to know about his personality… [≠inherently directed motion]
 Subject relative clauses:
  …those fantastic relatives that still survive.
 ..events of this kind which occurred in Spain.
 Free relative clauses:
  …trying to imagine what will remain…
  Hastings realizes what is happening…
 Predicative complements:
  Theatres remained closed.
  …men appear completely subordinated to the women’s desires.
37
Result: Type of VS structures -----Figure 1: Types of postverbal-subject structures produced and their frequency of production.
100%
Frequency of production (in %)
90%
80%
70%
60%
50%
41.4%
40%
30%
20%
15.5%
13.8%
10.4%
10.3%
10%
8.6%
0%
*It-insertion
Locative inversion
*XP-insertion
There-insertion
AdvP-insertion
*Ø-insertion
Type of postverbal-subject structures
38
Result: VS and specific unaccusative verbs----Figure 1: Production of postverbal subjects (VS) according to verb: VS/TotalConcordances ratio
2.9
3.0
2.0
1.7
1.5
1.0
0.5
39
SURVIVE
SPREAD
SETTLE
0.0 0.0 0.0 0.0 0.0
RISE
0.1
RETURN
0.0
REMAIN
LIVE
LEAVE
HIDE
HAPPEN
GROW
GO
FOLLOW
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
FLOW
FALL
EXIST
0.0 0.0
ESCAPE
0.0
DROP
DISAPPEAR
DIE
DEVELOP
COME
BEGIN
AWAKE
ARRIVE
0.0
ARISE
0.0 0.0
0.2
0.1
PASS
0.2
0.1 0.1 0.1
ENTER
0.2
EMERGE
0.5
OCCUR
0.6
APPEAR
Frequency of inversion (%)
2.5
Length of postverbal subject----Figure 1: Frequency of word-length of postverbal subjects
7
6
Frequency
5
4
3
2
1
Mean = 7.5172
Std. Dev. = 5.13414
N = 58
0
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
Length (in number of words)
40
Word order in native Spanish
Lexicon-syntax interface:
UNERGATIVES: SV
A: Qué pasó?
B: Un hombre gritó [SV]
UNACCUSATIVES: SV
A: Qué pasó?
B: Llegó un hombre [VS]
Syntax-discourse interface:
UNERGATIVES
A: Quién gritó?
B: Gritó un hombre [VS]
UNACCUSATIVES
A: Quién llegó?
B: Llegó un hombre [VS]
 Theoretical evidence: Zubizarreta 1998, Casielles-Suárez 2004, Domínguez 2004
 Empirical evidence: Hertel 2000, 2003, Lozano 2003, 2006
41
Result: VS and (in)definiteness
Figure 1: Production of postverbal subjects according to their definiteness.
Definite 41.4%
Indefinite 58.6%
INDEFINITE
DEFINITE
…some decades ago, it appeared a new
invent: the television.
…because later could appear the real evidence
and the real guilty.
The play was very well performed and also
appeared new elements in the stage.
…and usually appears the noble young man
that either waste or has wasted his fortune.
…it has appeared some cases of women
that have killed their husbands…
42
In the main plot appear the main characters:
Volpone and Mosca.
10. Resultados: léxico-sintaxis
¿Qué pasó?
Inacusativos (VS): Llegó un hombre
Inergativos (SV): Un hombre gritó
2
n.s.
sig
sig
1,5
1
Inac Neut VS
Inac Neut !SV
0,5
0
Ingl int alto
-0,5
Ingl avanz
Nativos esp
Media del juicio de aceptabilidad
Media del juicio de aceptabilidad
2
sig
n.s.
sig
1,5
1
Inerg Neut !VS
Inerg Neut SV
0,5
0
Ingl int alto
Ingl avanz
Nativos esp
-0,5
43
11. Resultados: sintaxis-discurso
¿Quién llegó / gritó?
Inacusativos (VS): Llegó un hombre
Inergativos (SV): Gritó un hombre
2
n.s.
n.s.
sig
1,5
1
Inac Foc VS
Inac Foc !SV
0,5
0
Ingl int alto
-0,5
Ingl avanz
Nativos esp
Media del juicio de aceptabilidad
Media del juicio de aceptabilidad
2
sig
n.s.
sig
1,5
1
Inerg Foc VS
Inerg Foc !SV
0,5
0
Ingl int alto
Ingl avanz
Nativos esp
-0,5
44
Figure 1: Unaccusative verbs in neutral contexts (word order by group)
1,81
1,52
1,44
2
1,34
1,5
1
0,91
0,9
Unac Neut VS
0,98
Unac Neut #SV
0,5
0,29
0
Mean acceptability rate
2
Mean acceptability rate
Figure 1: Unergative verbs in neutral contexts (word order by group)
1,5
1,64
1,41
1,47
0,87
0,84
0,9
Gk Upp Int
Gk Low adv
Gk Upp adv
1,61
1
Unerg Neut #VS
0,5
Unerg Neut SV
0
-0,5
-0,5
Gk Upp Int
Gk Low adv
Gk Upp adv
Natives
-0,45
Natives
Figure 1: Unaccusative verbs in focused subject contexts (word order by group)
Figure 1: Unergative verbs in focused subject contexts (word order by gro
2
2
1,5
1
1,32
1,52
1,43
1,38
1,32
Unac Foc VS
0,91
0,94
Unac Foc #SV
0,5
0
-0,28
-0,5
Gk Upp Int
Gk Low adv Gk Upp adv
Natives
Mean acceptability rate
Mean acceptability rate
1,54
1,5
1,26
1
1,32
1,12
0,97
1,25
Unerg Foc VS
0,93
0,5
Unerg Foc #SV
0,17
0
-0,5
Gk Upp Int
Gk Low adv Gk Upp adv
Natives
45