Lexical Typology

Download Report

Transcript Lexical Typology

Text-based typology
Corpora, corpora of elicited texts and
parallel corpora
(based on STUF 2007)
МД
1
Pros as compared to
questionnaires




Contextualization of examples
Naturalistic discourse
Intralinguistic variation
Potentially, makes up for grammar gaps
2
Frog stories

(Mercer Mayer) – e.g. Slobin on Talmy’s verb
vs. satellite-framed verbs
3
Fish movie

Russel Tomlin – attention and word
order
Pear stories



W. Chafe et al.
A six-minute film shot in UC (Berkeley) in
1975
Widely used in various cross-linguistic
research

referential density project
5
Referential density (Bickel 2003)

Relative frequency of overt NPs:
Via Nichols 2014
6
Contras of elicited corpora

Not directly comparable



events focused and omitted
mostly quantitative results
Require massive linguistic effort

limited data for each language
Any alternative?
 Parallel corpora
7
Some examples of parallel
corpora

Ruscorpora.ru
Massively parallel texts

Harry Potter


Biblical translations


State and Revolution: 71 tr in 36 lgs
Legal databases:




Pater Noster in 1300 lgs, 400 full texts, 1,000 gospels
Marxist texts


Including subtitles (76, 21)
Proceedings of the European Parliament
Universal Declaration of Human Rights (329)
Unesco online database of literary translations (1,5 mln items)
Andersen, Le Petit Prince, …
Cysouw and Wälchli 2007
9
Comparability (easy counts)

Parallel corpora:


Elicited texts:


roughly comparable number of sentences (from
1,663 to 1,528 for Petit Prince)
pear stories in the same language vary from 29
to 119 sentences (Bickel 2003 via Nichols 2014)
‘Free’ corpora:

not applicable…
10
Comparability (methodology)

Comparison by intension



definition of a phenomena
browsing grammars
Comparison by extension


linguistic structures used for expressing a
contextualized situation
truly functional
Wälchli 2007
11
Extensional typology in
parallel corpora

data we work with may be linguistically
different but semantically identical



cf. much looser identity in elicited texts
rather, they are “defined as a selection of
places in the parallel texts”
they may reflect linguistic variation

at points where one language uses the same
construction, another languages uses several
12
Parallel corpora support
conventional typology

Newmeyer against Stassen


Wälchli supports Stassen


Classical Greek, Latin and Tibetan have the ‘exceed’ type
comparative - contra Stassen 1985
A study of parallel corpora does not show ‘exceed’ but
‘separative’ construction
Parallel corpora reflect dominant patterns – exactly
where the typology’s primary interests lie

But they also numerically reflect variation or competition
between dominant patterns, rather than provide yes or
no typology
13
Case studies, among other:








Wälchli 2005: co-compounds
Auwera et al. 2004: epistemic poss. in Slavic
Wälchli 2006: ‘again’
Wälchli 2001: motion events
Wälchli & Zúñiga 2006: motion events ‘again’
Stolz 2004: total reduplication
Stolz et al. 2005: comitatives and instrumentals
Stolz et al.: absolute possessives
14
Stolz 2003, 2004
Le Petit Prince - quantitative
‘avec’-(de)cline
Total-reduplication-(de)cline
Does this require
parallel corpora?
15
Stolz 2003, 2004
Le Petit Prince – qualitative?
Puis il s-épongea le front avec un mouchoir à
carreaux rouges.
Then he mopped his forhead with a handkerchief
decorated with red squares.
Zatim obrise čelo rupčičem s crvenim kvadratima.
16
Pitfalls: data analysis

Easier than raw texts


we know what was intended and where to look
still, as any grammatical analysis by a non
expert, subject to mistakes
Alignment issues
Anyway, same or easier than with
elicited texts

Wälchli 2007
17
Pitfalls: sample bias
Europe overrepresented, convenience
sampling:

Europe > IE > other families

In his study of comitatives, Stolz ended up
with an areal rather than sampling study
18
Pitfalls: style/variant choice

Standard language bias


‘Hagiolect’ effects


‘The sinners will-Evid not enter the heaven’
Style incomparability


Better include texts reporting speech
Bible translation are stylistically diverse
Purism
Wälchli 2007
19
Wälchli 2007
Pitfalls: translation bias
“Incommensurability” of linguistic structures: some
languages think differently…
 Australian lgs prefer absolute over relative frame of
reference
 In Australian Gospels, occurrences of AFR are found but
significantly less frequent than in natural discourse from
this area

“Inert” construction – a construction that tends to be
imported from the source language
20
Case study:
MVC in ‘bring’ and ‘run’ events
Bible-based, Bernhard Wälchli
Multi-verb construction: clauses that contain
more than one lexical verb
BRING and RUN events may be described as
MVC or “solitarizing” verbs
21
BRING and RUN events (Wälchli)
Examples:
Minnin ti-bouay la ban
mouin.
lead
I
Ač-i-ne
little-boy def give
Man pat-ăm-a
(Haitian Creole)
il-se
kil-ĕr. (Chuvash)
child-ps3-dat/acc I.gen to-poss1sg-dat take-conv come-imp2pl
‘… bring him unto me.’ (solitarizing)
Data usually unavailable from grammars…
22
BRING and RUN events (Wälchli)
Bible-based, Bernhard Wälchli
Multi-verb construction: clauses that contain
more than one lexical verb
BRING and RUN events may be described as
MVC or “solitarizing” verbs
Is there any correlation between the choice
of either construction for encoding the two
events?
23
BRING and RUN events (Wälchli)
BRING
Solit
Solit
MVC
Dinka, Navajo, Russian
Ainu, Ewe, Khasi
RUN
MVC
English, Guarani, Maltese Choctaw, Chuvash, Khoekhoe
24
BRING and RUN events (Wälchli)



RUN
165 languages (Eurasia over-represented)
18 BRING events, six RUN events
Correlation between MVC in BRING and RUN is
highly significant (Fisher’s test)
BRING
Solit
MVC
Solit
65
12
MVC
46
42
25
BRING and RUN events (Wälchli)

Is a language consistently MVC vs. solitarizing?

Surely not – then, is this a typological parameter at all?
26
BRING and RUN events (Wälchli)

But: the distribution is bimodal
27
BRING and RUN events (Wälchli)

If we only consider LOW and HIGH, fewer (14)
languages are inconsistent
28
Case study: demonstratives
Deictic demonstratives (then adverbs), Potterbased, Federica da Milano 2007

Distance-oriented systems


Person-oriented systems


this near – that far
this with us – that far from us
Is this a real disctinction, or are these two
subtypes of something more general?
29
Demonstratives (da Milano)

48 stimuli (da Milano 2005)


Also include reciprocal orientation of the
locutors: face to face, face to back, side by
side
83 occurrences of deictic
demosntratives in “… and the Chamber
of Secrets”

this with us – that far from us
30
Demonstratives (da Milano)
One term systems:
French – cela, ca (ceci not used)
German – der/die/das (dieser, jener not used)
31
Demonstratives (da Milano)
Two term systems:
Unmarked vs. proximal – Scandinavian, English,
Northen Italian
Unmarked vs. distal – Polish, Russian, Czech,
Hungarian, Modern Greek
Dyad oriented - Catalan
32
Demonstratives (da Milano)
Which demonstrative is used in ‘neutral’
contexts?
‘Tie that round the bars,’ said Fred, throwing
the end of a rope to Harry.
‘Przywiąż to do kraty’, powiedział Fred,
rzucając Harry’emu koniec liny.
33
Demonstratives (da Milano)
da Milano then proceeds to build a similar typology for
adverbs; her conclusions are as follows:
 The map of adverbs is by and large isomorphic to the map
of pronouns
 Levinson 2004 “perhaps one can hazard the
generalizations that speaker-centered degrees of distance
are usually (more) fully represented in the adverbs than
the pronominals” confirmed
 “It has turned out to be fruitful to use parallel texts as a
control test of data obtained through the questionnaire.
The results from the parallel texts mainly confirmed the
prior typological generalizations.”
35
‘Free’ corpora!


No translations – no risk of inert
categories, closer to naturalistic
Massive amounts of texts


Usually – literary
Vast playground for quantitative analysis
36
‘Free’ corpora!
Examples:
 Combinatorial statistics for property
words


Lexical typology by LexTyp
Comparative occurrences

May be useful – cf. temperature domain
37
t‘ež
Temperature scale
šog
tak’
ĵerm
gol
paġ
saŕǝ
zov
hov
c’urt
38
раскаленный
жаркий
Temperature scale
горячий
теплый
прохладный
студеный
холодный
ледяной
студеный
зябкий
39
Comparison: texts in typology

Free corpora:





Elicited texts:





No ‘meaning identity’, shift towards intensional typology
Massive collections: almost all kinds of phenomena
But a shift towards intensional typology
Natural discourse
Weak ‘meaning’ identity
Massive effort for transcription, poor collections
Only frequent phenomena
Natural discourse (with provisos)
Parallel corpora:


Strong ‘meaning’ identity
Natural written discourse (with provisos)
40
Summary (obvious):

Corpora have their limitations and can
not substitute conventional methods –
but can go hand in hand with them
41