Transcript Slide 1

Verb compounds
within canonical typology:
Chinese separable verb compounds
Anna Siewierska
Jiajin Xu
Richard Xiao
1
Overview of the talk
1
Separable verb compounds (SVCs)
2
Canonical typological strategy
3
A case study of SVCs in Mandarin
2
Separable verb compounds
• Some languages have verb compounds which
are made up of two parts, a verbal stem and a
movable element standing before or after the
verb in adjacency or close proximity
– Different terms in the literature
• separable verb compounds, split words, separable
verbs, ionised words, discontinuous / detachable /
breakable / discrete words, etc
3
An example of Chinese SVC
• dan1xin1, lit. carry heart, ‘to worry’
• dan1-le yi1 shang4wu3 xin1, carry ASP one
morning heart, ‘to be worried the whole
morning’
• xin1 yi4zhi2 dan1-zhe, heart all the time carry
ASP, ‘to have been worried all the time’
4
Sound similar?
• Derivation by infixing (e.g. abso-fucking-lutely)
and syntactic interposing (e.g. of bloody
course) in English
• Separable complex verbs in Dutch (aankomen
‘arrive’) and German (ankommen ‘arrive’)
• But Chinese SVCs are …
5
…essentially different
• 1) Insertions in English infixing and interposing
– Almost exclusively restricted to expletives, euphemisms, and amplifiers
– Acting as an ‘emotive intensifier’
• In contrast, discontinuous use of Chinese SVCs has a
greater variety of insertions and discourse / pragmatic
functions
– Insertions as head / tail satellites: aspect markers, RVCs, quantifiers,
classifiers, modifiers, etc
– Providing extra information
– Acting as a mitigator / softener
– Showing casualness
– Expressing negative emotions such as disapproval
– Enhancing rhythm – important in a syllable-timed language like Chinese
6
…essentially different
• 2) A significant difference between SVCs in
Mandarin and the split prefix phenomenon in
Dutch (e.g. binnenkomen, ‘to come in’) and
German (e.g. abfahren, ‘to drive off/depart’)
• Chinese SVCs are not words with a separable
affix
– E.g. dan1xin1 ‘worry’
V O
7
…essentially different
• 3) SVCs in Dutch and German can have a wide range of
constituents of all types as insertions, including complex
NGs and subordinate clauses as in the example below
– A Dutch example of opbellen ‘ring up’
• Ik bel op
• Ik bel hem op
I ring him up
• Ik bel hem morgen op
I ring him tomorrow up
• Ik bel de man waarvan ik houd op
I ring the man that
I love up
• ...which is completely impossible in Chinese
8
Why are SVCs interesting?
• 1) SVCs are a large class of verbs in Chinese which
cannot be marginalised
• 2) They satisfy none of the ‘universal criteria’ for
wordhood (Dixon and Aikhenvald 2002: 19-20)
– ‘A grammatical word consists of a number of grammatical
elements which (a) always occur together, rather than
scattered through the clause (the criterion of cohesiveness);
(b) occur in a fixed order; (c) have a conventionalised
coherence and meaning’
• Criterion (c) means that speakers of the language ‘may talk about a
word (but are unlikely to talk about a morpheme)’
9
Why are SVCs interesting?
• 3) SVCs violate one of the most fundamental
principle of the theory of word formation
– The Principle of Lexical Integrity: Word-internal
structures are not accessible to rules of syntax
(Booij 1990: 45)
• 4) SVCs are listed as words, but they clearly
have some ‘phrasal’ properties, thus straddling
the boundary of morphology and syntax
– E.g. the analysable internal structures of Chinese
SVCs
10
Canonical typology
• To study such fuzzy and cross-border grammatical
categories, canonical typology (CT) has proved to be a
useful strategy (cf. Bond 2007; Corbett 2007;
Nikolaeva 2008), e.g.
–
–
–
–
–
Suppletive forms
Agreement
Negation
Syncretism
…
11
Standard strategy in typological
research (Croft 2003: 14)
1. Determine the particular structure or situation type
of interest
2. Examine the morpho-syntactic construction(s) or
strategies used to encode that situation type
3. Search for dependencies between the constructions
used for that situation and other linguistic factors
–
i.e. other structural features and external functions
expressed by the structure, or both
12
Canonical typological approach
1. Start with a linguistic phenomenon
2. Establish a general definition for identifying
that linguistic category
3. Construct a set of features or criteria for the
typical (canonical) case of the category
4. Use the criteria to investigate the relevant
categories in languages
13
Canonical typological approach
1. Start with a linguistic phenomenon
2. Establish a general definition for identifying
the linguistic category in question
3. Construct a set of features or criteria for the
canonical case of the category
4. Use the criteria to investigate the relevant
categories in languages
14
How can corpora inform CT?
• In CT, the features are usually collected from the
literature
– The collection could be selective, subjective and arbitrary
• Can the selection of features be more objective and
reliable?
– We seek to answer this question from the corpus linguistic
perspective
– The corpus-based approach makes it possible for
variational parameters of SVCs to be summarised
exhaustively and more objectively by looking at a large
amount of attested language use simultaneously
15
A case study of Chinese SVCs
•
•
•
What are common types of insertions and
external patterns of discontinuous use of
SVCs in Mandarin?
How can canonical features be identified on
the basis of frequency?
How can the study of SVCs in Chinese
contribute to the research of similar
phenomena in other languages?
16
Prevalence of SVCs in Mandarin
• The 2002 edition of the Modern Chinese
Dictionary includes 3,236 types of SVCs (Zhu
2006: 29)
– Four categories: verb-object (97%), verbcomplement, subject- predicate, and coordinative
• Given their prevalence, no grammar of Chinese
can turn a blind eye to the ‘verb-object
paradox’ (Packard 2003: 108)
17
Corpora
• Two corpora are used in this study
– The Lancaster Corpus of Mandarin Chinese (LCMC) for
written Chinese
– The Lancaster Los Angeles Corpus of Spoken Chinese (LLSCC)
for spoken Chinese
• The LCMC is a balanced corpus of written Chinese
composed of one million words proportionally
sampled from fifteen genres ranging from news,
fiction to academic prose published in mainland China
around 1991 (see McEnery, Xiao & Mo 2003)
18
Corpora
• The LLSCC comprises one million words of
dialogues (55%) and monologues (45%) in
Chinese, covering both spontaneous (57%) and
scripted (43%) speech in six spoken genres
• The two corpora are also tokenised and POStagged
• They provide an empirical basis for our
quantitative and qualitative analysis of SVCs in
Chinese
19
Seed SVCs for data extraction
• A total of 1,738 commonly used SVCs listed in
A Dictionary of Split Word Usage in Modern
Chinese (Yang 1995) were used as seeds to
automatically extract all instances of possible
SVCs exhaustively when their the head and tail
are separated, in either forward or backward
direction, by a span of 1-10 words
– 2793 raw concordance lines were extracted from
the two corpora
20
Human evaluation and annotation
• Each concordance line was evaluated independently
by two native Chinese speakers in order to remove
noise in automatically extracted results
• Only 565 true instances of discontinuous use of SVCs
are retained for further annotation and analysis
– Type of insertion, direction of separation, word semantics,
sentence semantics (i.e. pragmatic meaning), sentence
type, genre
21
Syntagmatic pattern of SVCs
SVCH + NEG + ASP/RVC + MC + CL + MOD + SVCT
22
Head satellites of SVCs
• Aspect insertion
Pattern
SVC types (%)
SVC tokens (%)
SVCH-le SVCT
42 (25%)
74 (13%)
SVCH-guo SVCT
15 (9%)
22 (4%)
SVCH-zhe SVCT
12 (7%)
35 (6%)
Total
69 (42%)
131 (23%)
• Expanded aspect insertion
Pattern
SVC types(%)
SVC tokens(%)
SVCH (?) ASP (?) SVCT
91 (55%)
244 (43%)
– Note: The ? slot can be filled or left blank
23
Head satellites of SVCs
• RVC insertion
Pattern
SVC type (%)
SVC token (%)
SVCH RVC SVCT
20 (12%)
26 (5%)
• Expanded RVC insertion
Pattern
SVC types (%)
SVC tokens (%)
SVCH (?) RVC (?) SVCT
20 (12%)
66 (12%)
– …hardly surprising given that RVCs can be analysed as
markers of the “completive aspect” in Chinese (Xiao and
McEnery 2004)
24
Tail satellites of SVCs
• Classifier (CL)
– 21% (116 SVCs) contain a classifier
• Nominals in Mandarin are typically preceded by a classifier
• Quantifier (MC)
– 19% (108 SVCs) contain a quantifying construction
• Modifier (MOD), i.e. pre-modifiers of tails
–
–
–
–
–
Possessive pronouns (64 times, 11%)
Adjectival modifiers (63 times, 11%)
Nominal items (59 times, 10%)
Question word (i.e. shen2me ‘what’, 26 times, 5%)
Also combinations of these elements
25
SVC network:
Lexical and grammatical patterning
Aspect marker (±)
SVC
H
SVCT
LE
DA
Modifier
GE
Resultative verb
complement (±)
Classifier
HAO
YI
Quantifier
26
Words or phrases?
• Synchronically, located somewhere on the continuum
between words and phrases (cf. Guo and Qian 2004)
words SVCs
idioms
phrases
• Diachronically, wordhood subject to language change
– Many compound words in current use have evolved from
phrases (e.g. daoqian ‘apologise’, jugong ‘bow’)
– Givón (1971): ‘Today's morphology is yesterday's syntax.’
• Two criteria - depending on the type and number
morpheme(s) in the insertion –
– Over half of discontinuous use of SVCs in our data (i.e. 54% if RVCs are
seen as quasi-aspect markers), together with their combined cognates,
can be analysed as legitimate compound words
27
Two overarching criteria
• Structural criteria
– Host dependency
• Head dependence enjoys priority over tail dependence
• Phonological criteria
– PrWd restriction (Feng 2001, 2002)
• A disyllabic unit is the typical prosodic foot in Chinese
• A trisyllabic unit can also be a prosodic word
28
Structural criteria
• According to the host dependency criteria of
the canonical typological approach
– a) SVCs with a clitic-like aspect marker alone are
compounds rather than phrases
– b) SVCs with an RVC attached to the head verb as
quasi-compounds
– c) Other modifiers (classifiers, modifiers, etc)
attached to the tail (represented typically by a
object or complement) are least possible
compounds
• Priority: a > b > c
29
Phonological criteria
• Various manifestations of SVCs define a continuum of
phonological conditions which complement the
structural criteria
– a) The combined uses of head and tail are disyllabic
compounds
– b) SVCs in which the head and tail are separated by one
single morpheme are possible compounds under the
Trisyllabic Foot Rule (TFR) of prosodic morphology
(McCarthy & Prince 1993; 1995)
– c) The head and tail separated by polymorphemic
insertions like quantifiers, adjectival modifiers etc are
phrases
• Priority: a > b > c
30
Conclusions
• We have used the corpus-based approach to
generalise canonical internal structures of
Chinese SVCs
• The structural and phonological criteria we
have proposed work well to define wordhood
of SVCs in Mandarin
• The approach combining canonical typology
and corpus methodology could also be useful
in research of similar phenomena in other
languages
31
Thank you!
[email protected]
32