Chinese splitable verb compounds

Download Report

Transcript Chinese splitable verb compounds

Verb compounds
within canonical typology:
Chinese splitable verb compounds
Anna Siewierska
Jiajin Xu
Richard Xiao
1
Overview of the talk
1
Separable verb compounds (SVCs)
2
Canonical typological strategies
3
A case study of SVCs in Mandarin
2
Separable verb compounds
• Some languages have verb compounds which
are made up of two parts, a verbal stem and a
movable element standing before or after the
verb in adjacency or close proximity
– Split words, separable verbs, separable verb-object
compounds, ionised words, discontinuous /
detachable / breakable / discrete words
3
An example of Chinese SVC
• dan1xin1, carry heart, ‘to worry’
• dan1-le yi1 shang4wu3 xin1, carry ASP one
morning heart, ‘to be worried the whole
morning’
• xin1 yi4zhi2 dan1-zhe, heart all the time carry
ASP, ‘to have been worried all the time’
4
Sound familiar?
• Derivation by infixing (abso-fucking-lutely) and
syntactic interposing (of bloody course) in
English
• Separable complex verbs in Dutch (aankomen
‘arrive’) and German (ankommen ‘arrive’)
• But Chinese SVCs are …
5
…essentially different
• Insertions in English infixing and interposing
– Almost exclusively restricted to expletives, euphemisms, amplifiers in
English
– Acting as an ‘emotive intensifier’
• Discontinuous use of Chinese SVCs has a greater variety
of insertions and pragmatic functions
– Insertions as head / tail satellites: aspect markers, RVCs, quantifiers,
modifiers, etc
– Providing extra information
– Acting as a mitigator or softener
– Showing casualness
– Expressing negative emotions such as disapproval
– Enhancing rhythm
6
…essentially different
• Significant difference between SVCs in Mandarin and
the split prefix phenomenon in Dutch and German
– German (e.g. abfahren, ‘to drive off/depart’) and Dutch (e.g.
binnenkomen, ‘to come in’)
– Structurally similar to Chinese SVCs of the verbcomplement type (e.g. kanzhun ‘observe or judge
accurately’)
• Chinese SVCs are not words with a separable affix
– Chinese : dan1xin1 “worry” (dan1=carry=V xin1=heart=O)
7
…essentially different
• SVCs in Dutch and German can have a wide range of
constituents of all types, including complex NPs and
subordinate clauses as insertions in between, which is
completely impossible in Mandarin
– A Dutch example of opbellen ‘ring up’
• Ik bel op
• Ik bel hem op
I ring him up
• Ik bel hem morgen op
I ring him tomorrow up
• Ik bel de man waarvan ik houd op
I ring the man that
I love up
8
Why are SVCs interesting?
• SVCs are a large class of verbs in Mandarin Chinese
which cannot be marginalised
• They satisfy none of the ‘universal criteria’ for
wordhood (Dixon and Aikhenvald 2002: 19-20)
– ‘A grammatical word consists of a number of grammatical
elements which (a) always occur together, rather than
scattered through the clause (the criterion of cohesiveness);
(b) occur in a fixed order; (c) have a conventionalised
coherence and meaning’
• Criterion (c) means that as a word has its own coherence and
meaning, speakers of the language “may talk about a word (but are
unlikely to talk about a morpheme)”
9
Why are SVCs interesting?
• They violate one of the most fundamental principle of
the lexicalist theory of word formation
– The Principle of Lexical Integrity: Word-internal structures
are not accessible to rules of syntax (Booij 1990: 45)
• SVCs are listed as words, but they apparently have
some “phrasal” properties, straddling the boundary
of morphology and syntax
– E.g. the analysable internal structures of Chinese SVCs
10
Canonical typological strategies
• To study such fuzzy and cross-border grammatical
categories, canonical typology has proved to be a
useful strategy (cf. Bond 2007; Corbett 2007;
Nikolaeva 2008), e.g.
–
–
–
–
–
Suppletive forms
Agreement
Negation
Syncretism
…
11
Standard strategy in typological
research (Croft 2003: 14)
1. Determine the particular structure or situation type
we want to explore
2. Examine the morpho-syntactic construction or
strategies used to encode that situation type
3. Search for dependencies between the structures
used for that situation and other linguistic factors
– i.e. other structural features and external
functions expressed by the construction in
question, or both
12
Canonical typology (CT) approach
1. Starting with a linguistic phenomenon
2. Establishing a general definition for
identifying the linguistic category in question
3. Constructing a set of features or criteria for
the canonical case of the category
4. Investigating the categories in languages
against the criteria
13
Canonical typology approach
1. Start with a linguistic phenomenon
2. Establish a general definition for identifying
the linguistic category in question
3. Construct a set of features or criteria for the
canonical case of the category
4. Investigate the categories in languages
against the criteria
14
How can corpora inform CT?
• Usually the features are summarised from literature
• The collection of features could be selective and
arbitrary
• Can the selection of features be more objective and
reliable?
• We seek to answer this question from a corpus
linguistics perspective
– The corpus-based approach makes it possible for
variational parameters of SVCs to be summarised by
looking at a large amount of attested language use
simultaneously
15
A case study of Chinese SVCs
•
•
•
What are common types of insertions and
external patterns of discontinuous use of
SVCs in Mandarin?
How can typical (canonical) features be
identified on a frequency basis?
How can the study of Chinese SVCs
contribute to the study of similar phenomena
in other languages?
16
Prevalence of SVCs in Mandarin
• The 2002 edition of the Modern Chinese Dictionary
includes 3,236 types of SVCs (Zhu 2006: 29)
– Four categories: verb-object, verb-complement, subjectpredicate, and coordinative
– Verb-object compounds constitute 97%
• The other three categories combined make up the remaining 3%
• No grammar of Mandarin can turn a blind eye to the
“verb-object paradox” (Packard 2003: 108)
– SVCs are by no means a marginal morphological
phenomenon
17
Corpora
• Two corpora are used in this study
– The Lancaster Corpus of Mandarin Chinese (LCMC) for
written Chinese
– The Lancaster Los Angeles Corpus of Spoken Chinese (LLSCC)
for spoken Chinese
• The LCMC is a balanced corpus of written Chinese
composed of one million words proportionally
sampled from fifteen genres ranging from news,
fiction to academic prose published in mainland China
around 1991 (see McEnery, Xiao & Mo 2003)
18
Corpora
• The LLSCC comprises one million words of
dialogues (55%) and monologues (45%) in
Chinese, covering both spontaneous (57%) and
scripted (43%) speech
• The two corpora are also tokenised and
annotated with part-of-speech tags
• They form the empirical basis for our
quantitative and qualitative analysis of SVCs in
Mandarin
19
Seed SVCs for data extraction
• First, a total of 1,738 commonly used split words
(SVCs) listed in A Dictionary of Split Word Usage in
Modern Chinese (Yang 1995) were used as seeds to
automatically extract all instances of possible SVCs
when their two parts are separated, in either forward
or backward direction, by 1-10 words
– 2793 concordance lines of SVCs were returned from the 2m
words of spoken and written data
• 1348 instances of crude SVCs in the LCMC
• 1445 instances of crude SVCs in the LLSCC
20
Human evaluation and annotation
• Each concordance line was evaluated independently
by two native Chinese speakers to remove noise in
automatically extracted results
• Only 565 true instances of discontinuous use of SVCs
for further annotation and analysis
– Type of insertion, direction of separation, word semantics,
sentence semantics (i.e. pragmatic meaning), sentence
type, genre
21
Syntagmatic pattern of SVCs
SVCH + NEG + ASP/RVC + MC + CL + MOD + SVCT
22
Head satellites of SVCs
• Aspect insertion
Pattern
SVC types (%) SVC tokens (%)
SVCH-le SVCT
SVCH-guo SVCT
SVCH-zhe SVCT
42 (25%)
15 (9%)
12 (7%)
74 (13%)
22 (4%)
35 (6%)
Total
69 (42%)
131 (23%)
• Expandable aspect insertion
Pattern
SVC types(%)
SVC tokens(%)
SVCH (?) ASP (?) SVCT
91 (55%)
244 (43%)
– Note: The ? Slot can be filled or left blank
23
Head satellites of SVCs
• RVC insertion
Pattern
SVC type (%)
SVC token (%)
SVCH RVC SVCT
20 (12%)
26 (5%)
• Expandable RVC insertion
Pattern
SVC types (%)
SVC tokens (%)
SVCH (?) RVC (?) SVCT
20 (12%)
66 (12%)
– …hardly surprising given that RVCs can be analysed as
markers of the “completive aspect” in Chinese (Xiao and
McEnery 2004)
24
Tail satellites of SVCs
• Classifier (CL)
– Nominals in Mandarin are typically preceded by a classifier
– 116 SVCs (21%) contain a classifier
• Quantifier (MC)
– There are 108 instances of quantifying constructions in the
insertions (19%)
• Modifier (MOD), i.e. pre-modifiers of tails
–
–
–
–
–
Possessive personal pronouns (64 times, 11%)
Adjectival modifiers (63 times, 11%)
Nominal items (59 times, 10%)
Question word (i.e. shen2me, ‘what’, 26 times, 5%)
Also combinations of these elements
25
SVC network:
Lexical and grammatical patterning
Aspect marker (±)
SVC
H
SVCT
LE
DA
Modifier
GE
Resultative verb
complement (±)
HA
O
Classifier
YI
Quantifier
26
Words or phrases?
• Depending on the type and number morpheme(s) in
the insertion
– Over half (54% if RVCs are seen as quasi-aspect markers) of
discontinuous use of SVCs, together with their continuous cognates,
can be analysed as legitimate (compound) words
• Continuum between words and phrases (Guo and
Qian 2004)
words SVCs
idioms
phrases
• Language change (e.g. daoqian ‘apologise’, jugong
‘bow’)
• Many compound words in current use have evolved
from phrases
– Given (1971): "Today's morphology is yesterday's syntax"
27
Two overarching criteria for wordhood
• Structural criteria
– Host dependency
• Priority: head dependence > tail dependence
• Phonological criteria
– PrWd restriction (Feng 2001, 2002)
• A disyllabic unit is the typical prosodic foot in Mandarin
• A trisyllabic construction can also be a prosodic word
28
Structural criteria
• The host dependency criteria of the canonical
approach
– a) SVCs with a clitic-like aspect marker (e.g. –le,
-guo and -zhe) alone are compounds instead of
phrases
– b) SVCs with resultative verb complements
attached to the main verb are quasi-compounds
– c) Other modifiers (classifiers, modifiers, etc.)
attached to SVCTs, represented typically by a noun
or complement, are as least possibly compounds
• Priority: a > b > c
29
Phonological criteria
• Various manifestations of SVCs define a continuum of
phonological conditions as a complement to the
structural criteria
– a) The combined uses of SVCHs and SVCTs are disyllabic
compounds
– b) SVCs in which the SVCHs and SVCTs are separated by one
single morpheme are possible compounds under the
Trisyllabic Foot Rule of prosodic morphology (McCarthy &
Prince 1993; 1995)
– c) SVCHs and SVCTs separated by polymorphemic insertions
like quantifiers, adjectival modifiers etc are phrases
• Priority: a > b > c
30
Conclusions
• We have used the corpus methodology to generalise
canonical internal structures between heads and tails
in Chinese SVCs
• The two overarching criteria we have proposed,
structural and phonological work well to define
wordhood of SVCs in Chinese
• The approach combining canonical typology and
corpus methodology can be useful in research of
similar phenomena in European languages like
German, Dutch as well as East and Southeast Asian
languages
31
Thank you!
[email protected]
32