A Comparable Corpus Driven, Multivariate Approach

Download Report

Transcript A Comparable Corpus Driven, Multivariate Approach

A Comparable Corpus Driven,
Multivariate Approach to Light Verb Variations
in World Chineses
Jingxia LIN2, Menghan JIANG1, and Chu-Ren HUANG1
1 The Hong Kong Polytechnic University, 2Nanyang Technological
University
Light verbs in Chinese
Similar to English light verbs: take rest, give advice, give description
•Semantically bleached: containing no eventive information
• The predicative content mainly comes from its taken complement
進行討論 jin4xing2 tao3lun4 ‘have a discussion’
•Being semantically bleached, they do not strongly select their objects
• They can take a wide range of objects, including deverbal nouns, eventive
nouns, and sometime concrete numbers with eventive meaning
• They are sometimes interchangeable with the same nominal object
Underspecified Selecitonal Restriction of
Chinese Light Verbs
• 從事cong2shi4,搞gao3, 加以jia1yi3, 進行jin4xing2, 做zuo4 are
among the most frequently used (also most typical) light verbs in
Modern Chinese
• The use of these five light verbs are sometimes interchangeable
• 從事/搞/加以/進行/ 做研究
• cong2shi4/gao3/jia1yi3/jin4xing2/zuo4 yan2jiu1
• “to do research”
Underspecified Selecitonal Restriction of
Chinese Light Verbs II
• Collocation constraints are sometimes found with these light verbs,
• e.g., 進行/*加以/*從事/搞/*做赛事,
jin4jing2/*jia1yi3/*cong2shi4/gao3/*zuo4 bi3sai4
“play a game”
•
*進行/加以/*從事/*搞/*做考慮
*jin4jing2/jia1yi3/*cong2shi4/*gao3/*zuo4 kao3lv4
“give consideration”
Variations of Light Verb Usages in Mainland and
Taiwan Mandarin Variants
• Even with the very limited collocation constraints, variations still
exist: Taiwan light verbs tend to take more types of NPs and even VPs
as its complements
• 進行感恩之旅/君子之爭
Jin4xing2 gan3en1zhi1lv3/ju1zi3zhi1zheng1
“to proceed with a ‘thanksgiving trip’/‘gentlemen’s dispute’”
• 進行抹黑/開票
Jin4xing2 mo3hei1/kai1piao4
“to proceed with ‘mud-slinging’/’ballot counting’ ”
-------(Huang et al. 2013)
Theoretical Challenges for Corpus-based Studies
of Chinese Light Verbs
• Can distribution based statistically analysis identify the
differences among different Chinese light verbs?
• The contrasts among the light verbs are often tendencies
rather than grammaticality dichotomies; hence the
distributional patterns are less prominent and harder to
characterize
• Can the subtle light verb variations between different
variants of Chinese, be identified through statistical
analysis based on comparable corpora (cf. Huang et al.
2013).
Main Research Questions
Facing the above challenges, we try to resolve the following four
research questions:
•Can light verbs be differentiated from each other by statistical
methods?
•Can the grammatical differences between variants of the same
language be empirically verified by distributional features?
•Are these differences statistically significant?
•If answers to both questions are yes, how do they differ
statistically from each other?
• That is, is the distributional difference between two different light verbs
or the between two variants of the same light verb more prominent?
Methodology
• A comparable-corpus-driven statistical approach
• 加以jia1yi3, 進行jin4xing2, 從事cong2shi4, 搞gao3, 做
zuo4 in Mainland Mandarin and Taiwan Mandarin
• Statistical methods and tools
• Univariate analysis + multivariate analysis
• Polytomous package in R (Arppe 2008)
Data
• Chinese Gigaword corpus ( over 1.1 billion Chinese words)
• Central News Agency (Taiwan, about 700 million characters)
• Xinhua News Agency (Mainland China, about 400 million
characters)
• Random sample: 200 sentences for each of the five light
verbs in Mainland and Taiwan corpora
• 1,000 in total for Mainland Chinese
• 1,000 in total for Taiwan Chinese
• 12 factors: (e.g. Zhu 1985, Zhou 1987, Cai 1982,
Huang et al. 1995, among others)
Value levels
Co-occur with
other light verbs
“OTHERLV”
開始 進行 比賽
kai1shi3/jin4xing2/bi3sai4
“start the game”
Yes, no
Take aspectual 昨天進行了比賽
marker: 著,了, zuo2tian1/jin4xing2/le0/bi3s
ai4
過
“ASP”
“played the
No, le, zhe, guo
game yesterday”
Event
比賽在學校進行
complement is at bi3sai4/zai4/xue2xiao4/jin4
subject position xing2
“EVECOMP”
“play the game
at school”
Yes, no
POS
“POS”
進行比賽(N)
jin4xing2/bi3sai4
進行戰鬥(V)
jin4xing2/zhan4d
ou4
Argument structure
“ARGSTR”
VO compound as
argument
“VOCOMP”
“play the game”
N, V
“fight the
battle”
進行調查(two) “carry on
jin4xing2/diao4ch investigation”
a2
One, two, zero
進行投 票
jin4xing2/tou2pia
o4
Yes, no
“carry on
voting”
進行投票
jin4xing2/tou2piao4
“carry on voting”
Yes, no
進行比賽
jin4xing2/bi3sai4
“play a game”
Yes, no
formal event
“FOREVT”
進行訪問
jin4xing2/fang3wen4
“pay an official
visit”
Yes, no
psychological
activity
“PSYEVT”
加以考慮
jia1yi3/kao3lv4
“give
consideration”
Yes, no
“do
communication”
Yes, no
“make
corrections/amen
dments”
Yes, no
Spontaneous/contr
ollable event
“SPONTEVT”
durative event
“DUREVT”
event involving
進行溝通
interaction of agent jia1yi3/gou1tong1
and patient
inflict/communicate
“INTEREVT”
accomplishment
complement
“ACCOMPEVT”
進行修正
jin4jing2/xiu1zheng4
proceed/correct
Mainland Chinese-An overall look of the factors
> str(MLLV3)
'data.frame':
1000 obs. of 13 variables:
$ LV
: Factor w/ 5 levels "congshi","gao",..: 1 1 1 1 1 1 1 1 1 1 ...
$ POS : Factor w/ 2 levels "N","V": 2 2 2 2 1 1 2 2 2 2 ...
$ ARGSTR : Factor w/ 3 levels "one","two","zero": 1 1 2 1 3 3 2 1 1 1 ...
$ VOCOMP : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ EVECOMP : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ OTHERLV : Factor w/ 1 level "no": 1 1 1 1 1 1 1 1 1 1 ...
$ ASP : Factor w/ 4 levels "guo","le","no",..: 3 3 3 3 3 3 3 3 3 3 ...
$ SPONTEVT : Factor w/ 1 level "yes": 1 1 1 1 1 1 1 1 1 1 ...
$ DUREVT : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
$ FOREVT : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
$ PSYEVT : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ INTEREVT : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ ACCOMPEVT: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
•Among the 12 independent variables, two have only one level
•OTHERLV: occurrence of the dependent variable (light verbs) with another light verb
• All five light verbs (1000 sentences) do not co-occur with another light verb
•SPONTEVT: with spontaneous events as the complement to light verbs
• All five light verbs (1000 sentences) take spontaneous events as their complements
• the two factors are not effective in distinguishing the five light verbs, and are thus excluded from
further statistical analysis
Univariate analysis of Chinese light verbs
• Chi-squared tests for the significance of the co-occurrence of the
factor with individual light verbs
• Chisq.posthoc() function in the Polytomous package
automatically transforms the results (Standardized pearson
residuals eij (Agresti 2002)) into signs
 “+”: eij > 2, statistically significant overuse of the light verb with
the factor
 “-”: eij < -2, statistically significant underuse of the light verb with
the factor
 “0”: eij [-2,2], lack of statistical significance
Mainland Chinese – a univariate analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
factor
POS.N
POS.V
ARGSTR.one
ARGSTR.two
ARGSTR.zero
VOCOMP.no
VOCOMP.yes
EVECOMP.no
EVECOMP.yes
ASP.guo
ASP.le
ASP.no
ASP.zhe
DUREVT.no
DUREVT.yes
FOREVT.no
FOREVT.yes
PSYEVT.no
PSYEVT.yes
INTEREVT.no
INTEREVT.yes
ACCOMPEVT.no
ACCOMPEVT.yes
N
263
737
214
523
263
983
17
942
58
5
129
865
1
22
978
37
963
990
10
926
74
937
63
p-value
0
0
0
0
0
0.6
0.6
0
0
0.403
0
0
0.405
0
0
0.001
0.001
0
0
0
0
0
0
congshi
+
0
+
0
0
+
0
+
0
+
0
0
0
0
+
+
-
gao
+
0
+
0
0
+
0
+
0
0
0
0
0
0
0
0
0
+
-
jiayi
+
+
0
0
+
0
+
0
+
+
+
+
+
jinxing
0
0
0
0
0
0
0
+
0
+
+
+
0
0
0
0
+
+
-
zuo
0
0
+
0
0
0
+
0
+
0
+
+
0
0
+
+
-
Four features show no significance (p-value <0.05) in distinguishing the five light verbs.
Mainland Chinese – a univariate analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
factor
POS.N
POS.V
ARGSTR.one
ARGSTR.two
ARGSTR.zero
VOCOMP.no
VOCOMP.yes
EVECOMP.no
EVECOMP.yes
ASP.guo
ASP.le
ASP.no
ASP.zhe
DUREVT.no
DUREVT.yes
FOREVT.no
FOREVT.yes
PSYEVT.no
PSYEVT.yes
INTEREVT.no
INTEREVT.yes
ACCOMPEVT.no
ACCOMPEVT.yes
N
263
737
214
523
263
983
17
942
58
5
129
865
1
22
978
37
963
990
10
926
74
937
63
p-value
0
0
0
0
0
0.6
0.6
0
0
0.403
0
0
0.405
0
0
0.001
0.001
0
0
0
0
0
0
congshi
+
0
+
0
0
+
0
+
0
+
0
0
0
0
+
+
-
gao
+
0
+
0
0
+
0
+
0
0
0
0
0
0
0
0
0
+
-
jiayi
+
+
0
0
+
0
+
0
+
+
+
+
+
jinxing
0
0
0
0
0
0
0
+
0
+
+
+
0
0
0
0
+
+
-
zuo
0
0
+
0
0
0
+
0
+
0
+
+
0
0
+
+
-
Also the table presents that each light verb shows significant preference for certain factors.
Polytomous Logistic Regression
• 加以/進行/從事/搞/做 研究.
• Jia1yi3/jin4xing2/cong2shi4/gao3/zuo4 yan2jiu1
• “to do research”
• Five light verbs as the possible outcome
• Estimate the probability of presence of each of the potential light verb
• Polytomous logistic regression
• An extension of standard logistic regression
• allows for simultaneous estimation of the probability of multiple outcomes
(light verbs in the current study)
Main Results of Polytomous for Mainland Chinese
(Intercept)
ACCOMPEVTyes
ARGSTRtwo
ARGSTRzero
ASPle
ASPno
ASPzhe
DUREVTyes
EVECOMPyes
FOREVTyes
INTEREVTyes
PSYEVTyes
VOCOMPyes
congshi
(1/Inf)
(1/Inf)
0.2652
(1.097)
(0.7487)
(Inf)
(1.603)
(Inf)
(1/Inf)
(2.744)
0.03255
(1/Inf)
(0.1346)
gao
0.02271
0.09863
2.895
3.584
(0.1767)
(1.499)
(1/Inf)
(2.958)
(1.726)
(1.227)
(0.5281)
(1/Inf)
(3.043)
jiayi
(1/Inf)
56.25
76.47
(1/Inf)
(0.8257)
(Inf)
(0.4571)
(1/Inf)
(1/Inf)
(Inf)
(0.5432)
19.87
23.54
jinxing
(1/Inf)
0.1849
(1.481)
(1.179)
(0.9196)
(0.2307)
(Inf)
(Inf)
3.975
(0.7457)
18.67
(1/Inf)
(1.086)
zuo
(1/Inf)
(1/Inf)
0.2177
0.245
(1.853)
(0.2389)
(1/Inf)
(Inf)
(1.772)
0.2679
0.08902
(0.9619)
(0.5344)
• odds>1: the chance of the occurrence of a light verb is significantly increased by the
feature (marked in orange)
• odds<1: the chance of the occurrence of a light verb is significantly decreased by the
feature (marked in blue)
• Non-significant odds (p-value >0,05) are given in parentheses
Distributional Contrasts Can Differentiate
Light Verb Pairs
Most pairs of light verbs can be effectively differentiated by one of
more factors (i.e. those where they have contrasting
positive/negative tendencies to appear)
congshi/gao: ARGSTRtwo
congshi/jinxing: INTEREVTypes
gao/zuo: ARGSTRtwo/ARGSTRzero
jiayi/zuo: ARGSTRtwo
congshi/jiayi: ARGSTRtwo
gao/jiayi: ACCOMPEVTypes
jiayi/jingxing: ACCOMPEVTypes
jinxing/zuo: INTEREVTypes
Only two pairs are without contrasting significant features
congshi/zuo
gao/jinxing
PROBABILITY OF OCCURRENCE OF LIGHT VERBS
• A probability model is adopted to predict the identity of light verb at its
position of occurrence.
• The overall performance of the model is good
• the most frequently predicted light verb of each column corresponds to the
light verb that actually occurs in the data (see the red figures)
predicted
observed
congshi
gao
jiayi
jinxing
zuo
congshi
131
69
1
31
50
gao
1
16
1
9
5
jiayi
62
86
192
47
44
jinxing
1
16
6
62
4
zuo
5
13
0
51
97
F-score of Automatic Identification of Five Light
Verbs Based on Mainland Mandarin Data
recall
precision
F-score
congshi
0.655
0.4645
0.5436
gao
0.08
0.5
0.1379
jiayi
0.96
0.4455
0.6086
jinxing
0.31
0.6966
0.4291
zuo
0.485
0.5843
0.5300
Analysis of Outcome (ML)
• Each light verb can be successful identified with a better F-score than chance
(0.2) with the exception of搞 gao3, while the performance varies from light
verb to light verb
加以Jia1yi3 > 從事cong2shi4/做zuo4 > 進行jin4xing2 > 搞 gao3
• -加以Jia1yi3 is the only light verb with effective differentiating factors with
all other light verbs.// All four significant factors are positive (i.e. direct
evidence for its occurrence).
• 事cong2shi4/做zuo4: Both have only one type of significant factors, but they
are negative ones (i.e. indirect evidence).
• 搞gao3, and 進行jin4xing2 have both positive and negative factors, which
may have cancelled each other out. The significance of their factors are also
relatively weak.
• Note that the low f-score of 搞 gao3 is consistent with the linguistic
observation that it is rarely used as LV in ML.
F-score of Automatic Identification of Five Light
Verbs Based on Taiwan Mandarin Data
recall
precision
F-score
congshi
0.32
0.5614
0.4076
gao
0.695
0.5036
0.5840
jiayi
0.95
0.4139
0.5766
jinxing
0.335
0.5929
0.4281
zuo
0.16
0.8421
0.2689
Analysis of Outcome (TW)
• Each light verb can be successful identified with a better f-score than
chance (0.2). But the performance varies from light verb to light verb
搞 gao3/加以Jia1yi3 > 進行jin4xing2/從事cong2shi4 > 做zuo4
• 搞 gao3/加以Jia1yi3 each have significant factors are positive only
(i.e. direct evidence for its occurrence).
• 從事cong2shi4 negative significant factors only (i.e. indirect evidence).
進行jin4xing2 has more positive than negative significant factors
• 做zuo4 have both types of significant factors, but negative ones
outnumber positive ones.
• Linguistically,
Comparison of Mainland and Taiwan light verbs -univariate analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
factor
POS.N
POS.V
ARGSTR.one
ARGSTR.two
ARGSTR.zero
VOCOMP.no
VOCOMP.yes
EVECOMP.no
EVECOMP.yes
ASP.guo
ASP.le
ASP.no
ASP.zhe
SPONTEVT.no
SPONTEVT.yes
DUREVT.no
DUREVT.yes
FOREVT.no
FOREVT.yes
PSYEVT.no
PSYEVT.yes
INTEREVT.no
INTEREVT.yes
ACCOMPEVT.no
ACCOMPEVT.yes
N
585
1415
376
1039
585
1939
61
1919
81
9
155
1835
1
1
999
35
1965
66
1934
1981
19
1870
130
1904
96
congshi
gao
ML
ML
+
0
+
0
0
+
0
+
0
+
0
0
0
0
+
+
-
TW
+
+
+
0
0
+
0
+
0
0
0
0
+
0
0
+
+
-
+
0
+
0
0
+
0
+
0
0
0
0
0
0
0
0
0
+
-
jiayi
TW
+
+
0
0
0
0
0
+
+
0
0
+
0
0
+
+
-
ML
+
+
0
0
+
0
+
0
+
+
+
+
+
TW
+
+
+
+
0
+
0
0
+
+
0
0
0
0
+
jinxing
zuo
ML
ML
0
0
0
0
0
0
0
+
0
+
+
+
0
0
0
0
+
+
-
TW
+
+
+
+
0
+
0
0
0
0
0
0
0
0
+
+
-
0
0
+
0
0
0
+
0
+
0
+
+
0
0
+
+
-
TW
+
0
+
0
0
0
0
0
+
0
0
0
0
0
0
+
0
0
0
0
Key results:
ML and TW 做zuo4 show opposite usage tendency of the feature ARGSTR.two
ML and TW 進行jin4xing2 show opposite usage tendencies of the features ASP.le and
ASP.no
But the difference is between a significant and non-significant feature, rather
than between a significant positive vs. a significant negative feature
Probability estimates of Mainland and
Taiwan light verbs by Polytomous
•In both ML and TW, the model in overall is good:
•the most frequently predicted light verb of each column corresponds to
the light verb that actually occurs in the data (see the red figures)
•The results also show while a light verb has a highest probability given a
particular context (a set of factors), other light verbs might also have a chance
to occur.  the reason why empirically more than one light verb can occur in
the same context.
Comparison of Mainland and Taiwan light verbs in multivariate polytomous regression
congshi
gao
jiayi
jinxing
zuo
ML
TW
ML
TW
ML
TW
ML
TW
ML
TW
(Intercept)
(1/Inf)
(1/Inf)
0.02271
(1/Inf)
(1/Inf)
(1/Inf)
(1/Inf)
(1/Inf)
(1/Inf)
(1/Inf)
ACCOMPEVTyes
(1/Inf)
(0.3419)
0.09863
(1/Inf)
56.25
11.33
0.1849
(0.1607)
(1/Inf)
0.2272
ARGSTRtwo
0.2652
0.1283
2.895
(0.7615)
76.47
(Inf)
(1.481)
(0.7062)
0.2177
(1.217)
ARGSTRzero
(1.097)
(0.6251)
3.584
7.177
(1/Inf)
(4.382)
(1.179)
0.5393
0.245
0.2075
ASPle
(0.7487)
(1/Inf)
(0.1767)
(1/Inf)
(0.8257)
(0.3027)
(0.9196)
(Inf)
(1.853)
32.98
ASPno
(Inf)
(0.9291)
(1.499)
(0.6946)
(Inf)
(Inf)
(0.2307)
(Inf)
(0.2389)
(0.2386)
ASPzhe
(1.603)
DUREVTyes
(Inf)
(Inf)
(2.958)
(Inf)
(1/Inf)
(1/Inf)
(Inf)
(0.9575)
(Inf)
(Inf)
EVECOMPyes
(1/Inf)
(1/Inf)
(1.726)
(0.8534)
(1/Inf)
(1/Inf)
3.975
8.115
(1.772)
(0.5016)
FOREVTyes
(2.744)
0.08674
(1.227)
(Inf)
(Inf)
(Inf)
(0.7457)
(1.441)
0.2679
(1.467)
INTEREVTyes
0.03255
0.1896
(0.5281)
(1/Inf)
(0.5432)
(0.951)
18.67
10.46
0.08902
(0.3979)
PSYEVTyes
(1/Inf)
(1/Inf)
(1/Inf)
(1/Inf)
19.87
(1.395)
(1/Inf)
(1/Inf)
(0.9619)
(3.323)
SPONTEVTyes
VOCOMPyes
(1/Inf)
(Inf)
(0.1346)
0.18
(0.4571)
(1/Inf)
(3.043)
(2.351)
(Inf)
(1/Inf)
23.54
(Inf)
(1/Inf)
(Inf)
(1.086)
3.16
(Inf)
(0.5344)
(0.5956)
Comparison of Mainland and Taiwan light verbs in multivariate polytomous regression
congshi
ML
TW
(Intercept)
(1/Inf)
(1/Inf)
ACCOMPEVTyes
(1/Inf)
(0.3419)
ARGSTRtwo
0.2652
0.1283
ARGSTRzero
(1.097)
(0.6251)
ASPle
(0.7487)
(1/Inf)
ASPno
(Inf)
(0.9291)
ASPzhe
(1.603)
DUREVTyes
(Inf)
(Inf)
EVECOMPyes
(1/Inf)
(1/Inf)
FOREVTyes
(2.744)
0.08674
INTEREVTyes
0.03255
0.1896
PSYEVTyes
(1/Inf)
(1/Inf)
SPONTEVTyes
VOCOMPyes
(Inf)
(0.1346)
0.18
Both have similar, non-contradictory
distributional patterns.
They differ only in that TW is less likely
to take formal event as arguments
(FOREVTyes). This is consistent with the
intuition that jingxing will be preferred
in this context in TW.
Comparison of Mainland and Taiwan light verbs in multivariate polytomous regression
gao
ML
TW
(Intercept)
0.02271
(1/Inf)
ACCOMPEVTyes
0.09863
(1/Inf)
ARGSTRtwo
2.895
(0.7615)
ARGSTRzero
3.584
7.177
ASPle
(0.1767)
(1/Inf)
ASPno
(1.499)
(0.6946)
ASPzhe
(1/Inf)
DUREVTyes
(2.958)
(Inf)
EVECOMPyes
(1.726)
(0.8534)
FOREVTyes
(1.227)
(Inf)
INTEREVTyes
(0.5281)
(1/Inf)
PSYEVTyes
(1/Inf)
(1/Inf)
SPONTEVTyes
VOCOMPyes
(1/Inf)
(3.043)
(2.351)
Both have similar, non-contradictory
distributional patterns. Both ML and TW 搞
gao3 are significantly favored by
ML 搞gao3 is less likely to occur with
accomplishment object. This and the fact that
it is unlikely to occur with the aggregate of
default variable values suggest that it is
unlikely to be used as light verb in ML.
Comparison of Mainland and Taiwan light verbs in multivariate polytomous regression
jiayi
ML
TW
(Intercept)
(1/Inf)
(1/Inf)
ACCOMPEVTyes
56.25
11.33
ARGSTRtwo
76.47
(Inf)
ARGSTRzero
(1/Inf)
(4.382)
ASPle
(0.8257)
(0.3027)
ASPno
(Inf)
(Inf)
ASPzhe
(0.4571)
DUREVTyes
(1/Inf)
(1/Inf)
EVECOMPyes
(1/Inf)
(1/Inf)
FOREVTyes
(Inf)
(Inf)
INTEREVTyes
(0.5432)
(0.951)
PSYEVTyes
19.87
(1.395)
SPONTEVTyes
VOCOMPyes
(1/Inf)
23.54
(Inf)
Both have similar, noncontradictory distributional
patterns
ML 加以jia1yi3 are more
likely to occur with two
arguments (ARGSTRtwo), as
well as taking VO compound
or psychological events as
objects (VOCOMPyes, and
PSYEVTyes). Which
confirms the intuition that it
is more frequently used in
ML.
Comparison of Mainland and Taiwan light verbs in multivariate polytomous regression
jinxing
ML
TW
(Intercept)
(1/Inf)
(1/Inf)
ACCOMPEVTyes
0.1849
(0.1607)
ARGSTRtwo
(1.481)
(0.7062)
ARGSTRzero
(1.179)
0.5393
ASPle
(0.9196)
(Inf)
ASPno
(0.2307)
(Inf)
ASPzhe
(Inf)
DUREVTyes
(Inf)
(0.9575)
EVECOMPyes
3.975
8.115
FOREVTyes
(0.7457)
(1.441)
INTEREVTyes
18.67
10.46
PSYEVTyes
(1/Inf)
(1/Inf)
SPONTEVTyes
VOCOMPyes
(Inf)
(1.086)
3.16
Both have similar, non-contradictory
distributional patterns.
ML jinxing is not likely to take
accomplishment objects
(ACCOMPEVTypes), while TW 進行
jin4xing2 is very likely to take VO
compound objects (VOCOMPyes),
consistent with Huang et al. (2013)
Comparison of Mainland and Taiwan light verbs in multivariate polytomous regression
zuo
ML
TW
(Intercept)
(1/Inf)
(1/Inf)
ACCOMPEVTyes
(1/Inf)
0.2272
ARGSTRtwo
0.2177
(1.217)
ARGSTRzero
0.245
0.2075
ASPle
(1.853)
32.98
ASPno
(0.2389)
(0.2386)
ASPzhe
(1/Inf)
DUREVTyes
(Inf)
(Inf)
EVECOMPyes
(1.772)
(0.5016)
FOREVTyes
0.2679
(1.467)
INTEREVTyes
0.08902
(0.3979)
PSYEVTyes
(0.9619)
(3.323)
SPONTEVTyes
VOCOMPyes
(Inf)
(0.5344)
(0.5956)
Both have similar, non-contradictory
distributional patterns
Their distributional patterns are consistent
with the analysis of zuo4 as the most
bleached of Mandarin light verbs. (The
attachment of perfect aspect –le is known
to be shared grammatical potential of all
light verbs.)
Conclusion
• This study compares the usage tendencies of Chinese light verbs
• (1) Among five different light verbs
• (2) Between Mainland and Taiwan Mandarin Usage of the same light verb
• The comparable-corpus-driven statistical analysis is able to generalize about
the similarities and differences among light verbs with different factors
• The contrast between different light verb pairs can be anchored by statistically significant
positive vs. statistically significant negative pairs,
• The difference between two Chinese varieties for the same light verbs, however, is
between statistically significant vs. non-significant pairs.
• The above result allows us to hypothesize that
• Different light verbs, even with its weak selectional features, can be identified
and differentiated by contrasting distributional tendencies
• Variants of the same language, however, do not show contrasting tendencies
but can be differentiated by existence (i.e. significant vs. non-significant) of
some distributional tendencies
References
• Arppe, A. (2008) Univariate, bivariate and multivariate methods in corpus-based
lexicography – a study of synonymy. Publications of the Department of General
Linguistics, University of Helsinki, No. 44. URN: http://urn.fi/URN:ISBN:978952-10-5175-3.
• Arppe, A. (2009) Linguistic choices vs. probabilities – how much and what can
linguistic theory explain? In: Featherston, S. & S. Winkler (eds.) The Fruits of
Empirical Linguistics. Volume 1: Process. Berlin: de Gruyter, pp. 1–24.
• Arppe, A. (in prep.) Solutions for fixed and mixed effects modeling of
polytomous outcome settings.
• Han, Weifeng, Arppe, Antti & Newman, John (2013). Topic marking in a
Shanghainese corpus: from observation to prediction. Corpus Linguistics and
Linguistic Theory (preprint).
• Butt, M., & Geuder, W. (2001). On the (semi) lexical status of light verbs. Semilexical
Categories, 323-370.
• Cattell, R. (1984). Composite Predicates in English. Syntax and Semantics
Volume 17. Sydney:
Academic Press Australia.
• Cai, Wenlan. (1982). Issues on the Complement of ‘jinxing’ (“進行”帶賓問題).
Chinese Language Learning ( 漢語學習) (3), 7-11.
References
• Huang, Chu-Ren and Jingxia Lin. (2013). The ordering of Mandarin Chinese
light verbs.
Proceedings of the 13th Chinese Lexical Semantics Workshop.
D. Ji and G. Xiao (Eds.): CLSW 2012, LNAI 7717, pp. 728-735. Heidelberg:
Springer.
• Huang Chu-Ren, Jingxia Lin, and Huarui Zhang (2013). World Chineses
based on comparable corpus: The case of grammatical variations of jinxing.
《澳门语言文化研究》, 397-414.
• Jespersen, O. (1965). A Modern English Grammar on Historical Principles.
Part VI,
Morphology. London: George Allen and Unwin Ltd.
• Zhou, Gang. (1987a). Subdivision of Dummy Verbs (形式動詞的次分類).
Chinese Language Learning ( 漢語學習), 1, 11-14.
• Zhou, Xiaobing. (1987b). Sentence Pattern Comparison of ‘jinxing’ and
‘jiayi’ (“進行”“加以”句型比較). Chinese Language Learning ( 漢語學
習), 6, 1-5.
• Zhu, Dexi. (1985). Dummy Verbs and NV in Modern Chinese (現代書面漢
語里的虛化動詞和名動詞). Journal of Peking University (Humanities and
Social Sciences) ( 北京大學學報(哲學社會科學版)), 5, 1-6.
Thank you
36