從詞彙語意學到知識體系: 語料中呈現的認知系統

Download Report

Transcript 從詞彙語意學到知識體系: 語料中呈現的認知系統

From Lexical Semantics
to Knowledge Systems:
How to infer cognitive systems
from linguistic data
Chu-Ren Huang
Academia Sinica
http://cwn.ling.sinica.edu.tw/huang/huang.htm
Outline





A generative lexicalist approach to
grammar
From distributional data to the basic
contrasts in a semantic field (or conceptual
motivation for corpus distribution)
Lexical distribution as cognitive model
Radical as ontology
Language as a knowledge system
2007.03.09
ISLCC
Chu-Ren Huang
Introduction: A generative
lexicalist approach to grammar
Back to Aristotle (through Pustejovsky)


How do know and know and what do we
know: through what we experience
Qualia Structure: what we experience




2007.03.09
Formal
Constitutive
Agentive
Telic
ISLCC
Chu-Ren Huang
Linguistics: What do we know
about language

Qualia Structure of Theory of Language





Formal: from Sign to Structure, Structuralism
Constitutive: from IA to IP, rule and
transformation based theories
Agentive: UG approaches
Telic: Function and Use based Theories
We need a linguistic theory that accounts
for the complete knowledge structure, not
just its individual aspects
2007.03.09
ISLCC
Chu-Ren Huang
Towards Language as
Knowledge System



Atoms of knowledge:lexicalized concepts
‘frames’ of knowledge:lexical semantic
relations
Instantiation of knowledge:corpus
lexicon-driven, corpus-based
to infer knowledge structure underlying
linguistic structure
2007.03.09
ISLCC
Chu-Ren Huang
Three Studies

The semantic field of emotion:
(elaborated from Chang et al. 2000)

Lexicalized Model of Cognition:
(Huang and Hong 2005)

Conventionalized Ontology in Writing
(Chou and Huang 2005)
2007.03.09
ISLCC
Chu-Ren Huang
Semantic Field of Verbs of
Emotion

Issues: Methodological



Interpretation of Distributional Data
Measuring and Interpreting lexical
choices
Issues: Linguistic


Archetype Via Contrast
Why Change-of-State:

2007.03.09
Saliency and relevance to human cognition
ISLCC
Chu-Ren Huang
Distributional Contrast of
Verbs of Emotion
高興gao1xing4 (Type A) Vs.快樂kuai4le4 (Type B)






Category: intrans. vs. trans. state verb
Function: more predicative vs. more nominalized
Collocation: CAUSE complement vs. no CAUSE
Collocation: Perfect aspect vs. no -le
Collocation (modified nouns): Eventive vs. no
selection
Interpretation (Imperative): Command vs. Wish
2007.03.09
ISLCC
Chu-Ren Huang
A Natural Dichotomy of
Verbs of Emotion
Subtype
Happiness
Type A
gao1xing4高興(669)
Type B
kuai4le4快樂(942)
kai1xin1開心(152)
yu2kuai4愉快(271)
tong4kuai4痛快(40)
xi3yue4喜悅(156)
huan1le4歡樂(141)
huan1xi3歡喜(107)
kuai4huo2快活(48)
Depression
2007.03.09
nan2guo4難過(232)
Tong4ku3痛苦(443)
tong4xin1痛心(48)
chen2zhong4沈重(83)
ju3sang4沮喪(62)
ISLCC
Chu-Ren Huang
A Natural Dichotomy of
Verbs of Emotion
Subtype
Sadness
Regret
Anger
Type A
hang1xin1傷心(134)
hou4hui3後悔(102)
seng1qi4生氣(307)
Fear
hai4pa4害怕(261)
Worry
dan1xin1擔心(609)
Type B
bei1shang1悲傷(52)
yi2han4遺憾(198)
fen4nu4憤怒(112)
qi4fen4氣憤(49)
kong3ju4恐懼(149)
wei4ju4畏懼(40)
fan2nao3煩惱(199)
dan1you1擔憂(64)
ku3nao3苦惱(45)
you1xin1 憂心(46)
2007.03.09
ISLCC
Chu-Ren Huang
Some Observations

Each of the seven kinds of emotion verbs
show the same dichotomy:


change-of-state vs. homogeneous state
Each side of the dichotomy is dominated by
a dominating verb

2007.03.09
in terms of frequency and prototypicality of
meaning
ISLCC
Chu-Ren Huang
Semantic Field and Contrast Set

A semantic field is consisted of a unique
covering term and a number of contrast
sets. Paraphrase of Grandy 1992


2007.03.09
The unique covering term may or may not occur
in a contrast set.
All other members of the semantic field must be
determined by entering into a contrast set
relation with a known member of the semantic
field.
ISLCC
Chu-Ren Huang
Observation: Chinese Defines
a Property by Contrast







qing1zhong4 light+heavy = weight
da4xiao3 big+small = size
gao1ai3 tall+short = height
shi4fei1/dui4cuo4 right+wrong = affair
xiong1di4 elder+younger = brothers
zang1pi3 praise+attack = criticize
hu1xi1 exhale+inhale = breathe
2007.03.09
ISLCC
Chu-Ren Huang
Our Proposal



T is either a single term or a privileged
contrast set, called a contrast pair.
When T is a contrast pair, the semantic
field can be defined by the shared semantic
properties of the pair.
The fundamental contrast relation defining
a contrast pair may be shared by a superset of semantic fields.
2007.03.09
ISLCC
Chu-Ren Huang
Our Proposal


T must enter contrast set relations with
other members of the semantic field,
although the contrast relation may be
weakened to a marked/unmarked contrast.
The set of fundamental contrast relations
are shared by all semantic fields. [cf.
Semantic relations]
2007.03.09
ISLCC
Chu-Ren Huang
Patterns of Distribution as
Representational Clues



Numbers Don’t Lie
The pattern itself is a proof that
generalizations based on a single lexical
item is replicable.
The uniformity and universality of the
pattern across a broad but contiguous
semantic field strongly favors a conceptual
motivation.
2007.03.09
ISLCC
Chu-Ren Huang
Functional Distribution of
Type A Verbs of Emotion
Type A
gao1xing4
nan2guo4
shang1xin1
hou4hui3
sheng1qi4
hai4pa4
dan1xin1
Average
2007.03.09
Pred.
85.05%
86.64%
76.12%
94.12%
87.82%
93.10%
96.72%
88.51%
Nom.
0.30%
2.16%
2.99%
0.00%
0.00%
3.07%
1.97%
1.50%
ISLCC
Chu-Ren Huang
N.M.
1.35%
2.59%
11.19%
2.94%
4.06%
2.68%
1.31%
3.73%
Functional Distribution of
Type B Verbs of Emotion
Type B
Pred.
kuai4le4
37.79%
tong4ku3
25.73%
bei1shang1 40.38%
yi2han4
34.85%
fen4nu4
28.57%
kong3ju4 23.49%
fan2nao3 24.12%
Average
30.70%
2007.03.09
Nom.
26.43%
45.60%
28.85%
33.84%
37.50%
68.46%
69.85%
44.36%
ISLCC
Chu-Ren Huang
N.M.
24.84%
20.54%
19.23%
3.54%
17.86%
7.38%
6.03%
14.21%
Preference of A verbs over B
verbs in Predicative Uses
Verbs
Pred.-Freq.
gaoxing/kuaile
569/356
nanguo/tongku
201/114
shangxin/beishang 102/21
houhui/yihan
96/69
shengqi/fennu
238/32
haipa/kongju
243/35
danxin/fannao
589/48
Average ratio
2007.03.09
ISLCC
Chu-Ren Huang
A/B Ratio
1.59
1.76
4.86
1.39
7.44
6.94
12.27
5.62
Preference of B verbs over A
verbs in Nominal Uses
Verbs
Nom.-Freq.
gaoxing/kuaile
11/483
nanguo/tongku
11/293
shangxin/beishang
19/25
houhui/yihan
3/74
shengqi/fennu
11/62
haipa/kongju
15/113
danxin/fannao
20/151
Average ratio
2007.03.09
ISLCC
Chu-Ren Huang
B/A Ratio
43.91
26.64
1.32
24.67
5.64
7.53
7.55
16.75
Summary of the Likelyhood
Ratio Data



A clear lexical preference between nearsynonyms are established.
Predicative preference and deverbal
preference tend to compensate each other
to establish contrast.
Overall, the deverbal preference seems to
be the defining feature of the dichotomy.
[note that these are all verbs.]
2007.03.09
ISLCC
Chu-Ren Huang
Deverbal Use Frequency of
Type A Verbs
tong4kuai4痛快
gao1xing4高興
hou4hui3後悔
dan1xin1擔心
sheng1qi4生氣
tong4xin1痛心
nan2guo4難過
hai4pa4害怕
you1xin1憂心
kai1xin1開心
dan1you1擔憂
shang1xin1傷心
2007.03.09
0.00%
1.65%
2.94%
3.28%
3.58%
4.17%
4.75%
5.75%
6.52%
7.89%
9.38%
14.18%
ISLCC
Chu-Ren Huang
Deverbal Use Frequency of
Type B Verbs
qi4fen4氣憤 24.49% chen1zhong4沈重48.19%
wei4ju4畏懼 25.00% kuai4le4快樂
51.27%
yu2kuai4愉快 29.89% fen4nu4憤怒
55.36%
huan1xi1歡喜 30.84% tong4ku3痛苦
66.14%
kuai4huo2快活33.33% kong3ju4恐懼 75.84%
ju3sang4沮喪 33.87% fan2nao3煩惱
75.88%
yi2han4遺憾 37.38% xi1yue4喜悅
92.20%
ku3nao3苦惱 46.67% huan1le1歡樂 92.91%
bei1shang1悲傷48.08%
2007.03.09
ISLCC
Chu-Ren Huang
Deverbal Use Frequency as a
Benchmark for Type A/B Verbs


2007.03.09
More than 10% differentiates the lowest Type B
verb (qi4fen4氣憤 24.49%) from the highest Type A
verbs (shang1xin1傷心14.18%).
The smallest gap between a competing pair is
almost 34% (shang1xin1傷心14.18% vs. bei1shang1悲
傷48.08% ).
ISLCC
Chu-Ren Huang
The Noisy-Channel Model of
Theory of Communication
 Our
Proposal
Language is an information-based
communication system.
 An optimized communication system is
where all redundant signs (for one piece of
information) also minimally differentiate
another piece of information.

2007.03.09
ISLCC
Chu-Ren Huang
Re-Interpretation of the
Data


Members of the same semantic field in
general, and a near-synonym pair in
particular, are competing signs to express
information pertaining to the field.
A sign is chosen to represent a piece of
information because it expresses that piece
of information most effectively.
2007.03.09
ISLCC
Chu-Ren Huang
Re-Interpretation of the
Data


This preference for expressing certain
information can be lexicalized to establish
logical implicature.
Once that lexical preference is established,
linguists could use the preferential ratio to
infer the lexical information being carried.
2007.03.09
ISLCC
Chu-Ren Huang
Lexical distribution as
cognitive model: Senses



A further step based on property defined by
contrast, with focus on how senses are
represented
Study the sense of hearing and the basic
property term of sheng-yin ‘sound/voice’
We (Huang and Hong 2005) look at the
distribution of these two lexical elements in
all derived words
2007.03.09
ISLCC
Chu-Ren Huang
聲 Sheng vs.音 Yin

聲樂 vs.音樂

*噪聲 vs.噪音
noise

大聲 vs. *大音
loudly
vocal music vs. music

發聲 vs.發音
make a sound vs.
articulate

高聲 vs.高音
loudly vs. high pitch
2007.03.09
ISLCC
Chu-Ren Huang
NN Compound N+*
聲 Sheng +source 音 Yin + quality







歌
掌
人
腳步
風
鐘
水
…
2007.03.09








ISLCC
嗓
鄉
喉
裝飾
尾
哨
…
Chu-Ren Huang
The semantic Contrast
聲
 Production of
sounds
 Often refers to the
manner or source of
haw a sound was
made
2007.03.09
ISLCC
音
 Perception of a
sound
 Often refers to the
sound quality or how
a sound is perceived
by an intelligent
agent
Chu-Ren Huang
A Lexicalized Schema for
Hearing in Chinese
From Huang and Hong 2005

Process of Hearing
聲sheng
音yin
發動者(instigator)
經驗者(experiencer)
起點、來源 source
主動完成 production
2007.03.09

終點、結果 goal
被動接收 reception
ISLCC
Chu-Ren Huang
A Lexicalized Schema for
Sense in Chinese
Process of Sensation
感知接收(sensation)
word1

word0
經驗者(experiencer)
Goal/perceptiopn: experience of sense
2007.03.09
ISLCC
Chu-Ren Huang
詞彙詞義分析(7)
「視覺」、「觸覺」與「聽覺」三者的關係圖示
認知特徵的對比
詞彙
特徵
感覺發動者
(instigator of action)
— marked
感覺經驗者
(experiencer of sensation)
— shared and unmarked
聽覺
聲 (production)
音 (perception)
視覺
看 (inchoative)
見 (bounded result)
觸覺
觸 (activity)
摸 (incremental theme)
2007.03.09
ISLCC
Chu-Ren Huang
perception
Radical as ontology



Chinese writing system has been
conventionalized and shared for over three
thousand years
And adopted by typologically very different
languages
If the radical system is a system of
conceptualization, then it is the most robust
and most widely used ontology
2007.03.09
ISLCC
Chu-Ren Huang
Example: the horse radical
(from Chou 2005)


馬 is a semantic symbol of
horse
Examples:



馬
2007.03.09


驩:馬名 a kind of horse
驫:眾馬 horses
騎:騎馬 riding a horse
驍:良馬 a good horse
驚:馬驚 a scared horse
ISLCC
Chu-Ren Huang
Research Tool and Issue

Formal Description

IEEE SUMO ( Suggested Upper Merged Ontology)
http://www.ontologyportal.org
http://BOW.sinica.edu.tw

2007.03.09
Issue: Why Chinese radicals are
usually considered as a imperfect
and misleading taxonomy?
ISLCC
Chu-Ren Huang
Knowledge System of the Radical 艸
/艹 (Grass, for Plants)
Usage
蕃藥蔬菜薪
苑藩藉茭
蕉蘭芒蒙菌蔓
苦菊茱范荷茅
蕈蔚菲草
Description
茲蒼芳落
茸茂荒薄
芬蒸莊
Parts
萌莖芽茄
苗蓮葉
Plants
IS-A
2007.03.09
Constitutive
ISLCC
Descriptive/
formal
Chu-Ren Huang
telic
Conclusion I:
Corpus as Evidence




Core issue of a scientific explanation of language
and cognition
Language as an living organism allows variations
and adaptations (the evolutionary view)
The coherence of language is the shared
tendency of all users
Distributional data in corpus lead to discovery of
these shared tendencies

2007.03.09
This should be more valuable than incidental example
ISLCC
Chu-Ren Huang
Conclusion II: Language as a
Knowledge System
The generative lexicalist approach to
grammar: language as a knowledge
system
 All aspects of Language are projected
from a unified knowledge system
 Lexical semantics based on distributional
data offers the best window to the
underlying knowledge system of language

2007.03.09
ISLCC
Chu-Ren Huang