PowerPoint Template - National University of Singapore

Download Report

Transcript PowerPoint Template - National University of Singapore

MWE
Re-examination of
Association Measures for identifying
Verb Particle and Light Verb
Constructions
Supervisor: Dr. Kan Min-Yen,
Dr. Su Nam Kim, (Dr. Timothy Baldwin)
Advisor:
Lin Ziheng
Student:
Hoang Huu Hung
17-Jul-15
Outline
Verb

Verb Particle
Particle and
and Light
Light Verb
Verb Constructions
Constructions
Motivations
 Motivations
Association
 Association Measures
Measures
 MI and PMI
 MI and PMI
 The high-frequency constituents
 Context
The high-frequency
measures constituents
 Context measures
Conclusions
& Future work
 Conclusions & Future work
17-Jul-15
Multiword Expressions
 Multiple simplex words
 Idiosyncratic




Lexically: ad hoc (ad ? , hoc ?)
Syntactically: by and large (prep. + conj. + adj. ?)
Semantically: spill the beans
Statistically: strong coffee (powerful coffee ?)
 Obstacles to language understanding,
translation, generation, etc.
17-Jul-15
Verb Particle and Light Verb
Constructions
Verb Particle Constructions Light Verb Constructions
 Verb + Particle(s)
 Light verb + Complement
 Bolster up, put off
 Put up with, get on with
 Cut short, let go
 do, get, give, have, make, put,
take
 Make a speech, give a demo
 Syntactically Flexible: Inflections, passive, internal modifications
 Extensive research has been carried out on this topic.
 He has given many excellent speeches in his career.
 Semantically
 Non-compositional
 Compositional
• carry out, give up
 (Semi-?)compositional
• walk off, finish up
 meaning from the de-verbal noun
• give a demo
 Subtle meaning deviations
• Have a read vs. Read
17-Jul-15
Identification of VPCs and LVCs
 ≠ Free Combination
VPCs: free verb + preposition
 Leave it to me.
LVCs: free light verb + noun
 make decision ≠ make drugs
 VPCs ≠ Prepositional verbs: look after, search for
VPCs
 Joint & split configurations
Prepositional verbs
 Only joint configuration
 Look up the word/Look the word up
 Look it up
 No intervening manner adverb
 Look up the word carefully
 *Look carefully up the word
 Look after your mum
 Look after her
 Flexible positions
 Look after your mum carefully
 Look carefully after your mum
17-Jul-15
Outline
 Verb Particle and Light Verb
Constructions
 Motivations
 Association Measures
 MI and PMI
 The high-frequency constituents
 Context measures
 Conclusion & Future work
17-Jul-15
Motivation
 Pecina P. and Schlesinger P. Combining
Association Measures for Collocation Extraction
(COLING/ACL 2006)
 An exhaustive list of 82 Lexical Association Measures
for bigrams
 Lexical Association Measures (AM)
 Mathematical formulae devised to capture the degree
of association between words of a phrase
• Association ~ dependence
• Degree ~ score
 Input: statistical information. Output: scores
17-Jul-15
A comparison
Pecina and Schlesinger
Our Project
 Czech bigram “collocations”  English bigram VPCs, LVCs
 Mixed idiomatic exps, LVCs,
terminologies, stock phrases,…
 Ranking extracted bigrams
 Prague Dependency Treebank
• 1.5 million words
 Average Precision (AP)
 Machine-learning based
combination
 Separately and mixed
 Rank VPC & LVC candidates
 Wall Street Journal Corpus
• 1 million words
 AP
 Analysis of AMs
 Categorization
 Modifications
17-Jul-15
Gold-standard Evaluation data
 Bigrams with frequency ≥ 6
 413 WSJ-attested VPC candidates
 Candidates: (verb + particle) pair
 Annotations from Baldwin, T. (2005)
 100 WSJ-attested LVC candidates
 LVC candidates: (light verb + noun) pairs
 Annotations from Tan, Y. F., Kan M., Y. and Cui H. (2006)
Evaluation
set
Size
Negative
Instances
Positive
Instances
% of positive
Instances
VPC
413
296
117
28.33%
LVC
100
72
28
28%
Mixed
513
368
145
28.26%
 A random ranker has an AP ~ 0.28
17-Jul-15
Rank-equivalence
 Idea: refer to AMs having the same AP
 simplification
 Categorization
ad r ad  bc r


bc
ad  bc
ad 
ad 
bc
bc
 Rank-equivalence over a set C
 “Ranking all members of C in the same way”
 Notation:
 Property
 M(c): score assigned by M to instance c
17-Jul-15
Example of AMs
 Contingency table of bigram (x y)





w : any word except w;
* : any word;
f(.) : frequency of (.)
N: total number of bigrams
Null hypothesis of independence
f ( x*)  f (* y )
fˆ ( xy )  E[ f ( xy )] 
N
17-Jul-15
Outline
 Verb Particle and Light Verb
Constructions
 Motivations
 Association Measures
 MI and PMI
 The high-frequency constituents
 Context measures
 Conclusion & Future work
17-Jul-15
Categorization: 4 main groups
 Group 1: Dependence ~ Reduction in uncertainty
 MI and PMI, Salience, etc.
 Group 2: Dependence ~ set’s similarity
↑↓ marginal frequencies
 Dice, Minimum Sensitivity, Laplace etc.
 Sokal-Michiner, Odds ratio, etc.
 Group 3: Compare observed freq. with expected freq.
 Null hypothesis of independence
 T, Z test; Pearson’s chi-squared, Fisher’s exact tests, etc.
 Group 4: Dependence ~ Non-compositionality
 Non-compositionality ~ context similarity
• Cosine Similarity in tf.idf space, idf space, etc.
 Non-compositionality ↑↓ context entropy
17-Jul-15
MI and PMI
 Mutual Information (MI)
p(uv)
MI(U; V)   p(uv) log
p(u) p(v)
u ,v
 Reduction in uncertainty of U given knowledge of V
1
MI 
N

i, j
f ij
f ij log
fˆ
ij
 Point-wise MI (PMI)
P( xy )
Nf ( xy )
PMI ( x, y )  log
 log
P( x) P( y )
f ( x) f ( y )
 “MI at a specific point”
 2 known drawbacks
17-Jul-15
The first drawback
 Higher performance in [0, 1] than [2, 100]
P( xy )
Nf ( xy )
PMI
(
x
,
y
)

log

log
 In [2, 100] rapid drop of performance as k increases
P( x) P( y )
k
VPCs
f ( x) f ( y )
LVCs
Mixed
0.13
0.547
0.538
0.543
Favor candidates with lower-frequency constituents
0.32
0.546
0.546
0.544
k
Nf
(
xy
)
k
0.85
0.525
0.573
PMI
0.528
log
 Daille(1994)
f ( x) f ( y )
1
0.510
0.566
0.515
 Experimenting with integer k from 2 to 10
• Best2result at k =0.353
2
0.505
 Replicate
3 this experiment
0.275
0.431
 real k in
integer k in0.313
[2, 100]
10[0, 1] and0.228
100
Joint prob.
0.217
0.170
0.289
0.28
0.370
0.290
0.236
0.224
0.175
17-Jul-15
The second drawback
 Beside the degree of dependence,
 MI grows with entropy
 PMI grows with frequency
 Mathematically,
1
1
PMI ( x, y)  min(log
, log
)
P( x)
P( y)
1
1
1
1
MI(X;Y)  min(log
 log
, log
 log
)
P( y)
P( y )
P( x)
P( x )
Comparing these scores are just not appropriate !!!
17-Jul-15
Proposed Solution
 Normalizing scores
 Share the same unit
 Proposed normalization factor (NF)
NF =  P( x)  (1   ) P( y)
 [0, 1]
or
NF-α  max(P( x), P( y))
17-Jul-15
Against high-frequency constituents
AMs Brawn-Blanquet
[M34]
VPCs a
a Mixed
a  max(b, c)
(Minimum
Sensitivity)
[M34] Brawn-Blanquet
0.478
0.578
[M35] Simpson
[M35] Simpson
0.249 a
0.382

LVCs

max(a  b, a  c)
min(a  b, a  c)
0.486
a
0.260
a  min(b, c)
 Insights: penalizing against the higher productive
constituent
 Confirmed by [M49] Laplace, [M41] S cost
AMs
VPCs
LVCs
Mixed
a 1
a1
a 1
a 1
max(
,M49
)  0.486
 0.493
Adjusted
0.577
a  b  2 a  c  2 a  min(b, c)  2
a  max(b, c)  2
[M49] Laplace
0.241
0.388
0.254
17-Jul-15
Against high-frequency constituents
ad
VPCs
a  d rMixed
LVCs
[M18] Sokal-Michiner

 a  d  b  c
abcd
N
0.565
0.453
0.546
–max(b, c)
0.540
0.433
AMsSokal-Michiner
[M18]
 Better to penalize both constituents ???
 Proposed modification:
r
0.519
  b  (1   )  c
17-Jul-15
Context-based Measures
 Non-compositionality of (x y)
 Context of (x y) ≠ context of x and y
 Context of x ≠ context of y
 Eg: Dutch courage, hot dog
 Context as a distribution of words
 Relative entropy (KL divergence), Jensen-Shannon diverg.
 Dice similarity
 Context as a point/vector in RN
 Euclidean, Manhattan, etc. distance
 Angle distance (Cosine similarity)
17-Jul-15
Representation of context
 Context of z Cz
( (w1 ),  (w2 ), ...,  (wn ))
 Common representation schemes
 Boolean:
 (wi )  0  1
 Term frequency (tf):  ( w )  f ( w )
i
i
N
 Tf.Idf:
 (wi )  f (wi ).
df ( wi )
df ( wi ) |{x : wi  Cx }|
2 x y
i i
dice(c
,
c
)

Diceonly
similarity
x y than2 tf
2
 Tf.idf
slightly better
x  y
i assumption
i
 Under-estimation due to independence
AM Hauptmann (1999) VPCs
• Sieger, Jin and
 [M82]
Scale tf
to [0.5,
before
multiplied 0.387
by idf
Dice
in 1]
tf.idf
space
• Salton and Buckley (1987)
Dice in (Scaled tf).idf space
0.568
LVCs
Mixed
0.367 0.374
0.488 0.553
17-Jul-15
Outline
 Verb Particle and Light Verb
Constructions
 Motivations
 Association Measures
 MI and PMI
 The high-frequency constituents
 Context measures
 Conclusion & Future work
17-Jul-15
Conclusions
 The 82 AMs: 4 main groups
 Meaning
 Rank-equivalence
 Group 1: Dependence ~ Reduction in uncertainty
 Effective
 Group 2: Dependence ~ set’s similarity, marginal frequencies
 Simple but most effective
 Group 3: Compare observed freq. with expected freq.
 Not effective
 Group 4: Non-compositionality ~ context similarity, entropy
 Compromised by the ubiquity of particles and light verbs.
17-Jul-15
Conclusions
 Co-occurrence frequency f(xy)
 Not useful for VPCs: f(xy)0.13
 Ok for LVCs: f(xy)0.85
 Marginal frequencies f(x*) and f(*y)
 Effective to discriminate against high-frequency constituents
 Useful discriminative units:
• VPCs: –b –c
• LVCs: 1/bc
 MI and PMI
 An indicator of independence, not dependence
• Manning and Schutze (1999, p. 67)
 As an indicator of dependence
• PMI: normalized for VPCs
• MI: normalized for VPCs and LVCs
 The tf.idf
 Effective to normalize tf to [0.5, 1]
• Salton and Buckley (1987)
17-Jul-15
Future work
 More types of particles of VPCs
 Adjective: cut short, put straight
 Verb: let go, make do
 Trigram model
 Phrasal-prepositional verbs: Verb + adverb + prep.
• Look forward to, get away with
 Idioms
• Kick the bucket, spill the beans
 Adaptation of bigram-AMs
 A larger corpus, evaluation data set
17-Jul-15
References
 Baldwin, T. (2005). The deep lexical acquisition of English verb-particle
constructions. Computer Speech and Language, Special Issue on
Multiword Expressions, 19(4):398–414
 Evert, S. (2004). The Statistics of Word Cooccurrences: Word Pairs and
Collocations. Ph.D. dissertation, University of Stuttgart.
 Kim, S. N. (2008). Statistical Modeling of Multiword Expressions. Ph.D.
Thesis, University of Melbourne, Australia.
 Lin, D. (1999). Automatic identification of non-compositional phrases. In
Proc. of the 37th Annual Meeting of the ACL, Manning, C.D. and Schutze,
H.(1999). Foundations of Statistical Natural Language Processing. The MIT
Press, Cambridge, Massachusetts.
 Pecina, P. and Schlesinger, P. (2006). Combining association measures for
collocation extraction. COLING/ACL 2006
 Tan, Y. F., Kan, M. Y. and Cui, H. (2006). Extending corpus-based
identification of light verb constructions using a supervised learning
framework. EACL 2006 Workshop on Multi-word-expressions in a
multilingual context
 Zhai, C. (1997). Exploiting context to identify lexical atoms – A statistical
view of linguistic context. In International and Interdisciplinary Conference
on Modelling and Using Context (CONTEXT-97).
17-Jul-15
MWE
17-Jul-15
Q&A
 Bigram idioms
 in all, after all, later on, as such
17-Jul-15
Types of MWEs
MWEs
Lexicalized phrases
Fixed exps
Idioms
Semi-fixed exps
NCs
PP-Ds
Institutionalized phrases
Syntactically-flexible exps
LVCs
VPCs
17-Jul-15
Pavel (2005, 2006, 2008)
Combination of 82 association measures
 Statistical tests:
 Mutual information
 Statistical independence
 Likely-hood measures
 Semantics tests:
 Entropy of immediate context
• Immediate context: immediately preceding/following words
 Diversity of empirical context
• Empirical context: words within a certain specified window
17-Jul-15
Pavel (2005, 2006, 2008)
Combination of 82 association measures
 Result:
 MAP: 80.81%
 “Equivalent” measures  17 measures
 Issues:
 Possible conflicting predictions
• All combined is best?
 Unclear linguistic and/or statistical significance
17-Jul-15
Other linguistics-driven tests
 MWEs are lexically fixed?
 Substitutability test (Lin, 1999)
 MWEs are order-specific?
 Permutation entropy (PE) (Yi Zhang et al., 2006)
 Entropy of Permutation and Insertion (EPI)
• (Aline et al., 2008)
• Not all permutations are valid
• “Permutation”  “Syntactic variants”
 Others…?
17-Jul-15
Q&A
 Bigram idioms
 in all, after all, later on, as such
17-Jul-15
Outline
 Verb Particle and Light Verb
Constructions
 Motivations
 Association Measures
 MI and PMI
 The high-frequency constituents
 Context measures
 Conclusion & Future work
17-Jul-15