PowerPoint Template - National University of Singapore
Download
Report
Transcript PowerPoint Template - National University of Singapore
MWE
Re-examination of
Association Measures for identifying
Verb Particle and Light Verb
Constructions
Supervisor: Dr. Kan Min-Yen,
Dr. Su Nam Kim, (Dr. Timothy Baldwin)
Advisor:
Lin Ziheng
Student:
Hoang Huu Hung
17-Jul-15
Outline
Verb
Verb Particle
Particle and
and Light
Light Verb
Verb Constructions
Constructions
Motivations
Motivations
Association
Association Measures
Measures
MI and PMI
MI and PMI
The high-frequency constituents
Context
The high-frequency
measures constituents
Context measures
Conclusions
& Future work
Conclusions & Future work
17-Jul-15
Multiword Expressions
Multiple simplex words
Idiosyncratic
Lexically: ad hoc (ad ? , hoc ?)
Syntactically: by and large (prep. + conj. + adj. ?)
Semantically: spill the beans
Statistically: strong coffee (powerful coffee ?)
Obstacles to language understanding,
translation, generation, etc.
17-Jul-15
Verb Particle and Light Verb
Constructions
Verb Particle Constructions Light Verb Constructions
Verb + Particle(s)
Light verb + Complement
Bolster up, put off
Put up with, get on with
Cut short, let go
do, get, give, have, make, put,
take
Make a speech, give a demo
Syntactically Flexible: Inflections, passive, internal modifications
Extensive research has been carried out on this topic.
He has given many excellent speeches in his career.
Semantically
Non-compositional
Compositional
• carry out, give up
(Semi-?)compositional
• walk off, finish up
meaning from the de-verbal noun
• give a demo
Subtle meaning deviations
• Have a read vs. Read
17-Jul-15
Identification of VPCs and LVCs
≠ Free Combination
VPCs: free verb + preposition
Leave it to me.
LVCs: free light verb + noun
make decision ≠ make drugs
VPCs ≠ Prepositional verbs: look after, search for
VPCs
Joint & split configurations
Prepositional verbs
Only joint configuration
Look up the word/Look the word up
Look it up
No intervening manner adverb
Look up the word carefully
*Look carefully up the word
Look after your mum
Look after her
Flexible positions
Look after your mum carefully
Look carefully after your mum
17-Jul-15
Outline
Verb Particle and Light Verb
Constructions
Motivations
Association Measures
MI and PMI
The high-frequency constituents
Context measures
Conclusion & Future work
17-Jul-15
Motivation
Pecina P. and Schlesinger P. Combining
Association Measures for Collocation Extraction
(COLING/ACL 2006)
An exhaustive list of 82 Lexical Association Measures
for bigrams
Lexical Association Measures (AM)
Mathematical formulae devised to capture the degree
of association between words of a phrase
• Association ~ dependence
• Degree ~ score
Input: statistical information. Output: scores
17-Jul-15
A comparison
Pecina and Schlesinger
Our Project
Czech bigram “collocations” English bigram VPCs, LVCs
Mixed idiomatic exps, LVCs,
terminologies, stock phrases,…
Ranking extracted bigrams
Prague Dependency Treebank
• 1.5 million words
Average Precision (AP)
Machine-learning based
combination
Separately and mixed
Rank VPC & LVC candidates
Wall Street Journal Corpus
• 1 million words
AP
Analysis of AMs
Categorization
Modifications
17-Jul-15
Gold-standard Evaluation data
Bigrams with frequency ≥ 6
413 WSJ-attested VPC candidates
Candidates: (verb + particle) pair
Annotations from Baldwin, T. (2005)
100 WSJ-attested LVC candidates
LVC candidates: (light verb + noun) pairs
Annotations from Tan, Y. F., Kan M., Y. and Cui H. (2006)
Evaluation
set
Size
Negative
Instances
Positive
Instances
% of positive
Instances
VPC
413
296
117
28.33%
LVC
100
72
28
28%
Mixed
513
368
145
28.26%
A random ranker has an AP ~ 0.28
17-Jul-15
Rank-equivalence
Idea: refer to AMs having the same AP
simplification
Categorization
ad r ad bc r
bc
ad bc
ad
ad
bc
bc
Rank-equivalence over a set C
“Ranking all members of C in the same way”
Notation:
Property
M(c): score assigned by M to instance c
17-Jul-15
Example of AMs
Contingency table of bigram (x y)
w : any word except w;
* : any word;
f(.) : frequency of (.)
N: total number of bigrams
Null hypothesis of independence
f ( x*) f (* y )
fˆ ( xy ) E[ f ( xy )]
N
17-Jul-15
Outline
Verb Particle and Light Verb
Constructions
Motivations
Association Measures
MI and PMI
The high-frequency constituents
Context measures
Conclusion & Future work
17-Jul-15
Categorization: 4 main groups
Group 1: Dependence ~ Reduction in uncertainty
MI and PMI, Salience, etc.
Group 2: Dependence ~ set’s similarity
↑↓ marginal frequencies
Dice, Minimum Sensitivity, Laplace etc.
Sokal-Michiner, Odds ratio, etc.
Group 3: Compare observed freq. with expected freq.
Null hypothesis of independence
T, Z test; Pearson’s chi-squared, Fisher’s exact tests, etc.
Group 4: Dependence ~ Non-compositionality
Non-compositionality ~ context similarity
• Cosine Similarity in tf.idf space, idf space, etc.
Non-compositionality ↑↓ context entropy
17-Jul-15
MI and PMI
Mutual Information (MI)
p(uv)
MI(U; V) p(uv) log
p(u) p(v)
u ,v
Reduction in uncertainty of U given knowledge of V
1
MI
N
i, j
f ij
f ij log
fˆ
ij
Point-wise MI (PMI)
P( xy )
Nf ( xy )
PMI ( x, y ) log
log
P( x) P( y )
f ( x) f ( y )
“MI at a specific point”
2 known drawbacks
17-Jul-15
The first drawback
Higher performance in [0, 1] than [2, 100]
P( xy )
Nf ( xy )
PMI
(
x
,
y
)
log
log
In [2, 100] rapid drop of performance as k increases
P( x) P( y )
k
VPCs
f ( x) f ( y )
LVCs
Mixed
0.13
0.547
0.538
0.543
Favor candidates with lower-frequency constituents
0.32
0.546
0.546
0.544
k
Nf
(
xy
)
k
0.85
0.525
0.573
PMI
0.528
log
Daille(1994)
f ( x) f ( y )
1
0.510
0.566
0.515
Experimenting with integer k from 2 to 10
• Best2result at k =0.353
2
0.505
Replicate
3 this experiment
0.275
0.431
real k in
integer k in0.313
[2, 100]
10[0, 1] and0.228
100
Joint prob.
0.217
0.170
0.289
0.28
0.370
0.290
0.236
0.224
0.175
17-Jul-15
The second drawback
Beside the degree of dependence,
MI grows with entropy
PMI grows with frequency
Mathematically,
1
1
PMI ( x, y) min(log
, log
)
P( x)
P( y)
1
1
1
1
MI(X;Y) min(log
log
, log
log
)
P( y)
P( y )
P( x)
P( x )
Comparing these scores are just not appropriate !!!
17-Jul-15
Proposed Solution
Normalizing scores
Share the same unit
Proposed normalization factor (NF)
NF = P( x) (1 ) P( y)
[0, 1]
or
NF-α max(P( x), P( y))
17-Jul-15
Against high-frequency constituents
AMs Brawn-Blanquet
[M34]
VPCs a
a Mixed
a max(b, c)
(Minimum
Sensitivity)
[M34] Brawn-Blanquet
0.478
0.578
[M35] Simpson
[M35] Simpson
0.249 a
0.382
LVCs
max(a b, a c)
min(a b, a c)
0.486
a
0.260
a min(b, c)
Insights: penalizing against the higher productive
constituent
Confirmed by [M49] Laplace, [M41] S cost
AMs
VPCs
LVCs
Mixed
a 1
a1
a 1
a 1
max(
,M49
) 0.486
0.493
Adjusted
0.577
a b 2 a c 2 a min(b, c) 2
a max(b, c) 2
[M49] Laplace
0.241
0.388
0.254
17-Jul-15
Against high-frequency constituents
ad
VPCs
a d rMixed
LVCs
[M18] Sokal-Michiner
a d b c
abcd
N
0.565
0.453
0.546
–max(b, c)
0.540
0.433
AMsSokal-Michiner
[M18]
Better to penalize both constituents ???
Proposed modification:
r
0.519
b (1 ) c
17-Jul-15
Context-based Measures
Non-compositionality of (x y)
Context of (x y) ≠ context of x and y
Context of x ≠ context of y
Eg: Dutch courage, hot dog
Context as a distribution of words
Relative entropy (KL divergence), Jensen-Shannon diverg.
Dice similarity
Context as a point/vector in RN
Euclidean, Manhattan, etc. distance
Angle distance (Cosine similarity)
17-Jul-15
Representation of context
Context of z Cz
( (w1 ), (w2 ), ..., (wn ))
Common representation schemes
Boolean:
(wi ) 0 1
Term frequency (tf): ( w ) f ( w )
i
i
N
Tf.Idf:
(wi ) f (wi ).
df ( wi )
df ( wi ) |{x : wi Cx }|
2 x y
i i
dice(c
,
c
)
Diceonly
similarity
x y than2 tf
2
Tf.idf
slightly better
x y
i assumption
i
Under-estimation due to independence
AM Hauptmann (1999) VPCs
• Sieger, Jin and
[M82]
Scale tf
to [0.5,
before
multiplied 0.387
by idf
Dice
in 1]
tf.idf
space
• Salton and Buckley (1987)
Dice in (Scaled tf).idf space
0.568
LVCs
Mixed
0.367 0.374
0.488 0.553
17-Jul-15
Outline
Verb Particle and Light Verb
Constructions
Motivations
Association Measures
MI and PMI
The high-frequency constituents
Context measures
Conclusion & Future work
17-Jul-15
Conclusions
The 82 AMs: 4 main groups
Meaning
Rank-equivalence
Group 1: Dependence ~ Reduction in uncertainty
Effective
Group 2: Dependence ~ set’s similarity, marginal frequencies
Simple but most effective
Group 3: Compare observed freq. with expected freq.
Not effective
Group 4: Non-compositionality ~ context similarity, entropy
Compromised by the ubiquity of particles and light verbs.
17-Jul-15
Conclusions
Co-occurrence frequency f(xy)
Not useful for VPCs: f(xy)0.13
Ok for LVCs: f(xy)0.85
Marginal frequencies f(x*) and f(*y)
Effective to discriminate against high-frequency constituents
Useful discriminative units:
• VPCs: –b –c
• LVCs: 1/bc
MI and PMI
An indicator of independence, not dependence
• Manning and Schutze (1999, p. 67)
As an indicator of dependence
• PMI: normalized for VPCs
• MI: normalized for VPCs and LVCs
The tf.idf
Effective to normalize tf to [0.5, 1]
• Salton and Buckley (1987)
17-Jul-15
Future work
More types of particles of VPCs
Adjective: cut short, put straight
Verb: let go, make do
Trigram model
Phrasal-prepositional verbs: Verb + adverb + prep.
• Look forward to, get away with
Idioms
• Kick the bucket, spill the beans
Adaptation of bigram-AMs
A larger corpus, evaluation data set
17-Jul-15
References
Baldwin, T. (2005). The deep lexical acquisition of English verb-particle
constructions. Computer Speech and Language, Special Issue on
Multiword Expressions, 19(4):398–414
Evert, S. (2004). The Statistics of Word Cooccurrences: Word Pairs and
Collocations. Ph.D. dissertation, University of Stuttgart.
Kim, S. N. (2008). Statistical Modeling of Multiword Expressions. Ph.D.
Thesis, University of Melbourne, Australia.
Lin, D. (1999). Automatic identification of non-compositional phrases. In
Proc. of the 37th Annual Meeting of the ACL, Manning, C.D. and Schutze,
H.(1999). Foundations of Statistical Natural Language Processing. The MIT
Press, Cambridge, Massachusetts.
Pecina, P. and Schlesinger, P. (2006). Combining association measures for
collocation extraction. COLING/ACL 2006
Tan, Y. F., Kan, M. Y. and Cui, H. (2006). Extending corpus-based
identification of light verb constructions using a supervised learning
framework. EACL 2006 Workshop on Multi-word-expressions in a
multilingual context
Zhai, C. (1997). Exploiting context to identify lexical atoms – A statistical
view of linguistic context. In International and Interdisciplinary Conference
on Modelling and Using Context (CONTEXT-97).
17-Jul-15
MWE
17-Jul-15
Q&A
Bigram idioms
in all, after all, later on, as such
17-Jul-15
Types of MWEs
MWEs
Lexicalized phrases
Fixed exps
Idioms
Semi-fixed exps
NCs
PP-Ds
Institutionalized phrases
Syntactically-flexible exps
LVCs
VPCs
17-Jul-15
Pavel (2005, 2006, 2008)
Combination of 82 association measures
Statistical tests:
Mutual information
Statistical independence
Likely-hood measures
Semantics tests:
Entropy of immediate context
• Immediate context: immediately preceding/following words
Diversity of empirical context
• Empirical context: words within a certain specified window
17-Jul-15
Pavel (2005, 2006, 2008)
Combination of 82 association measures
Result:
MAP: 80.81%
“Equivalent” measures 17 measures
Issues:
Possible conflicting predictions
• All combined is best?
Unclear linguistic and/or statistical significance
17-Jul-15
Other linguistics-driven tests
MWEs are lexically fixed?
Substitutability test (Lin, 1999)
MWEs are order-specific?
Permutation entropy (PE) (Yi Zhang et al., 2006)
Entropy of Permutation and Insertion (EPI)
• (Aline et al., 2008)
• Not all permutations are valid
• “Permutation” “Syntactic variants”
Others…?
17-Jul-15
Q&A
Bigram idioms
in all, after all, later on, as such
17-Jul-15
Outline
Verb Particle and Light Verb
Constructions
Motivations
Association Measures
MI and PMI
The high-frequency constituents
Context measures
Conclusion & Future work
17-Jul-15