PowerPoint Template - National University of Singapore

Download Report

Transcript PowerPoint Template - National University of Singapore

SoC
Re-examination of
Association Measures for identifying
Verb Particle and Light Verb Constructions
Advisor: Dr. Kan Min-Yen, Dr. Su Nam Kim, Lin Ziheng
Student: Hoang Huu Hung
17-Jul-15
Outline
Motivations and Objectives
Background
 Lexical Association Measures
 Verb Particle and Light Verb Constructions
Analysis
 Categorization
 Suggested modifications
Conclusions & Future work
17-Jul-15
Outline
Motivations and Objectives
Background
 Lexical Association Measures
 Verb Particle and Light Verb Constructions
Analysis
 Categorization
 Suggested modifications
Conclusions & Future work
17-Jul-15
Motivations
 Pecina P. and Schlesinger P. Combining Association Measures for
Collocation Extraction (COLING/ACL 2006)
 An exhaustive list of 82 Lexical Association Measures for bigrams
Pecina and Schlesinger
 Machine-learning combination
of 82 Association Measures
What we want
 “Understanding” of key
Association Measures
 Evaluation over “Collocations”  Over individual subtypes
 Idiomatic expressions,
 Terminologies
 Stock phrases
 Etc.
 Czech
 English
17-Jul-15
Objectives - Deliverables
Objectives
Deliverables
 “Understanding” of
key Association measures
(AM)
 Evaluate all 82 AMs
 Categorization
 Modifications
 Evaluating over
individual subtypes
 2 most common types of
Multiword Expressions
 Verb Particle Constructions
 Light Verb Constructions
 English
 English
17-Jul-15
Outline
Motivations and Objectives
Background
 Lexical Association Measures
 Verb Particle and Light Verb Constructions
Analysis
 Categorization
 Suggested modifications
Conclusions & Future work
17-Jul-15
Lexical Association Measures
 Lexical Association Measures (AM)
 Mathematical formulae devised to capture the degree of
association between words of a phrase
• Association ~ statistical, semantic dependence
• Degree ~ score
• Input: statistical information. Output: scores
 Notation
 (x y): bigram consisting of words x and y
 w : any word except w
• Ex: (pick up): pick on, pick it, etc. but not pick up
 * : any word
• Ex: (go *): go up, go upstairs, go slowly, etc.
17-Jul-15
Association Measures (cont.)
 Contingency table of bigram (x y)
Marginal
frequencies
 The null hypothesis: x and y occur independently
17-Jul-15
Examples of Association Measures
17-Jul-15
Evaluation metric
 AM as a ranker: Average precision (AP)
 The average of precision values obtained when varying recall
 Comparable to Pecina, P. and Schlesinger P. (2006)
AMs
figure out describe in
f ( xy)
16
16
f ( x)
f ( y)
320
100
1376
20707
3.669
2.120
Nf ( xy )
log
f ( x) f ( y )
17-Jul-15
Rank-equivalence
 Motivation: Some AMs rank the candidates exactly the same
 Same AP
 Different scores  not equivalent
 Rank-equivalence. Notation:
 Property: Rank-equivalent AMs have the same average precision
 Useful in categorization
 “Canonical form”
17-Jul-15
Verb Particle and Light Verb Constructions
Verb Particle Constructions
Light Verb Constructions
(VPCs)
(LVCs)
 Verb + Particle(s)
 Focus on VPCs with 1 particle
 Particle: prepositional particles
• in, on, off, out, up, etc.
 Bolster up, put off
 Non-compositional
 carry out, give up
 Light Verb + Complement
 do, get, give, have, make, put, take
 Complement: de-verbal nouns
• present  presentation
 Make a presentation, give a demo
 * Make a cake, have a car
 Compositional
 Meaning from the noun
 Make a presentation ~ to present
 Syntactically Flexible
 Extensive research has been carried out on this topic.
 He has delivered many excellent speeches in his career.
 We have put the meeting forward by one week.
17-Jul-15
Evaluation data
 Candidate extractions
 Corpus: Wall-Street Journal (1 million words)
Statistics
VPC
LVC
 VPC: (verb + particle) pairs
candidates candidates
 LVC: (light verb + noun) pairs
Total extracted
8652
11465
f>5
1629
1254
 Annotations
 VPCs: Baldwin, T. (2005)
Evaluation
 LVCs:
Tan, Y. F.,Size
Kan M., Negative
Y. and Cui H. Positive
(2006)
set
Instances
Instances
% of positive
Instances
VPC
413
296
117
28.33%
LVC
100
72
28
28%
Mixed
513
368
145
28.26%
 A random ranker has an AP ~ 0.28
17-Jul-15
Outline
Motivations and Objectives
Background
 Lexical Association Measures
 Verb Particle and Light Verb Constructions
Analysis
 Categorization
 Suggested modifications
Conclusions & Future work
17-Jul-15
Categorization: 4 main groups
 Dependence ~ Reduction in uncertainty
 MI and PMI, Salience, etc.
 Dependence ~ co-occurrence,
marginal frequencies
 Dice, Minimum Sensitivity, Laplace etc.
 Sokal-Michiner, Odds ratio, etc.
 Dependence ~ observed freq. >< expected freq.
 Null hypothesis of independence
 T, Z test; Pearson’s chi-squared, Fisher’s exact tests, etc.
 Dependence ~ Non-compositionality
 Non-compositionality ~ context similarity
• Cosine Similarity in tf.idf space, idf space, etc.
 Non-compositionality ↑↓ context entropy
17-Jul-15
Outline
 Motivations and Objectives
 Background
 Lexical Association Measures
 Verb Particle and Light Verb Constructions
Analysis
 Categorization
 Suggested modifications
• MI and PMI
• Penalization terms
 Conclusions & Future work
17-Jul-15
MI and PMI
 Mutual Information (MI)
p(uv)
MI(U ; V )  H (U )  H (U | V )   p(uv) log
u ,v
p(u) p(v)
 The common information between 2 variables
 Point-wise MI (PMI)
P( xy )
Nf ( xy )
PMI( x, y)  log
 log
P( x) P( y )
f ( x) f ( y)
 Hold between values
17-Jul-15
The drawback
 MI, PMI: not a good indicator of dependence
 Manning and Chutze (1999, p. 67)
 Mathematically,
MI(X; Y)  min(H ( X ), H (Y ))
PMI( x, y)  min(log
1
1
, log
)
P( x)
P( y)
17-Jul-15
Proposed Solution
 Normalizing scores
 Share the same unit
 Proposed normalization factor (NF)
NF( ) =  P( x)  (1   ) P( y)
 [0, 1]
or
NF-α  max(P( x), P( y))
17-Jul-15
Penalization term for VPCs
r
ad
ad r
a  d r

 a  d  N  b  c  b  c
b  c  (a  d )
N
bc
 Unstable performance. Ex: VPCs: b << c  –b –c ≈ –c
 Scale the terms   b  (1   )  c   [0,1]
 Penalization factor
  b  (1   )  c (1)
 Incorporation: <?> / (1)
 (1) ~ NF( ) =  P( x)  (1   ) P( y)
log(ad )
PMI ¾ ¾heuristic
¾
¾
¾®
(1) simplification
(1)
17-Jul-15
Penalization term for LVCs
AMs
VPCs
LVCs
Mixed
–b –c
0.565
0.453
0.546
1/bc
0.502
0.532
0.502
0.443
0.567
0.456
Odds ratio 
ad
bc
 Penalization factor:
 PMI / bc: not proper
1
b c (1 )
PMI  log
aN
(a  c)(b  c)
ad
heuristic
PMI 
simplification log bc
17-Jul-15
Outline
Motivations and Objectives
Background
 Lexical Association Measures
 Verb Particle and Light Verb Constructions
Analysis
 Categorization
 Suggested modifications
Conclusions & Future work
17-Jul-15
Conclusions
 The 82 AMs: 4 main groups
 Meaning
 Rank-equivalence
 Dependence ~ Reduction in uncertainty
 Effective
 Dependence ~ co-occurrence, marginal frequencies
 Simple but most effective
 Dependence ~ observed freq. >< expected freq.
 Not effective
 Non-compositionality ~ context similarity, entropy
 Undermined by the ubiquity of particles and light verbs
17-Jul-15
Conclusions
 PMI: normalized for VPCs
MI: normalized for VPCs and LVCs
 f(x y): not reliable indicator
 Penalization terms from f(x*) and f(*y)
 VPCs:   b  (1   )  c
 LVCs:
1
b c (1 )
 In tf.idf, normalize tf to [0.5, 1]
 Salton and Buckley (1987)
17-Jul-15
Future work
 More types of particles of VPCs
 Adjective: cut short, put straight
 Verb: let go, make do
 Trigram model
 Phrasal-prepositional verbs: Verb + adverb + prep.
• Look forward to, get away with
 Idioms
• Get there first, spill the beans
 Bigram-AMs to trigram-AMs
 A larger corpus, evaluation data set
17-Jul-15
References
 Baldwin, T. (2005). The deep lexical acquisition of English verb-particle
constructions. Computer Speech and Language, Special Issue on Multiword
Expressions, 19(4):398–414
 Evert, S. (2004). The Statistics of Word Cooccurrences: Word Pairs and
Collocations. Ph.D. dissertation, University of Stuttgart.
 Kim, S. N. (2008). Statistical Modeling of Multiword Expressions. Ph.D. Thesis,
University of Melbourne, Australia.
 Lin, D. (1999). Automatic identification of non-compositional phrases. In Proc.
of the 37th Annual Meeting of the ACL,
 Manning, C.D. and Schutze, H.(1999). Foundations of Statistical Natural
Language Processing. The MIT Press, Cambridge, Massachusetts.
 Pecina, P. and Schlesinger, P. (2006). Combining association measures for
collocation extraction. COLING/ACL 2006
 Tan, Y. F., Kan, M. Y. and Cui, H. (2006). Extending corpus-based identification
of light verb constructions using a supervised learning framework. EACL 2006
Workshop on Multi-word-expressions in a multilingual context
 Zhai, C. (1997). Exploiting context to identify lexical atoms – A statistical view
of linguistic context. In International and Interdisciplinary Conference on
Modelling and Using Context (CONTEXT-97).
17-Jul-15
SoC
17-Jul-15
The first drawback
P( xy )
Nf ( xy )
PMI( x, y)  log
 log
P( x) P( y )
f ( x) f ( y)
Favors candidates with lower-frequency constituents
k
Nf
(
xy
)
PMIk  log
f ( x) f ( y)
 Daille(1994)
 Experimenting with integer k in [2, 100]
• Best result at k = 3
 Replicate this experiment
 real k in [0, 1] and integer k in [1, 100]
17-Jul-15
The first drawback
k
VPCs
LVCs
Mixed
0.13
0.547
0.538
0.543
0.32
0.546
0.546
0.544
0.85
0.525
0.573
0.528
1
0.510
0.566
0.515
2
0.353
0.505
0.370
3
0.275
0.431
0.290
10
0.228
0.313
0.236
100
0.217
0.289
0.224
f(x y)
0.170
0.28
0.175
 f(xy): poor indicator
17-Jul-15
Context-based Measures
 Non-compositionality of (x y)
 Context of (x y) ≠ context of x and y
 Context of x ≠ context of y
 Eg: Dutch courage, hot dog
 Context as a distribution of words
 Entropy, cross entropy
 KL divergence (Relative entropy ), Jensen-Shannon diverg.
 Context as a point/vector in RN
 Euclidean, Manhattan, etc. distance
 Angle distance (Cosine similarity)
17-Jul-15
Representation of context
 Context of z Cz  ( (w1 ),  (w2 ), ...,  (wn ))
 Common representation schemes
 Boolean:  ( wi )  0  1
 Term frequency (tf):  ( wi )  f ( wi )
 Tf.Idf:
N
 (wi )  f (wi ).
df ( wi )
df ( wi ) |{x : wi  Cx }|
2 x y
i i
Dice
similarity
Dice(cthan
, c )tf
 Tf.idf
only
slightly better
x y
2
2
 x assumption
y
 Under-estimation due to independence
i
i
• Sieger, Jin and Hauptmann (1999)
AM
VPCs
 Scale tf to [0.5, 1] before multiplied by idf
LVCs
Mixed
[M82]
Dice in tf.idf space
• Salton and Buckley (1987)
0.387
0.367
0.374
Dice in (Scaled tf).idf space
0.568
0.488
0.553
17-Jul-15
Penalization terms
a Mixed
[M34] Brawn-Blanquet max(a  b, a  c) a  max(b, c)
(Minimum
Sensitivity)
[M34]
Brawn-Blanquet
0.478
0.578
0.486
AMs
[M35] Simpson
[M35] Simpson
VPCsa
LVCs

a 0.260
0.249a
0.382

min(a  b, a  c) a  min(b, c)
 Insights: penalizing against the higher productive constituent
 Confirmed by Laplace,
S cost VPCs
AMs
LVCs
Mixed
a 1
a1
a  10.577
a 1
Laplace  max(Adjusted ,M49
)  0.486
0.493
a  b  2 a  c  2 a  min(b, c)  2
a  max(b, c)  2
[M49] Laplace
0.241
0.388
0.254
17-Jul-15
Penalization terms for VPCs
AMs
[M18] Sokal-Michiner
[M18] Sokal-Michiner
–max(b, c)
VPCs
ad
LVCsa  d

r Mixed r
0.433
0.519
 a  d  b  c
abcd
N
0.565
0.453
0.546
0.540
 Better to penalize both constituents
  f ( xy )  (1   )  f ( xy)
  [0,1]
17-Jul-15
Rank-equivalence
 Notation:
to mean M1 is rank-equivalent to M2
ad r ad  bc r


bc
ad  bc
ad 
ad 
bc
bc
 Definition
 M(c): score assigned by M to instance c
 Rank-equivalent AMs have the same average precision
17-Jul-15
Multiword Expressions
 Multiple simplex words
 Idiosyncratic




Lexically: ad hoc (ad ? , hoc ?)
Syntactically: by and large (prep. + conj. + adj. ?)
Semantically: spill the beans
Statistically: strong coffee (powerful coffee ?)
 Obstacles to language understanding, translation,
generation, etc.
17-Jul-15
Types of MWEs
MWEs
Lexicalized phrases
Fixed exps
Idioms
Semi-fixed exps
NCs
PP-Ds
Institutionalized phrases
Syntactically-flexible exps
LVCs
VPCs
17-Jul-15
Identification of VPCs and LVCs
 ≠ Free Combination
VPCs: free verb + preposition
 Leave it to me.
LVCs: free light verb + noun
 make decision ≠ make drugs
 VPCs ≠ Prepositional verbs: look after, search for
VPCs
 Joint & split configurations

 Look up the word/Look the word up
 Look it up
 No intervening manner adverb

 Look up the word carefully
 *Look carefully up the word
Prepositional verbs
Only joint configuration
 Look after your mum
 Look after her
Flexible positions
 Look after your mum carefully
 Look carefully after your mum
17-Jul-15
Other linguistics-driven tests
 MWEs are lexically fixed?
 Substitutability test (Lin, 1999)
 MWEs are order-specific?
 Permutation entropy (PE) (Yi Zhang et al., 2006)
 Entropy of Permutation and Insertion (EPI)
• (Aline et al., 2008)
• Not all permutations are valid
• “Permutation”  “Syntactic variants”
 Others…?
17-Jul-15
Pavel (2005, 2006, 2008)
Combination of 82 association measures
 Statistical tests:
 Mutual information
 Statistical independence
 Likely-hood measures
 Semantics tests:
 Entropy of immediate context
• Immediate context: immediately preceding/following words
 Diversity of empirical context
• Empirical context: words within a certain specified window
17-Jul-15
A comparison
Pecina and Schlesinger
Our Project
 Czech bigram “collocations”
 English bigram VPCs, LVCs
 Mixed idiomatic exps, LVCs,
 Separately and mixed
terminologies, stock phrases,…
 Ranking extracted bigrams
 Rank VPC & LVC candidates
 Prague Dependency Treebank
 Wall Street Journal Corpus
• 1.5 million words
 Average Precision (AP)
 Machine-learning based
combination
• 1 million words
 AP
 Analysis of AMs
 Categorization
 Modifications
17-Jul-15
Pavel (2005, 2006, 2008)
Combination of 82 association measures
 Result:
 MAP: 80.81%
 “Equivalent” measures  17 measures
 Issues:
 Possible conflicting predictions
• All combined is best?
 Unclear linguistic and/or statistical significance
17-Jul-15
Q&A
 Bigram idioms
 in all, after all, later on, as such
17-Jul-15
17-Jul-15
Gold-standard Evaluation data
 Bigrams with frequency ≥ 6
 413 WSJ-attested VPC candidates
 Candidates: (verb + particle) pair
 Annotations from Baldwin, T. (2005)
 100 WSJ-attested LVC candidates
 LVC candidates: (light verb + noun) pairs
 Annotations from Tan, Y. F., Kan M., Y. and Cui H. (2006)
Evaluation
set
Size
Negative
Instances
Positive
Instances
% of positive
Instances
VPC
413
296
117
28.33%
LVC
100
72
28
28%
Mixed
513
368
145
28.26%
 A random ranker has an AP ~ 0.28
17-Jul-15