PowerPoint Template - National University of Singapore
Download
Report
Transcript PowerPoint Template - National University of Singapore
SoC
Re-examination of
Association Measures for identifying
Verb Particle and Light Verb Constructions
Advisor: Dr. Kan Min-Yen, Dr. Su Nam Kim, Lin Ziheng
Student: Hoang Huu Hung
17-Jul-15
Outline
Motivations and Objectives
Background
Lexical Association Measures
Verb Particle and Light Verb Constructions
Analysis
Categorization
Suggested modifications
Conclusions & Future work
17-Jul-15
Outline
Motivations and Objectives
Background
Lexical Association Measures
Verb Particle and Light Verb Constructions
Analysis
Categorization
Suggested modifications
Conclusions & Future work
17-Jul-15
Motivations
Pecina P. and Schlesinger P. Combining Association Measures for
Collocation Extraction (COLING/ACL 2006)
An exhaustive list of 82 Lexical Association Measures for bigrams
Pecina and Schlesinger
Machine-learning combination
of 82 Association Measures
What we want
“Understanding” of key
Association Measures
Evaluation over “Collocations” Over individual subtypes
Idiomatic expressions,
Terminologies
Stock phrases
Etc.
Czech
English
17-Jul-15
Objectives - Deliverables
Objectives
Deliverables
“Understanding” of
key Association measures
(AM)
Evaluate all 82 AMs
Categorization
Modifications
Evaluating over
individual subtypes
2 most common types of
Multiword Expressions
Verb Particle Constructions
Light Verb Constructions
English
English
17-Jul-15
Outline
Motivations and Objectives
Background
Lexical Association Measures
Verb Particle and Light Verb Constructions
Analysis
Categorization
Suggested modifications
Conclusions & Future work
17-Jul-15
Lexical Association Measures
Lexical Association Measures (AM)
Mathematical formulae devised to capture the degree of
association between words of a phrase
• Association ~ statistical, semantic dependence
• Degree ~ score
• Input: statistical information. Output: scores
Notation
(x y): bigram consisting of words x and y
w : any word except w
• Ex: (pick up): pick on, pick it, etc. but not pick up
* : any word
• Ex: (go *): go up, go upstairs, go slowly, etc.
17-Jul-15
Association Measures (cont.)
Contingency table of bigram (x y)
Marginal
frequencies
The null hypothesis: x and y occur independently
17-Jul-15
Examples of Association Measures
17-Jul-15
Evaluation metric
AM as a ranker: Average precision (AP)
The average of precision values obtained when varying recall
Comparable to Pecina, P. and Schlesinger P. (2006)
AMs
figure out describe in
f ( xy)
16
16
f ( x)
f ( y)
320
100
1376
20707
3.669
2.120
Nf ( xy )
log
f ( x) f ( y )
17-Jul-15
Rank-equivalence
Motivation: Some AMs rank the candidates exactly the same
Same AP
Different scores not equivalent
Rank-equivalence. Notation:
Property: Rank-equivalent AMs have the same average precision
Useful in categorization
“Canonical form”
17-Jul-15
Verb Particle and Light Verb Constructions
Verb Particle Constructions
Light Verb Constructions
(VPCs)
(LVCs)
Verb + Particle(s)
Focus on VPCs with 1 particle
Particle: prepositional particles
• in, on, off, out, up, etc.
Bolster up, put off
Non-compositional
carry out, give up
Light Verb + Complement
do, get, give, have, make, put, take
Complement: de-verbal nouns
• present presentation
Make a presentation, give a demo
* Make a cake, have a car
Compositional
Meaning from the noun
Make a presentation ~ to present
Syntactically Flexible
Extensive research has been carried out on this topic.
He has delivered many excellent speeches in his career.
We have put the meeting forward by one week.
17-Jul-15
Evaluation data
Candidate extractions
Corpus: Wall-Street Journal (1 million words)
Statistics
VPC
LVC
VPC: (verb + particle) pairs
candidates candidates
LVC: (light verb + noun) pairs
Total extracted
8652
11465
f>5
1629
1254
Annotations
VPCs: Baldwin, T. (2005)
Evaluation
LVCs:
Tan, Y. F.,Size
Kan M., Negative
Y. and Cui H. Positive
(2006)
set
Instances
Instances
% of positive
Instances
VPC
413
296
117
28.33%
LVC
100
72
28
28%
Mixed
513
368
145
28.26%
A random ranker has an AP ~ 0.28
17-Jul-15
Outline
Motivations and Objectives
Background
Lexical Association Measures
Verb Particle and Light Verb Constructions
Analysis
Categorization
Suggested modifications
Conclusions & Future work
17-Jul-15
Categorization: 4 main groups
Dependence ~ Reduction in uncertainty
MI and PMI, Salience, etc.
Dependence ~ co-occurrence,
marginal frequencies
Dice, Minimum Sensitivity, Laplace etc.
Sokal-Michiner, Odds ratio, etc.
Dependence ~ observed freq. >< expected freq.
Null hypothesis of independence
T, Z test; Pearson’s chi-squared, Fisher’s exact tests, etc.
Dependence ~ Non-compositionality
Non-compositionality ~ context similarity
• Cosine Similarity in tf.idf space, idf space, etc.
Non-compositionality ↑↓ context entropy
17-Jul-15
Outline
Motivations and Objectives
Background
Lexical Association Measures
Verb Particle and Light Verb Constructions
Analysis
Categorization
Suggested modifications
• MI and PMI
• Penalization terms
Conclusions & Future work
17-Jul-15
MI and PMI
Mutual Information (MI)
p(uv)
MI(U ; V ) H (U ) H (U | V ) p(uv) log
u ,v
p(u) p(v)
The common information between 2 variables
Point-wise MI (PMI)
P( xy )
Nf ( xy )
PMI( x, y) log
log
P( x) P( y )
f ( x) f ( y)
Hold between values
17-Jul-15
The drawback
MI, PMI: not a good indicator of dependence
Manning and Chutze (1999, p. 67)
Mathematically,
MI(X; Y) min(H ( X ), H (Y ))
PMI( x, y) min(log
1
1
, log
)
P( x)
P( y)
17-Jul-15
Proposed Solution
Normalizing scores
Share the same unit
Proposed normalization factor (NF)
NF( ) = P( x) (1 ) P( y)
[0, 1]
or
NF-α max(P( x), P( y))
17-Jul-15
Penalization term for VPCs
r
ad
ad r
a d r
a d N b c b c
b c (a d )
N
bc
Unstable performance. Ex: VPCs: b << c –b –c ≈ –c
Scale the terms b (1 ) c [0,1]
Penalization factor
b (1 ) c (1)
Incorporation: <?> / (1)
(1) ~ NF( ) = P( x) (1 ) P( y)
log(ad )
PMI ¾ ¾heuristic
¾
¾
¾®
(1) simplification
(1)
17-Jul-15
Penalization term for LVCs
AMs
VPCs
LVCs
Mixed
–b –c
0.565
0.453
0.546
1/bc
0.502
0.532
0.502
0.443
0.567
0.456
Odds ratio
ad
bc
Penalization factor:
PMI / bc: not proper
1
b c (1 )
PMI log
aN
(a c)(b c)
ad
heuristic
PMI
simplification log bc
17-Jul-15
Outline
Motivations and Objectives
Background
Lexical Association Measures
Verb Particle and Light Verb Constructions
Analysis
Categorization
Suggested modifications
Conclusions & Future work
17-Jul-15
Conclusions
The 82 AMs: 4 main groups
Meaning
Rank-equivalence
Dependence ~ Reduction in uncertainty
Effective
Dependence ~ co-occurrence, marginal frequencies
Simple but most effective
Dependence ~ observed freq. >< expected freq.
Not effective
Non-compositionality ~ context similarity, entropy
Undermined by the ubiquity of particles and light verbs
17-Jul-15
Conclusions
PMI: normalized for VPCs
MI: normalized for VPCs and LVCs
f(x y): not reliable indicator
Penalization terms from f(x*) and f(*y)
VPCs: b (1 ) c
LVCs:
1
b c (1 )
In tf.idf, normalize tf to [0.5, 1]
Salton and Buckley (1987)
17-Jul-15
Future work
More types of particles of VPCs
Adjective: cut short, put straight
Verb: let go, make do
Trigram model
Phrasal-prepositional verbs: Verb + adverb + prep.
• Look forward to, get away with
Idioms
• Get there first, spill the beans
Bigram-AMs to trigram-AMs
A larger corpus, evaluation data set
17-Jul-15
References
Baldwin, T. (2005). The deep lexical acquisition of English verb-particle
constructions. Computer Speech and Language, Special Issue on Multiword
Expressions, 19(4):398–414
Evert, S. (2004). The Statistics of Word Cooccurrences: Word Pairs and
Collocations. Ph.D. dissertation, University of Stuttgart.
Kim, S. N. (2008). Statistical Modeling of Multiword Expressions. Ph.D. Thesis,
University of Melbourne, Australia.
Lin, D. (1999). Automatic identification of non-compositional phrases. In Proc.
of the 37th Annual Meeting of the ACL,
Manning, C.D. and Schutze, H.(1999). Foundations of Statistical Natural
Language Processing. The MIT Press, Cambridge, Massachusetts.
Pecina, P. and Schlesinger, P. (2006). Combining association measures for
collocation extraction. COLING/ACL 2006
Tan, Y. F., Kan, M. Y. and Cui, H. (2006). Extending corpus-based identification
of light verb constructions using a supervised learning framework. EACL 2006
Workshop on Multi-word-expressions in a multilingual context
Zhai, C. (1997). Exploiting context to identify lexical atoms – A statistical view
of linguistic context. In International and Interdisciplinary Conference on
Modelling and Using Context (CONTEXT-97).
17-Jul-15
SoC
17-Jul-15
The first drawback
P( xy )
Nf ( xy )
PMI( x, y) log
log
P( x) P( y )
f ( x) f ( y)
Favors candidates with lower-frequency constituents
k
Nf
(
xy
)
PMIk log
f ( x) f ( y)
Daille(1994)
Experimenting with integer k in [2, 100]
• Best result at k = 3
Replicate this experiment
real k in [0, 1] and integer k in [1, 100]
17-Jul-15
The first drawback
k
VPCs
LVCs
Mixed
0.13
0.547
0.538
0.543
0.32
0.546
0.546
0.544
0.85
0.525
0.573
0.528
1
0.510
0.566
0.515
2
0.353
0.505
0.370
3
0.275
0.431
0.290
10
0.228
0.313
0.236
100
0.217
0.289
0.224
f(x y)
0.170
0.28
0.175
f(xy): poor indicator
17-Jul-15
Context-based Measures
Non-compositionality of (x y)
Context of (x y) ≠ context of x and y
Context of x ≠ context of y
Eg: Dutch courage, hot dog
Context as a distribution of words
Entropy, cross entropy
KL divergence (Relative entropy ), Jensen-Shannon diverg.
Context as a point/vector in RN
Euclidean, Manhattan, etc. distance
Angle distance (Cosine similarity)
17-Jul-15
Representation of context
Context of z Cz ( (w1 ), (w2 ), ..., (wn ))
Common representation schemes
Boolean: ( wi ) 0 1
Term frequency (tf): ( wi ) f ( wi )
Tf.Idf:
N
(wi ) f (wi ).
df ( wi )
df ( wi ) |{x : wi Cx }|
2 x y
i i
Dice
similarity
Dice(cthan
, c )tf
Tf.idf
only
slightly better
x y
2
2
x assumption
y
Under-estimation due to independence
i
i
• Sieger, Jin and Hauptmann (1999)
AM
VPCs
Scale tf to [0.5, 1] before multiplied by idf
LVCs
Mixed
[M82]
Dice in tf.idf space
• Salton and Buckley (1987)
0.387
0.367
0.374
Dice in (Scaled tf).idf space
0.568
0.488
0.553
17-Jul-15
Penalization terms
a Mixed
[M34] Brawn-Blanquet max(a b, a c) a max(b, c)
(Minimum
Sensitivity)
[M34]
Brawn-Blanquet
0.478
0.578
0.486
AMs
[M35] Simpson
[M35] Simpson
VPCsa
LVCs
a 0.260
0.249a
0.382
min(a b, a c) a min(b, c)
Insights: penalizing against the higher productive constituent
Confirmed by Laplace,
S cost VPCs
AMs
LVCs
Mixed
a 1
a1
a 10.577
a 1
Laplace max(Adjusted ,M49
) 0.486
0.493
a b 2 a c 2 a min(b, c) 2
a max(b, c) 2
[M49] Laplace
0.241
0.388
0.254
17-Jul-15
Penalization terms for VPCs
AMs
[M18] Sokal-Michiner
[M18] Sokal-Michiner
–max(b, c)
VPCs
ad
LVCsa d
r Mixed r
0.433
0.519
a d b c
abcd
N
0.565
0.453
0.546
0.540
Better to penalize both constituents
f ( xy ) (1 ) f ( xy)
[0,1]
17-Jul-15
Rank-equivalence
Notation:
to mean M1 is rank-equivalent to M2
ad r ad bc r
bc
ad bc
ad
ad
bc
bc
Definition
M(c): score assigned by M to instance c
Rank-equivalent AMs have the same average precision
17-Jul-15
Multiword Expressions
Multiple simplex words
Idiosyncratic
Lexically: ad hoc (ad ? , hoc ?)
Syntactically: by and large (prep. + conj. + adj. ?)
Semantically: spill the beans
Statistically: strong coffee (powerful coffee ?)
Obstacles to language understanding, translation,
generation, etc.
17-Jul-15
Types of MWEs
MWEs
Lexicalized phrases
Fixed exps
Idioms
Semi-fixed exps
NCs
PP-Ds
Institutionalized phrases
Syntactically-flexible exps
LVCs
VPCs
17-Jul-15
Identification of VPCs and LVCs
≠ Free Combination
VPCs: free verb + preposition
Leave it to me.
LVCs: free light verb + noun
make decision ≠ make drugs
VPCs ≠ Prepositional verbs: look after, search for
VPCs
Joint & split configurations
Look up the word/Look the word up
Look it up
No intervening manner adverb
Look up the word carefully
*Look carefully up the word
Prepositional verbs
Only joint configuration
Look after your mum
Look after her
Flexible positions
Look after your mum carefully
Look carefully after your mum
17-Jul-15
Other linguistics-driven tests
MWEs are lexically fixed?
Substitutability test (Lin, 1999)
MWEs are order-specific?
Permutation entropy (PE) (Yi Zhang et al., 2006)
Entropy of Permutation and Insertion (EPI)
• (Aline et al., 2008)
• Not all permutations are valid
• “Permutation” “Syntactic variants”
Others…?
17-Jul-15
Pavel (2005, 2006, 2008)
Combination of 82 association measures
Statistical tests:
Mutual information
Statistical independence
Likely-hood measures
Semantics tests:
Entropy of immediate context
• Immediate context: immediately preceding/following words
Diversity of empirical context
• Empirical context: words within a certain specified window
17-Jul-15
A comparison
Pecina and Schlesinger
Our Project
Czech bigram “collocations”
English bigram VPCs, LVCs
Mixed idiomatic exps, LVCs,
Separately and mixed
terminologies, stock phrases,…
Ranking extracted bigrams
Rank VPC & LVC candidates
Prague Dependency Treebank
Wall Street Journal Corpus
• 1.5 million words
Average Precision (AP)
Machine-learning based
combination
• 1 million words
AP
Analysis of AMs
Categorization
Modifications
17-Jul-15
Pavel (2005, 2006, 2008)
Combination of 82 association measures
Result:
MAP: 80.81%
“Equivalent” measures 17 measures
Issues:
Possible conflicting predictions
• All combined is best?
Unclear linguistic and/or statistical significance
17-Jul-15
Q&A
Bigram idioms
in all, after all, later on, as such
17-Jul-15
17-Jul-15
Gold-standard Evaluation data
Bigrams with frequency ≥ 6
413 WSJ-attested VPC candidates
Candidates: (verb + particle) pair
Annotations from Baldwin, T. (2005)
100 WSJ-attested LVC candidates
LVC candidates: (light verb + noun) pairs
Annotations from Tan, Y. F., Kan M., Y. and Cui H. (2006)
Evaluation
set
Size
Negative
Instances
Positive
Instances
% of positive
Instances
VPC
413
296
117
28.33%
LVC
100
72
28
28%
Mixed
513
368
145
28.26%
A random ranker has an AP ~ 0.28
17-Jul-15