Automated Suggestions for Miscollocations
Download
Report
Transcript Automated Suggestions for Miscollocations
Automated Suggestions for
Miscollocations
Anne Li-E Liu
David Wible
Nai-lung Tsao
June 5, 2009
Automated Suggestions for
Miscollocations
1
Overview
• Introduction
• Methodology
• Experimental Results
• Conclusion
June 5, 2009
Automated Suggestions for
Miscollocations
2
Introduction
• Our study focuses on how to find
suggestions for miscollocations
automatically.
• In this paper, only verb-noun collocations
and miscollocations are considered.
June 5, 2009
Automated Suggestions for
Miscollocations
3
Introduction
• Howarth’s (1998) investigation of collocations
found in L1 and L2 writers’ writing.
• Granger’s analysis on adverb-adjective
collocation (1998).
• Liu’s (2002) lexical semantic analysis on the
verb-noun miscollocations in English Taiwanese
Learner Corpus.
June 5, 2009
Automated Suggestions for
Miscollocations
4
Introduction
Projects using learner corpora in analyzing and
categorizing learner errors:
• NICT JLE (Japanese Learner English) Corpus
• The Chinese Learner English Corpus (CLEC)
• English Taiwan Learner Corpus (or TLC) (Wible
et al., 2003).
June 5, 2009
Automated Suggestions for
Miscollocations
5
An example
• She tries to improve
her students’ problems.
reduce
V collocates from
Collocation Explorer
June 5, 2009
Automated Suggestions for
Miscollocations
1. solve
2. pose
3. tackle
4. grapple
5. alleviate
6. overcome
7. exacerbate
8. compound
9. beset
10. resolve
6
Method
• Three features of collocate candidates are
used:
1. Word association strength,
2. Semantic similarity
3. Intercollocability (Cowie and Howarth, 1996).
June 5, 2009
Automated Suggestions for
Miscollocations
7
Resource
• 84 VN miscollocations in TLC (Liu, 2002).
Training data: 42
Testing data: 42
• Two knowledge resources: BNC, WordNet
• Two human evaluators.
June 5, 2009
Automated Suggestions for
Miscollocations
8
Word Association Strength
•
•
Mutual Information (Church et al. 1991)
Two purposes:
1. All suggested correct collocations have to be
identified as collocations.
2. The higher the word association strength the
more likely it is to be a correct substitute for
the wrong collocate.
June 5, 2009
Automated Suggestions for
Miscollocations
9
Semantic Similarity
•
A semantic relation holds between a miscollocate
and its correct counterpart (Gitsaki et al., 2000;
Liu 2002)
*say a story
tell a story
Synonymous relation
•
The synsets of WordNet to be nodes in a graph.
measure graph-theoretic distance
*say a story
think of a story
Hypernymy relation
June 5, 2009
Automated Suggestions for
Miscollocations
10
Semantic Similarity
sim ( w1 , w2 )
June 5, 2009
max
si synset( w1 ), s j synset( w2 )
(1
Automated Suggestions for
Miscollocations
dis ( s i , s j )
2 max( Lsi , Ls j )
11
)
Intercollocability
• Cowie and Howarth (1996) propose that certain
collocations form clusters on the basis of the
shared meaning.
convey point
get across the message
express concern
convey
get across
express
communicate
June 5, 2009
communicate concern
convey feeling
message
point
concern
feeling
Automated Suggestions for
Miscollocations
12
Intercollocability
• Collocations in a cluster show a certain degree
of intercollocability.
convey
get across
express
communicate ?
message
point
concern
feeling
condolences
express
concern
express one’s concern
communicate
feeling
June 5, 2009
Automated Suggestions for
Miscollocations
13
Intercollocability
She tries to *improve her students’ problems.
Starting point.
*improve problem
problem
86 verb collocates
improve
52 noun collocates
resolve/
resolve
improve
Does any of the 86
verbs co-occur with the
52 nouns?
problem
+ situation
+ matter
problem
reduce
reduce/
+ quality
improve + efficiency
+ way
June 5, 2009
+ effectiveness
Automated Suggestions for
Miscollocations
14
Intercollocability
situation
resolve
matter
improve
problem
way
reduce
quality
efficiency
effectiveness
• The cluster is partially created and the link between
improve, resolve and reduce is developed by virtue of
the overlapping noun collocates.
June 5, 2009
Automated Suggestions for
Miscollocations
15
Intercollocability
Quantify intercollocability
The number of shared collocates
June 5, 2009
Automated Suggestions for
Miscollocations
16
situation
resolve
matter
improve
problem
way
reduce
quality
efficiency
effectiveness
shared collocate (resolve, improve) = 3
shared collocate (reduce, improve) = 3
The more shared collocates a verb has with the wrong verb,
the more likely this verb is a good candidate
June 5, 2009
Automated Suggestions for
Miscollocations
17
Integrate the 3 features
• The probabilistic model
P S c Fc ,m
P f S PS
PF S PS
PF
P f
c
c ,m
c
c
c
f Fc , m
c ,m
f Fc , m
June 5, 2009
Automated Suggestions for
Miscollocations
18
Training
• Probability distribution of word association
strength
MI value to 5 levels
(<1.5, 1.5~3.0, 3.0~4.5, 4.5~6, >6)
P( MI level )
P(MI level | Sc)
June 5, 2009
Automated Suggestions for
Miscollocations
19
Training
• Probability distribution of semantic
similarity
Similarity score to 5 levels
(0.0~0.2, 0.2~0.4, 0.4~0.6, 0.6~0.8 and 0.8 ~1.0 )
P(SS level )
P(SS level | Sc)
June 5, 2009
Automated Suggestions for
Miscollocations
20
Training
• Probability distribution of intercollocability
Normalized shared collocates number to 5
levels
(0.0~0.2, 0.2~0.4, 0.4~0.6, 0.6~0.8 and 0.8 ~1.0 )
P(SC level )
P(SC level | Sc)
June 5, 2009
Automated Suggestions for
Miscollocations
21
Experiments
• Different combinations of the three features.
Models
Feature (s) considered
M1
MI (Mutual Information)
M2
SS (Semantic Similarity)
M3
SC (Shared Collocates)
M4
MI + SS
M5
MI + SC
M6
SS + SC
M7
MI+ SS + SC
June 5, 2009
Automated Suggestions for
Miscollocations
22
Results
KBest
M1
16.67
1
36.90
2
47.62
3
52.38
4
64.29
5
65.48
6
67.86
7
70.24
8
72.62
9
June
105, 200976.19
M2
(SS)
M3
M4
M5
M6
(SS+SC)
M7
(MI+SS+
SC)
40.48 22.62 48.81 29.76 55.95 53.75
53.45 38.10 60.71 44.05
63.1
67.86
64.29 50.00 71.43 59.52 77.38 78.57
67.86 63.10 77.38 72.62 80.95 82.14
75.00 72.62 83.33 78.57 83.33 85.71
77.38 75.00 85.71 83.33 84.52 88.10
77.38 77.38 86.90 86.90 86.90 89.29
80.95 82.14 86.90 89.29 88.10 91.67
83.33 85.71 88.10 92.86 90.48 92.86
23
86.90 Automated
88.10Suggestions
88.10for 94.05 90.48 94.05
Miscollocations
Results (cont.)
The K-Best suggestions for “get knowledge”.
K-Best
M2
M6
M7
1
aim
obtain
acquire
2
generate
share
share
3
draw
develop
obtain
4
obtain
generate
develop
5
develop
acquire
gain
June 5, 2009
Automated Suggestions for
Miscollocations
24
The K-Best suggestions for *reach purpose.
K-Best
M2
M6
M7
1
achieve
achieve
achieve
2
teach
account
account
3
explain
trade
trade
4
account
treat
fulfill
5
trade
allocate
serve
June 5, 2009
Automated Suggestions for
Miscollocations
25
The K-Best suggestions for *pay time.
K-Best
M2
M6
M7
1
devote
spend
spend
2
spend
invest
waste
3
expend
devote
devote
4
spare
date
invest
5
invest
waste
date
June 5, 2009
Automated Suggestions for
Miscollocations
26
Conclusion
• A probabilistic model to integrate features.
• The early experimental result shows the
potential of this research.
June 5, 2009
Automated Suggestions for
Miscollocations
27
Future works
• Applying such mechanisms to other types of
miscollocations.
• Miscollocation detection will be one of the main
points of this research.
• A larger amount of miscollocations should be
included in order to verify our approach.
June 5, 2009
Automated Suggestions for
Miscollocations
28
Thank you!
Q&A
Anne Li-E Liu [email protected]
David Wible
[email protected]
Nai-Lung Tsao [email protected]
June 5, 2009
Automated Suggestions for
Miscollocations
29