PrepNet: a Framework for Describing Prepositions

Download Report

Transcript PrepNet: a Framework for Describing Prepositions

PrepNet: a Framework for Describing
Prepositions:
Preliminary Investigation results
Patrick Saint-Dizier
IRIT-CNRS, France
Long-term objectives
• Construct a repository of preposition syntactic and
semantic behaviors,
• Develop a multi-level approach, from prototypical uses to
unexpected ones, that accounts for diversity of preposition
uses and for their polysemic behavior,
• Develop a relatively shallow semantic characterization
based on frames,
• Investigate the verb-preposition-NP relations: restrictions
and compositionality
• Develop a multi-lingual approach.
 Applications: MT, Knowledge extraction, QA, etc.
This paper:
basic elements of a preliminary approach
• Introduce a general characterization of preposition senses
viewed as abstract notions,
• Characterize these abstract notions by means of frames
(viewed as linguistic or conceptual macros),
• Populate preposition frames via corpus and then validate,
• Develop a multi-level characterization of preposition uses,
to organize the diversity of their uses in language,
• Raise a few questions about multilinguality (prepositions can
be realized by other categories or by morphology in some languages)
 Investigate evaluation methods, in abstracto, and via
applications.
Related work
• Very little in CL circles compared to verbs and nouns, in
spite of their necessity in a number of applications (MT,
IE, QA, …),
• Almost nothing in EWN, FrameNet or VerbNet,
• Some valuable work in AI: e.g. temporal, spatial reasoning,
• A few isolated works in linguistics on a given preposition,
• Quite a lot of work in psycho-linguistics.
Other resources: B. Dorr’s large description for English, with
MT in view (about 500 entries).
Why is that so ?
• High polysemy (but may be not more than adjectives?, and
smaller number: 95 preps. in French + compounds, 32 in
Spanish: not always agreement on what a preposition
is…..)
• Linguistic realizations very difficult to predict, large
number of idiosyncratic uses and cross-linguistic
differences,
• Syntactic difficulties due to the chain V-Prep-N, e.g.: PPattachment problems, VPC,
• Deep level in the semantic-cognitive structure:
prepositions often used in metalanguages as primitives
 Study here only compositional uses of prepositions
Global architecture of the proposal
Prep. Senses: 3 level set of abstract notions
Shallow semantic representation with strata
Uses in language 1
Uses in language 2
etc.
General architecture (1): categorizing
preposition senses
 Preposition categorization on 3 levels:
– Family (roughly thematic roles): localization, manner,
quantity, etc.
– Facets: localization: source, position, destination, etc.
– Modalities.
 Facets viewed as abstract notions on which PrepNet is
based
 12 families defined
Families/ facets
Quantity: numerical/ frequency / proportion
Accompaniment: adjunction/ simultaneity/ inclusion/ exclusion
Manner: means/ manners and attitudes/ imitation or analogy
Localisation: source/ destination/ via/ fixed position
Choice and exchange: exchange / choice or alternative / substitution
Causality: cause/ goal or consequence/ intention
Opposition
Ordering: priority/ subordination/ hierarchy/ ranking/ degree of
importance
Minor elements: about, in spite of, comparison
(see examples in paper)
 Conceptual/ ontological status of these dictinctions ??
• Families  ‘superframes’ : general principles and
restrictions
• Facets:  frames, strata: subframes : with some general
forms of inheritance and property consistency
• Whenever appropriate: modalities  subframes
Frames are viewed as linguistic macros, to be interpreted.
They are shallow or coarsed-grained representations so far.
Language realizations are a priori associated with the lower
level frame nodes.
(2): a conceptual, prelexical structure
- name + gloss,
- shallow restrictions
- simplified LCS representation
Frame of
abstract notion
SF1
SF2
SF3
strata of
abstract notion:
subframes
Structure of a frame
• Structure:
–
–
–
–
Number, name, gloss,
Frame with shallow constraints: X <Action> Y [Number] Z
Conceptual representation in simplified LCS (kind of LST)
In the future: inferential patterns (within a frame or among frames)
 195 senses/abstract notions described using 65 primitives
 Shallow constraints:
 (1) generic semantic types
 (2) generic verb class types from WordNet
 (3) generic semantic fields from the LCS: temp, poss, loc, psy,
epist, perc, amount, comm, prop, abs, etc.
Example 1: ‘via’
[1] : VIA - generic.
'An entity X moving via a location Y'
X <ACTION>
[1]
Y
X: concrete entity, ACTION: movement verb, Y: location
representation: X : via(loc, Y)
French synset: {par, via}
example: Jean rentre par la porte
Stratification 1:
[1.1] : VIA - narrow passage.
'An entity X moving via / an action that uses a narrow passage in an object Y'
X <ACTION>
[1.1]
Y
X: concrete entity, ACTION: perception verb, Y: location with a narrow passage
representation: X : through(loc or temp, Y)
French synset: {a travers, au travers de, dans}
example: Jean regarde a travers la grille / dans les jumelles.
.
Example 1, cont’:
Stratification 2:
[1.2.1] VIA UNDER – from generic
'An entity X moving via under a location Y'
X
<ACTION>
[1.2.1] Y
X: concrete entity, ACTION: movement verb,
Y: location with a form of passage under it
representation: X : via(loc, under(loc,Y))
French synset: {par dessous}
example: Jean passe par dessous le pont.
[1.2.2] VIA ABOVE – from generic
etc.
Example 2: instruments
Stratification requires the taking into account of 2 relations,
characterized by means of primitives (Mari and Saint-Dizier 03):
– Actor/instrument: undergo (no control), select (controls
another prop.), control,
– Instrument/ V+NP object: be (passive, but participates), react
(other prop than controlled by the agent), act (full participation)
Contrast: cut the bread with a knife / eat soup with a spoon
John burned himself with boiling oil.
 A generic entry for instruments, and, potentially: 9 strata
(combinations), depends on language.
 4 strata for French
(2) cont’
[5] : MANNER - MEANS - Instrument
'Someone X doing an action Y using instrument Z.'
X
<ACTION>
Y
[5]
Z
X: human, ACTION: verb of change, Y: object Z: instrument
representation: X: by-means-of(_, Z)
 Followed by a priori 9 Strata.
Example: Application to French:
1. Be(X,Z) Λ Undergo(Z, Action+Y) : synset: {grâce à} , restrictions…
2. Be(X,Z) Λ Select (Z, Action+Y) : synset: {par} , restrictions…
3. Select(X,Z) Λ React (Z, Action+Y) : synset: {avec} , restrictions…
4. Act(X,Z) Λ Control (Z, Action+Y) : synset: {avec, au moyen de}, …..
(3) The language realization level
SFi
(= lower frame level)
Multi-level partitioning of realizations from usage norms
Direct uses
Indirect uses
etc…
etc…
restr1
synset1
restr2
….
restr3
synset3
Derived types, …
synsets ??
… + frequency
measures
Populating preposition frames from
corpora
• Conceptual frames are associated with shallow constraints
 Move on to the language level, elements of a method:
• For a given language: associate each frame strata with
corpus and dictionary observations
• Manual analysis: identify prototypical uses, promote usage
norms  multi-level partitioning of realizations
• Contrast, if possible, direct versus indirect (mainly
metaphorical) realization levels
• Elaborate conceptual/ontological status of categorizations
and related constraints (mainly semantic types)
A few notes
• Multi-level architecture: helps to account for the large
variety of (compositional) behaviors, investigate in more
depth partitioning strategies,  incremental depth to get
finer-grained analysis worth pursuing??
• For each synset: develop frequency measures, identify
contexts of use (from syntactic to type of text): frequency
rates are very diverse (some uses are only found in dictionaries!)
• Populate but then valide on new corpora: develop several
forms of corpus annotations (the frame; the relation with
the head, with the NP, etc.)
Looking at other languages
• Hypothesis: given an abstract notion (interlingua),
translations are constructed on the basis of the restrictions
that hold on the corresponding synsets,
BUT:
• Large realization variations are in general observed, even
for closely related languages: up to what point is this just
surface language contrasts? Or is it also conceptual ? :
Regarder dans le microscope / look through the
microscope (durch; a travès de)
• Some languages have do not use so much pre-/postpositions, but other categories, incorporation in heads, or
just case marks .
Preliminary conclusions
• Preliminary investigation to identify difficulties and
organize the research,
• Global architecture looks an interesting approach
• Abstract notion definitions seem to be quite stable, status
of strata needs further investigations,
• Multi-level approach to language realizations seems a good
direction, but needs a much larger testing on a number of
languages and a more clear method to organize sets of
realizations
• Implement an open system on the Web.
Some obvious research directions
 ontological/conceptual status of categorizations and
restrictions,
 Investigate integration with other frameworks: VerbNet,
FrameNet,
 Investigate preposition polysemy and derived uses in more
depth, and ways to characterize it
 Relations Head-preposition-NP, and compositionality
(Head is often a verb, but can be any other kind of
predicate): some PPs have wider scope over the
proposition.
 Inferential patterns associated with prepositions (e.g. for
approximation notions, spatial notions, etc.)