Extracting Information from Participial Structures
Download
Report
Transcript Extracting Information from Participial Structures
Extracting Information from
Participial Structures
Kata Gábor, Enikő Héja, Ágnes Mészáros
Research Institute for Linguistics, HAS
8th INTEX WORKSHOP, 2005
STRUCTURE
IE system and its shortage: the
problem of participles
NPs and participles in Hungarian
a possible enhancement of the IE
system
implementation in INTEX
IE system
input text (1-2 sentences of short business news)
shallow syntactic analysis
pre-defined semantic patterns (event frames)
output: event frames’ slots filled by the elements
of the input text
the event, its participants and circumstances are
identified
Event frames
Az ABN Amro Bank egyesül a Kereskedelmi és Hitelbankkal.
ABN Amro Bank fuses with Commercial and Credit Bank.
<event schema="owner_changed.fusion.6" roles_matched="3/3">
<rv role="member_company_1" pos="N" case="NOM" sem="company|institute">
<NP id="88" sem="company countable human institute">
<w id="0" class="DET" at="1-1" lex="az" case="NOM">Az</w>
<w id="2" class="UNKNOWN" at="2-2" lex="ABN">ABN</w>
<w id="4" class="UNKNOWN" at="3-3" lex="Amro">Amro</w>
<w id="6" class="N" at="4-4" lex="bank" case="NOM">Bank</w>
</NP>
</rv>
<rv role="_1" pos="V" lemma="egyesül">
<w id="8" class="V" at="5-5" lex="egyesül">egyesül</w>
</rv>
<rv role="member_company_2" pos="N" case="INS" sem="company|institute">
<NP id="118" sem="company countable institute">
<w id="13" class="DET" at="6-6" lex="a" case="NOM">a</w>
<w id="15" class="ONADJ" at="7-7" lex="kereskedelem"
case="NOM">Kereskedelmi</w>
<w id="17" class="CONJ" at="8-8" lex="és">és</w>
<w id="19" class="N" at="9-9" lex="hitelbank" case="INS">Hitelbankkal.</w>
</NP>
</rv>
</event>
Mapping syntax to event frames
SYNTAX
EVENT FRAMES
verb
main event
arguments
participants
free modifiers
circumstances
(time, location,manner...)
Mapping syntax to event frames
Problem: secondary information (cause or
antecedent of the main event) is ‘hidden’
in participial structures:
[A befektetők által tegnap eladott
részvények] megnövelték a tőzsde
forgalmát.
[The shares sold yesterday by the
investors] increased the traffic at the stock
exchange.
Mapping syntax to event frames
[A befektetők által tegnap eladott részvények] megnövelték a tőzsde
forgalmát.
[The shares sold yesterday by the investors] increased the traffic at
the stock exchange.
a befektetők / the investors /
eladott / sold /
tegnap / yesterday /
részvények / shares /
Mapping syntax to event frames
[A befektetők által tegnap eladott részvények] megnövelték a tőzsde
forgalmát.
[The shares sold yesterday by the investors] increased the traffic at
the stock exchange.
A befektetők tegnap eladtak részvényeket.
The investors sold shares yesterday.
A solution
a preprocessing module within the IE system
which transforms participial structures into
sentences with a finite predicate
semantic frame matching may operate on
transformed sentences
1st step: past participles within NPs
• the participle preserves the meaning of its base verb
• its arguments can be derived from the internal structure
of the NP
NPs in Hungarian 1.
NPs in Hungarian 2.
ADV
NP+case
DET
Participles
AP+case
N+Postp
V.INF
...
(past, present)
modifiers
head Noun
Participles in Hungarian
ADJ – Participle homonimy is a problem:
“mérsékelt PC-chip kereslet”
modest /~moderated/ demand for PC-chips
* Valaki mérsékelte a PC-chip keresletet.
* Somebody moderated the demand for PC-chips
“ragozott szóalakok”
inflected word forms
* Valaki ragozott szóalakokat.
* Somebody inflected word forms.
only participles can be transformed
Participle or Adjective?
syntactic tests
comparative
ADV formation
predicative use
impossibility of preverb detachment
we need to decide in the context whether the given word form
is an ADJ or a PART:
1. If at least one of the base verb’s
complements is present, than it is a
participle.
Participle or Adjective?
syntactic tests
comparative
ADV formation
predicative use
preverb detachment
we need to decide in the context whether the given word form
is an ADJ or a PART:
2. If at least one of the base verb’s
complements / adjuntcs / a preverb is present,
than it is a participle.
Participle or Adjective?
TESTS:
• comparative: “mérsékeltebb kereslet”
more moderate demand
• predicative: “Ez a szóalak ragozott.”
This word form is inflected.
• ADV formation: mérsékelt mérsékelten
moderate moderately
• preverb detachment:
“a [fel nem újított] házak”
“the [re- not stored] houses” (=not restored)
* Ezek a házak
[fel nem újítottak].
* These houses are [re- not stored].
THE GRAMMAR
- the correctness and informativity of the resulting
sentence depends on the correct identification of
verbal arguments and modifiers within the NP
- then these elements are transformed according to
their grammatical function
• past participles may be formed from both transitive or
intransitive verbs
• if the base verb is intransitive, the head noun of the NP
represents the subject of the base verb:
“az összedőlt épület” /the collapsed building/
• if the base verb is transitive, the head noun represents the
direct object of the base verb
“a bejelentett változások”
/the changes announced/
transitivity needs to be coded
THE GRAMMAR
transformation rules are (enhanced) FSTs:
• they store relevant elements of the input NP in
variables
• the output is made up of the content of these
variables but in an altered order + function words
needed in the sentence
• our delaf dictionary codes
transitivity properties of verbs (on the basis of a
lexicon-grammar of verbal argument structures)
+- preverb feature shows whether the base verb has a
preverb
Transformation Graphs 1.
Transitive Verbs
transitive verbs without expressed subject
(“somebody” insertion):
Det
(V_compl)
Valaki
V_vmib Det
VMIB
N –t
N
(V_compl) .
transitive verbs with a subject with the PostP “által”:
Det
Nsubj
Nsubj
által
V_vmib
Det
(V_compl)
N –t
VMIB
(V_compl) .
N
Transformation Graphs 2.
Intransitive Verbs
head N becomes subject (patient)
Det
Det
(V_compl)
N
V_vmib
VMIB
N
(V_compl) .
Structure of the graphs
1 graph
3 subgraphs according to complement-types:
possessor / verbal complement+adjunct /
nothing/
each subgraph divided into two paths:
transitive / intransitive verbs
Evaluation
central aspect: to what extent does it augment the
efficiency of the IE system?
lack of information (recall value) is considered less
important than incorrect information (precision)
evaluated on the 231.000 words corpus of short business
news;
1259 hits 898 qualified as informative
precision: 64%
further task: recall
(requires a corpus with manually annotated
participial structures)
THANK YOU FOR YOUR
ATTENTION!
{gkata, eheja, magnes}@corpus.nytud.hu