Tutorial for the annotation of the Penn Discourse Treebank

Download Report

Transcript Tutorial for the annotation of the Penn Discourse Treebank

Annotation Guidelines for the
Penn Discourse Treebank
Part A
Eleni Miltsakaki, Rashmi Prasad,
Aravind Joshi, Bonnie Webber
1
Discourse relations (1)
 Discourse relations hold between parts of text
 One way of marking discourse relations is by
use of explicit markers
Markers  discourse connectives
Textual spans they relate  arguments
2
Example
(1) On the one hand, John loves Barolo.
(2) So he went and ordered three cases.
(3) On the other hand, he didn’t have
much money.
(4) So then he had to cancel the order.
3
Discourse relations (2)
 Between adjacent textual spans, discourse
relations may hold which must be inferred.
 In such cases, we establish the presence of an
implicit connective
Example
(5a) You should never lend any books to
John.
(5b) He never returns them.
4
Goals of PDTB
 To produce a large scale and reliably annotated
corpus, which
Encodes discourse relations associated with
discourse connectives
Including implicit connectives
5
Corpus
 Penn Treebank
Approx. 1 million words
Wall Street Journal
25 sections
100 files in each section
6
Annotation tasks
 Annotation of explicit connectives
 Annotation of implicit connectives
7
Annotation tool
 Wordfreak
Allows you to search for specific connectives
Keeps record of connectives and arguments
 More later…
8
Explicit connectives (1)
 Subordinate conjunctions
 ‘because’, ‘although’, ‘when’, etc.
 Arguments found locally
 Subordinate clauses can be preposed
(6a) John failed the exam because he was
lazy.
(6b) Because he was lazy, John failed the
exam.
9
Explicit connectives (2)
 Coordinate conjunctions
‘and’, ‘but’, ‘or’, ‘so’
Arguments found locally
Preposing is not allowed
(7a) John is very smart but he failed the
exam.
(7b) # But he failed the exam, John is very
smart.
10
Explicit connectives (3)
 Adverbials
‘therefore’, ‘however’, ‘as a result’, etc.
One argument found locally
One argument may or may not be found locally
(1) On the one hand, John loves Barolo.
(2) So he went and ordered three cases.
(3) On the other hand, he didn’t have much
money.
(4) So then he had to cancel the order.
11
Annotation of explicit conns
We have grouped explicit connectives in sets of 10.
Your task is to:
• Identify all instances of a given set of connectives in
the corpus.
• Mark their arguments.
Proceed one file at a time.
12
Sets of explicit connectives

In progress: Set 3

Adverbials
•

Coordinate conj.
•

And, but, or
Subordinate conj.
•

Indeed, for example
As soon as, unless, as long as,
after, until
 Already annotated:
 Adverbials:
• instead, otherwise, therefore, as
a result, nevertheless, in fact,
then, on the other hand,
however, furthermore/further
 Subordinate conjunctions
• Because, although, even though,
when, so that, if, while, since
In progress: Set 4

Adverbials
•

• Section 00, Section 06
Coordinate conj.
•

Though, yet, so, on the
contrary, conversely
 Empty
nor
Subordinate conj.
•
Whereas, as, insofar as, till
13
Implicit connectives
 Implicit connectives describe relations that hold
between adjacent textual spans and that they must be
inferred.
 In PDTB we will only annotate implicit connectives
between sentences in the same paragraph.
 We will initially ignore implicit connectives across
paragraphs or within a sentence.
(8) John walked across the room, waving at
everybody.
14
Annotation of implicit conns
 Relation between two adjacent sentences.
 Both sentences belong to the same paragraph.
 Second sentence does not contain a connective.
! Preposed subordinate conjunctions in the second sentence do not count
(9) Mary stayed until late. (IMPLICIT) Although she
was very tired, she had to finish the report today.
 Mark the period as a placeholder for an implicit connective.
 Mark the arguments.
 Provide an explicit connective that best expresses the relation.
15
What is a legal argument
 Multiple-sentences
 Sentences
 Main clause + subordinate clauses
(10) It is cold, although the sun is shining.
(11) John walked across the hall, waving his hand
cheerfully.
 Clauses
 Grammatical unit that contains a predicate and its
arguments
 Tensed
 Non-tensed
16
Predicates and propositions
 Predicates
 Verb
 Says something about the subject
(12) John is sleeping
 May require one, two, or three arguments (‘sleep’, ‘eat’,
‘give’
 Propositions (expressions of events or states)
 Predicate and its arguments
 Semantic objects, constant across syntactic variability
(13) John ate the banana.
(14) The banana was eaten by John.
(15) Did John eat the banana?
17
Attention!
 Discourse relations hold between propositions.
 When annotating arguments include a predicate.
(16) Everybody considered Einstein's contribution to
be a breakthrough because he discovered the
theory of relativity.
 Do not separate a predicate from its arguments.
 BUT: Implicit arguments are OK in non-tensed clauses.
(17) John crossed the hall, waving his hand
cheerfully.
 ALSO: If the only thing left from the clause that contains your selection is
a non-verbal element, include it in your selection.
(18) * In Geneva, however, [they supported Iran’s
proposal].
18
What is not a legal argument
 Textual spans that do not contain (or refer to)
propositional material (usually a verb and its
arguments at minimum).
 Verbs separated from their arguments.
 You can select a clause that is the argument of a verb,
excluding the verb.
 You cannot select the verb and leave out its arguments.
(19) John said [that Mary left]. OK
(20) [John said] that Mary left. NOT OK
(21) [John said that Mary left]. OK
19
NPs as arguments?!
 Discourse deictic expressions are NPs and they may
be selected as arguments because they may refer to
propositional material.
 Discourse deictic expressions are ‘this’ and ‘that’
when they refer to textual spans in the preceding
discourse.
(21) ABC is firing 1,000 employees. That (is)
because they have huge debts.
20
What is a legal argument: summary
 A single clause




[John left].
Because [John left]…
While [watching TV]….
John wants [to leave].
 A single sentence
 [John wants to leave because he’s sick].
 Multiple sentences
 NPs that refer to clauses
 [This] because…
 Some nominal forms expressing events or states (but make a
note)
 After [the sudden price increase]…
21
What is ARG1/ARG2



The clause that contains the connective is always
Arg2.
The other argument of the connective is Arg1.
Note that with subordinate conjunctions it is
possible for Arg2 to precede Arg1.
(22)Because [Arg 2 he was sick], [Arg1 John left
early].
22
ARG and SUP annotations
 When deciding what to mark as an argument of the connective,
you should select what is ‘minimally’ necessary to interpret the
relation established by the connective. Mark that as ARG.
 This is a good principle to follow.
 However, sometimes you may feel you want to mark/include
material which provides useful, even if not crucial,
information about the interpretation of an argument. Mark this
as SUP. SUP annotations are optional.
23
SUP: Example 1
 Lawyers and their clients who frequently bring
business to a country courthouse can expect to appear
before the same judge year after year. [Fear of
alienating that judge is pervasive], says Maurice
Geiger, founder and director of the Rural Justice
Center in Montpellier, Vt., a public interest group that
researches rural justice issues.
As a result, lawyers think twice before appealing a
judge’s ruling, are reluctant to mount, or even
support, challenges against him for reelection and
usually loath to file complaints that might impugn a
judge’s integrity.
24
SUP: Example 2
 While dividends have risen smartly, [their
expansion hasn’t kept pace with even stronger
advances in stock prices].
25
Connectives are not part of their
arguments
 When annotating the second argument of a
connective do not include the connective itself.
(23) He failed the exam although he had studied hard.
 Connectives may appear in an argument of a
connective that you are annotating. Include that
connective in the selection of the argument.
(24) When the stock market dropped nearly 7% Oct. 13,
for instance, the Mexico Fund plunged about 18%
and the Spain Fund fell 16%.
26
What about sentence medial
connectives?
 If a connective is sentence medial you exclude
it from your selection of the argument.
 Wordfreak allows you to select discontinuous
text and enter it as single argument.
27
Using the discontinuous text
selection feature in the tool
 On-line demo
 Basic steps
 Press Control
 Select span 1
 Holding Control pressed,
 Select span 2, 3, etc.
 Then click on the Arg button to enter your selection
 All selected spans will show up in the Arg window in the
order that they were selected
28
Examples with discontinuous text
selections

Connectives
(25) In Geneva, however, they supported
Iran’s proposal.

Modifications
(26) Mary, who is a friend of mine, just arrived in
Philadelphia.

Parentheticals
(27) The price of the stock -many had
expected this- was rising.

…
29
What not to annotate!

Do not annotate connectives that are followed by a preposition.
Out:
(28) Instead of teaming up, GE Capital staffers and Kidder
investment bankers have bickered.
In:
(29)
The Hopkinsian universal disinterested
benevolence, although holding to original sin and
the doctrine of election, inspired its adherents
to heroic endeavours for others, ...
(30) Its 1,400-member brokerage operation reported
an estimated $5 million loss last year, although
Kidder expects it to turn a profit this year.
30
Connectives and relations
 Think of a discourse relation that ‘then’ can
express?
 Think of another discourse relation that ‘then’
can express?
 Think of a connective that expresses a
‘contrastive’ relation?
 Think of another connective that expresses a
‘contrastive’ relation?
 Other discourse relations?
31
Practice: Test your understanding
of legal arguments
1. When Sophie and Joanna got to the supermarket they
went their separate ways.
2. At the end of the road there was a sharp bend, known
as Captain’s Bend.
3. People seldom went that way except on the weekend.
4. Sophie tried to imagine herself shaking hands and
introducing herself as Lillemor Amundsen, but it
seemed all wrong. It was someone else who kept
introducing himself.
5. ‘I’m Sophie Amundsen,’ she said.
6. Sophie tried to beat her reflection to it with a
lightning movement but the girl was just as fast.
7. Sophie pressed her index finger to the nose in the
mirror and said, ‘You are me.’ As she got no answer to
this, she turned the sentence around and said, ‘I’m
you.’
32
Wordfreak
 On-line demonstration
33