ppt - pedagogy (main phase)

Download Report

Transcript ppt - pedagogy (main phase)

Resources: Question Classification Schemes, Graesser et al.
Automatic Factual Question Generation from Text (Chapter 3), Michael Heilman

Questions test factual knowledge of a learner
 When did Alexander invade India?
 Who invented small pox vaccine?

Does not involve higher order cognitive skills
like inference

Overgenerate-and-rank framework
CMU Question Generator: http://www.ark.cs.cmu.edu/mheilman/questions/




Source sentence: sentence taken directly from
the input document
Derived sentence: declarative sentence
derived in stage 1
Answer phrase: possible answer to generated
questions
Question phrase: phrase containing the
question word replacing an answer phrase

Mark clauses or phrases for
 NLP transformation (simplification, compression)
 Answer phrase marking
 Tregex

Delete clauses or phases for
 NLP transformation
 Tsurgeon
Resources: Tregex and Tsurgeon: tools for querying and manipulating tree data structures,
Levy and Andrew
Web: http://nlp.stanford.edu/software/tregex.shtml



A java program for identifying patterns in trees
Like regular expressions for strings
Simple example: NP < NN
S
NP
tregex.sh “NP < NN” treeFilename
VP
DT NN VBD
VP
VBG NP
NN
The firm stopped using
PP
IN
crocoin
dilite
NP
NP
PRP NN
its
NN
cigarette filters


The basic units of Tregex are Node
Descriptions
Descriptions match node labels of a tree
 Literal string to match: NP
▪ Disjunction of literal strings separated by ‘|’: NP|PP|VP
 Regular Expression (Java 5 regex): /NN.?/
▪ Matches NN, NNP, NNS
 Wildcard symbol: __ (two underscores)
▪ Matches any node

Descriptions can be negated with !: !NP


Relationships between tree nodes can be specified
There are many different relations. Here are a few:
Symbol
Description
Symbol Description
A<B
A is the parent of B
A << B
A is an ancestor of B
A$B
A and B are sisters
A $+ B
B is next sister of A
A <i B
B is ith child of A
A <: B
B is only child of A
A <<# B
B is a head of
phrase A
A <<- B B is rightmost descendent
A .. B
A precedes B in depth-first traversal of tree
http://nlp.stanford.edu/manning/courses/ling289/Tregex.html

Relations can be strung together for “and”
 All relations are relative to first node in string
 NP < NN $ VP
▪ “An NP over an NN and with sister VP”
 & symbol is optional: NP < NN & $ VP

Nodes can be grouped with parentheses
 NP < (NN < dog)
▪ “An NP over an NN that is over ‘dog’ ”
 Not the same as NP < NN < dog

Ex: NP < (NN < dog) $ (VP <<# (barks > VBZ))
 “An NP both over an NN over ‘dog’ and with a
sister VP headed by ‘barks’ under VBZ”
X
NP
VP
NN
VBZ
dog
barks

Operators can be combined via “or” with |
 Ex: NP < NN | < NNS
 “An NP over NN or over NNS”

By default, & takes precedence over |
 Ex: NP < NNS | < NN & $ VP
 “NP over NNS OR both over NN and w/ sister VP”
 Equivalent operators are left-associative

Any relation can be negated with “!” prefix
 Ex: NP !<< NNP
 “An NP that does not dominate NNP”

To specify operation order, use [ and ]
 Ex: NP [ < NNS | < NN ] $ VP
 “An NP either over NNS or NN, and w/ sister VP”

Grouped relations can be negated
 Just put ! before the [

Already we can build very complex expressions!
 NP <- /NN.?/ > (PP <<# (IN ![ < of | < on]))
 “An NP with rightmost child matching /NN.?/ under a
PP headed by some preposition (IN) that is not either
‘of’ or ‘on’ ”
“An NP with rightmost child matching /NN.?/ under a PP headed
by some preposition (IN) that is not either ‘of’ or ‘on’ ”
NP <- /NN.?/ > (PP <<# (IN ![ < of | < on]))
PP
IN
about
NP
NNS

Sometimes we want to find which nodes
matched particular sub-expressions
 Ex: /NN.?/ $- JJ|DT
 What was the modifier that preceded the noun?

Name nodes with = and if expression matches,
we can retrieve matching sub-expr with name
 Ex: /NN.?/ $- JJ|DT=premod
 Subtree with root matching JJ|DT is stored in a map
under key “premod”

Note:
 named nodes are not allowed in scope of negation
Sometimes we want to try to match a subexpression to retrieve named nodes if they exist,
but still match root if sub-expression fails.
 Use the optional relation prefix ‘?’
 Ex: NP < (NN ?$- JJ=premod) $+ CC $++ NP

 Matches NP over NN with sisters CC and NP
 If NN is preceded by JJ, we can retrieve the JJ using
the key “premod”
 If there is no JJ, the expression will still match

Cannot be combined with negation

What?
makes operations on a grammatical tree

How?
based on Tregex syntax

Where?
Javanlp: trees.tregex.tsurgeon
• utility for identifying patterns in trees
(like regular expressions for strings)
• node descriptions and relationships between nodes
S
NP
DT
NN
VP
VBD
VBG
The
firm
stopped
NP < /^NN/
VP
using
NP
PP
NN
IN
crocodilite
in
NP
PRP
NN
its
cigarette
NNS
filters

Define a pattern to be matched on the trees
VBZ=vbz $+ NP

Define one or several operation(s)
relabel vbz VBZ_TRANSITIVE
(ROOT
(SBARQ
(SQ (NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat)))
(PUNCT ?)))
PUNCT=punct > SBARQ
delete punct
(ROOT
(SBARQ
(SQ (NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat)))
(PUNCT ?)))
delete <name1>…<nameN>
PUNCT=punct >
SBARQ
delete punct
Delete the node
and everything below it
(ROOT
(SBARQ
(SQ (NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat))))))
SBARQ=sbarq >
ROOT
excise sbarq
sbarq
(ROOT
(SQ (NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat)))))
(ROOT
(SBARQ
(SQ (NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat))))))
excise <name1> <name2>
name1 is name2
or dominates name2.
All children of name2
SBARQ=sbarq >
ROOT
excise sbarq
sbarq
go into the parent of
name1,
where name1 was.
(ROOT
(SQ (NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat)))))
SQ=sq > ROOT !<- /PUNCT/
insert (PUNCT .) >-1 sq
<tree>
<position>
(ROOT
(SQ (NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat)))
(PUNCT .)))
insert <name> <position>
insert <tree> <position>
<position> := <relation> <name>
<relation>
$+ the left sister of the named node
$- the right sister of the named node
>i the i_th daughter of the named node
>-i the i_th daughter, counting from the
right, of the named node.
(ROOT
(SQ
(NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat)))
(PUNCT .)))
VP < (/^WH/=wh $++ /^VB/=vb)
move vb $+ wh
<position>
move <name> <position>
moves the named node
into the specified position
(ROOT
(SQ
(NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat)))
(PUNCT .)))
VP < (/^WH/=wh $++ /^VB/=vb)
move vb $+ wh
<position>
(ROOT
(SQ
(NP (NNS Cats))
(VP (VBP do)
(VP (VB eat)
(WHNP what)))
(PUNCT .)))
adjoin <auxiliary_tree> <name>
Adjoins the specified auxiliary tree into the
named node.
The daughters of the target node will become
the daughters of the foot of the auxiliary tree.
adjoin (VP (ADVP (RB usually)) VP@) vp
foot
VP=vp > SQ !> (__ << usually)
adjoin (VP (ADVP (RB usually)) VP@) vp


Input: arbitrary text
Output: simple, concise and declarative
sentences
Input: Putin, the Russian Prime Minister, visited Moscow.
Desired Output: Putin was the Russian Prime Minister.
ROOT
S
VP
NP
NP
,
Putin ,
(noun)
NP
the Russian Prime Minister
(appositive)
VBD
,
,
NP
visited Siberia
(mainverb)
NP < (NP=noun !$-- NP $+ (/,/ $++ NP|PP=appositive !$CC|CONJP))
>> (ROOT << /^VB.*/=mainverb)
ROOT
S
VP
NP
NP
,
Putin ,
(noun)
NP
the Russian Prime Minister
(appositive)
VBD
,
,
NP
visited Siberia
(mainverb)
NP
NP
Putin
the Russian Prime Minister
VBD
visited
NP
NP
VBD
Putin
the Russian Prime Minister
was
Singular past tense form of be
ROOT
S
VP
NP
VBD
NP
Putin
was
the Russian Prime Minister

Representation: phrase structure trees from
the Stanford Parser

Syntactic rules are written in the Tregex tree
searching language
 Tregex operators encode tree relations such as
dominance, sisterhood, etc.

Performing manipulation over identified
Tregex pattern (Tsurgeon)
Given an input sentence A that is assumed true,
we aim to extract sentences B that are also true.
Our operations are informed by two
phenomena:
• semantic entailment
• presupposition
A entails B:
B is true whenever A is true.
Levinson 1983
A: However, Jefferson did not believe the Embargo Act, which
restricted trade with Europe, would hurt the American economy.
Entailment holds when removing
certain types of modifiers.
discourse marker
non-restrictive
relative clause
A: However, Jefferson did not believe the Embargo Act, which
restricted trade with Europe, would hurt the American economy.
Entailment holds when removing
certain types of modifiers.
40
discourse marker
non-restrictive
relative clause
A: However, Jefferson did not believe the Embargo Act, which
restricted trade with Europe, would hurt the American economy.
B: Jefferson did not believe the Embargo Act would hurt the
American economy.
Entailment holds when removing
certain types of modifiers.
41
A: Mr. Putin built his reputation in part on his success at
suppressing terrorism, so the attacks could be considered a
challenge to his stature.
B1: Mr. Putin built his reputation in part on his success at
suppressing terrorism.
B2: The attacks could be considered a challenge to his stature.
In most clausal and verbal conjunctions, the
individual conjuncts are entailed.
negation of main clause
A: Hamilton did not like Jefferson, the third U.S. President.
B: Jefferson was the third U.S. President.
In some constructions, B is true regardless of whether
the main clause of sentence A is true.
• i.e., B is presupposed to be true.
Many presuppositions have clear syntactic or
lexical associations.
Trigger
Example
non-restrictive appositives Jefferson, the third U.S. President,
…
non-restrictive relative
Jefferson, who was the third U.S.
clauses
President…
participial modifiers
Jefferson, being the third U.S.
President, …
temporal subordinate
Before Jefferson was the third U.S.
clauses
President, …
Jefferson was the third U.S. President.

extractSimplifiedSentences
 Input
▪ Constituency parse tree 𝑡.
 Output
▪ 𝑇𝑅𝑒𝑠𝑢𝑙𝑡  set of trees representing simplified sentences
 Uses
▪ extractHelper
▪ Input
 One parse tree
▪ Output
 Split over conjunctions
 Checking outputs have subjects and finite main verbs.


𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ← ∅
𝑇𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑒𝑑 ← 𝑡 ∪ 𝑝𝑎𝑟𝑠𝑒 𝑡𝑟𝑒𝑒 𝑓𝑜𝑟







non-restrictive appositives
non-restrictive relative clauses
subordinate clauses with a subject and finite verb
participial phrases that modify noun phrases, verb
phrases, or clauses
for each 𝑡 ′ ∈ 𝑇𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑒𝑑 do
 𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ← 𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ∪ 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝐻𝑒𝑙𝑝𝑒𝑟 𝑡 ′
end for
return 𝑇𝑟𝑒𝑠𝑢𝑙𝑡
𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ← ∅
move any leading prepositional phrases and quotations in 𝑡 to be
the last children of the main verb phrase.
 remove the following from 𝑡:


 noun modifiers offset by commas
 leading modifiers of the main clause

if t is conjoined with a conjunction then
 𝑇𝑐𝑜𝑛𝑗𝑢𝑛𝑐𝑡𝑠 ← extract new sentence trees for each conjuncts
 for all𝑡𝑐𝑜𝑛𝑗𝑢𝑛𝑐𝑡 ∈ 𝑇𝑐𝑜𝑛𝑗𝑢𝑛𝑐𝑡𝑠 do
▪ 𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ← 𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ∪ 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝐻𝑒𝑙𝑝𝑒𝑟(𝑡𝑐𝑜𝑛𝑗𝑢𝑛𝑐𝑡 )
 end for



else if𝑡has a subject and finite main verb then
 𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ← 𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ∪ {t}
end if
return𝑇𝑟𝑒𝑠𝑢𝑙𝑡

Input
 Declarative sentences derived in stage 1

Output
 Set of grammatically correct questions
▪ Well defined syntactic transformations
▪ Identification of answer phrases for WH-movement
▪ Marking of unmovable chunks
▪ etc
Declarative Sentence
Question
Mark Unmovable
Phrases
Perform
Post-processing
Generate Possible
Question Phrase *
Insert Question
Phrase
(Decompose Main
Verb)
(Invert Subject
and Auxiliary)






Mark phrases that cannot be answer phrases
Select an answer phrase, and generate a set
of question phrases for it
Decompose the main verb
Invert the subject and auxiliary verb
Remove the answer phrase and insert one of
the question phrases at the beginning of the
main clause
Post-process to ensure proper formatting

Exceptions
 Yes-no questions
▪ no answer phrase to remove nor question phrase to
insert
 answer phrase is the subject of the declarative
sentence
▪ John met Sally  Who met Sally?
▪ decomposition of the main verb and subject-auxiliary
inversion are not necessary
▪ subject is removed and replaced by a question phrase in
the same position

Question generation involves
 WH-movement
▪ To generate WH questions
▪ Target answer phrase is transformed into WH phrase
and is moved to front (WH-fronting)
▪ Are all phrases movable?
 Subject-Auxiliary inversion
▪ To generate decision (yes-no) questions
▪ Positions of subject and auxiliary verb are swapped

An example
 Darwin studied how species evolve.
▪ ‘Species’ is a potential answer phrase
▪ *What did Darwin study how evolve?
 Mark phrases that should not undergo WH-
movement using Tregex patterns
▪ Constraints over the phrases
▪ phrases under a clause with a WH complementizer cannot
undergo WH-movement
▪ SBAR < /ˆWH.*P/ << NP|ADJP|VP|ADVP|PP=unmv
clauses (i.e., “S” nodes) that are under verb phrases and are signalled as
adjuncts by being offset by commas
Pattern: VP < (S=unmv $,, /,/)
Input sentence:
James hurried, barely catching the bus.
Question to avoid:
*What did James hurry?
A $,, B  A is a sister of B and follows B

Iterate over possible answer phrases
 Generate question for each


Skipped for decision questions.
Answer phrase is one of the following
 Noun phrase (“NP”)  Abraham Lincon
 Prepositional phrase (“PP”) in 1801
 Subordinate clause (“SBAR”)  that Thomas
Jefferson was the 3rd U.S. President

Mapping answer phrases to question phrases
 Supersense tagger
▪ Label word tokens with high level semantic classes
▪ Noun.person, noun.location etc.
B-noun.person I-noun.person B-verb.social B-noun.location
Richard
Nixon
visited
B-noun.communication
O
diplomacy
.
China
O
to
B-verb.change
improve
WH-word
Conditions
Examples
Who
tag@head=noun.person or a Abraham Lincoln, him, the 16th
personal pronoun
president
What
tag@head! = noun.time or
noun.person
Where
Object of PP tagged with
in Japan, to a small town
noun.location & preposition:
on, in, at, over, to
When
tag@head=noun.time
Wednesday, next year, 1929
Whose NP
tag@head word
noun.person and answer
phrase is modified with
possessive
John’s car, the president’s
visit to Asia, the
companies’ profits
How many NP
answer phrase is modified by 10 books, two hundred
a cardinal number or
years
quantifier phrase
The White House, the building



Situation: subject-auxiliary inversion
Condition: Auxiliary verb or modal is not
present
Action: main verb = auxiliary do + base form
of main verb
John saw Mary
John did see Mary
Who did John see?

Identifying main verbs that need to be
decomposed
ROOT < (S=clause < (VP=mainvp [ < (/VB.?/=tensed !<
is|was|were|am|are|has| have|had|do|does|did) | < /VB.?/=tensed !< VP ]))
clause
clause
aux
aux
verb
verb
A <+ (C) B
ROOT=root < (S=clause <+(/VP.*/) (VP < /(MD|VB.?)/=aux < (VP < /VB.?/=verb)))
Copula: word used to link the subject of a sentence with a predicate (a subject
complement)
ROOT=root < (S=clause <+(/VP.*/) (VP < (/VB.?/=copula <
is|are|was|were|am) !< VP))
S<(NP=np $+ VP)
S=start<VP=vp
delete np
relabel start SBARQ
relabel vp SQ
SBARQ < SQ=ins
Insert (WHNP (WP Who)) $+ ins
Sir Isaac Newton's book "Mathematical Principles of Natural Philosophy", first published
in 1687, laid the foundations for classical mechanics.
TREE-I
TREE-II
Tregex: ROOT=root < (SQ=qclause << /^(NP|PP|SBAR)-0/=answer < VP=predicate)
Phrase to move: (PP (IN in) (NP (CD 1687)))
Insert WH subtree: (WHNP (WHADVP (WRB when)))
1. Whose book ``Mathematical Principles of Natural Philosophy'' was first published in
1687?
2. What laid the foundations for classical mechanics?
3. What did Sir Isaac Newton's book ``Mathematical Principles of Natural Philosophy''
lay?
4. When was Sir Isaac Newton's book ``Mathematical Principles of Natural
Philosophy'' first published?
5. Did Sir Isaac Newton's book ``Mathematical Principles of Natural Philosophy'' lay
the foundations for classical mechanics?
6. Whose book ``Mathematical Principles of Natural Philosophy'' laid the foundations
for classical mechanics?
7. Was Sir Isaac Newton's book ``Mathematical Principles of Natural Philosophy'' first
published in 1687?
8. What was first published in 1687?
Arvind Kejriwal, the AAP leader, resigned from the post of CM.
Appositive tree
TREE-I
TREE-II
Tregex: ROOT=root < (SQ=qclause << /^(NP|PP|SBAR)-0/=answer < VP=predicate)
Phrase to move: (NP (NNP Arvind) (NNP Kejriwal))
Insert WH subtree: (WHNP (WHNP (WRB who)))
1.
2.
3.
4.
5.
6.
Who resigned from the post of CM?
What did Arvind Kejriwal resign from?
Who was Arvind Kejriwal?
Who was the AAP leader?
Did Arvind Kejriwal resign from the post of CM?
Was Arvind Kejriwal the AAP leader?

Acceptability 𝑦 of a question 𝑥
 𝑦 = 𝑤⊤𝑓(𝑥)
▪ 𝑓(𝑥) returns a vector of real-valued numbers
pertaining to different aspects of the question
▪ 𝑤 vector of weights for each feature of a question
 Learning weight vector
▪ Penalized linear regression (Ridge regression)

Question features
 Length feature
▪ Length of question, source sentence, answer phrase
 WH words
▪ Boolean feature whether a question is a WH one
 N-gram log likelihood of question
 Grammatical features
 Transformation features
 etc.

Term project evaluation includes
 Presentation (10 min)
 Demonstration (20 min)

Date 18.04.2015 (Saturday) from 9:30 am
 Group 1 -4

Date 18.04.2014 (Saturday) from 2:30 am
 Group 5-9