ppt - pedagogy (main phase)
Download
Report
Transcript ppt - pedagogy (main phase)
Resources: Question Classification Schemes, Graesser et al.
Automatic Factual Question Generation from Text (Chapter 3), Michael Heilman
Questions test factual knowledge of a learner
When did Alexander invade India?
Who invented small pox vaccine?
Does not involve higher order cognitive skills
like inference
Overgenerate-and-rank framework
CMU Question Generator: http://www.ark.cs.cmu.edu/mheilman/questions/
Source sentence: sentence taken directly from
the input document
Derived sentence: declarative sentence
derived in stage 1
Answer phrase: possible answer to generated
questions
Question phrase: phrase containing the
question word replacing an answer phrase
Mark clauses or phrases for
NLP transformation (simplification, compression)
Answer phrase marking
Tregex
Delete clauses or phases for
NLP transformation
Tsurgeon
Resources: Tregex and Tsurgeon: tools for querying and manipulating tree data structures,
Levy and Andrew
Web: http://nlp.stanford.edu/software/tregex.shtml
A java program for identifying patterns in trees
Like regular expressions for strings
Simple example: NP < NN
S
NP
tregex.sh “NP < NN” treeFilename
VP
DT NN VBD
VP
VBG NP
NN
The firm stopped using
PP
IN
crocoin
dilite
NP
NP
PRP NN
its
NN
cigarette filters
The basic units of Tregex are Node
Descriptions
Descriptions match node labels of a tree
Literal string to match: NP
▪ Disjunction of literal strings separated by ‘|’: NP|PP|VP
Regular Expression (Java 5 regex): /NN.?/
▪ Matches NN, NNP, NNS
Wildcard symbol: __ (two underscores)
▪ Matches any node
Descriptions can be negated with !: !NP
Relationships between tree nodes can be specified
There are many different relations. Here are a few:
Symbol
Description
Symbol Description
A<B
A is the parent of B
A << B
A is an ancestor of B
A$B
A and B are sisters
A $+ B
B is next sister of A
A <i B
B is ith child of A
A <: B
B is only child of A
A <<# B
B is a head of
phrase A
A <<- B B is rightmost descendent
A .. B
A precedes B in depth-first traversal of tree
http://nlp.stanford.edu/manning/courses/ling289/Tregex.html
Relations can be strung together for “and”
All relations are relative to first node in string
NP < NN $ VP
▪ “An NP over an NN and with sister VP”
& symbol is optional: NP < NN & $ VP
Nodes can be grouped with parentheses
NP < (NN < dog)
▪ “An NP over an NN that is over ‘dog’ ”
Not the same as NP < NN < dog
Ex: NP < (NN < dog) $ (VP <<# (barks > VBZ))
“An NP both over an NN over ‘dog’ and with a
sister VP headed by ‘barks’ under VBZ”
X
NP
VP
NN
VBZ
dog
barks
Operators can be combined via “or” with |
Ex: NP < NN | < NNS
“An NP over NN or over NNS”
By default, & takes precedence over |
Ex: NP < NNS | < NN & $ VP
“NP over NNS OR both over NN and w/ sister VP”
Equivalent operators are left-associative
Any relation can be negated with “!” prefix
Ex: NP !<< NNP
“An NP that does not dominate NNP”
To specify operation order, use [ and ]
Ex: NP [ < NNS | < NN ] $ VP
“An NP either over NNS or NN, and w/ sister VP”
Grouped relations can be negated
Just put ! before the [
Already we can build very complex expressions!
NP <- /NN.?/ > (PP <<# (IN ![ < of | < on]))
“An NP with rightmost child matching /NN.?/ under a
PP headed by some preposition (IN) that is not either
‘of’ or ‘on’ ”
“An NP with rightmost child matching /NN.?/ under a PP headed
by some preposition (IN) that is not either ‘of’ or ‘on’ ”
NP <- /NN.?/ > (PP <<# (IN ![ < of | < on]))
PP
IN
about
NP
NNS
Sometimes we want to find which nodes
matched particular sub-expressions
Ex: /NN.?/ $- JJ|DT
What was the modifier that preceded the noun?
Name nodes with = and if expression matches,
we can retrieve matching sub-expr with name
Ex: /NN.?/ $- JJ|DT=premod
Subtree with root matching JJ|DT is stored in a map
under key “premod”
Note:
named nodes are not allowed in scope of negation
Sometimes we want to try to match a subexpression to retrieve named nodes if they exist,
but still match root if sub-expression fails.
Use the optional relation prefix ‘?’
Ex: NP < (NN ?$- JJ=premod) $+ CC $++ NP
Matches NP over NN with sisters CC and NP
If NN is preceded by JJ, we can retrieve the JJ using
the key “premod”
If there is no JJ, the expression will still match
Cannot be combined with negation
What?
makes operations on a grammatical tree
How?
based on Tregex syntax
Where?
Javanlp: trees.tregex.tsurgeon
• utility for identifying patterns in trees
(like regular expressions for strings)
• node descriptions and relationships between nodes
S
NP
DT
NN
VP
VBD
VBG
The
firm
stopped
NP < /^NN/
VP
using
NP
PP
NN
IN
crocodilite
in
NP
PRP
NN
its
cigarette
NNS
filters
Define a pattern to be matched on the trees
VBZ=vbz $+ NP
Define one or several operation(s)
relabel vbz VBZ_TRANSITIVE
(ROOT
(SBARQ
(SQ (NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat)))
(PUNCT ?)))
PUNCT=punct > SBARQ
delete punct
(ROOT
(SBARQ
(SQ (NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat)))
(PUNCT ?)))
delete <name1>…<nameN>
PUNCT=punct >
SBARQ
delete punct
Delete the node
and everything below it
(ROOT
(SBARQ
(SQ (NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat))))))
SBARQ=sbarq >
ROOT
excise sbarq
sbarq
(ROOT
(SQ (NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat)))))
(ROOT
(SBARQ
(SQ (NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat))))))
excise <name1> <name2>
name1 is name2
or dominates name2.
All children of name2
SBARQ=sbarq >
ROOT
excise sbarq
sbarq
go into the parent of
name1,
where name1 was.
(ROOT
(SQ (NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat)))))
SQ=sq > ROOT !<- /PUNCT/
insert (PUNCT .) >-1 sq
<tree>
<position>
(ROOT
(SQ (NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat)))
(PUNCT .)))
insert <name> <position>
insert <tree> <position>
<position> := <relation> <name>
<relation>
$+ the left sister of the named node
$- the right sister of the named node
>i the i_th daughter of the named node
>-i the i_th daughter, counting from the
right, of the named node.
(ROOT
(SQ
(NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat)))
(PUNCT .)))
VP < (/^WH/=wh $++ /^VB/=vb)
move vb $+ wh
<position>
move <name> <position>
moves the named node
into the specified position
(ROOT
(SQ
(NP (NNS Cats))
(VP (VBP do)
(VP (WHNP what)
(VB eat)))
(PUNCT .)))
VP < (/^WH/=wh $++ /^VB/=vb)
move vb $+ wh
<position>
(ROOT
(SQ
(NP (NNS Cats))
(VP (VBP do)
(VP (VB eat)
(WHNP what)))
(PUNCT .)))
adjoin <auxiliary_tree> <name>
Adjoins the specified auxiliary tree into the
named node.
The daughters of the target node will become
the daughters of the foot of the auxiliary tree.
adjoin (VP (ADVP (RB usually)) VP@) vp
foot
VP=vp > SQ !> (__ << usually)
adjoin (VP (ADVP (RB usually)) VP@) vp
Input: arbitrary text
Output: simple, concise and declarative
sentences
Input: Putin, the Russian Prime Minister, visited Moscow.
Desired Output: Putin was the Russian Prime Minister.
ROOT
S
VP
NP
NP
,
Putin ,
(noun)
NP
the Russian Prime Minister
(appositive)
VBD
,
,
NP
visited Siberia
(mainverb)
NP < (NP=noun !$-- NP $+ (/,/ $++ NP|PP=appositive !$CC|CONJP))
>> (ROOT << /^VB.*/=mainverb)
ROOT
S
VP
NP
NP
,
Putin ,
(noun)
NP
the Russian Prime Minister
(appositive)
VBD
,
,
NP
visited Siberia
(mainverb)
NP
NP
Putin
the Russian Prime Minister
VBD
visited
NP
NP
VBD
Putin
the Russian Prime Minister
was
Singular past tense form of be
ROOT
S
VP
NP
VBD
NP
Putin
was
the Russian Prime Minister
Representation: phrase structure trees from
the Stanford Parser
Syntactic rules are written in the Tregex tree
searching language
Tregex operators encode tree relations such as
dominance, sisterhood, etc.
Performing manipulation over identified
Tregex pattern (Tsurgeon)
Given an input sentence A that is assumed true,
we aim to extract sentences B that are also true.
Our operations are informed by two
phenomena:
• semantic entailment
• presupposition
A entails B:
B is true whenever A is true.
Levinson 1983
A: However, Jefferson did not believe the Embargo Act, which
restricted trade with Europe, would hurt the American economy.
Entailment holds when removing
certain types of modifiers.
discourse marker
non-restrictive
relative clause
A: However, Jefferson did not believe the Embargo Act, which
restricted trade with Europe, would hurt the American economy.
Entailment holds when removing
certain types of modifiers.
40
discourse marker
non-restrictive
relative clause
A: However, Jefferson did not believe the Embargo Act, which
restricted trade with Europe, would hurt the American economy.
B: Jefferson did not believe the Embargo Act would hurt the
American economy.
Entailment holds when removing
certain types of modifiers.
41
A: Mr. Putin built his reputation in part on his success at
suppressing terrorism, so the attacks could be considered a
challenge to his stature.
B1: Mr. Putin built his reputation in part on his success at
suppressing terrorism.
B2: The attacks could be considered a challenge to his stature.
In most clausal and verbal conjunctions, the
individual conjuncts are entailed.
negation of main clause
A: Hamilton did not like Jefferson, the third U.S. President.
B: Jefferson was the third U.S. President.
In some constructions, B is true regardless of whether
the main clause of sentence A is true.
• i.e., B is presupposed to be true.
Many presuppositions have clear syntactic or
lexical associations.
Trigger
Example
non-restrictive appositives Jefferson, the third U.S. President,
…
non-restrictive relative
Jefferson, who was the third U.S.
clauses
President…
participial modifiers
Jefferson, being the third U.S.
President, …
temporal subordinate
Before Jefferson was the third U.S.
clauses
President, …
Jefferson was the third U.S. President.
extractSimplifiedSentences
Input
▪ Constituency parse tree 𝑡.
Output
▪ 𝑇𝑅𝑒𝑠𝑢𝑙𝑡 set of trees representing simplified sentences
Uses
▪ extractHelper
▪ Input
One parse tree
▪ Output
Split over conjunctions
Checking outputs have subjects and finite main verbs.
𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ← ∅
𝑇𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑒𝑑 ← 𝑡 ∪ 𝑝𝑎𝑟𝑠𝑒 𝑡𝑟𝑒𝑒 𝑓𝑜𝑟
non-restrictive appositives
non-restrictive relative clauses
subordinate clauses with a subject and finite verb
participial phrases that modify noun phrases, verb
phrases, or clauses
for each 𝑡 ′ ∈ 𝑇𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑒𝑑 do
𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ← 𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ∪ 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝐻𝑒𝑙𝑝𝑒𝑟 𝑡 ′
end for
return 𝑇𝑟𝑒𝑠𝑢𝑙𝑡
𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ← ∅
move any leading prepositional phrases and quotations in 𝑡 to be
the last children of the main verb phrase.
remove the following from 𝑡:
noun modifiers offset by commas
leading modifiers of the main clause
if t is conjoined with a conjunction then
𝑇𝑐𝑜𝑛𝑗𝑢𝑛𝑐𝑡𝑠 ← extract new sentence trees for each conjuncts
for all𝑡𝑐𝑜𝑛𝑗𝑢𝑛𝑐𝑡 ∈ 𝑇𝑐𝑜𝑛𝑗𝑢𝑛𝑐𝑡𝑠 do
▪ 𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ← 𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ∪ 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝐻𝑒𝑙𝑝𝑒𝑟(𝑡𝑐𝑜𝑛𝑗𝑢𝑛𝑐𝑡 )
end for
else if𝑡has a subject and finite main verb then
𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ← 𝑇𝑟𝑒𝑠𝑢𝑙𝑡 ∪ {t}
end if
return𝑇𝑟𝑒𝑠𝑢𝑙𝑡
Input
Declarative sentences derived in stage 1
Output
Set of grammatically correct questions
▪ Well defined syntactic transformations
▪ Identification of answer phrases for WH-movement
▪ Marking of unmovable chunks
▪ etc
Declarative Sentence
Question
Mark Unmovable
Phrases
Perform
Post-processing
Generate Possible
Question Phrase *
Insert Question
Phrase
(Decompose Main
Verb)
(Invert Subject
and Auxiliary)
Mark phrases that cannot be answer phrases
Select an answer phrase, and generate a set
of question phrases for it
Decompose the main verb
Invert the subject and auxiliary verb
Remove the answer phrase and insert one of
the question phrases at the beginning of the
main clause
Post-process to ensure proper formatting
Exceptions
Yes-no questions
▪ no answer phrase to remove nor question phrase to
insert
answer phrase is the subject of the declarative
sentence
▪ John met Sally Who met Sally?
▪ decomposition of the main verb and subject-auxiliary
inversion are not necessary
▪ subject is removed and replaced by a question phrase in
the same position
Question generation involves
WH-movement
▪ To generate WH questions
▪ Target answer phrase is transformed into WH phrase
and is moved to front (WH-fronting)
▪ Are all phrases movable?
Subject-Auxiliary inversion
▪ To generate decision (yes-no) questions
▪ Positions of subject and auxiliary verb are swapped
An example
Darwin studied how species evolve.
▪ ‘Species’ is a potential answer phrase
▪ *What did Darwin study how evolve?
Mark phrases that should not undergo WH-
movement using Tregex patterns
▪ Constraints over the phrases
▪ phrases under a clause with a WH complementizer cannot
undergo WH-movement
▪ SBAR < /ˆWH.*P/ << NP|ADJP|VP|ADVP|PP=unmv
clauses (i.e., “S” nodes) that are under verb phrases and are signalled as
adjuncts by being offset by commas
Pattern: VP < (S=unmv $,, /,/)
Input sentence:
James hurried, barely catching the bus.
Question to avoid:
*What did James hurry?
A $,, B A is a sister of B and follows B
Iterate over possible answer phrases
Generate question for each
Skipped for decision questions.
Answer phrase is one of the following
Noun phrase (“NP”) Abraham Lincon
Prepositional phrase (“PP”) in 1801
Subordinate clause (“SBAR”) that Thomas
Jefferson was the 3rd U.S. President
Mapping answer phrases to question phrases
Supersense tagger
▪ Label word tokens with high level semantic classes
▪ Noun.person, noun.location etc.
B-noun.person I-noun.person B-verb.social B-noun.location
Richard
Nixon
visited
B-noun.communication
O
diplomacy
.
China
O
to
B-verb.change
improve
WH-word
Conditions
Examples
Who
tag@head=noun.person or a Abraham Lincoln, him, the 16th
personal pronoun
president
What
tag@head! = noun.time or
noun.person
Where
Object of PP tagged with
in Japan, to a small town
noun.location & preposition:
on, in, at, over, to
When
tag@head=noun.time
Wednesday, next year, 1929
Whose NP
tag@head word
noun.person and answer
phrase is modified with
possessive
John’s car, the president’s
visit to Asia, the
companies’ profits
How many NP
answer phrase is modified by 10 books, two hundred
a cardinal number or
years
quantifier phrase
The White House, the building
Situation: subject-auxiliary inversion
Condition: Auxiliary verb or modal is not
present
Action: main verb = auxiliary do + base form
of main verb
John saw Mary
John did see Mary
Who did John see?
Identifying main verbs that need to be
decomposed
ROOT < (S=clause < (VP=mainvp [ < (/VB.?/=tensed !<
is|was|were|am|are|has| have|had|do|does|did) | < /VB.?/=tensed !< VP ]))
clause
clause
aux
aux
verb
verb
A <+ (C) B
ROOT=root < (S=clause <+(/VP.*/) (VP < /(MD|VB.?)/=aux < (VP < /VB.?/=verb)))
Copula: word used to link the subject of a sentence with a predicate (a subject
complement)
ROOT=root < (S=clause <+(/VP.*/) (VP < (/VB.?/=copula <
is|are|was|were|am) !< VP))
S<(NP=np $+ VP)
S=start<VP=vp
delete np
relabel start SBARQ
relabel vp SQ
SBARQ < SQ=ins
Insert (WHNP (WP Who)) $+ ins
Sir Isaac Newton's book "Mathematical Principles of Natural Philosophy", first published
in 1687, laid the foundations for classical mechanics.
TREE-I
TREE-II
Tregex: ROOT=root < (SQ=qclause << /^(NP|PP|SBAR)-0/=answer < VP=predicate)
Phrase to move: (PP (IN in) (NP (CD 1687)))
Insert WH subtree: (WHNP (WHADVP (WRB when)))
1. Whose book ``Mathematical Principles of Natural Philosophy'' was first published in
1687?
2. What laid the foundations for classical mechanics?
3. What did Sir Isaac Newton's book ``Mathematical Principles of Natural Philosophy''
lay?
4. When was Sir Isaac Newton's book ``Mathematical Principles of Natural
Philosophy'' first published?
5. Did Sir Isaac Newton's book ``Mathematical Principles of Natural Philosophy'' lay
the foundations for classical mechanics?
6. Whose book ``Mathematical Principles of Natural Philosophy'' laid the foundations
for classical mechanics?
7. Was Sir Isaac Newton's book ``Mathematical Principles of Natural Philosophy'' first
published in 1687?
8. What was first published in 1687?
Arvind Kejriwal, the AAP leader, resigned from the post of CM.
Appositive tree
TREE-I
TREE-II
Tregex: ROOT=root < (SQ=qclause << /^(NP|PP|SBAR)-0/=answer < VP=predicate)
Phrase to move: (NP (NNP Arvind) (NNP Kejriwal))
Insert WH subtree: (WHNP (WHNP (WRB who)))
1.
2.
3.
4.
5.
6.
Who resigned from the post of CM?
What did Arvind Kejriwal resign from?
Who was Arvind Kejriwal?
Who was the AAP leader?
Did Arvind Kejriwal resign from the post of CM?
Was Arvind Kejriwal the AAP leader?
Acceptability 𝑦 of a question 𝑥
𝑦 = 𝑤⊤𝑓(𝑥)
▪ 𝑓(𝑥) returns a vector of real-valued numbers
pertaining to different aspects of the question
▪ 𝑤 vector of weights for each feature of a question
Learning weight vector
▪ Penalized linear regression (Ridge regression)
Question features
Length feature
▪ Length of question, source sentence, answer phrase
WH words
▪ Boolean feature whether a question is a WH one
N-gram log likelihood of question
Grammatical features
Transformation features
etc.
Term project evaluation includes
Presentation (10 min)
Demonstration (20 min)
Date 18.04.2015 (Saturday) from 9:30 am
Group 1 -4
Date 18.04.2014 (Saturday) from 2:30 am
Group 5-9