Question Generation via Overgenerating Transformations
Download
Report
Transcript Question Generation via Overgenerating Transformations
Good Question! Statistical Ranking
for Question Generation
Michael Heilman and Noah A. Smith
The North American Chapter of Association for
Computational Linguistics - Human Language Technologies
(NAACL HLT 2010)
Agenda
•
•
•
•
•
•
Introduction
Related Work
Three-stage Framework AQG
Evaluation
Conclusion
Comments
Introduction(1/3)
• In this paper, we focus on question generation
(QG) for the creation of educational materials for
reading practice and assessment.
• Our goal is to generate fact-based questions
about the content of a given article.
• The top-ranked questions could be filtered and
revised by educators, or given directly to
students for practice.
• Here we restrict our investigation to questions
about factual information in texts.
Introduction(2/3)
•
Los Angeles become known as the
“Queen of the Cow Counties” for its
Consider the
following sentence from
role in supplying beef and other
theWikipedia
article
on the
history
foodstuffs
to hungry
miners
in the of Los
north
Angeles
During the Gold Rush years in northern California, Los Angeles
became known as the “Queen of the Cow Counties” for its role in
supplying beef and other foodstuffs to hungry miners in the north.
What did Los Angeles become known as the “Queen of the Cow
Counties” for?
Introduction(3/3)
• Question transformation involves complex long
distance dependencies.
• The characteristics of such phenomena are
difficult to learn from corpora, but they have
been studied extensively in linguistics.
• However, since many phenomena pertaining to
question generation are not so easily encoded
with rules, we include statistical ranking as an
integral component.
• Thus, we employ an overgenerate-andrank
approach.
Related Work
• None has involved statistical models for choosing among
output candidates.
• Mitkov et al. (2006) demonstrated that automatic
generation and manual correction of questions can be
more time-efficient than manual authoring alone.
• Existing QG systems model their transformations from
source text to questions with many complex rules for
specific question types (e.g., a rule for creating a
question Who did the Subject Verb? from a sentence
with SVO word order and an object referring to a person),
rather than with sets of general rules.
Research Objectives
• We apply statistical ranking to the task of
generating natural language questions.
• We model QG as a two-step process of first
simplifying declarative input sentences and then
transforming them into questions.
• We incorporate linguistic knowledge to explicitly
model well-studied phenomena related to long
distance dependencies in WH questions.
• We develop a QG evaluation methodology,
including the use of broad-domain corpora.
Three-stage Framework AQG
• We define a framework for generating a ranked
set of fact-based questions about the text of a
given article.
• From this set, the top-ranked questions might be
given to an educator for filtering and revision, or
perhaps directly to a student for practice.
Stage 1
Transforming Source Sentence
• Each of the sentences from the source text is
expanded into a set of derived declarative
sentences by altering lexical items, syntactic
structure, and semantics.
• In our implementation, a set of transformations
derive a simpler form of the source sentence by
removing phrase types such as leading
conjunctions, sentence-level modifying phrases,
and appositives.
Stage 1
Transforming Source Sentence
• Complex source sentence:
Prime Minister Vladimir V. Putin, the country's paramount leader, cut short a
trip to Siberia, returning to Moscow to oversee the federal response.
• Extracted factual sentences:
•Prime Minister Vladimir V. Putin cut short a trip to Siberia.
•Prime Minister Vladimir V. Putin was the country's paramount leader.
•Prime Minister Vladimir V. Putin returned to Moscow to oversee the federal
response.
Stage 2
Question Transducer
• The declarative sentences derived in step 1 are
transformed into sets of questions by a sequence
of well-defined syntactic and lexical
transformations (subject-auxiliary inversion, WHmovement, etc.).
• It identifies the answer phrases which may be
targets for WH-movement and converts them
into question phrases.
Declarative Sentence Mark Unmovable
Phrases
Question
Perform
Post-processing
Generate Possible
Question Phrase *
(Decompose Main
Verb)
Insert Question
Phrase
(Invert Subject
and Auxiliary)
Stage 2
Question Transducer
• In English, various constraints determine whether
phrases can be involved in WH-movement and
other phenomena involving long distance
dependencies.
What did John like?
John liked the
*Who did John like the
book that I gave
• Forhim.example, noun phrases are
to
book“islands”
that gave him?
movement, meaning that constituents
dominated by a noun phrase typically cannot
undergo WH-movement.
Declarative Sentence Mark Unmovable
Phrases
Question
Perform
Post-processing
Generate Possible
Question Phrase *
(Decompose Main
Verb)
Insert Question
Phrase
(Invert Subject
and Auxiliary)
Stage 2
Question Transducer
• After marking unmovable phrases, we iteratively
remove each possible answer phrase.
• The question phrases for a given answer phrase
consist of a question word (e.g., who, what,
where, when), possibly preceded by a
preposition and, in the case of question phrase
like whose car, followed by the head of the
answer phrase.
Declarative Sentence Mark Unmovable
Phrases
Question
Perform
Post-processing
Generate Possible
Question Phrase *
(Decompose Main
Verb)
Insert Question
Phrase
(Invert Subject
and Auxiliary)
Stage 2
Question Transducer
• The system annotates the source sentence with a set
of entity types taken from the BBN Identifinder Text
Suite and generate a final question.
• The set of labels from BBN includes those used in
standard named entity recognition tasks (e.g.,
“PERSON,” “ORGANIZATION” and their
corresponding types for common nouns (e.g., “PER
DESC,” “ORG DESC”).
Declarative Sentence Mark Unmovable
Phrases
Question
Perform
Post-processing
Generate Possible
Question Phrase *
(Decompose Main
Verb)
Insert Question
Phrase
(Invert Subject
and Auxiliary)
Stage 2
Question Transducer
• It also includes dates, times, monetary units, and
others.
• For a given answer phrase, the system uses the
phrase’s entity labels and syntactic structure to
generate a set of zero or more possible question
phrases, each of which is used to generate a final
question sentence.
Declarative Sentence Mark Unmovable
Phrases
Question
Perform
Post-processing
Generate Possible
Question Phrase *
(Decompose Main
Verb)
Insert Question
Phrase
(Invert Subject
and Auxiliary)
Stage 2
Question Transducer
• In order to perform subject-auxiliary inversion
– if an auxiliary verb or modal is not present, the
John saw
Mary. →
John did see
Mary.verb
question
transducer
decomposes
the main
→ Who did
John
see?
into the appropriate
form
of do
and the base
form of the main verb.
John
has
seen Mary.
– If an auxiliary
verb
is already
present, however,
this decomposition
is not
necessary.
→ Who has
John
seen?
Declarative Sentence Mark Unmovable
Phrases
Question
Perform
Post-processing
Generate Possible
Question Phrase *
(Decompose Main
Verb)
Insert Question
Phrase
(Invert Subject
and Auxiliary)
Stage 2
Question Transducer
• In order to convert between lemmas of verbs and the
different surface forms that correspond to different parts
of speech, we created a map from pairs of verb lemma
and part of speech to verb surface forms.
• We extracted all verbs and their parts of speech from the
Penn Treebank.
• We lemmatized each verb first by checking
morphological variants in WordNet, and if a lemma was
not found, then trimming the rightmost characters from
the verb one at a time until a matching entry in WordNet
was found.
Generate Possible
(Decompose Main
Declarative Sentence Mark Unmovable
Phrases
Question
Perform
Post-processing
Question Phrase *
Verb)
Insert Question
Phrase
(Invert Subject
and Auxiliary)
Stage 2
Question Transducer
• The transducer performs subject-auxiliary
inversion either when the question to be
generated is a yes-no question or when the
answer phrase is a non-subject noun phrase.
• Each possible question phrase is inserted into a
copy of the tree to produce a question.
Declarative Sentence Mark Unmovable
Phrases
Question
Perform
Post-processing
Generate Possible
Question Phrase *
(Decompose Main
Verb)
Insert Question
Phrase
(Invert Subject
and Auxiliary)
Stage 2
Question Transducer
• Sentence-final periods are changed to question
marks.
• The output of our system that nearly all of the
questions including pronouns were too vague (e.g.,
What does it have as a head of state?).
• Therefore, to filter all questions with personal
pronouns, possessive pronouns, and noun phrases
consisting solely of determiners (e.g., those).
Declarative Sentence Mark Unmovable
Phrases
Question
Perform
Post-processing
Generate Possible
Question Phrase *
(Decompose Main
Verb)
Insert Question
Phrase
(Invert Subject
and Auxiliary)
Stage 3
Question Ranker
• Since different sentences and transformations of
source sentences, may be more or less likely to
lead to high-quality questions.
• Fifteen native English-speaking university
students rated a set of questions produced from
stages 1 and 2.
• For a predefined training set, each question was
rated by a single annotator (not the same for
each question), leading to a large number of
diverse examples.
Stage 3
Question Ranker
• For the test set, each question was rated by three people
(again, not the same for each question) to provide a
more reliable gold standard.
• An inter-rater agreement of Fleiss’s k = 0.42 was
computed from the test set’s acceptability ratings.
Source
English Wikipedia
Training set
1328/12
Testing set
120/2
Simple English Wiki
Wall Street Journal
Total
1195/16
284/8
2807/36
118/2
190/2
428/6
Ranking
• Why do we over-generate and rank questions?
– Name entity recognition error
– Parsing error
– Transformation error
• Therefore, We use a discriminative ranker
specifically based on a logistic regression model
that defines a probability of acceptability.
M. Collins. 2000. Discriminative reranking for natural language parsing. In Proc. of ICML.
Feature Set
Type
Feature
Value Type
Length
the numbers of tokens in the
question, the source
sentence, and the answer
phrase from which the WH
phrase was generated
integer
Negation
the presence of not, never, or
no in the question
boolean
N-Gram Language
Model
the log likelihoods
and length-normalized log
likelihoods of
the question, the source
sentence, and the answer
phrase
real value
Type
Feature
Value Type
Grammatical
the numbers of proper nouns,
pronouns, adjectives, adverbs,
conjunctions, numbers, noun
phrases, prepositional
phrases, and subordinate
clauses in the phrase
structure parse trees for the
question and answer phrase
integer
Transformations
the possible syntactic
transformations(e.g., removal
of appositives and
parentheticals, choosing the
subject of source sentence as
the answer phrase)
binary
Vagueness
the numbers of noun phrases
in the question,
source sentence, and answer
phrase that are
potentially vague
integer
Evaluation
• The results of experiments to evaluate the quality
of generated questions before and after ranking.
• The evaluation metric we employ is the
percentage of test set questions labeled as
acceptable.
• For rankings, our metric is the percentage of the
top N% labeled as acceptable, for various N.
Results for Unranked Questions
• 27.3% of test set questions were labeled
acceptable (i.e., having no deficiencies) by a
majority of raters.
Results for Ranking
Ablation Result
Recall
Online Demo