Transcript Green
NLG STEC Workshop
April 20-21, 2007
Arlington, VA
Nancy Green
Univ. of North Carolina Greensboro, USA
NLG Pipeline Model &
STEC
STEC
Pro-STEC Assumptions:
• (All/most/worth-funding) NLG can be decomposed into
well-defined independent STEC-modules such that
improving each one will advance NLG
• Input/output representation for STEC is noncontroversial
NLG ‘Pipeline’ = Tip of
Iceberg
Discourse
KR&R
Domain
Communication
KR&R
User Model
KR&R
Media/
Presentationrelated KR&R
Who will pay for NLG research outside of classical pipeline?:
essential empirical research, major cost, but afraid it would fall
outside of STEC funding model
Example NLG System KR&R
GenIE: generates letters to genetics clinic patients; goal to
justify medical experts’ conclusions such that all
arguments are comprehensible to a lay person
• Discourse: argumentation
• Domain Communication: conceptual
causal model underlying expert-lay
communication (not domain model)
• User Model: model of appraisal
• Media/Presentation: how presentation
affects argument comprehension
Lesson from GenIE
• NLG Pipeline = global control + sentence
planning/realization
• can use existing surface realizers, standard domain
ontology, and lexical resources
• Main cost has been KR&R modules; mainly
empirical work:
• Goal: find non-domain-specific principles/ guidelines
to optimize lay audience’s comprehension of
arguments
• Corpus studies: very useful but not sufficient
• Controlled studies: necessary, and cannot afford to
wait for other disciplines (HCI, learning sciences, etc.)
to do them for us
GenIE Corpus Studies
• Intercoder reliability of content annotation
scheme: used to justify domain
communication model
• Argumentation schemes (non-domainspecific, both normative and affective)
• Stylistic (lexical/syntactic) features of
author perspective
• Argument presentation features (order, cue
words, explicitness)
GenIE Controlled
Studies
• How multimedia layout, cross-media
cue words affect comprehension
• How argument presentation (explicit
vs. implied claim, cue words) affects
recognition of argument components
(Claim vs. Data) & dependence of
final claim on intermediate claims
NLG Pipeline Model &
STEC
STEC
Pro-STEC Assumptions:
• (All/most/worth-funding) NLG can be decomposed into
well-defined independent STEC-modules such that
improving each one will advance NLG
• Input/output representation for STEC is noncontroversial
STEC Input/Output Problem
Different input representations needed for different types
of output; e.g. compare requirements for:
• Fixed-format text (original scope of NLG)
• Task-appropriate, user-friendly text format (e.g. line
length, paragraphing, headings, font)
• Text and (reported or quoted) dialogue in story
• Dialogue spoken by animated emoting conversational
agent
• Integrated text and images or data graphics
• Text referring to physical or visual properties of
presentation (‘The red line in Fig. 2 shows sales in 2002.’)
Big Challenges
Empirical research to test computationoriented, general theories, principles,
guidelines to answer:
• What makes a “text” (i.e. including
spoken dialogue, MMPs, etc.)
• Coherent? In story dialogue, believable?
• User-friendly? Task-appropriate?
• Comprehensible? Pedagogically effective?
• Entertaining (suspenseful, funny, etc.)?
Ex. Challenges
(cont.)
• How does channel change answer?
• E.g. HCI research: cannot assume findings
for paper apply to computer screen
• How does length change answer?
• E.g. learning sciences: 300-word summary
vs. 3-page science argument for middle
school
• How do individual differences matter?
• E.g. cognitive impairments, affect
Conclusions
• Need some NLG research with
massively interdisciplinary view:
cognitive science, communication
studies, etc.
• Need some NLG research motivated by
search for answers to general
questions such as above
• Will STEC approach effectively kill the
above kind of NLG research?