Transcript PPT

CSA3050:
Natural Language Generation
What is Natural Language Generation?
When is NLG an Appropriate
Technology?
NLG System Architectures
December 2003
CSA3050: Natural Language
Generation
1
Acknowledgements & Resources
• Ehud Reiter and Robert Dale, Building
Natural Language Generation Systems,
Cambridge:2000.
• SIGGEN's resource page
www.dynamicmultimedia.com.au/siggen/
• Dale & Reiter's ANLP-97 Tutorial on
Building Applied Natural Language
Generation Systems
December 2003
CSA3050: Natural Language
Generation
2
NLP = NLU + NLG
Natural
Language
Generation
Natural
Language
Understanding
Meaning
Text
Text
December 2003
CSA3050: Natural Language
Generation
3
What is NLG?
• NLG "is the process of deliberately constructing
a natural language text in order to meet
specified communicative goals". [McDonald
1992]
• Goal: design of computer software which
produces understandable NL utterances.
• Input: some underlying non-linguistic
representation of information
• Output: documents, reports, explanations, help
messages, and other kinds of texts
December 2003
CSA3050: Natural Language
Generation
4
Why Use NLG?
• Important information is often stored on
computers in ways which are not
comprehensible to the end user:
• NLG systems can present this information
to users in an accessible way.
• When output is so variable that is difficult
to capture by means of canned text.
December 2003
CSA3050: Natural Language
Generation
5
Are NLG and NLU Mirror Images?
• Both Require Knowledge
– knowledge of language
– knowledge of the domain
• Can we use same knowledge to drive NLG
and NLU?
• Reversible grammars
December 2003
CSA3050: Natural Language
Generation
6
Reversible Grammars
are Possible
s --> np, vp.
np --> n.
vp --> v, np.
n
n
v
v
-->
-->
-->
-->
December 2003
[john].
[mary].
[loves].
[hits].
CSA3050: Natural Language
Generation
7
Reversible Grammar - Output
1 ?- s([john,loves,mary],[]).
Yes
2 ?- s(X,[]).
X = [john, loves, john] ;
X = [john, loves, mary] ;
X = [john, hits, john] ;
X = [john, hits, mary] ;
X = [mary, loves, john] ;
X = [mary, loves, mary] ;
X = [mary, hits, john] ;
X = [mary, hits, mary] ;
No
December 2003
CSA3050: Natural Language
Generation
8
But NLU and NLG Address
Fundamentally Different Problems
• NLU
• NLG
– Management of
choices about
interpretation.
– Handling ill-formed
input.
December 2003
– Management of
choices about
realisation, given that
you know what you
want to say.
– Stylistically
appropriate output.
– Creating
understandable
output.
CSA3050: Natural Language
Generation
9
Deciding what to say involves
consideration of ....
• what the content of an utterance should be
• what information should be omitted;
• how to organise that content in a coherent
discourse;
• what tone or degree of formality should be
adopted;
• how the material should be broken down into
sentences or clauses;
• what syntactic constructions should be used;
• how entities should be described;
• word choice.
December 2003
CSA3050: Natural Language
Generation
10
Examples of Choices
•
•
•
•
•
•
"This course is being taught by Mike Rosner. It is
an introduction to natural language generation".
lecturers name and course title.
style of name
two sentences rather than one.
passive rather than active for first sentence
being taught rather than being given
pronoun it in the second sentence
December 2003
CSA3050: Natural Language
Generation
11
Criteria of Understandability/Quality
1. Clear meaning, good grammar,
terminology and sentence structure.
2. Clear meaning but bad grammar, bad
terminology, or bad sentence structure.
3. Meaning graspable but ambiguities due
to bad terminology or bad sentence
structure
4. Meaning unclear but inferrable
5. Meaning absolutely unclear
December 2003
CSA3050: Natural Language
Generation
12
Examples of
Understandability/Quality
1. The US unilaterally reduced China's
textile export quotas.
2. US cutted china export ration lonely.
3. A chinese ration US cut it down.
4. Cause states go quotas to reduced.
5. alone cut it up rations alone
December 2003
CSA3050: Natural Language
Generation
13
When are NLG Techniques
Desirable?
• Necessary source data available in a
computationally tractable form.
• Much variation in output is required.
• Automation justified on the basis of
volume, speed requirements or
consistency requirements.
• Text is the right medium.
December 2003
CSA3050: Natural Language
Generation
14
Alternatives to/Variations of
Natural Language Generation
• Alternatives
– Fixed Templates
– Templates with Variables
– Graphics.
– Manual NLG
• Variations
– Multi-Modal
– Dialogue
December 2003
CSA3050: Natural Language
Generation
15
Choice of Text v. Graphics
• No hard and fast rules.
• Examination of existing conventions in a
given area of application is useful.
• Can depend on type of subject matter, e.g.
– Information about physical location often
better conveyed by graphics.
– Information about abstract concepts better
conveyed by text.
• Expertise and language abilities of user.
December 2003
CSA3050: Natural Language
Generation
16
WIP: Knowledge Based
Presentation of Information
• WIP (Wahlster et al c.1990)
• Multimodal
• Presentation system that is able to generate a variety of
multimedia documents
• Input consisting of a formal description of the
communicative intent of a planned presentation.
• generation process is controlled by a set of generation
parameters
–
–
–
–
target group
presentation objective
resource limitation
target language.
December 2003
CSA3050: Natural Language
Generation
17
Typical Pipelined Architecture
Text
Planning
Sentence
Planning
Linguistic
Realization
December 2003
CSA3050: Natural Language
Generation
18
Tasks and Architecture in
NLG
Text
1. Content determination
Planning
2. Discourse planning (≈ paragraphs)
3. Sentence aggregation
4. Lexicalisation
Sentence
Planning
Linguistic
Referring expression generation Realization
5.
6. Syntax + morphology
7. Orthographic realization
December 2003
CSA3050: Natural Language
Generation
19
Intermediate Representations
Text
Planning
Text Plan
Sentence
Planning
Sentence Plans
Linguistic
Realization
December 2003
CSA3050: Natural Language
Generation
20
1. Content Determination
• The process of deciding what to say from
communicative goals etc.
• construction of a set of messages from the
underlying data source
– Messages are aggregations of data that are
appropriate for linguistic expression.
– Each message may correspond to the
meaning of a word or a phrase.
– Messages are based on domain entities,
concepts, and relations.
December 2003
CSA3050: Natural Language
Generation
21
Examples of Messages
•
DEPARTURETIME(
CALEXPRESS,
1000).
•
ID(NEXTTRAIN,
CALEXPRESS)
•
COUNT((TRAIN,
SRC(ABERDEEN),
DESTINATION(GLASGOW)),
20,
PERDAY)
December 2003
• The Caledonian
Express leaves at
10am
• The next train is the
Caledonian Express
• There are 20 trains
daily from Aberdeen
to Glasgow
CSA3050: Natural Language
Generation
22
2. Discourse Planning
• A text is not just a random collection of
sentences
The Caledonian Express leaves at 10am.
The next train is the Caledonian Express.
There are 20 trains daily from Aberdeen to Glasgow
• Texts have an underlying structure in which the
parts are related together
• The structure can be expressed by means of a
text plan
December 2003
CSA3050: Natural Language
Generation
23
A Text Plan
Sequence
COUNT(…)
NextTrainInformation
Elaboration
IDENTITY(…)
December 2003
DEPARTURETIME(…)
CSA3050: Natural Language
Generation
24
Text Resulting from Text Plan
There are 20 trains daily from
Aberdeen to Glasgow.
The next train is the Caledonian
Express.
It leaves Aberdeen at 10am.
December 2003
CSA3050: Natural Language
Generation
25
3. Sentence Planning:
Aggregation
• A one-to-one mapping from messages to
sentences results in disfluent text
• Messages need to be combined to
produce larger and more complex
sentences
• The result is a sentence specification or
SENTENCE PLAN
December 2003
CSA3050: Natural Language
Generation
26
An Example of Sentence
Aggregation
• Without aggregation:
– The next train is the Caledonian
Express.
It leaves Aberdeen at 10am.
• With aggregation:
– The next train, which leaves at
10am, is the Caledonian Express.
December 2003
CSA3050: Natural Language
Generation
27
4 Lexicalisation
• Lexicalisation determines the particular words to
be used to express domain concepts and
relations
• In our example, should the DEPARTURETIME
relation be expressed using the verb leave or
depart?
• How do we express different nuances of
meaning?
• What words should be used in different
languages?
December 2003
CSA3050: Natural Language
Generation
28
5
Referring Expression
Generation
• Referring expression generation is concerned
with how we describe domain entities in such a
way that the hearer will know what we are
talking about.
• Choice between
– Proper names (type/degree of formality)
– Definite Descriptions
– Pronouns
• Major issue is avoiding ambiguity.
John hit Bill. He cried out.
December 2003
CSA3050: Natural Language
Generation
29
6 Syntactic and
Morphological Realization
• Morphology: rules of word formation:
– walk + ed = walked
• Syntax: rules of sentence formation
– the subject goes before the verb
December 2003
CSA3050: Natural Language
Generation
30
7 Orthographic Realization
• Orthographic realization is concerned
case, punctuation, typographic issues:
font size, column width …
• sentences begin with upper case letter,
end in full stops
• choice of font
• other layout issues
December 2003
CSA3050: Natural Language
Generation
31
Summary
• NLG is related to NLU but addresses different
problems.
• Quality/understandability is a major issue.
• NLG is an option when text is an appropriate
output medium, and when "mail-merge" style
character manipulation is insufficient for the
application at hand.
• Planning considerations enter into the
generation of texts.
• Text generation is a pipeline process involving
different representations.
December 2003
CSA3050: Natural Language
Generation
32