When the cognitive scientist meets the engineer
Download
Report
Transcript When the cognitive scientist meets the engineer
Curs 10
Natural Language Generation
a highly complex task
both for people and for machines
Slide-uri împrumutate de la
Michael Zock
LIMSI-CNRS
Orsay, France
Some preliminary issues :
Warning
This is not a state of the art talk. If you are interested
in those, this here could be a starting point :
Bateman & Zock : (2003) Natural Language Generation. In R.
Mitkov (Ed.) Handbook of Computational Linguistics,
Oxford University Press, pp. 284-304
List of systems: http://www.fb10.unibremen.de/anglistik/langpro/NLG-table/NLG-tableroot.htm
Anything related to NLG: http://www.siggen.org/
2
Some preliminary issues
Background material
Willem Levelt
•
Speaking : from Intention to Articulation, MIT Press, 1989
E. Reiter & R. Dale
•
Building Natural Language Generation Systems (2000),
Cambridge University Press
3
Overview of this talk
Part 1 : General problems
•
knowledge and constraints, architecture, process, etc.
Part 2 : Deep generation
message planning
message ordering (text plan, outline)
Part 3: Surface generation
lexical choice (acces and synthesis)
computation of syntactic structure
4
Different ways to look at text generation
NLG FOR people
NLG WITH people
Fully automated generation
NLG LIKE people
text
Writer’s workbench
NLG
Semi-automated, machinemediated-generation
Foreign language
learning
connectionism
Simulation of psychological processes
Online processing
Incremental generation
What is NLG? - ask google
http://www.charabia.net/generation/index.php?voir=intro&mode
Fort méconnue du grand public, la
génération de textes demeure une
discipline sportive essentiellement
universitaire, pratiquée par d'obscurs chercheurs dans des laboratoires tristes et exigus. Cette discipline pousse ses malheureux
adeptes à des pratiques honteuses :
la génération par ordinateur interposé de textes longs et soporifiques
à partir d'une composition sémantique produite mécaniquement.
Hardly known by the great majority
of people, text generation remains a
sport basically practiced by people
from academia. Those engaged in
this activity usually work in sad and
narrow places. The discipline
induces strange kinds of behavior
like the generation of long and
boaring texts via computers on the
basis of mechanically produced
semantic representations.
6
What is NLG?
In search for a definition
The focus and definition may depend on the domain
(psychology, linguistic, computer science)
Mapping problem:
translate meanings into linguistic form
Linguistically-mediated problem solving
Language as a search problem
7
What is NL-Generation? (I)
Generation as a mapping process
C1
W1
Input: concepts
C3
C2
W2
W3
Output: words
NLG viewed as a process of mapping
a conceptual structure (meaning) onto a linguistic form
8
Catch me if you can
C1
C3
C2
W1
W2
C4
W3
Conceptualization
Expression
We tend to think faster than we can find
the corresponding words and convert them into sounds
9
There is no one-to-one mapping
between linguistic structures and
conceptual structures
The same conceptual structure may map onto
different linguistic structures
(synonymes, paraphrase)
Possession
This car belongs to the president
This is the car of the president
This is the president's car
This is his car.
verb
preposition
genitive
Poss. Adj.
11
The same linguistic structure may map onto
different conceptual structures
Linguistic ressource: genitif
Peter's car is broken
Peter's brother is sick
Peter's leg hurts
possession
family relationship
inalienable possession,
part of
12
NLG as language mediated
problem solving
PROBLEM
WHY COMMUNICATE ?
(Reason - motivation)
WHAT TO SAY ?
(Content - message)
HOW TO SAY IT ?
(Linguistic resources)
SOLUTION
SAY IT !
Formulation
13
A simple
generation
model
14
Nature of choices
pragmatic
conceptual
linguistic
15
Pragmatic choices
Languages are indirect means for achieving goals
•
mediating devices
Different linguistic means serve different discourse
purposes
•
i.e. different forms are used in order to achieve
different goals
16
Pragmatic choices: language
as a resource
active vs. passive voice
[topic, perspective]
main vs. subordinate clause [relative prominence]
17
Conceptual choices
Different meanings yield generally
different forms
NUMBER
TENSE
he sings vs. they sing
he sings vs. he sang
18
Linguistic choices
The same meaning can be expressed by different
words or syntactic forms (synonymes, paraphrases)
GROWN UP MALE PERSON:
man, guy, chap
HELP:
help, give a hand, assist
19
What is NL-Generation?
Tentative definition (III)
Generation as a search problem
Size of mental lexicon : appr. 30 000 words
20
An abstract
view
An
example
Input: analysis
Phonological
level
Lexical
level
Semantic
Level
matière
verre
contenant
direction
vers
environ-t
mesure
vert
couleur
vair
fourrure
ver
animal
23
Input: synthesis
Lexical
level
Semantic
level
...
boisson
+
+
récipient
Phonological level
...
tasse
avec
anse
verre
ouvert
+
biberon
avec
têtine
bouteille
fête
...
...
24
Different search spaces
Lexical
level
Semantic
level
...
boisson
+
+
récipient
Phonological level
Lexical
level
...
Semantic
Level
matière
verre
tasse
contenant
direction
avec
anse
vers
verre
ouvert
environ-t
mesure
+
biberon
avec
têtine
bouteille
fête
Phonological
level
...
vert
couleur
vair
fourrure
ver
animal
...
25
Fundamental problems
Analysis : ambiguity
Generation : choice
Why bother about generation ? (1)
Different kinds of motivation
Theoretical
Practical
Industrial
27
Theoretical reasons - building and
testing a theory
Testbed
•
coverage (over/undergeneration), correctness
Testbed
•
for a linguistic theory :
for a psychological model:
simulation of cognitive processes (on-line
processing, language learning)
28
Practical reasons
(industrial-full automation)
machine translation
text generation (business letters)
generation of resumes (stock market report,
weather forecast, etc.)
help systems (audit trail, access to DB)
abstracting
29
Practical reasons
(help systems, semi automation)
Computer
assisted language learning (tools)
Writer's
workbench (pre/postediting:
correction of grammar, style, spelling, text
organization)
30
The decomposition of the
task: NLG-architectures
31
GOAL
A two-stage
model
Division of labor
32
Four componants
33
Procedural know-how
Planning (determine the order of the different
steps - textual organisation)
Searching (find the words; access)
Reasoning-inferencing (« see » possible links
between ideas)
34
LTM
STM
Up to
lifetime
less than
30 seconds
Sensory
Memory
1 second
Rose
Basic Memory Processes
Number of choices
(space + time constraints)
We have to take a great number of choices
under severe space and time constraints
space constraint (limitation of STM)
time constraint : (speed)
speech
is fast: 3-5 words / second
average
of decisions / word = 4
37
Diversity of choices
Conceptual
choices
Linguistic
choices
Pragmatic
choices
38
The necessary information for
synthesis is scattered all over
Subject
LISTENER
Pronoun
Direct Object
GIVE
BOOK
Indirect Object
SPEAKER
Pronoun
39
How to express the notion of the speaker ?
SPEAKER
je
I
me
me
moi
me
nous
we / us
What do the different forms depend upon?
40
LISTENER
Subj.
GIVE
DO
BOOK
IO
SPEAKER
Tu lui donnes le livre.
You give him/her the book.
Person
Tu me donnes le livre.
Tu nous donnes le livre.
You give us the book.
Tu ME donnes le livre.
You give me the book.
Number
You give me the book.
41
Tu me donnes le livre.
You give me the book.
Speech act
Tu me donnes le livre.
You give me the book.
Tense
Ne me le donne pas !
Don’t give me this book !
LISTENER
Subj.
Donne-le moi !
Give it to me !
Tu m’as donné le livre.
You have given me the book.
Donne-moi ce livre !
Give me this book !
Polarity
GIVE
DO
BOOK
IO
SPEAKER
42
MESS AGE
PAUL
Agent
pré sent
AIDER
Objet
MARIE
SYNTACTIC FUNCTIONS & VOICE
aider = active voice
Paul = subject
Marie = direct object
PRAGMATIC CHOICE
Paul = topic
Marie = given
aider = new
PARTS OF SPEECH
aider = verb
Paul = noun
Marie = pronoun
LEXICALIZATION
AIDER = aider
PAUL = Paul
MARIE = Marie
WORD ORDER
MORPHOLOGY
verb : 3d person, singular, present -> aide
Subject : Noun -> Paul
Direct object: pronoun -> la
PHONO-GRAPHEMIC ADJUSTMENTS
SUBJEC
T
noun
Paul la aide
DIRECT OBJECT
pronoun
VERB
verb
Paul l' aide.
43
present
Input
PAUL
Agent
HELP
Object
MARY
PRAGMATIC CHOICE
Paul = topic
Marie = given
Aider = new
SYNT. FUNCT. & VOICE
voice = active
Paul = subject
Mary = direct object
MORPHOLOGY
Verb : 3d person, singular,
present aide
Subject : Noun Paul
Direct object : pronoun la
LEXICALIZATION
HELP = aider
PAUL = Paul
MARY = Marie
output
PHONOGRAPH. SYNTH.
Paul l’aide.
Paul helps her
PART OF SPEECH
HELP = verb
Paul = noun
Mary = pronoun
WORD ORDER
SUBJECT noun
DIR. OBJECT pronoun
VERB verb
44
Consequences for languages, architecture &
processing
languages are and need to be flexible
information does not become available in a strict order: it
may vary on every occasion
EVENT-TIME-PLACE vs. PLACE-EVENT-TIME , etc.
Consequences (interaction and accomodation)
Data : accomodation of the different data structures (interaction
between words and syntax) in the different modules (conceptual
lexical, syntactic),
45
Process : feedback to higher components
Example illustrating
the consequences (i.e. functional
dependencies ) of the choices
46
Conceptual input
PERSON-1
singular
Agent
HELP
Present perfect
Patient
PERSON-2
singular
47
Let’s consider the consequences
of the following 2 choices
Topicalisation
the concept to start the sentence with
Lexical choice
synonymes
48
Topicalize Agent
Consequences:
Agent
voice
--> Subject
--> active
Patient
--> Direct Object
49
Consequences of topicalisation
Topicalization
Topic
PERSON-1
singular
Active voice
Subject
HELP
Present perfect
Object
PERSON-2
singular
50
Topicalize Patient
Consequences:
Agent
--> PP
Voice
--> passive
Patient
-->grammatical Subject
51
Consequences of topicalisation
Topicalization
Topic
PERSON-2
singular
Passive voice
Subject
HELP
Present perfect
Prepos.Phrase
PERSON-1
singular
52
Summary of the consequences of the
topicalization choice at the top level
Strategy 1
Strategy 2
Topic
agent
patient
Agent
grammatical subject
preposit. phrase
Patient
direct object
grammat. subject
voice
active
passive
53
Assumptions - Conclusion
No superexpert but a set of cooperative agents
competition - accomodation
no algorithmic processing but opportunistic
planning
various orders of processing
various components need the same information
system is heterarchical rather than hierarchic
54