Action-Selection Biased by Pleasure Regulated Simulated

Download Report

Transcript Action-Selection Biased by Pleasure Regulated Simulated

Computational Aspects of
Emotion in Adaptive Behavior
Joost Broekens, Walter Kosters, Fons Verbeek
LIACS, Leiden University, The Netherlands.
Joost Broekens, LIACS, Leiden University, The Netherlands.
Overview
• Emotion & Information Processing.
• Adaptive agents:
– reactive,
– cognitive,
– emotion-modulated cognitive agents.
• Experiment: Pleasure regulates
information processing.
• Future work.
Joost Broekens, LIACS, Leiden University, The Netherlands.
Emotion: communication medium,
decision heuristic and modulator.
• Common emotions: fear, anger, happiness, sadness, surprise,
disgust.
• Short episode triggered by an (internal/external) event composed of
–
–
–
–
–
subjective feelings,
inclinations to act (action preparation, action tendency (Frijda)),
facial expressions,
cognitive evaluation, and
physiological arousal (heartbeat, alertness).
• Emotion: communication medium.
– Communicate internal state (Biological & Sociological evidence: Darwin,
Ekman).
• Emotion: decision-heuristic relating events to goals, needs, desires,
beliefs of an agent.
– Result of evaluation of personal relevance, helps decision-making
(Neurological & cognitive evidence: Damasio, appraisal theory).
• Emotion: influences information processing.
– Neurocomputational & cognitive evidence: Doya and Frijda, Manstead
and Bem.
Joost Broekens, LIACS, Leiden University, The Netherlands.
Emotion & Information Processing
• BiologyEmotion; internal drives, homeostasis, hardwired reactions
• CognitionEmotion; cognitive emotion elicitation:
– Emotions result from the interpretation of our world in relation to our
goals, needs, desires, beliefs, etc. (Appraisal Theory, Frijda, Lazarus,
Arnolds, etc.).
• Emotionbehavior; emotion influences adaptive behavior:
–
–
–
–
emotion as drive,
emotion as source of information,
emotion as modulator of cognitive processes.
Relates to different types of (views on aspects of) adaptive agents:
• reactive,
• cognitive,
• emotion-modulated cognitive agents.
Joost Broekens, LIACS, Leiden University, The Netherlands.
Emotions and reactive agents
• Reactive agents:
– have predefined behaviors,
– learn new behavior based on instrumental conditioning, and
– select behaviors based on this learned model and based on
internal drives (motivations).
• Emotion influences behavior:
– can be such an internal drive, and
– can trigger typical behaviors (fight / flight).
• Computational models that study emotion within this
context (drive/motivation) (Avila-Garcia and Cãnamero,
2004; Cãnamero, 1997; Velasquez, 1998).
Joost Broekens, LIACS, Leiden University, The Netherlands.
Emotion and cognitive agents
• Cognitive agents are reactive agents plus:
–
–
–
–
Internally represented knowledge used in
planning and reasoning, and an
Attention mechanism guiding perception and action,
etc...
• Emotion influences behavior:
– is a source of (explicit) information used in reasoning
(knowledge), and
– can (implicitly) modulate information processing
(systemic influence).
• Computational models in which emotion is used
as information (e.g. Botelho and Coelho).
Joost Broekens, LIACS, Leiden University, The Netherlands.
Thinking: Internal Simulation of
Behavior
• Internal simulation of behavior
– Covertly execute and evaluate potential interaction using
sensory-motor substrates (Hesslow, 2002; Damasio; Cotterill,
2001), but see also
– “interaction potentialities” (Bickhard), and
– “state anticipation” (Butz, Sigaud, Gérard, 2003).
– Existing mechanisms are basis for simulation
– Evolutionary continuity!
• Our basis for information processing
Joost Broekens, LIACS, Leiden University, The Netherlands.
Emotion modulates information
processing
•
•
Emotion influences thinking and behavior at multiple levels of cognitive
complexity (Frijda, Manstead and Bem, 2000; Damasio, 1994; Davidson,
2000; Berridge, 2003; Rolls, 2000).
Emotion is integrated at multiple levels of processing &
higher levels of processingconscious, reflective reasoningnot always
existed 
evolutionary advantage to integration of emotion at lower levels can be
expected; levels close to reward systems, and behavioral control.
– If thinking is internal simulation of behavior, these low-level integration
mechanisms should also learn us about the influence of emotion on higher-level
cognitive mechanisms, e.g., on attention.
•
•
•
In this research we focus on the low-level influence of emotion on
information processing in simulated adaptive agents.
We use emotion as a metalearning parameter (Doya, 2000).
Emotion: pleasure and arousal (Russell, 2003).
Joost Broekens, LIACS, Leiden University, The Netherlands.
Experiment: Can pleasure regulate
information processing such that this
provides an adaptive advantage for the
agent?
Joost Broekens, LIACS, Leiden University, The Netherlands.
Pleasure regulates information
processing
Cognitive influence
simulated reinforcement
simulated interaction
pleasure
Interaction-selection
Emotion process
interaction
predicted interactions
action
Reactive behavior
Perception
percept
RL
model
Action-selection
reinforcement
stimulus
ENVIRONMENT
Joost Broekens, LIACS, Leiden University, The Netherlands.
Learning
• The agent learns to interact with the
environment through Reinforcement Learning
(instrumental conditioning).
– Agent’s actions are rewarded or punished.
– Learns value-state predictions of potential next
states.
– Uses these predictions to determine what next
action to do.
– Basics of the model are based on (Sutton and
Barto, 1998).
• Learns through continuous interaction.
• Learns based on perception-action pairs.
Joost Broekens, LIACS, Leiden University, The Netherlands.
Learning: reinforcement example
Reward: propagate back to beginning, using a mechanism that solves the
temporal credit assignment problem (i.e., find actions responsible for reward).
Joost Broekens, LIACS, Leiden University, The Netherlands.
Action-Selection
Cognitive influence
simulated reinforcement
simulated interaction
pleasure
Interaction-selection
Emotion process
interaction
predicted interactions
action
Reactive behavior
Perception
percept
Distributed-state RL
model
Action-selection
reinforcement
stimulus
ENVIRONMENT
Joost Broekens, LIACS, Leiden University, The Netherlands.
Action-Selection
• Value-state predictions are transformed
into action-values.
• Action-selection is based on these action
values.
– Choose an action from the set of action-value
pairs stochastically (e.g. using a Boltzmann
distribution)
• Action-selection responsible for
exploration vs. exploitation behavior.
Joost Broekens, LIACS, Leiden University, The Netherlands.
Our agent’s cognitive part (based on
internal simulation of behavior)
Cognitive influence
simulated reinforcement
simulated interaction
pleasure
Interaction-selection
Emotion process
interaction
predicted interactions
action
Reactive behavior
Perception
percept
Distributed-state RL
model
Action-selection
reinforcement
stimulus
ENVIRONMENT
Joost Broekens, LIACS, Leiden University, The Netherlands.
Simulation: action-selection bias
At every step, instead of action-selection, select a subset of predicted
interactions from reinforcement learning model  feed back to RL
model.
1. Interaction-selection: select a subset of predicted interactions.
2. Simulate-and-bias-predicted-benefit: feed back to model as if a real
interaction.
3. Action-selection: select
the next action using
the action-selection
mechanism explained
earlier based on the
now biased action
values.
Cognitive influence
simulated reinforcement
simulated interaction
pleasure
Interaction-selection
Emotion process
interaction
predicted interactions
action
Reactive behavior
Perception
percept
Hierarchical-state
RL model
Action-selection
reinforcement
stimulus
ENVIRONMENT
Joost Broekens, LIACS, Leiden University, The Netherlands.
Simulation: example
• Action list before simulation (!hypothetical example!):
– {up=0.2, down=-0.5, right=-1, left=-1}
• Action-selection would have selected “up”,
– With Boltzmann high probability for “up”.
• Simulate all interactions.
Roadblock r=-.5
– Propagate back the predicted values by simulating interaction with
environment.
– Effect is a “value look-ahead” of 1 step.
• Action list after simulation:
– {up=0.1, down=0.5, right=-1, left=-1}
• Action-selection selects “down”.
• In this example simulating all predicted interactions helps .
Joost Broekens, LIACS, Leiden University, The Netherlands.
But: Simulating Everything is not
Always Best
• Even apart from fact that simulating everything costs mental effort.
• Earlier experiments (Broekens, 2005) showed that
– simulation has benefit, especially when many interactions are simulated.
This is not surprising (better heuristic). However,
– in some cases less simulation resulted in better learning.
 Dynamic relation between environment and simulation “strategy” (i.e.
simulation threshold: percentage of all predicted interactions to be
simulated).
 Emotion as metalearning to adapt amount of internal simulation?
(Doya, 2002)
– Pleasure is an indication of the current performance of the agent (Clore
and Gasper, 2000). Also,
– high pleasure top down thinking, and
low pleasure bottom up thinking (Fiedler and Bless, 2000).
Joost Broekens, LIACS, Leiden University, The Netherlands.
Pleasure Modulates Simulation
Cognitive influence
simulated reinforcement
simulated interaction
pleasure
Interaction-selection
Emotion process
interaction
predicted interactions
action
Reactive behavior
Perception
percept
Distributed-state RL
model
Action-selection
reinforcement
stimulus
ENVIRONMENT
Joost Broekens, LIACS, Leiden University, The Netherlands.
Pleasure Modulates Simulation
• Many theories of emotion.
• We use core-affect (or activation-valence) theory of
emotion as basis.
– Two fundamental factors, pleasure and arousal (Russell, 2003).
– Pleasure relates to emotional valence, and
– arousal relates to action-readiness, or activity.
• In this study we model pleasure as simulation threshold.
– We use pleasure to dynamically adapt the amount of interactions
that are simulated. It is thus used as a dynamic simulation
threshold.
– We study the indirect effect of emotion as a metalearning
parameter affecting information processing that on its turn
influences action-selection.
Joost Broekens, LIACS, Leiden University, The Netherlands.
Pleasure Modulates Simulation
• Pleasure quantification: indication of current performance relative to
what the agent is used to.
– Tried to capture this by the normalized difference between the short
term average reinforcement signal and the long term average
reinforcement signal:
• Continuous pleasure feedback:
e p  (r star  (r ltar  f ltar )) 2 f ltar
– High pleasure, going well?
Continue strategy, goal directed
thinking.
This is the only formula in the
presentation!
• > ep, high threshold, simulate
predicted best interactions,
– Low pleasure? Look broader, pay
more attention to all predicted
interactions.
• < ep, low threshold, simulate many
interactions.
Cognitive influence
simulated reinforcement
simulated interaction
pleasure, ep
Interaction-selection
Emotion process
interaction
predicted interactions
action
Reactive behavior
Perception
percept
Hierarchical-state
RL model
Action-selection
reinforcement
stimulus
ENVIRONMENT
Joost Broekens, LIACS, Leiden University, The Netherlands.
Experimental setup
• To measure adaptive effect of pleasure-modulated simulation: force
agent to adapt to new task.
– First the agent has 128 trials to learn task 1, then
– switch environment to new task, 128 trials to learn task 2.
– Repeat for many different parameter settings (e.g. the window of the
long and short term average reinforcement signals, the learning rate,
etc…)
• Pleasure predictions:
–
–
–
–
Pleasure increases to value near 1 (agent gets better at task)
then slowly converges down to .5. (agent gets used to task)
At switch: pleasure drops, (new task, drop in performance)
then increases to value near 1, and converges down to .5 (agent gets
used to new task)
Joost Broekens, LIACS, Leiden University, The Netherlands.
Results
• Performance of pleasure-modulated simulation is comparable with
simulating ALL / Best 50% predicted interactions (static simulation
threshold), but, using only 30% / 70% of the mental resources.
Joost Broekens, LIACS, Leiden University, The Netherlands.
Results
• Some settings even have a significantly better performance at lower
mental cost.
• Predicted pleasure curve was confirmed
Joost Broekens, LIACS, Leiden University, The Netherlands.
Some conclusions
• Can pleasure regulate information processing such that this
provides an adaptive advantage for the agent?
– Yes.
• Simple pleasure feedback can be used to determine how broad an
agent should internally simulate potential behavior.
– Agent’s performance is comparable and mental effort decreases.
– Since we introduce few new mechanism for simulation
results are relevant to the understanding of the evolutionary plausibility
of the simulation hypothesis, as increased individual adaptation at lower
cost is an evolutionary advantageous feature.
• Our results provide clues of a relation between the simulation
hypothesis and emotion theory.
Joost Broekens, LIACS, Leiden University, The Netherlands.
Future work.
• Use emotion to modulate:
– action-selection distribution (Doya, 2002), and
– interaction-selection distribution (e.g. temperature of Boltzmann,
threshold of our AS mechanism).
• Interplay between covert interaction (simulation) and overt
interaction (action-selection).
– Simulate the best interaction, but chose an action stochastically, see
also (Gadanho, 2003):
 Gives extra “drive” to certain actions.
– The inverse? Seems rational too:
 Simulate bad actions for “mental (covert) exploration”, choose best actions
for “overt exploitation”.
 Early experiments do not (yet) show clear benefit.
• Use arousal factor as feed-back
• Could arousal modify amount of energy available for information
processing, and thereby provide a bound for the amount of simulation?
• Arousal resulting from low-level evaluation of familiarity and suddenness
(e.g. Scherer).
Joost Broekens, LIACS, Leiden University, The Netherlands.
Questions?