Motivated Learning for Machine Intelligence_ Nov

Download Report

Transcript Motivated Learning for Machine Intelligence_ Nov

Motivated Learning based on
Goal Creation
Janusz Starzyk
School of Electrical Engineering and Computer Science,
Ohio University, USA
www.ent.ohiou.edu/~starzyk
Istituto Dalle Molle di Studi sull'Intelligenza Artificiale, 4 December 2009.
EE141
Outline
Embodied Intelligence (EI)
 Embodiment of Mind
 How to Motivate a Machine
 Goal Creation Hierarchy
 GCS Experiment
 Motivated Learning

EE141
Design principles of intelligent systems
from Rolf Pfeifer “Understanding of Intelligence”, 1999








EE141
Interaction with
complex environment
cheap design
ecological balance
redundancy principle
parallel, loosely
coupled processes
asynchronous
sensory-motor
coordination
value principle
Agent
Drawing by Ciarán O’Leary- Dublin Institute of Technology
Embodied Intelligence
 Definition
Embodied Intelligence (EI) is a
mechanism that learns how to
survive in a hostile environment
– Mechanism: biological, mechanical or virtual agent
with embodied sensors and actuators
– EI acts on environment and perceives its actions
– Environment hostility is persistent and stimulates EI to act
– Hostility: direct aggression, pain, scarce resources, etc
– EI learns so it must have associative self-organizing memory
– Knowledge is acquired by EI
EE141
Embodiment of a Mind




Embodiment is a part of environment under control
of the mind
It contains intelligence core and sensory motor
interfaces to interact with environment
It is necessary for development of intelligence
It is not necessarily constant
Embodiment
Sensors
channel
Environment
Intelligence
core
Actuators
EE141
channel
Embodiment of Mind





Changes in embodiment modify
brain’s self-determination
Brain learns its own body’s
dynamics
Self-awareness is a result of
identification with own embodiment
Embodiment can be extended by
using tools and machines
Successful operation is a function
of correct perception of
environment and own embodiment
EE141
How to Motivate a Machine ?
A fundamental question is what
motivates an agent to do
anything, and in particular, to
enhance its own complexity?
What drives an agent to
explore the environment and
learn ways to effectively
interact with it?
EE141
How to Motivate a Machine ?

Pfeifer claims that an agent’s motivation should emerge
from the developmental process.
 He called this the “motivated complexity” principle.
 Chicken and egg problem? An agent must have a motivation to
develop while his motivation comes from development?

Steels suggested equipping an agent with self-motivation.
 “Flow” experienced when people perform their expert activity well
would motivate to accomplish even more complex tasks.
 But what is the mechanism of “flow”?

Oudeyer proposed an intrinsic motivation system.
 Motivation comes from a desire to minimize the prediction error.
 Similar to “artificial curiosity” presented by Schmidhuber.
EE141
How to Motivate a Machine ?

Although artificial curiosity helps to
explore the environment, it leads to
learning without a specific purpose.
 It may be compared to exploration in
reinforcement learning.

Exploration is needed in order to learn and to model the
environment.
 But is exploration the only motivation we need to develop EI?
 Can we find a more efficient mechanism for learning?

I suggest a simpler mechanism to motivate a machine.
EE141
How to Motivate a Machine ?

I suggest that it is the hostility of the environment, in the
definition of EI that is the most effective motivational factor.
 It is the pain we receive that moves us.
 It is our intelligence determined to reduce this pain that motivates us
to act, learn, and develop.

Both are needed - hostility of the environment and
intelligence that learns how to reduce the pain.
 Thus pain is good.
 Without pain we would not be motivated to develop.
Fig. englishteachermexico.wordpress.com/
EE141
Motivated Learning

I suggest a goal-driven mechanism to motivate
a machine to act, learn, and develop.




A simple pain based goal creation system.
It uses externally defined pain signals that are
associated with primitive pains.
Machine is rewarded for minimizing the primitive
pain signals.
Definition: Motivated learning (ML) is learning based on the
self-organizing system of goal creation in embodied agent.



Machine creates abstract goals based on the primitive pain signals.
It receives internal rewards for satisfying its goals (both primitive and
abstract).
ML applies to EI working in a hostile environment.
EE141
Pain-center and Goal Creation


EE141
expectation
n

tio

Simple Mechanism
Creates hierarchy
of values
Leads to formulation
of complex goals
Reinforcement
• Pain increase
• Pain decrease
Forces exploration
i bi
inh

Dual
pain
memory
Pain
detection
Pain increase
+ (-)
d
nee
(-)
(+)
Sensor
activation
Missing
objects
- (+)
Pain
decrease
Stimulation
Pain detection/goal creation center
Reinforcement neuro-transmitter
Sensory neuron
Motor neuron
Motor
Abstract Goal Creation for ML
 The goal is to reduce
the primitive pain level
 Abstract goals are
created if they satisfy
the primitive goals
Sensory pathway
(perception, sense)
Motor pathway
(action, reaction)
refrigerator
Open
-
+
food”becomes a
“
sensory input to
abstract pain center
Abstract pain
(Delayed memory of pain)
Food
Eat
-
Association
Inhibition
Reinforcement
Connection
Planning
Expectation
EE141
Level II
Level I
+
Dual pain
Pain
Primitive
Level
Stomach
Goal Creation Experiment
SENSORY
MOTOR
INCREASES
DECREASES
1
Food
Eat
sugar level
food supplies
8
Grocery
Buy
food supplies
money at hand
15
Bank
Withdraw
money at hand
spending
limits
22
Office
Work
spending limits
job
opportunities
29
School
Study
job
opportunities
-
PAIR #
Sensory-motor pairs and their effect on the environment
EE141
Goal Creation Experiment in ML
Pain
Primitive Hunger
1
Pain
0
0
200
300
400
Lack of Food
500
600
100
200
300
400
Empty Gorcery
500
600
100
200
300
400
Discrete time
500
600
0.5
0
0
Pain
100
0.5
0
0
Pain signals in GCS simulation
EE141
Goal Creation Experiment in ML
Goal Scatter Plot
40
35
30
Goal ID
25
20
15
10
5
0
0
100
200
300
400
Discrete time
500
600
Action scatters in 5 GCS simulations
EE141
Goal Creation Experiment in ML
Pain
Pain
Pain
Pain
Pain
Primitive Hunger
0.5
0
0.2
0.1
0
0.2
0.1
0
0.2
0.1
0
0.1
0.05
0
0
100
200
300
Lack of Food
400
500
600
0
100
200
300
Empty Gorcery
400
500
600
0
100
200
300
Lack of Money
400
500
600
0
100
200
300
400
Lack of JobOpportunitites
500
600
0
100
200
500
600
300
Discrete time
400
The average pain signals in 100 GCS simulations
EE141
Compare RL (TDF) and ML (GCS)
Mean primitive pain
Pp value as a function
of the number of
iterations:
- green line for TDF
- blue line for GCS.
Primitive pain ratio with
pain threshold 0.1
EE141
Compare RL (TDF) and ML (GCS)

Comparison of
execution time on
log-log scale
 TD-Falcon green
 GCS blue

Combined
efficiency of GCS
1000 better than
TDF
Problem solved
Conclusion: embodied intelligence, with motivated learning based on goal creation
is an effective learning and decision making system for dynamic environments.
EE141
Reinforcement Learning


Single value function
Measurable rewards
 Can be optimized



Predictable
Objectives set by
designer
Maximizes the reward
Motivated Learning

 One for each goal


Learning effort increases
with complexity
Always active
EE141
Internal rewards
 Cannot be optimized



 Potentially unstable

Multiple value functions
Unpredictable
Sets its own objectives
Solves minimax problem
 Always stable


Learns better in complex
environment than RL
Acts when needed
Sounds like science fiction


EE141
If you’re trying to look far
ahead, and what you see
seems like science fiction,
it might be wrong.
But if it doesn’t seem like
science fiction, it’s
definitely wrong.
From presentation by Feresight Institute
Questions?
EE141
Resources – Evolution of Electronics
EE141
From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006
EE141
By Gordon E. Moore
EE141
Clock Speed (doubles every 2.7 years)
EE141
From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006
Doubling (or Halving) times








EE141
Dynamic RAM Memory “Half Pitch” Feature Size
Dynamic RAM Memory (bits per dollar)
Average Transistor Price
5.4 years
1.5 years
1.6 years
Microprocessor Cost per Transistor Cycle
Total Bits Shipped
Processor Performance in MIPS
Transistors in Intel Microprocessors
Microprocessor Clock Speed
1.1 years
1.1 years
1.8 years
2.0 years
2.7 years
From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006
EE141
From Ray Kurzwail, The Singularity Summit at Stanford, May 13, 2006
EE141
From Hans Moravec, Robot, 1999
Software or hardware?
Software





Sequential
Error prone
Require programming
Low cost
Well developed
programming methods
EE141
Hardware





Concurrent
Robust
Require design
Significant cost
Hardware prototypes
hard to build
Future software/hardware capabilities
11
10
10
10
g
alo
n
A
SI
L
V
9
Number of neurons
10
(F
ch
a
ro
pp
a
are
w
d
r
Ha
8
10
7
10
re
tw a
Sof
6
10
io
u lat
m
i
S
Human
brain
complexity
A)
G
P
d)
ase
b
C
n (P
5
10
4
10
2005
2010
2015
2020
2025
Year
EE141
2030
2035
2040
Why should we care?
EE141
Source: SEMATECH
Design Productivity Gap  Low-Value Designs?
Percent of die area that must be occupied by memory to
maintain SOC design productivity
100%
80%
60%
% Area Memory
40%
% Area Reused
Logic
20%
% Area New Logic
19
99
20
02
20
05
20
08
20
11
20
14
0%
Source = Japanese system-LSI industry
EE141
Self-Organizing Learning Arrays SOLAR




* Self-organization
* Sparse and local
interconnections
* Dynamically
reconfigurable
* Online data-driven
learning
Integrated circuits connect transistors into a system
-millions of transistors easily assembled
-first 50 years of microelectronic revolution
Self-organizing arrays connect processors into a system
-millions of processors easily assembled
-next 50 years of microelectronic revolution
EE141