Lecture Note - 서울대 : Biointelligence lab
Download
Report
Transcript Lecture Note - 서울대 : Biointelligence lab
Cognitive Learning and the Multimodal Memory Game:
Toward Human-Level Machine Learning
2008 IEEE World Congress on Computational Intelligence (WCCI 2008)
Cognitive Architectures: Towards Human-Level Intelligence Session
June 5, 2008, Hong Kong
Byoung-Tak Zhang
Biointelligence Laboratory
School of Computer Science and Engineering
Cognitive Science, Brain Science, and Bioinformatics Programs
Seoul National University
Seoul 151-744, Korea
[email protected]
http://bi.snu.ac.kr/
Talk Outline
Human-level machine learning is a prerequisite to achieving
human-level machine intelligence.
Differences of behaviors in humans and machines
What principles are underlying the cognitive learning and memory
in humans?
A proposal for three principles
What tasks are challenging enough to study human-level machine
learning?
A proposal for the multimodal memory game (MMG)
Some illustrative results
Linguistic memory
Language-vision translation
Future directions
Toward human-level machine intelligence
2
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Cognitive Learning
Humans and Machines
Humans are
creative,
compliant,
attentive to change,
resourceful, and
multipurpose
Humans are
imprecise,
sloppy,
distractable,
emotional, and
illogical
To
achieve human-level intelligence these
properties should be taken into account.
4
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Toward Human-Level Intelligence
Human intelligence develops
situated in a multimodal
environment [Gibbs, 2005].
The human mind makes use of
multiple representations and
problem-solving strategies [Fuster,
2003].
The brain consists of functional
modules which are localized in
subcortical areas but work together
on the whole-brain scale [Grillner et
al., 2006].
Humans can integrate the multiple
tasks into a coherent solution [Jones,
2004].
Humans are versatile and come up
with many new ideas and solutions
to a given problem [Minsky, 2006].
5
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Learning and Memory as a Substrate for
Intelligence
It is our memory that enables us to value everything else
we possess. Lacking memory, we would have no ability
to be concerned about our hearts, achievements, loved
ones, and incomes. Our brain has an amazing capacity to
integrate the combined effects of our past experiences
together with our present experiences in creating our
thought and actions. This is all possible by the memory
and the memories are formed by the learning process.
McGaugh, J. L. Memory & Emotion: The Making of Lasting Memories, 2003.
6
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Principles of Learning: Early Ideas
Aristotle: Three Laws of Association [Crowder, 1976]
Similarity
Contrast
Contiguity
James Mill (1773-1836): Strength Criteria of
Association
Permanence
Certainty
Spontaneity
“Mental Compounding”
John Stuart Mill (1806-1873)
“Mental Chemistry”
7
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Principles of Learning: Modern Concepts
Types of learning:
Accretion, tuning,
restructuring (e.g.,
Rumelhart & Norman,
1976)
Encoding specificity
principle (Tulving, 1970’s)
Cellular and molecular
basis of learning and
memory (Kandel et al.,
1990’s)
Conceptual blend and
chemical scramble (e.g.,
Feldman, 2006)
8
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Methods of Machine Learning
Symbolic Learning
Version Space Learning
Case-Based Learning
Neural Learning
Multilayer Perceptrons
Self-Organizing Maps
Support Vector Machines
Evolutionary Learning
Evolution Strategies
Evolutionary Programming
Genetic Algorithms
Genetic Programming
Probabilistic Learning
Bayesian Networks
Helmholtz Machines
Latent Variable Models
Generative Topographic
Mapping
Other Machine Learning
Methods
Decision Trees
Reinforcement Learning
Boosting Algorithms
Kernel Methods
Independent Component
Analysis
Three Fundamental Principles of Cognitive
Learning: Our Proposal
Continuity. Learning is a continuous,
lifelong process. “The experiences of
each immediately past moment are
memories that merge with current
momentary experiences to create the
impression of seamless continuity in
our lives” [McGaugh, 2003]
Glocality. “Perception is dependent
on context” and it is important to
maintain both global and local, i.e.
glocal, representations [Peterson and
Rhodes, 2003]
Compositionality. “The brain
activates existing metaphorical
structures to form a conceptual blend,
consisting of all the metaphors linked
together” [Feldman, 2006]
10
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Research Platform for Cognitive
Learning
Toward Human-Level Machine Learning:
Multimodal Memory Game (MMG)
But, I'm getting married tomorrow
Well, maybe I am...
I keep thinking about you.
But,I'm
I'mwondering
getting married
And
if we tomorrow
made a mistake giving up so fast.
Well,
maybe
I am...
Are
you
thinking
about me?
I
keep
thinking
about
But if you are, call me you.
tonight.
And I'm wondering if we made a mistake giving up so fast.
Are you thinking about me?
But if you are, call me tonight.
Text
Hint
Image
Image
Hint
Sound
Image-to-Text Generator
Machine Learner
Text
Text-to-Image Generator
© 2007, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Text Generation Game (from Image)
Image
Sound
Text
I2T
Learning
by Viewing
T
Game
Manager
Text
Hint
T2I
13
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Image Generation Game (from Text)
Image
Sound
Hint
I2T
Learning
by Viewing
I
Game
Manager
Text
Image
T2I
14
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Some Experimental Results
Three Experiments
Sentence Generation
Learn: a linguistic recall memory from a sentence corpus
Given: a partial or corrupt sentence
Generate: a complete sentence
Image-to-Text Translation
Learn: an image-text joint model from an image-text pair corpus
Given: an image (scene)
Generate: a text (dialogue of the scene)
Text-to-Image Translation
Learn: an image-text joint model from an image-text pair corpus
Given: a text (dialogue)
Generate: an image (scene of the dialogue)
16
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Experiment 1: Learning Linguistic Memory
Dataset: scripts from dramas
Friends
House
24
Grey Anatomy
Gilmore Girls
Sex and the City
Training data: 289,468 sentences
Test data: 700 sentences with
blanks
Vocabulary size: 34,219 words
17
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Sentence Completion Results
? gonna ? upstairs ? ? a shower
I'm gonna go upstairs and take a shower
We ? ? a lot ? gifts
We don't have a lot of gifts
? have ? visit the ? room
I have to visit the ladies' room
? ? don't need your ?
if I don't need your help
? still ? believe ? did this
I still can't believe you did this
? ? a dream about ? In ?
I had a dream about you in Copenhagen
? ? ? decision
to make a decision
What ? ? ? here
What are you doing here
? appreciate it if ? call her by ? ?
I appreciate it if you call her by the way
? you ? first ? of medical school
Are you go first day of medical school
Would you ? to meet ? ? Tuesday ?
Would you nice to meet you in Tuesday and
I'm standing ? the ? ? ? cafeteria
I'm standing in the one of the cafeteria
Why ? you ? come ? down ?
Why are you go come on down here
? think ? I ? met ? somewhere before
I think but I am met him somewhere before
18
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Experiments 2 & 3: Crossmodal
The order (k) of hyperedge
Translation
Text: Order 2~4
Dataset: scenes and corresponding
Image: Order 10~340
scripts from two dramas
Friends
Prison Break
Training data: 2,808 scenes and
scripts
Scene (image) size: 80 x 60 = 4800
binary pixels
Vocabulary size: 2,579 words
The method of creating hyperedges
from training data
Text: Sequential sampling from a
randomly selected position
Image: Random sampling in 4,800
pixel positions
Number of samples from an imagetext pair
From 150 to 300
Where am I giving birth
I know it's been really hard for you
So when you guys get in there
19
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Image-to-Text Translation Results
Query
Matching &
Completion
Answer
I don't know
don't know what
know what happened
I don't know what happened
There's a
a kitty in
…
in my guitar case
There's a kitty in my guitar
case
Maybe there's something
there's something I
…
I get pregnant
Maybe there's something I
can do to make sure I get
pregnant
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Text-to-Image Translation Results
Matching &
Completion
Query
I don't know what happened
Take a look at this
There's a kitty in my guitar case
Maybe there's something I can
do to make sure I get pregnant
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Answer
Hypernetwork Architecture for
Cognitive Learning
1
x1
=1
x2
=0
x3
=0
x4
=1
x5
=0
x6
=0
x7
=0
x8
=0
x9
=0
x10
=1
x11
=0
x12
=1
x13
=0
x14
=0
x15
=0
y
=1
2
x1
=0
x2
=1
x3
=1
x4
=0
x5
=0
x6
=0
x7
=0
x8
=0
x9
=1
x10
=0
x11
=0
x12
=0
x13
=0
x14
=1
x15
=0
y
=0
3
x1
=0
x2
=0
x3
=1
x4
=0
x5
=0
x6
=1
x7
=0
x8
=1
x9
=0
x10
=0
x11
=0
x12
=0
x13
=1
x14
=0
x15
=0
y
=1
4
x1
=0
x2
=0
x3
=0
x4
=0
x5
=0
x6
=0
x7
=0
x8
=1
x9
=0
x10
=0
x11
=1
x12
=0
x13
=0
x14
=0
x15
=1
y
=1
Learning
4 Data Items
x1
x2
1
x1
x4
x10
y=1
x1
x4
x12
y=1
x4
x10
x12
y=1
x15
Round
Round123
x3
x14
x4
2
3
4
x2
x3
x9
y=0
x2
x3
x14
y=0
x3
x9
x14
y=0
x3
x6
x8
y=1
x3
x6
x13
y=1
x6
x8
x13
y=1
x8
x11
x15
y=0
x13
x12
x5
x6
x11
x7
x10
x8
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
x9
23
Hypernetwork of DNA Molecules
[Zhang, DNA-2006]
Hypernetwork as Chemical Associative
Memory
The hypernetwo rk is defined as
H ( X , S ,W )
X ( x1 , x2 ,..., xI )
The energy of the hypernetwo rk
1
1
E (x ( n ) ;W ) w(i1i22) x (i1n ) x (i2n ) w(i13i2)i3 x (i1n ) x (i2n ) x (i3n ) ...
2 i1 ,i2
6 i1 ,i2 ,i3
S Si ,
Si X , k | Si | The probabilit y distributi on
i
1
P(x ( n ) | W )
exp[ E (x ( n ) ;W )]
( 2)
( 3)
(K )
W (W ,W ,...,W )
Z(W )
Training set :
1
1
1
( 2) ( n ) ( n )
( 3) ( n ) ( n ) ( n )
exp
w
x
x
w
x
x
x
...
D {x ( n ) }1N
Z(W )
2
6
i
,
i
i
,
i
,
i
K 1
1
(k )
(n) (n)
(n)
exp
w
x
x
...
x
,
Z(W )
c
(
k
)
i ,i ,..., i
k 2
i1i2
i1
i2
i1i2i3
1 2
i1
i2
i3
1 2 3
i1i2 ...ik
1 2
i1
i2
ik
k
where the partition function is
[Zhang, DNA-2006]
K 1
(k )
(m) (m)
(m)
Z(W ) exp
wi1i2 ...ik x i1 x i2 ...x ik
k 2 c(k ) i1 ,i2 ,..., ik
x( m )
For more details: Zhang, B.-T., IEEE Computational Intelligence
Magazine, August 2008 (in press)
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
25
Image to Text (Recall Rate)
Recall Rate
1
0.9
0.8
0.7
Rate
0.6
0.5
0.4
0.3
0.2
Perfect Recall
0.1
Tolerant Recall
0
10
40
70
100
130
160 190 220
Image Order
250
280
310
340
26
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Text to Image (Recall Rate)
Recall Rate
1
0.9
0.8
0.7
Rate
0.6
0.5
0.4
0.3
0.2
0.1
0
2
3
4
Text Order
5
6
27
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Toward Human-Level Intelligence
From Mind to Molecules and Back
Mind
Brain
Cell
∞ memory
Molecule
1011 cells
>103 molecules
29
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Paradigms for Computational Intelligence
Hyperinteractionism
biomolecular
system
Symbolism
Connectionism
Dynamicism
Metaphor
symbol
system
neural
system
dynamical system
Mechanism
logical
electrical
mechanical
chemical
Description
syntactic
functional
behavioral
relational
Representation
localist
distributed
continuous
collective
Organization
structural
connectionist
differential
combinatorial
Adaptation
substitution
tuning
rate change
self-assembly
Processing
sequential
parallel
dynamical
massively parallel
Structure
procedure
network
equation
hypergraph
Mathematics
logic, formal
language
linear algebra,
statistics
geometry, calculus
graph theory,
probabilistic logic
Space/time
formal
spatial
temporal
spatiotemporal
[Zhang, IEEE Comp. Intel. Mag., August 2008]
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
30
Summary and Conclusion
We argue that understanding and implementing the principles of cognitive
learning and memory is a prerequisite to achieving human-level
intelligence.
Suggested three principles as the most fundamental to cognitive learning.
Continuity, glocality, compositionality
Proposed the multimodal memory game (MMG) as a research platform for
studying the architectures and algorithms for cognitive learning.
Presented the hypernetwork model as a cognitive architecture for learning
in an MMG environment.
Showed some experimental results to illustrate the usefulness of the
platform.
Linguistic recall memory or sentence completion
Language-vision crossmodal translation tasks
Future work can extend the experimental setups in various dimensions, such
as corpus size, kinds of modality, and learning strategies.
31
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Derivation of the Learning Rule
ln P ({x ( n ) }N1 | W )
(s)
wi1i2 ...is
w(i1is2)...is
K
1
(k )
(n) (n)
(n)
exp
w
x
x
...
x
ln
Z
(
W
)
i1i2 ...ik
i1
i2
ik
c
(
k
)
n 1
k
2
i
,
i
,...,
i
1 2
k
N
N
K
1
(k )
(n) (n)
(n)
exp
w
x
x
...
x
ln
Z
(
W
)
(s)
(s)
i1i2 ...ik
i1
i2
ik
c
(
k
)
w
n 1 w i i ...i
k
2
i
,
i
,...,
i
1 2
k
i1i2 ...is
12
s
N x x ...x
N
x (i1n ) x (i2n ) ...x (isn ) xi1 xi2 ...xis
n 1
i1
i2
is
Data
P ( x|W )
xi1 xi2 ...xis
P ( x|W )
where
1
N
xi1 xi2 ...xis
Data
xi1 xi2 ...xis
P ( x|W )
x
N
n 1
(n)
i1
x (i2n ) ...x (isn )
x i1 x i2 ...x is P ( x | W )
x
Completion of One Missing Word (1/3)
Sentences with One Missing Words Completion
800
700
600
500
400
300
200
Order 2
Order 3
Order 4
100
0
40K
80K
120K
160K
200K
240K
280K
290K
The number of completion increases while the number of training
sentences become larger.
34
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Completion of Two Missing Words (2/3)
Sentences with Two Missing Words Completion
800
700
600
500
400
300
200
Order 2
Order 3
Order 4
100
0
40K
80K
120K
160K
200K
240K
280K
290K
The number of completions increases until the number of missing
words equals the order – 1. (ex) Orders 3 and 4
35
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Completion of Three Missing Words (3/3)
Sentences with Three Missing Words Completion
800
700
600
500
400
300
200
Order 2
Order 3
Order 4
100
0
40K
80K
120K
160K
200K
240K
280K
290K
The number of completions rapidly decreases if the number of
missing words becomes larger than the order. (ex) Orders 2 and 3
36
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Multimodal Memory Game as a Platform
for Cognitive Machine Learning
Image
Sound
Text
I2T
Learning
by Viewing
T2I
37
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Illustrative Results
Query
Completion
Classification
what are you
who are you
who are you
Friends
Friends
Friends
I need to wear it
you want to wear it
you need to wear it
you need to do it
you need to wear a
24
24
24
House
24
who are you
? are you
who ? you
who are ?
you need to wear it
? need to wear it
you ? to wear it
you need ? wear it
you need to ? it
you need to wear ?
38
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Image to Text Translation
Image
Text
Learning
by Viewing
- Where am I giving birth
- You guys really don't know
anything
- So when you guys get in
there
- I know it's been really hard
for you
-…
Question:
Where ? I giving ?
User
Answer:
Where am I giving birth
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
Text
Corpus
39
Text to Image Translation
Image
Text
Learning
by Viewing
- Where am I giving birth
- You guys really don't know
anything
- So when you guys get in
there
- I know it's been really hard
for you
-…
Question:
You've been there
Answer:
Image Corpus
User
40
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/