23 MNS Model 2 - University of Southern California

Download Report

Transcript 23 MNS Model 2 - University of Southern California

Michael Arbib: CS564 - Brain Theory and Artificial Intelligence
University of Southern California, Fall 2001
Lecture 23: MNS Model 2
Michael Arbib
and
Erhan Oztop
The Mirror Neuron System for Grasping:
Visual Processing for the MNS model
The Virtual Arm
The Core Mirror Neuron Circuit
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
1
Visual Input Processing
Object recognition
IT
Object
affordance
extraction
(FARS-AIP)
Object features
cIPS
Hand
shape
recognition
& Hand
motion
detection
STS
AIP
Object affordance
-hand state
association
7b
Motor
program
(FARS-F5)
Integrate
temporal
association
F5canonica
l
Action
Mirror
Feedback recognition
Hand-Object
spatial relation
analysis
7a
(Mirror
Neurons)
F5mirror
Motor
program
(FARS-F4)
Motor
execution
M1
F4
Object
location
(FARS-VIP)
MIP/LIP/VIP
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
2
Reminder: Hand State components
For most components we need to know (3D) configuration of the hand.
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
3
What does it take for (monkey) MNS to work?
Visual Input:
Recognition of hand (h) and object (o)
and their temporal and spatial relation (r).
The set of all valid h * o * r combinations which make up a grasp is very large
(actually infinite). It is impossible that the system memorizes all such combinations
and raises the mirror response flag when a match occurs.
The visual information must be mapped to a lower dimensional space which is
simpler to handle.
This space has to capture the grasp type discrimination and temporal and spatial
relation characteristics of the hand and object:
 The aperture(s) between fingers, the position of the thumb with respect to
palm are key to define the hand configuration relevant for the grasp recognition.
 The disparity between the aperture axes and the object grasp axis and the
distance of the hand to the object are key to define the relation of the hand to
the object.
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
4
Hand State
We capture the hand and its relation to the target
object information in the Hand State, a 7-dimensional
trajectory H(t) with the following components:
H(t) = (d(t), v(t), a(t), o1(t), o2(t), o3(t), o4(t)) where
d(t): distance to target at time t
v(t): tangential velocity of the wrist
a(t): Aperture of the virtual fingers involved in
grasping at time t
o1(t): Angle between the object axis and the
(index finger tip – thumb tip) vector
o2(t): Angle between the object axis and the
(index finger knuckle – thumb tip) vector
o3(t), o4(t): The two angles defining how close the
thumb is to the hand as measured relative to the
side of the hand and to the inner surface of the
palm.
Note that the whole history of H(t) during a grasp is
required to represent the grasp.
Key task
To determine
whether the motion
and preshape of a
moving hand may be
extrapolated to
culminate in a grasp
appropriate to one of
the affordances of
the observed object.
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
5
Visual Processing for the MNS model
How much should we attempt to solve ?
 Even though computers are getting more powerful
the vision problem in its general form is an unsolved
problem in engineering.
 There exists gesture recognition systems for humancomputer interaction and sign language interpretation
 Our vision system must at least recognize
1) The Hand and its Configuration
2) Object features

Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
6
Simplifying the problem



We simplifying the problem of recognizing the Hand and its
Configuration by using colored patches on the articulation
points of the hand.
If we can extract the patch positions reliably then we can try
to extract some of the features that make up the hand state by
trying to estimate the 3D pose of the hand from 2D pose.
Thus we have 2 steps:
1. Extract the color marker positions
2. Estimate 3D pose
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
7
The Color-Coded Hand
• The Vision task is
simplified using colored
tapes on the joints and
articulation points
• The First step of hand
configuration analysis is to
locate the color patches
unambiguously (not easy!).
Use color segmentation. But we have to compensate for lighting,
reflection, shading and wrinkling problems: Robust color detection
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
8
Robust Detection of the Colors – RGB space
 A color
image in a computer is composed of a matrix of
pixels triplets (Red,Green,Blue) that define the color of the pixel.
 We want to label a given pixel color as belonging to one of the color
patches we used to mark the hand, or as not belonging to any class.
 A straightforward way to detect whether a given target color (R’,G’,B’)
matches the pixel color (R,G,B) is to look at the squared distance
(R-R’)2 + (G-G’)2 + (B-B’) 2
with a threshold to do the classification.
This does not work well, because the shading and different lighting
conditions effect R,G,B values a lot and a our simple nearest neighbor
method fails. For example an orange patch under shadow is very close to
red in RGB space.
 But we can do better:
Train a neural network that can do the labeling for us
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
9
Robust Detection of Colors – the Color Expert
Create a training set using a test image by manually
picking colors from the image and specifying their labels.
Create a NN – in our case a one hidden layer feed-forward
network - that will accept the R,G,B values as input and put out
the marker label, or 0 for a non-marker color.
Make sure that the network is not too “powerful” so that it does not
memorize the training set (as distinct from generalization)
Train it then Use it: When given a pixel to classify, apply the RGB
values of the pixel to the trained network and use the output as
the marker that the pixel belongs to.
One then needs a segmentation system to aggregate the pixels
into a patch with a single color label.
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
10
Color Segmentation and Feature Extraction
Preprocessi
ng
Color Expert
(Network weights)
Training phase: A color expert is generated by training a feed-forward network to
approximate human perception of color.
Features
NN augmented
segmentation system
Actual processing: The hand image is fed to an augmented segmentation system.
The color decision during segmentation is done by the consulting color expert.
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
11
STS hand shape recognition
Color Coded Hand
Feature Extraction
Step 1 of hand shape
recognition: Process the
color-coded hand image
and generates a set of
feature: position of markers
relative to the wrist
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
12
Hand Model
A realistic
drawing of hand
bones. The hand
is modelled with
14 degrees of
freedom as
illustrated.
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
13
STS hand shape recognition2:
3D Hand Model Matching
Feature Vector
Step 2: The feature
vector is used to fit
a 3D-kinematics
model of the hand
by the model
matching module.
The resulting hand
configuration is sent
to the classification
module.
Error minimization
Result of feature
extraction
Grasp Type
Classification
The model matching algorithm minimizes the
error between the extracted features and the
model hand.
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
14
Reach & Grasp generation
Object recognition
IT
Object
affordance
extraction
(FARS-AIP)
Object features
cIPS
Hand
shape
recognition
& Hand
motion
detection
STS
AIP
Object affordance
-hand state
association
7b
Motor
program
(FARS-F5)
Integrate
temporal
association
F5canonical
Action
Mirror
Feedback recognition
Hand-Object
spatial relation
analysis
7a
(Mirror
Neurons)
F5mirror
Motor
program
(FARS-F4)
Motor
execution
M1
F4
Object
location
(FARS-VIP)
MIP/LIP/VIP
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
15
Virtual Hand/Arm and Reach/Grasp Simulator
A precision pinch
A power grasp
and
a side grasp
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
16
Kinematics model of arm and hand
19 DOF freedom: Shoulder(3), Elbow(1), Wrist(3),
Fingers(4*2), Thumb (3)
Implementation Requirements
 Rendering: Given the 3D positions of links’ start and end
points, generate a 2D representation of the arm/hand (easy)
 Forward Kinematics: Given the 19 angles of the joints
compute the position of each link (easy)
 Inverse Kinematics: Given a desired position in space for a
particular link what are the joint angles to achieve the desired
position (semi-hard)
 Reach & Grasp execution: Harder than simple inverse
kinematics since there are more constraints to be satisfied (e.g.
multiple target positions to be achived at the same time)

Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
17
A 2D, 3DOF arm example
c
C
b
B
a
A
P(x,y)
Forward kinematics: given joint angles
A,B,C compute the end effector
position P:
X = a*cos(A) + b*cos(B) + c*cos(C)
Y = a*sin(A) + b*sin(B) + c*sin(C)
Radius=c
Inverse kinematics: given joint
P(x,y) position P there are infinitely
many joint angle triplets to
achieve
b
b
b
Radius of the circles are a and c and the segments
connecting the circles are all equal length of b
Radius=
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
18
A Simple Inverse Kinematics Solution
Consider just the arm.
The forward kinematics of the arm can be represented as
a vector function that maps joint angles of the arm to the wrist
position.
(x,y,z)=F(s1,s2,s3,e) , where s1,s2,s3 are the shoulder angles and
e is the elbow angle.
We can formulate the inverse kinematics problem as an
optimization problem: Given the desired P’ = (x’,y’,z’) to be
achieved we can introduce the error function
J = || (P’-F(s1,s2,s3,e)) ||
Then we can compute the gradient with respect to s1,s2,s3,e and
follow the minus gradient to reach the minimum of J.
This method is called to Jacobian Transpose method as the partial
derivatives of F encountered in the above process can be arranged into
the transpose of a special derivative matrix called the Jacobian (of F).

Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
19
Power grasp time series data
+: aperture; *: angle 1; x: angle 2; : 1-axisdisp1; :1axisdisp2; : speed; : distance.
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
20
Curve recognition
The general problem: associate N-dimensional space curves with object
affordances
A special case: The recognition of two (or three) dimensional trajectories in physical space
Simplest solution: Map temporal information into spatial
domain. Then apply known pattern recognition techniques.
Problem with simplest solution: The speed of the moving
point can be a problem! The spatial representation may
change drastically with the speed
Scaling can overcome the problem. However the scaling
must be such that it preserves the generalization ability of
the pattern recognition engine.
Solution: Fit a cubic spline to the sampled values. Then normalize and resample from the spline curve.
Result:Very good generalization. Better performance than using the Fourier
coefficients to recognize curves.
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
21
Curve recognition
Curve recognition system demonstrated for hand drawn numeral
recognition (successful recognition examples for 2, 8 and 3).
Spatial resolution: 30
Network input size: 30
Hidden layer size: 15
Output size: 5
Training : Back-propagation with
momentum.and adaptive learning rate
Sampled points
Point used for spline interpolation
Fitted spline
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
22
Core Mirror Circuit
Object recognition
IT
Object
affordance
extraction
(FARS-AIP)
Object features
cIPS
Hand
shape
recognition
& Hand
motion
detection
STS
AIP
Object affordance
-hand state
association
7b
Motor
program
(FARS-F5)
Integrate
temporal
association
F5canonical
Action
Mirror
Feedback recognition
Hand-Object
spatial relation
analysis
7a
(Mirror
Neurons)
F5mirror
Motor
program
(FARS-F4)
Motor
execution
M1
F4
Object
location
(FARS-VIP)
MIP/LIP/VIP
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
23
Core Mirror Circuit
Object affordance
Association
Neurons
Mirror Neurons (F5mirror)
Mirror Neuron Output
Hand state
Motor Program
Mirror Feedback
What is to be learned?:
 Connections from hand-state and object-affordance are adapted so that association
neurons will respond when hand-state is congruent with object
 Connections from association neurons are adapted so that their integrated activity
will activate mirror neurons for the appropriate grasp
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
25
Connectivity pattern
Object affordance (AIP)
STS
F5mirror
7b
Motor Program
(F5canonical)
Mirror Feedback
7a
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
26
A single grasp trajectory viewed from three
different angles
The wrist trajectory during
the grasp is shown by
square traces, with the
distance between any two
consecutive trace marks
traveled in equal time
intervals.
How the network classifies
the action as a power grasp.
Empty squares: power grasp
output; filled squares:
precision grasp; crosses:
side grasp output
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
27
Power and precision grasp resolution
(a)
(b)
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
28
“Spatial Perturbation” Experiment with trained
core mirror circuit
Figure A. A regular precision
grasp (the hand spatially
coincides with the target).
A
C
B
Figure B. The response of the
network as precision grasp.
D
Figure C. The target object is
displaced to create a ‘fake’
grasp.
Figure D. The response of the
network to action in Figure C.
The activity of the precision
mirror neuron is reduced. In the
graphs the x axes represent the
normalized time (0 for start of
grasp, 1 for the contact with
object) and y axes represent the
cell firing rate.
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
29
“Kinematics Alteration” Experiment with the
trained core mirror circuit
A
B
Normalized
speed
Figure B. The velocity
profile is (almost) linear.
Normalized
time
Firing rate
Figure A. A regular
precision grasp (the wrist
has a bell shaped velocity
profile).
Figure C. Classification of
the action in Figure A as
precision grasp.
Firing rate
Figure D. The activity
vanished during the
observation of action
D
C
Normalized
time
E
D
Normalized
time
Note that the scales of the
graphs C and D are
different.
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
30
Process
Grasp
Execution
Temporal to Spatial
Conversion
Action Recognition/
Learning
Engine
By Grasp
Simulator
Inverse Kinematics
Preprocessing
By Core Mirror
Circuit
Curve fitting,time
and space scaling
Back-prop Net
Strategy
Monkey
Model
Functioning
Correspondence of MNS Model with Monkey Mirror Neuron System
Monkey A/B
executes grasp
Monkey A: visual
system processes
the visual stimuli
generated
Monkey A: F5 mirror
neurons respond to
their preferred stimuli
Experimental Challenges
What are “poor” mirror neurons coding?
- temporal recognition codes
- transient response to actions which are not exactly the
preferred stimuli
How can we relate different cells’ responses to each
other?
- Fix the condition and record from as many as possible
cells with the exactly the same condition.
Is it possible to record from mirror cells in different age
groups of monkeys ( i.e. infant to adult)?
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
32
Modeling Challenges
How can MNS be plugged into a learning-by-imitation
system with faith to biological constrains (BG,
Cerebellum, SMA, PFC etc..)
How does the brain handle temporal data? Transform the
learning network into a one which can work directly on
temporal data. Eliminate the preprocessing required
before the input can be applied to MNS core circuit.
Extend the action to be recognized beyond simple grasps.
Model the complementary circuit, learning to grasp by
trial and error.
And a lot more!
Michael Arbib CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 23: MNS Model 2
33