Active vision system for embodied intelligence
Download
Report
Transcript Active vision system for embodied intelligence
Active vision system for embodied intelligence based on retina sampling model and
hierarchical representation
Janusz A. Starzyk, Xinming Yu
Ohio University, Athens, OH
Building up memories from environment
INTRODUCTION
Fig. 1: Retina structure
Retina structure is fundamental to human vision system,
which is much more efficient than any of the current
robotic vision systems.
● Photoreceptors (CONE, ROD) are concentrated
around fovea, for the highest resolution on target
● Retina processes the sampled scene using ganglion
cells, and sends activation through optical nerve and
LGN to the primary visual cortex (V1)
● The neurons in V1 fire in groups responding to
different visual features from the retina
The retina sampling model uses prespecified
sampling density.
Fig. 2 shows the distribution density curves
of the cones inside retina.
● Correlation based sparse connections
are used to mimic the neuron connections
in V1
● Neurons which are locally correlated
connect to the same group of neurons in the
higher layer
● Winners and theirs neighbors fire and
have weight adjusted together, for smooth
processing and increased robustness.
● Using the retina sampling, the vision system receives more useful
information.
● Table 1 shows comparison of retina sampling model (human vision)
and uniform sampling (computer vision).
● The resolution (density of the sampling points) in the center part of the
retina sampling is much higher than that of uniform sampling.
Part inside blue
Percentage of sampling points
Retina Sampling
Uniform Sampling
(Human Vision)
(Computer Vision)
31%
4%
Part inside black
52%
14%
Part inside red
63%
25%
Part inside green
78%
50%
Whole range
100%
100%
Table 1: Comparison (Human V.S. Computer vision)
Fig. 2: Cone densities
When this artificial retina
sampling is applied to a visual
scene, the vision system will
receive much more data from
object in focus, and still have a
peripheral vision
in human retina [1]
Fig. 5: An example of retina sampling
Original resolution: 900x900, resolution after sampling 60x60
In active vision system, we apply a connection mechanism based on correlation
between input neurons’ activation and the activation of local winners.
● Correlation between input neurons’ activation
Perceive
◘ Use real image data instead of noise
◘ Use images organized in time sequences to obtain feedback connections
for invariance building
◘ Process the input data from layer N-1 and calculate the correlations
◘ For each neuron, find out the best correlated set of neurons, and create
connections to those neurons
● Local winners are used to
adjust the connection weights
◘ the local winners are
activated (e.g. the green
one in layer N)
◘ The weights of connections
to neighbors of local winner
Fig. 8: The excitation of local winner
are adjusted
and its neighbors
◘ The local winners help the
neighbors to fire together (horizontal red arrows are excitatory)
◘ All groups of winner sets in layer N (local winners and their neighbors)
used to activate layer N+1
◘ Use Oja’s learning rule [4] to adjust the weights of connections to the winners
Retina sampling: Model of data Collection
EI Architecture
Pain
or
Goal
Creation
Competing
goals
Act
Planning
INPUT
Hierarchical representations
learning is based on external
reinforcement for primitive goals
and internal goal creation system
for abstract goals and internal
rewards
OUTPUT
Task
Environment
Fig. 11: The pathways through which
Simulation or
Real-World System
the system is built up from interactions
with the unspecified environment
In learning, it is not easy to obtain
examples of desired behavior that are
both correct and representative of all the
situations in which the agent has to act.
Reinforcement learning (RL) is a good
choice for learning in unspecified
external environment.
E
S
I
A
M
R
As shown in Fig. 12, the agent (A)
receives data, which includes input (I) Fig. 12: The reinforcement learning model
and reward (R) from the environment (E), and takes proper action (M)
back to the environment. With the aid of the reward, the agent learns how
to take correct action to have the maximum reward.
Goal Creation system provides a mechanism that organizes learning of
intentional representations and associations between sensory and motor
pathways. When an agent realizes that a specific action resulted in a desirable
effect related to the current goal, it stores a representation of the perceived
object involved in such action and learns associations between the sensory
and motor pathways.
Correlation-based Connection
The retina, unlike a camera, does not
simply send a picture to the brain. The
retina spatially encodes (compresses)
the image to fit the limited capacity of
the optic nerve.
● In primary visual cortex (V1), neurons are activated by the stimuli from similar
groups of inputs.
● The connections built based on the correlation of the input reflect observed
relations in the real world. Fig. 6 shows the correlations based on real images.
● The photoreceptors are not evenly
distributed inside retina. Most of them
are concentrated on or around fovea
●1D probability distribution curve is
shown in Fig. 3.
Fig. 3: PDF of the
photoreceptors (cones and rods) [2]
● Cortex receives distorted images,
which are sharper in the fovea area.
● Fovea is the reference point of gaze
shifting, and focuses on the most
interesting part of the scene.
Fig. 4 shows the sampling points for the
retina model, with higher density in the
center than on the periphery
Fig. 4: Sampling points for retina model
Fig. 6: Correlation of the input data
Fig.7: Correlation based connections with
remote but correlated area
● Linsker obtained useful features in visual field with a fixed connectivity
model and noise input for self-organizing training. [3]
The disadvantage of his model is that the fixed connectivity model
◘ May not deliver connections to remote but correlated areas of the
visual field. Fig. 7 shows the existence of the remote but correlated area
◘ May not result in useful features on higher levels
◘ Local connectivity region is set arbitrarily
Procedure of the weight adjustment:
Activate layer N-1 find the strongest winners in layer N excite the neighbors
as co-winners adjust weights for all activated activate layer N+1
An active servo system shown in Fig. 9 is being built with real-time video input,
to demonstrate the active vision system for embodied intelligence.
Both the retina sampling model and the correlation based
connections are used to work with the servo system.
◘ The webcam is used to capture the visual data,
◘ The raw data is uniformly distributed
(320x240 pixels), it will be processed first by
retina model, compressed to 40x30 with little
data loss in the center.
◘ With the compressed data and the correlation
based sparse connection, the active vision
system processes the real-time input, finds
Fig. 9: Servo system
the interesting object and generates the
object coordinates.
◘ The servo system receives the
real-time coordinates and follows
the object with laser pointer.
Fig. 10: Servo system is working with active
vision system to follow the object in view
CONCLUSIONS
An active vision system for embodied intelligence based on retina sampling
model and hierarchical representation is developed.
The retina sampling model mimics efficiency of human vision system.
A hierarchical representation is built up with sparse connections, which are
locally generated from the neurons’ activity correlation.
Using the goal creation system learning scheme, the active vision system can
learn complex knowledge.
Goals evolve from the simple ones through interaction with environment.
Such organization of the learning process is conductive to creation of a
general intelligence, with self-organizing structure and dynamic goals.
BIBLIOGRAPHY
[1] Curcio, C.A., Sloan, K.R. Jr, Packer, O., Hendrickson, A.E. & Kalina, R.E. (1987). Distribution
of cones in human and monkey retina: individual variability and radial asymmetry. Science 236,
pp. 579-582.
[2] Riedel G., Physiology of Human Cells, Available:
http://www.aberdeen.ac.uk/sms/ugradteaching/course.php?ID=10
[3] Linsker R., “From Basic Network Principles to Neural Architecture: Emergence of
Spatial-Opponent Cells”, Proc. National Academy of Sciences, Vol. 83. pp. 7508-7512, 1986.
[4] Oja E., “Simplified neuron model as a principal component analyzer”. Journal of Mathematical
Biology 15 (3): pp. 267-273, 1982.