Cellular Neural Network Simulation and Modeling

Download Report

Transcript Cellular Neural Network Simulation and Modeling

Cellular Neural Network
Simulation and Modeling
Oroszi Balázs
2006.01.06.
Overview




Introduction: About the CNN in general
Basic characteristics of the CNN
Modeling and simulation of the CNN architecture
The functional model of the CNN architecture
– Handling special cases to increase performance



From theory to practice: Realization of the CNN
simulator
Summary
Demonstration
About the CNN in general
In 1988 papers from Leon O. Chua
introduced the concept of the Cellular Neural
Network. CNNs can be defined as “2D or
3D arrays of mainly locally connected
nonlinear dynamical systems called cells,
whose dynamics are functionally determined
by a small set of parameters which control
the cell interconnection strength” (Chua).
These parameters determine the connection
pattern, and are collected into the so-called
cloning templates, which, once determined,
define the processing of the whole structure.
Basic Characteristics of the CNN



The CNN can be defined as an M x N type array of identical cells
arranged in a rectangular grid. Each cell is locally connected to its 8
nearest surrounding neighbors.
Each cell is characterized by uij, yij and xij being the input, the output
and the state variable of the cell respectively.
The output is related to the state by the nonlinear equation:
yij = f(xij) = 0.5 (| xij + 1| – |xij – 1|)
Basic Characteristics of the CNN (2)

The state transition of neuron (i, j) is governed by the following
differential equation:
xi , j   xi , j 


 A(i, j, k , l ) y
C ( k ,l )S r ( i , j )
k ,l
(t ) 
 B(i, j, k , l )u
C ( k ,l )S r ( i , j )
k ,l
(t )  zi , j
Where C(i,j) represents the neuron at column i, row j, Sr(i,j) represents
the neurons in the radius r of the neuron C(i,j), and zi,j is the threshold
(bias) of the cell C(i,j).
The coefficients A(i, j, k, l) and B(i, j, k, l) are known as the cloning
templates. In general, they are nonlinear, time- and space variant
operators.
If they are considered linear, time- and space invariant, they can
simply be represented by matrices.
Modeling and simulation of the CNN architecture




Simulation plays an important role in the design of the CNN cloning
templates.
Therefore, it has to be fast enough to allow the design phase of various
templates be accomplished in reasonable time.
At the same time, the simulation has to be accurate enough, to reflect
the behavior of the analog circuitry correctly.
In practice, the simulation of the CNN involves a trade-off between
accuracy and computation time.
Modeling and simulation of the CNN architecture (2)




The true processing capabilities of CNNs for high-speed parallel
processing are only fully exploited by dedicated VLSI hardware
realizations.
Typical CNN chips may contain up to 200 transistors per pixel.
At the same time, industrial applications require large enough grid
sizes (around 100 x 100).
Thus, CNN chip designers must confront complexity levels larger than
106 transistors, most of them operating in analogue mode.
Modeling and simulation of the CNN architecture (3)




On the one hand, high-level simulation, which is focused on emulating
the functional behaviour, cannot reflect realistically the underlying
electronic circuitry. Their lack of detail makes them ill-suited for
reliable IC simulation.
On the other hand, the SPICE-type transistor-level simulators,
although very accurate, are barely capable of handling more than about
105 transistors and may take several days of CPU time for circuit
netlists containing about 106 transistors. Hence, these low-level tools
are ill-suited for simulating large CNN chips.
Therefore, it would be necessary to bridge the gap between these
approaches, which would give very accurate results in reasonable (but
not real-) time.
However, our main concern now is fast simulation, so in the rest of this
presentation we shall focus on the functional modeling of the CNN
architecture.
The functional model of the CNN architecture

The output of a CNN model simulation is the final state reached by the
network after evolving from an initial state under the influence of a
specific input and boundary conditions. The following block diagram
shows the state-transition and output of a single cell:
The functional model of the CNN architecture (2)

In the most general case, the final state of one cell can be described by
the following equation:
t
t
t0
t0
x(t )  x(t0 )   x ( )d  x(t0 )   f ( x( )) d


As a closed form for the solution of the above equation cannot be
given, it must be integrated numerically.
For the simulation of such equations on a digital computer, they must
be mapped into a discrete-time system that
– emulates the continuous-time behavior,
– has similar dynamics
– and converges to the same final state.

The error committed by this emulation depends on the choice of the
method of integration, i. e. the way in which the integral is calculated.
The functional model of the CNN architecture (3)
There is a wide variety of integration algorithms that can be used to
perform this task. However, only three of them are going to be considered
here. These methods are:

the explicit Euler’s formula:
t n1
 f ( x(t ))dt  t  f ( x(t ))
n
tn

the predictor-corrector algorithm:
t n1

f ( x(t )) dt 
tn

t
( 0)
 [ f ( x(tn ))  f ( x(tn 1 ) ( 0) )] where x(tn 1 )  x(tn )  t  f ( x(tn ))
2
and the fourth-order Runge-Kutta method:
t n1

tn
1
f ( x(t )) dt  (k1  2k 2  2k3  k 4 )
6
where
k1  t  f ( x(t n ))
1
k 2  t  f ( x(t n )  k1 )
2
1
k3  t  f ( x(t n )  k 2 )
2
1
k 4  t  f ( x(t n )  k3 )
2
The functional model of the CNN architecture (4)



The Euler method is the fastest, but gives the least accurate
convergence behaviour.
Runge-Kutta gives the best results, however, much slower. In this
case, four auxiliary components (k1-k4) are computed. These are
auxiliary values, which are then averaged. This makes it rather illsuited for applications, that prefer speed over accuracy.
If the main goal would be accuracy and robustness, undoubtedly
Runge-Kutta would be the method of choice. In our case, however, as
the primary target is a fast, working implementation of a CNN
simulator as an image processor, we shall choose the Euler method.
Handling special cases for increasing performamce

What can be considered a special case from a programming point of
view?
– special input
– special templates (A, B)



To gain significant speed improvements, the case of special templates
should be examined.
It is not uncommon within templates that extract local properties of
the image (like edge detectors) to use a fully zero A template.
I have discovered, that revisiting the state equation when the A
template is fully zero, significant improvements in speed can be
achieved.
Handling special cases for increasing performamce (2)

Given A = 0, the state equation takes the following form:
X   X  BU  Z

(BU + Z) is constant during the process. Let: BU + Z = C
Using Euler integration:

X 1  X 0  t  X 0  X 0  t ( X 0  C )  X 0 (1  t )  C  t
X 2  X1 (1  t )  C  t  [ X 0 (1  t )  C  t ](1  t )  C  t  X 0 (1  t ) 2  C  t (1  t )  C  t
X 3  X 2 (1  t )  C  t  [ X 0 (1  t ) 2  C  t (1  t )  C  t ](1  t )  C  t 
 X 0 (1  t )3  C  t (1  t ) 2  C  t (1  t )  C  t
X 4  X 3 (1  t )  C  t  [ X 0 (1  t )3  C  t (1  t ) 2  C  t (1  t )  C  t ](1  t )  C  t 
 X 0 (1  t ) 4  C  t (1  t )3  C  t (1  t ) 2  C  t (1  t )  C  t
...

The pattern can clearly be seen by now.
Handling special cases for increasing performamce (3)
X 4  X 0 (1  t ) 4  C  t (1  t )3  C  t (1  t ) 2  C  t (1  t )  C  t

In each new step X0(1-Δt)n gets multiplied by (1-Δt) so it’s power index
increases. The remaining part is a geometric series, so the general
equation of calculating the n-th state is:
n 1
X n  X 0 (1  t )   C t (1  t ) k
n
k 0

Using the general formula of calculating the sum of a geometric series:
qn 1
S n  a1
q 1
the sum of the above geometric series turns into:
(1  t ) n  1
(1  t ) n  1
C t (1  t )  C  t
 C  t
 C  [(1  t ) n  1]  C  [1  (1  t ) n ]

(1  t )  1
 t
k 0
n 1
k

So the state equation using this will be:
X n  X 0 (1  t ) n  C  [1  (1  t ) n ]  X 0 (1  t ) n  C  C (1  t ) n  ( X 0  C )(1  t ) n  C
X n  ( X 0  BU  Z )(1  t ) n  BU  Z
Handling special cases for increasing performance (4)
X n  ( X 0  BU  Z )(1  t ) n  BU  Z


This result is of utmost importance regarding speed, because:
– the number of iterations that need to be performed to get to the nth
state is reduced to 1
– thus we can get to the final state immediately, given the U input,
the B template and Z bias
As multiple iterations through an image causes lots of non-cacheable
memory accesses (which is very slow), this improvement in the special
case of A = 0 gives a huge boost in speed.
From theory to practice:
Realization of the CNN simulator

Environment used: Avisynth (http://www.avisynth.org)
– A powerful tool for video post-production.
– Special programming language, designed specifically for video
processing.
– It’s functions are implemented under-the-hood as C/C++ dynamic link
libraries (DLLs), which are called plugins.
– Plugins expose an interface (functions) towards the scripting language,
from which these functions can be called.
– The CNN simulator is realised as a plugin (DLL) for Avisynth, written in
C++.

Primary goal: speed
– but also make sure it behaves according to the state-equation
Summary

CNN simulation:
– functional modeling (mathematical calculation according
to the state-equation)
– circuit-level modeling

Implementation:
–
–
–
–
based on functional model
using Avisynth (http://www.avisynth.org)
written in C++ programming language
available in my web-space at
http://digitus.itk.ppke.hu/~oroba