part1 - University of Arizona
Download
Report
Transcript part1 - University of Arizona
Evolvable Hardware
John Mixter
Overview
Motivation
Artificial Neural Networks
Genetic Algorithms
Evolvable Hardware
Neurograph Networks
2
Motivation
The looming threat of Moore’s Law
Previous barriers have been technology based.
The barrier we are approaching now is physics based.
The unforgiving nature of Amdahl’s Law
Only a portion of an application can be made parallel.
We are not very good at thinking (programming) in parallel.
We have been following the von Neumann Model since the late 40’s
For the most part our progress has been evolutionary (pipelines,
caches, etc).
We need to explore new revolutionary ideas.
3
Artificial Neurons
An artificial neuron mimics the basic
function of a biological neuron.
The perceptron was one of the first models
of a neuron. Frank Rosenblatt came up with
the idea in 1957.
An perceptron generates an output signal
when the sum of its (inputs × weight) is
greater than a threshold value.
A perceptron is trained by adjusting the
𝑁
y=
𝑥𝑖 𝑤𝑖
𝑖=0
1 𝑖𝑓 𝑦 ≥ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
input weights.
4
Artificial Neural Networks
Perceptrons are connected together in
parallel to form artificial neural
networks (ANNs).
ANNs are arranged in layers: Input,
Hidden and Output.
ANNs are trained by providing an input
and testing the output. If the output is
incorrect, new weight values are
calculated by an error function and the
training repeats.
The training is indirect, ANNs cannot
be programmed by hand.
5
ANNs are Awesome
Artificial Neural Networks are massively parallel.
One perceptron is useless, connect a bunch together and they become
powerful.
They are asynchronous, no clock needed.
They are very good at prediction and pattern recognition.
6
Example: Branch Prediction
The branch address is used to select from a
table of perceptrons.
The history shift register is presented as inputs
to the perceptron, up to 128 inputs.
The prediction is calculated.
After the branch direction is determined, it is
compared against the predicted direction.
If the actual direction taken does not agree with
the prediction, the perceptron is trained.
Error = Desired Output – Actual
Correction = Learning Rate × Error
w0 += ( x0 × Correction )
Based on work done by Daniel A. Jim´enez and Calvin Lin
University of Texas at Austin
7
Did it work?
Integer Benchmark Averages
1
1
0.98
0.98
0.96
0.96
0.94
0.94
0.92
GAg
0.9
GAp
GShare
0.88
PAg
PAp
0.86
Direction Accuracy
Direction Accuracy
Floating Point Benchmark Averages
0.92
GAg
0.9
GAp
GShare
0.88
PAg
PAp
0.86
Neural
Neural
0.84
0.84
0.82
0.82
0.8
0.8
Hardware Costs
Hardware Costs
SimpleScalar running Spec2000 Benchmarks
8
ANNs and FPGAs
ANNs do not perform well in software.
WHY?
9
ANNs and FPGAs
ANNs do not perform well in software.
They need to run and play in parallel.
FPGAs seem like a good platform, lots of gates and reconfigurable.
Researchers have tried and failed to implement ANNs on FPGAs,
routing is a problem.
You need many perceptrons to make a good predictive network.
Larger FPGAs offer hope.
10
Genetic Algorithms
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
by Scott Hauck and André DeHon
Morgan Kaufmann Publishers © 2008
ISBN:9780123705228
11
An Example
• Generate 100 viable chromosomes and add them to the
gene pool.
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
• Decode the chromosomes (build equations).
• Evaluate the chromosomes:
1
• 𝐹𝑖𝑡𝑛𝑒𝑠𝑠 = 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝐴𝑛𝑠𝑤𝑒𝑟 − 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 𝐴𝑛𝑠𝑤𝑒𝑟
• Select fittest chromosomes.
• Randomly mutate chromosomes.
• Randomly split/select genes for mating.
0
1
2
3
4
5
6
7
8
9
+
*
/
0110
1010
0101
1100
0100
1101
0010
1010
0001
6
+
5
*
4
/
2
+
1
• Add new generation to gene pool.
http://www.ai-junkie.com/ga/intro/gat1.html
12
Does it work?
It finds a solution quickly.
But, it may not work for other answer sets, so it needs to keep evolving.
Notice the chromosome – 010010110100100101110101110…
Looks a lot like an FPGA bitstream.
13
Evolvable Hardware
Evolvable hardware physically changes to solve a given problem.
It does this by dynamically reconfiguring its connections and functions.
FPGAs are an excellent platform for evolvable hardware research.
Bitstreams for partial reconfiguration are chromosomes in the gene pool.
An FPGA is initially configured as an 2D array of processing elements that
have a fixed number of predefined functions.
The generated chromosomes are fed directly into the FPGA as a
reconfigure frame that determines the function and connections of each
processor element (PE).
After the reconfiguration has taken place, the array is evaluated and the
fittest chromosomes are selected, mutated and put into the gene pool.
14
FPGA based Evolvable Hardware
15
A Case Study
In the paper “Towards
Evolvable Systems Based on
the Xilinx Zynq Platform” a
case study was performed that
demonstrated an evolving
image filter using a Zynq-7000.
• The filter consists of:
•
•
•
9 Processing Elements (PEs)
9 inputs, one for the pixel being
filtered and 8 for its neighbors
1 output, the filtered pixel
• A PE input can be connected to a filter input or to a PE output in the direction
of the filter inputs.
16
Chromosome Format
The PE gene is encoded as follows:
Input A
0000
Input B
0000
Operation
0000
The chromosome length would be
(3 numbers × 3 ×3) + 1 =28 numbers
Each number is 4 bits, total would
be 4 × 28 = 112 bits
Code
Operation
Description
0000
255
constant
0001
x
identity
0010
255 − x
inversion
0011
x∨y
bitwise OR
0100
x∨y
bitwise x OR y
0101
x∧y
bitwise AND
0110
x∧y
bitwise NAND
0111
x⊕y
bitwise XOR
1000
x >> 1
right shift by 1
1001
x >> 2
right shift by 2
1010
swap (x, y)
swap nibbles
1011
x+y
addition
1100
x+sy
addition with saturation
1101
(x + y) >> 1
average
1110
max (x, y)
maximum
1111
min (x, y)
minimum
17
Filter Input
To evolve the filter, the
inputs are the target pixel
(141) and its eight
neighbors.
18
Filter Evolution
The chromosome pool is filled.
The chromosomes are injected to configure the PEs.
The filter is given an input.
19
Filter Evolution
The output is calculated by the filter.
The results are evaluated by
𝑐−1 𝑟−1
𝑝 𝑖, 𝑗 − 𝑝𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑖, 𝑗
fitness =
𝑖=0 𝑗=0
The most fit chromosomes are selected to create the next generation
In this example, the fittest chromosomes are the ones with the smallest
fitness values, zero being perfect.
The process continues until a suitable filter is created.
20
Paper Results
The experiments were focused on the time needed to evaluate a given
number of generations and did not address how well the filter worked.
Mutations
PS
i5
VRC
DPR
DPR
DPR
DPR
DPR
DPR
DPR
DPR
7
7
7
1
2
3
4
5
6
7
8
Individual (µs)
Generation (µs)
Generations (s-1)
Build/Eval Time
Build/Eval Time
Build/Eval per second
225,285.3
42,372.9
469.3
206.2
247.2
288.2
329.2
370.2
411.2
452.2
493.2
901,141.1
169,491.5
1877.2
824.8
988.8
1152.8
1316.8
1480.8
1644.8
1808.8
1972.8
1.1
5.9
532.7
1212.4
1011.3
867.5
759.4
675.3
608
552.9
506.9
Acceleration
1
5
484
1102
919
789
690
614
553
503
461
PS - Pure software running on the Zynq-7000 on chip processor (~3W)
i5 - Pure software running on a i5 @ 3.33GHz (~80W)
VRC - Virtual Reconfigurable Circuits, a workaround for the absence of partial reconfiguration in
early Virtex chips
DPR - Dynamic Partial Reconfiguration
21
Neurograph Networks
A neurograph network is a hybrid of an ANN and Evolvable Hardware.
They are structured as a high level ANN and are trained in a similar
fashion as a perceptron.
The network structure and connections are evolved using hardware
evolution techniques described earlier.
Small specialized networks are evolved and stored to be used by them-
selves or combined into larger, more powerful networks.
The goal is to create super massive parallel networks to predict,
recognize and solve problems.
22
Creating a Neurograph
23
5 x 5 Matrix Determinate
Determinate = a(g(m(sy - tx)-n(ry - tw)+o(rx - sw)) - h(l(sy - tx) n(qy - tv) + o(qx - sv)) + i(l(ry - tw) - m(qy - tv) + o(qw - rv)) j(l(rx - sw) - m(qx - sv) + n(qw - rv))) - b(f(m(rx - sw) - n(qx - sv) +
o(qw - rv)) - h(k(sy - tx) - n(py - tu) + o(px - su)) + i(k(ry - tw) m(py - tu) + o(pw - ru)) - j(k(rx - sw) - m(px - su) + n(pw - ru))) +
c(f(l(sy - tx) - n(qy - tv) + o(qx - sv)) - g(k(sy - tx) - n(py - tu) +
o(px - su)) + i(k(qy - tv) - l(py - tu) + o(pv - qu)) - j(k(qx - sv) - l(px
- su) + n(pv - qu))) - d(f(l(ry - tw) - m(qy - tv) + o(qw - rv)) - g(k(ry
- tw) - m(py - tu) + o(pw - ru)) + h(k(qy - tv) - l(py - tu) + o(pv qu)) - j(k(qw - rv) - l(pw - ru) + m(pv - qu))) + e(f(l(rx - sw) - m(qx
- sv) + n(qw - rv)) - g(k(rx - sw) - m(px - su) + n(pw - ru)) + h(k(qx
- sv) - l(px - su) + n(pv - qu)) - i(k(qw - rv) - l(pw - ru) + m(pv qu)))
24
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
q
p
r
t
s
u
v
w
x
y
ne0
ne1
ne3
ne4
ne6
ne7
ne14
ne15
ne17
ne18
ne25
ne26
ne50
ne51
ne53
ne54
ne61
ne62
ne81
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
ne82
×
ne2
ne5
ne8
ne16
ne19
ne27
ne52
ne55
ne63
ne83
-
-
-
-
-
-
-
-
-
-
ne9
ne10
ne11
ne33
ne34
ne35
ne45
ne46
ne20
ne21
ne22
ne56
ne57
ne58
ne28
ne29
ne30
ne64
ne65
ne66
ne84
ne85
ne86
ne69
ne70
ne71
ne89
ne90
ne91
ne101
ne102
ne103
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
..
ne12
ne36
ne48
ne23
ne59
ne31
ne67
ne87
ne72
ne92
ne104
-
-
-
-
-
-
-
-
-
-
-
ne13
ne37
ne49
ne24
ne60
ne32
ne68
ne88
ne73
ne93
ne105
+
+
+
+
+
+
+
+
+
+
+
ne38
ne39
ne40
ne41
ne74
ne75
ne76
ne77
ne94
ne95
ne96
ne97
ne106
ne107
ne108
ne109
ne113
ne114
ne115
ne116
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
ne42
ne43
ne78
ne79
ne98
ne99
ne110
ne111
ne117
ne118
-
-
-
-
-
-
-
-
-
-
ne44
ne80
ne100
ne112
ne119
+
+
+
+
+
ne120
ne121
ne122
ne123
ne124
×
×
×
×
×
ne125
ne126
-
-
ne127
+
ne127
+
Output
25
Project Goal
To implement key neurograph functions in software.
To determine the feasibility of implementing a
neurograph network on an FPGA.
26