Thousands-to-millions_PaulFox

Download Report

Transcript Thousands-to-millions_PaulFox

From tens to millions of neurons
How computer architecture can help
Paul Fox
Computer Architecture Group
What hinders the scaling of neural computation?
Neural Computation =
Communication
+ Data Structures
+ Algorithms
But almost everybody ignores the first two!
What is Computer Architecture?
• Designing computer systems that are appropriate for their intended use
• Relevant design points for neural computation are:
• Memory hierarchy
• Type and number of processors
• Communication infrastructure
Just the things that existing approaches
don’t consider!
Our approach
Bluehive system
• Vast communication and memory
resources
• Reprogrammable hardware using
FPGAS
Can explore different system
designs and see what is most
appropriate
for
neural
computation
Organisation of data for spiking neural networks
Equat ion
Param et ers
V
U
A
B
C
D
Point er
Fan-out
Tuples
Delay
Point er
Updat e
Tuples
Neuron ID
Weight
First approach – Custom FPGA pipeline
Rout er Int erface
Delay
Spike
Audit or
Equat ion
Accumulator
Fanout
Spike
Inject or
Off-Chip Mem ory Int erface
Running 256k Neurons
First approach – Custom FPGA pipeline
• Real-time performance for at least 256k neurons over 4 boards
• Saturates memory bandwidth
• Plenty of FPGA area left, so could use a more complex neuron model
• But only if it doesn’t need more data
• But time consuming and not really usable by non computer scientists
Can we use more area to make something that is
easier to program but still attains performance
approaching the custom pipeline?
Single scalar processor
Data bus = any width
Data bus = 256 bits
DDR2 RAM
(from 200MHz FPGA)
Block
RAM
…
Block
RAM
One 32-bit transfer at a time
Processor
Block
RAM
Multicore scalar processor
Data bus = any width
Data bus = 256 bits
DDR2 RAM
(from 200MHz FPGA)
Block
RAM
…
Block
RAM
Block
RAM
Ruins
spatial
locality
Inter-processor
communication
needed
Processor
Processor
…
Processor
Vector processor – many words at a time
Data bus = any width
Data bus = 256 bits
DDR2 RAM
(from 200MHz FPGA)
Block
RAM
Vector Processor
…
Block
RAM
Block
RAM
Productivity vs. Performance
125
Run
time (s)
Izhikevich.c
NIOS II
Dual-core
NIOS II+BlueVec
IzhikevichVec.c
2
1
200
500
Bluespec System Verilog
NeuronSimulator/*.bsv
5k-10k
Lines of code
Vector version doesn’t have much more code than original code
Massive performance improvement
Example for LIF character recognition
LIF.c
LIFVec.c
Time (ms) %
Time (ms)
%
I-values
331.7
83
I-values
7.9
42
Gain/Bias
39.2
9
Gain/Bias
3.6
18
Neuron updates
26.8
6
Neuron updates
5.8
30
Total
397.7
Total
18.9
324 lines of code
496 lines of code
LIF simulator on FPGA running a Nengo model
Conclusion
• When designing a neural computation system you need to think about
every part of the computation, not just the algorithm
• Some form of vector processor is likely to be most appropriate
Or write your model in NeuroML and let us
do the hard work!
Questions?