Transcript Slide 1
Computing beyond a Million
Processors
- bio-inspired massively-parallel architectures
Andrew Brown
Steve Furber
The University of Manchester The University of Southampton
[email protected]
[email protected]
SBF is supported by a Royal
Society-Wolfson Research
Merit Award
ACACES 12 July 2009
1
Outline
•
•
•
•
•
•
•
Computer Architecture Perspective
Building Brains
Living with Failure
Design Principles
The SpiNNaker system
Concurrency
Conclusions
ACACES 12 July 2009
2
Multi-core CPUs
• High-end uniprocessors
– diminishing returns from complexity
– wire vs transistor delays
• Multi-core processors
– cut-and-paste
– simple way to deliver more MIPS
• Moore’s Law
– more transistors
– more cores
… but what about the software?
ACACES 12 July 2009
3
Multi-core CPUS
• General-purpose parallelization
– an unsolved problem
– the ‘Holy Grail’ of computer science for half a century?
– but imperative in the many-core world
• Once solved
– few complex cores, or many simple cores?
– simple cores win hands-down on power-efficiency!
ACACES 12 July 2009
4
Back to the future
• Imagine…
– a limitless supply of (free) processors
– load-balancing is irrelevant
– all that matters is:
• the energy used to perform a computation
• formulating the problem to avoid synchronisation
• abandoning determinism
• How might such systems work?
ACACES 12 July 2009
5
Bio-inspiration
• How can massively parallel computing
resources accelerate our understanding of
brain function?
• How can our growing understanding of
brain function point the way to more
efficient parallel, fault-tolerant
computation?
ACACES 12 July 2009
6
Outline
•
•
•
•
•
•
•
Computer Architecture Perspective
Building Brains
Living with Failure
Design Principles
The SpiNNaker system
Concurrency
Conclusions
ACACES 12 July 2009
7
Building brains
• Brains demonstrate
– massive parallelism (1011 neurons)
– massive connectivity (1015 synapses)
– excellent power-efficiency
• much better than today’s microchips
–
–
–
–
ACACES 12 July 2009
low-performance components (~ 100 Hz)
low-speed communication (~ metres/sec)
adaptivity – tolerant of component failure
autonomous learning
8
Neurons
• Multiple inputs
(dendrites)
• Single output (axon)
– digital “spike”
– fires at 10s to 100s
of Hz
– output connects to
many targets
• Synapse at
input/output
connection
ACACES 12 July 2009
40
20
0
-20
-40
-60
-80
0
20
40
60
80
100
120
140
160
180
200
(www.ship.edu/ ~cgboeree/theneuron.html)
9
Neurons
• A flexible
biological control
component
– very simple
animals have a
handful
– bees: 850,000
– humans: 1011
(photo courtesy of the Brain Mind Institute, EPFL)
ACACES 12 July 2009
10
Neurons
• Regular high-level
structure
• low-level vision, to
• language, etc.
• Random low-level
structure
– adapts over time
ACACES 12 July 2009
(faculty.washington.edu/
rhevner/Miscellany.html)
– e.g. 6-level cortical
microachitecture
11
Neural Computation
• To compute we need:
– Processing
– Communication
– Storage
• Processing:
abstract model
– linear sum of
weighted inputs
x1
x2
x3
x4
w1
w
2
w
3
w
4
f
y
• ignores non-linear
processes in dendrites
– non-linear output function
– learn by adjusting synaptic weights
ACACES 12 July 2009
12
Processing
• Leaky integrate-and-fire
model
– inputs are a series of spikes
– total input is a weighted
sum of the spikes
– neuron activation is the
input with a “leaky” decay
– when activation exceeds
threshold, output fires
– habituation, refractory
period, …?
ACACES 12 July 2009
xi (t tik )
k
I wi xi
i
A A / A I
if A A fire
& set A 0
13
Processing
( www.izhikevich.com )
• Izhikevich model
– two variables, one fast, one slow:
v 0.04v 5v 140 u I
u a (bv u )
v
40
20
2
0
-20
-40
– neuron fires when
-60
v > 30; then:
vc
u ud
-80
0
20
40
60
80
100
120
140
160
180
200
0
20
40
60
80
100
120
140
160
180
200
2
u
0
-2
-4
-6
-8
-10
-12
-14
– a, b, c & d select behaviour
ACACES 12 July 2009
14
Communication
• Spikes
– biological neurons communicate principally
via ‘spike’ events
– asynchronous
– information is only:
40
20
0
• which neuron fires, and
• when it fires
-20
-40
-60
-80
ACACES 12 July 2009
0
20
40
60
80
100
120
140
160
180
200
15
Storage
• Synaptic weights
– stable over long periods of time
• with diverse decay properties?
– adaptive, with diverse rules
• Hebbian, anti-Hebbian, LTP, LTD, ...
• Axon ‘delay lines’
• Neuron dynamics
– multiple time constants
• Dynamic network states
ACACES 12 July 2009
16
Outline
•
•
•
•
•
•
•
Building Brains
Computer Architecture Perspective
Living with Failure
Design Principles
The SpiNNaker system
Concurrency
Conclusions
ACACES 12 July 2009
17
The Good News...
Transistors per Intel chip
Millions of transistors per chip
100
Pentium 4
Pentium III
10
Pentium
Pentium II
486
1
386
286
0.1
8086
0.01
4004
0.001
1970
ACACES 12 July 2009
8080
8008
1975
1980
1985
Year
1990
1995
2000
18
...and the Bad News
• Device
variability
&
1.0
• Component
failure
ACACES 12 July 2009
Vout2(V)
0.8
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
Vout1(V)
0.8
1.0
19
Atomic Scale devices
The simulation
Paradigm now
A 22 nm MOSFET
In production 2008
A 4.2 nm MOSFET
In production 2023
ACACES 12 July 2009
20
A view from Intel
• The Good News:
– we will have 100 billion transistor ICs
• The Bad News:
– billions will fail in manufacture
• unusable due to parameter variations
– billions more will fail over the first year of
operation
• intermittent and permanent faults
(Shekhar Borkar, Intel Fellow)
ACACES 12 July 2009
21
A view from Intel
• Conclusions:
– one-time production test will be out
– burn-in to catch infant mortality will be
impractical
– test hardware will be an integral part of the design
– dynamically self-test, detect errors, reconfigure,
adapt, ...
(Shekhar Borkar, Intel Fellow)
ACACES 12 July 2009
22
Outline
•
•
•
•
•
•
•
Building Brains
Computer Architecture Perspective
Living with Failure
Design Principles
The SpiNNaker system
Concurrency
Conclusions
ACACES 12 July 2009
23
Design principles
• Virtualised topology
– physical and logical connectivity are
decoupled
• Bounded asynchrony
– time models itself
• Energy frugality
– processors are free
– the real cost of computation is energy
ACACES 12 July 2009
24
Outline
•
•
•
•
•
•
•
Building Brains
Computer Architecture Perspective
Living with Failure
Design Principles
The SpiNNaker system
Concurrency
Conclusions
ACACES 12 July 2009
25
SpiNNaker project
• Multi-core CPU node
– 20 ARM968 processors
– to model large-scale systems of spiking neurons
• Scalable up to systems with 10,000s of nodes
– over a million processors
– >108 MIPS total
• Power ~ 25mw/neuron
ACACES 12 July 2009
26
SpiNNaker project
ACACES 12 July 2009
27
SpiNNaker project
• Fault-tolerant
architecture for largescale neural modelling
• A billion neurons in real
time
• A step-function
increase in the scale of
neural computation
• Cost- and energyefficient
ACACES 12 July 2009
28
SpiNNaker system
ACACES 12 July 2009
29
CMP node
ACACES 12 July 2009
30
ARM968 subsystem
ACACES 12 July 2009
31
GALS organization
• clocked IP blocks
• self-timed
interconnect
• self-timed interchip links
ACACES 12 July 2009
32
Outline
•
•
•
•
•
•
•
Building Brains
Computer Architecture Perspective
Living with Failure
Design Principles
The SpiNNaker system
Concurrency
Conclusions
ACACES 12 July 2009
33
Circuit-level concurrency
• Delay-insensitive comms
– 3-of-6 RTZ on chip
– 2-of-7 NRZ off chip
• Deadlock resistance
– Tx & Rx circuits have high
deadlock immunity
– Tx & Rx can be reset
independently
• each injects a token at reset
• true transition detector filters
surplus token
ACACES 12 July 2009
data
Rx
Tx
ack
din
dout
(2 phase)
¬reset
(4 phase)
¬ack
34
System-level concurrency
• Breaking symmetry
– any processor can be Monitor Processor
• local ‘election’ on each chip, after self-test
– all nodes are identical at start-up
• addresses are computed relative to node with host
connection (0,0)
– system initialised using flood-fill
• nearest-neighbour packet type
• boot time (almost) independent of system scale
ACACES 12 July 2009
35
Application-level concurrency
• Event-driven realtime software
– spike packet arrived
• initiate DMA
– DMA of synaptic
data completed
• process inputs
• insert axonal delay
– 1ms Timer interrupt
ACACES 12 July 2009
sleeping
event
Packet Received
Timer Millisecond
Interrupt
Interrupt
fetch_
Synaptic_
Data();
Priority 1
update_
Neurons();
Priority 3
DMA Completion
Interrupt
update_
Stimulus();
Priority 2
goto_Sleep();
36
Application-level concurrency
• Cross-system
delay << 1ms
– hardware
routing
– ‘emergency’
routing
• failed links
• Congestion
– if all else fails
• drop packet
ACACES 12 July 2009
37
Biological concurrency
• Firing rate population
codes
450
400
350
300
firing rate
– N neurons
– diverse tuning
– collective coding of
a physical parameter
– accuracy N
– robust to neuron
failure
500
250
200
150
100
50
0
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
parameter
0.4
0.6
0.8
1
(Neural Engineering,
Eliasmith & Anderson 2003)
ACACES 12 July 2009
38
Biological concurrency
• Single spike/neuron codes
– choose N to fire from a population of M
– order of firing may or may not matter
Number of codes
Unordered Ordered
N-of-M
N-of-M
CNM
M!/(M-N)!
M-bit
binary
2M
e.g. M=100, N=20
1021
1039
1030
e.g. M=1000, N=200
10216
10591
10301
ACACES 12 July 2009
39
Outline
•
•
•
•
•
•
•
Building Brains
Computer Architecture Perspective
Living with Failure
Design Principles
The SpiNNaker system
Concurrency
Conclusions
ACACES 12 July 2009
40
Software progress
• ARM SoC Designer
SystemC model
• 4 chip x 2 CPU toplevel Verilog model
• Running:
• boot code
• Izhikevich model
• PDP2 codes
ACACES 12 July 2009
• Basic configuration
flow
• …it all works!
41
Where might this lead?
• Robots
– iCub EU project
– open humanoid
robot platform
– mechanics, but no
brain!
ACACES 12 July 2009
42
Conclusions
• Many-core processing is coming
– soon we will have far more processors than we can
program
• When (if?) we crack parallelism…
– more small processors are better than fewer large
processors
– synchronization, coherent global memory,
determinism, are all impediments
• Biology suggests a way forward!
– but we need new theories of biological concurrency
ACACES 12 July 2009
43
UoM SpiNNaker team
ACACES 12 July 2009
44