Transcript Document

Asynchronous Circuits
Jordi Cortadella
Universitat Politècnica de Catalunya, Barcelona
Collège de France
May 14th, 2013
Goals
• Convince ourselves that:
– designing an asynchronous circuit is easy
– synchronous and asynchronous circuits are similar
– asynchronous circuits bring new advantages
• Not to discourage designers with exotic and
sophisticated asynchronous schemes
Collège de France 2013
Asynchronous circuits
2
Clocking
• How to distribute the clock?
• How to determine the clock
frequency?
• How to implement robust
communications?
• How to reduce and manage
energy?
Nvidia KeplerTM GK110
28nm, 7.1B transistors, 550mm2, 2688 CUDA cores,
Base clock: 836MHz, Memory clock: 6GHz
Collège de France 2013
Asynchronous circuits
3
Collège de France 2013
Asynchronous circuits
4
Synchronous circuits
Combinational
Logic
Flip Flops
Flip Flops
Synchronous circuit
PLL
Collège de France 2013
Asynchronous circuits
6
Synchronous circuit
CL
Two competing paths:
• Launching path
• Capturing path
Launching path < Capturing path + Period
1
2
PLL
Collège de France 2013
CLKtree + CL <
CL
Asynchronous circuits
<
CLKtree
Period
+ Period
(no clock skew)
7
Source-synchronous
Launching path
CLK
gen
Capturing path
matched delay
matched delay
matched delay
• No global clock required
• More tolerance to PVT variations
• Period > longest combinational path
• Good for acyclic pipelines
Collège de France 2013
Asynchronous circuits
8
Source-synchronous with forks and joins
CLK
gen
?
How to synchronize incoming events?
Collège de France 2013
Asynchronous circuits
9
C element (Muller 1959)
A
B
C
C
A
0
0
1
1
B
0
1
0
1
C
0
C
C
1
A
B
C
Collège de France 2013
Asynchronous circuits
10
C element (Muller 1959)
A
MAJ
B
C
(many implementations exist)
A
0
0
1
1
B
0
1
0
1
C
0
C
C
1
A
B
C
Collège de France 2013
Asynchronous circuits
11
Multi-input C element
a1
a2
a3
a4
C
C
C
a5
a6
a7
Collège de France 2013
C
c
C
C
Asynchronous circuits
12
Completion detection
Completion detection
CLK
gen
fixed delay
The fixed delay must be longer than the
worst-case logic delay (plus variability)
Q: could we detect when a computation has completed ASAP ?
Collège de France 2013
Asynchronous circuits
14
Delay-insensitive codes: Dual Rail
• Dual rail: every bit encoded with two signals
A.t
0
0
1
1
A.f
0
1
0
1
A
Spacer
0
1
Not used
SP
1
A.t
A.f
A
1
Collège de France 2013
SP
0
Asynchronous circuits
SP
1
SP
15
Dual-Rail AND gate
A
B
C
SP
SP
SP
0
-
0
-
0
0
SP
1
SP
1
SP
SP
1
1
1
A.t
A.f
B.t
B.f
C.t
C.f
A
C
B
Collège de France 2013
Asynchronous circuits
16
Dual-Rail Inverter
A
Z
SP
SP
0
1
1
0
Collège de France 2013
A.t
Z.t
A.f
Z.f
Asynchronous circuits
17
Dual-Rail AND/OR gate
A.t
A.f
C.t
A
C
B
B.t
B.f
A
A.f
A.t
C
B
C.f
C.f

A
C
B.f
B.t
C.t
B
Collège de France 2013
Asynchronous circuits
18
Dual rail: completion detection
00
01
00
10
00
10
00
10
00
01
00
01
00
01
Dual-rail
logic
00
01
10
00
00
01
•
•
•
00
10
•
•
•
00
01
10
00
Collège de France 2013
Asynchronous circuits
19
Dual rail: completion detection
Dual-rail
logic
•
•
•
C
done
•
•
•
Completion detection tree
Collège de France 2013
Asynchronous circuits
20
Dual rail: completion detection
AND
INV
OR
AND
CLK
gen
Collège de France 2013
Asynchronous circuits
21
Dual rail: completion detection
AND
INV
OR
AND
C
Collège de France 2013
C
Asynchronous circuits
22
Single rail data vs. dual rail
Some back-of-the-envelope estimations:
Area
Delay
Static power
Dynamic power
Single rail
1
1
1
< 0.2
Dual Rail
2
<< 1
2
2
Dual rail:
• Good for speed
• Large area
• High power comsumption
Collège de France 2013
Asynchronous circuits
23
Handshaking
Handshaking
CLK
gen
unknown delay
Assume that the source module can provide data at any rate:
• When should the CLK generator send an event if the
internal delays of the circuit are unknown?
Solution: handshaking
Collège de France 2013
Asynchronous circuits
25
Handshaking
Data
I have data
Request
Acknowledge
I want data
Collège de France 2013
Asynchronous circuits
26
Asynchronous elastic pipeline
ReqIn
ReqOut
C
C
C
C
AckOut
AckIn
• David Muller’s pipeline (late 50’s)
• Sutherland’s Micropipelines (Turing award, 1989)
Collège de France 2013
Asynchronous circuits
27
Multiple inputs and outputs
Collège de France 2013
Asynchronous circuits
28
Multiple inputs and outputs
Collège de France 2013
Asynchronous circuits
29
Channel-based communication
• A channel contains data and handshake wires
Data
Req
Ack
Data
Req
Ack
Collège de France 2013
Asynchronous circuits
30
Two-phase protocol
Data transfer
Data transfer
Req
Ack
Data
Data 1
Data 2
Data 3
• Every edge is active
• It may require double-edge triggered flip-flops or
pulse generators
Collège de France 2013
Asynchronous circuits
31
Four-phase protocol
Data transfer
Data transfer
Req
Ack
Data
Data 1
Data 2
Data 3
• Valid data on the active edge of Req
• Req/Ack must return to zero before the next transfer
• Different variations of the 4-phase protocol exist
Collège de France 2013
Asynchronous circuits
32
How to memorize?
L
Combinational
Logic
?
L
?
delay
C
Collège de France 2013
2-phase or 4-phase ?
Asynchronous circuits
C
33
How to memorize?
L
Combinational
Logic
L
Pulse
generator
delay
C
Collège de France 2013
2-phase
Asynchronous circuits
C
34
How to memorize?
L
Combinational
Logic
L
delay
C
Collège de France 2013
4-phase
Asynchronous circuits
C
35
Performance analysis
Ring oscillators
C
6
7
5
1
C
C
2
C
3
C
4
• Every ring requires an odd number of inverters
• The cycle period is determined by the slowest ring
• The cycle period is adapted to the operating conditions
(temperature, voltage)
Collège de France 2013
Asynchronous circuits
37
Why asynchronous?
Modularity
• Time-independent functional composability
– Performance may be affected (but not functionality)
A
Collège de France 2013
Data
Req
Ack
Asynchronous circuits
B
B’
40
Tracking variability
matched delay
Collège de France 2013
Asynchronous circuits
41
Tracking variability
delay
Good correlation for:
• Process variability (systematic)
• Global voltage fluctuations
• Temperature
•best
Aging (partially)
typ
Collège de France 2013
Asynchronous circuits
worst
42
Margins
Rigid Clocks:
Gate and wire delays (typ)
P
V
T
PLL
Aging Skew
Jitter
Cycle period
Gate and wire delays (typ)
P VT
Aging
Elastic Clocks:
Margin reduction
Skew
Speed-up / Power savings
Cycle period
Collège de France 2013
Asynchronous circuits
43
Clock elasticity
Rigid clock
wasted time
computation time
Cycle period
Elastic clock
computation time
Cycle period
Collège de France 2013
Asynchronous circuits
44
Voltage scaling and power savings
3 ARM926 cores
on the same die
-14%
Collège de France 2013
Asynchronous circuits
-24%
45
Design Automation
Design automation paradigms
• Synthesis of asynchronous controllers
– Logic synthesis from Petri nets or
asynchronous FSMs
• Syntax-directed translation
– Correct-by-construction composition of
handshake components
• De-synchronization
– Automatic transformation from synchronous to
asynchronous
Collège de France 2013
Asynchronous circuits
47
Synthesis of asynchronous controllers
DSr
LDS
LDTACK
D
DTACK
DSr+
LDS+
LDTACK+
D+
DTACK-
DTACK+
LDTACKCollège de France 2013
Asynchronous circuits
DSr-
D-
LDS48
Synthesis of asynchronous controllers
D
DTACK
LDS
DSr
LDTACK
Example: Petrify
DSr+
LDS+
LDTACK+
D+
DTACK-
DTACK+
LDTACKCollège de France 2013
Asynchronous circuits
DSr-
D-
LDS49
Syntax-directed translation
P = (A || B) ; C
(A || B) ; C
Collège de France 2013
Asynchronous circuits
50
Syntax-directed translation
P = (A || B) ; C
seq
par
C
A
||
B
B
A
Collège de France 2013
Asynchronous circuits
51
Syntax-directed translation
P = (A || B) ; C
seq
C
par
A
Collège de France 2013
B
Asynchronous circuits
52
Syntax-directed translation

P = (A ; B) 
seq
A
Collège de France 2013
Asynchronous circuits
B
53
Syntax-directed translation
a
b
+
c := a + b
c
Collège de France 2013
Asynchronous circuits
54
Syntax-directed translation
int = type [0..255]
& gcd: main proc (in? chan <<int,int>> &
out! chan int)
begin x, y: var int
| forever do
in?<<x,y>>
*
SEQ
; do x <> y then
if x < y then y:=y-x
else x:=x-y
fi
od
→
out
R
MUX
W
x
R
→
R
; out!x
od
end
→
-
DMX
DMX
<>
do
-
DMX
DMX
<
→
áá ññ
Sources:
P.A.Beerel, R.O. Ozdag and M. Ferretti.
A Designer’s Guide to Asynchronous VLSI,
Cambridge University Press, 2010.
Collège de France 2013
@
→
J. Kessels and A. Peeters.
DESCALE: A Design Experiment for a Smart
Card Application Consuming Low Energy,
in Principles of Asynchronous Circuit Design, A Systems Perspective,
Eds., J. Sparso and S. Furber, Kluwer Academic Publishers, 2001.
R
MUX
W
y
R
R
Asynchronous circuits
55
De-synchronization
• Strategy: substitute the clock tree
by local clocks and handshakes
• Combinational logic and latches are not modified
• More tolerance to variability
– Similar area, less power and/or more speed
• Cortadella, Kondratyev, Lavagno and Sotiriou.
Desynchronization: Synthesis of asynchronous circuits
from synchronous specifications.
IEEE TCAD, Oct 2006.
Collège de France 2013
Asynchronous circuits
56
Synchronous operation
CLK
gen
Transforming a synchronous circuit into asynchronous (automatically)
Collège de France 2013
Asynchronous circuits
57
De-synchronization
Transforming a synchronous circuit into asynchronous (automatically)
Collège de France 2013
Asynchronous circuits
59
Conclusions
• Asynchrony offers flexibility in time
– Modularity
– Dynamic adaptability
– Tolerance to variability
• Better optimization of power/performance
• Why isn’t it an important trend in circuit design?
– Lack of commercial EDA support (timing sign-off)
– Designers do not feel comfortable with “unpredictable” timing
– Other aspects: testing, verification, …
• De-synchronization might be a viable solution
Collège de France 2013
Asynchronous circuits
61
Collège de France 2013
Asynchronous circuits
62