Asynchronous Design Using Commercial HDL Synthesis Tools

Download Report

Transcript Asynchronous Design Using Commercial HDL Synthesis Tools

Asynchronous Design Using
Commercial HDL Synthesis Tools
Michiel Ligthart
Karl Fant
Ross Smith
Alexander Taubin
Alex Kondratyev
Outline

Added Value of NCL - Simplification of design

Canonical form of gates - The key for optimization

NCL in CAD flow. An example

Validation of optimization

Experimental results

Conclusion and future work
Outline

Added Value of NCL - Simplification of design

Canonical form of gates - The key for optimization

NCL in CAD flow. An example

Validation of optimization

Experimental results

Conclusion and future work
Potential NCL Advantages
 Inherent to asynchronous
- no clock system
- low EMI
- free stand-by mode, etc.
 Inherent to delay-insensitive
-
nicely fits current/future (DSM) technology
easy to reuse design
plug-’n’-play SoC design
easily portable among technologies
 Particular to NULL Convention Logic (NCL)
-
ease of design (reduced time to market)
use standard HDL and commercial tools to
simulate and synthesize asynchronous circuits
Outline

Added Value of NCL - Simplification of design

Canonical form of gates - The key for optimization

NCL in CAD flow. An example

Validation of optimization

Experimental results

Conclusion and future work
Data Communication Based on DI
Encoding
•
DI protocol with spacer (NULL)
– NULL propagation / NULL acknowledge
– Data propagation / Data acknowledge
Register
Register
Combinational
circuitry
Request for DATA/NULL
Completion
DATA
NULL
by codeword
Completion
detection
NCL: Pushing Two-phase Behavior Down
to the Level of Each Gate
Logic
gate
no data present
NCL: Pushing Two-phase Behavior Down
to the Level of Each Gate
Logic
gate
complete data present
Gate output acknowledges input changes
Simplest DI encoding - dual-rail [Sims’58]
General Implementation
of Hysteresis Gates in CMOS
Dual-rail circuits under
Set is positively unate
two-phase operation:
• A transition from NULL to
p-tree
Data is monotonic
Reset
• An input transition to NULL function
resets all gates to NULL
x1
...
Reset
R ( x1 ,..., x n )  x1  x2  ...  xn
g
xn
n-tree
Set
function
g=S+gR
Refined Implementation
of NCL Hysteresis Gates in CMOS
R ( x1 ,..., xn )  x1  x2  ...  xn
Canonical form of
reset is the key to
use synchronous
optimization tools
Depends only on
the number of inputs
g
...
x1
xn
n-tree
Set
function
g=S+gR
Reset of each individual gate scales up
to the whole network
Family of Logic Gates
M of N threshold gates
with hysteresis behavior
DIMS
1
1
[Muller’62]
[Sparso’92]
1
1
OR gate equivalents
2
2
2
2
3
3
3
4
4
5
Room for
optimization
Example: 2-of-3 Threshold Gate
with Hysteresis
• The gate switches
a
to data when M inputs are data
to NULL when all inputs are NULL
b
c
z
a
a
b
b
c
c
• It is possible to use “negative
logic” – reversing pull-up and
pull-down networks
z=ab+ac+bc+z(a+b+c)
Outline

Added Value of NCL - Simplification of design

Canonical form of gates - The key for optimization

NCL in CAD flow. An example

Validation of optimization

Experimental results

Conclusion and future work
RTL Design Flow –
Combinational Optimization
Separate combinational logic and registers
Subject of
synthesis and
optimization
The topic of this
presentation
Combinational
process
Request for data/null
Sequential
process
Replaced by NCL
registration in RTL code
reset
Request for data/null
Two-Step Synthesis Flow
(Using Synopsys' Design Compiler)
VHDL
Generic
library
Synthesis
 Step 1.
Translate HDL into
“synchronous” netlist
Dual-rail
definition
Intermediate
netlist
NCL
library
Synthesis
NCL
netlist

Step 2.
Convert intermediate
netlist into NCL netlist
Input to Step 1: RTL Description
(Multiplexer Example)
•
RTL description (MUX)
entity test
input a,b,s : ncl_logic;
output z
: ncl_logic;
architecture
process (a, b, s) is begin
if s = ‘1’ then
z <= a;
else
z <= b;
end if;
end process;
s
a
b
z
MUX Example: Output of Step 1 / Input
to Step 2: Intermediate Netlist
s
a
b
s
z
x
z
a
b
y
Two input NAND gates
Dual-rail Package
•
Define type
{0,1,N}
type dual_rail_logic is record
rail1 : std_logic ;
rail0 : std_logic ;
end record;
a
• Overload operators a.0
22
a.1
22
function “nand”
b.0
b.1
function “not”
22
13
{0,1}
a.0
a.1 {0,1}
z.0
z.1
th22 = two-input C-element
th13 = three-input OR
22
a.0
a.1
z.1
z.0
Optimizing with Design Compiler

Dual-rail expansion

Two phases (set and reset) are separated

Set phase ensures circuit functionality

Reset phase is implied

Optimizations are applied to the set phase
Dual-rail Expansion of MUX
x.t
s
x
z
a
b
D-R
NAND
y
s.t
s.f
a.t
a.f
b.t
b.f
x.f
z.t
D-R
NAND
z.f
D-R
NAND
y.t
y.f
Naive semi-static DIMS implementation – 114 transistors
(can be reduced to 63 transistors by merging C-elements
with OR-gates) versus 14 for a synchronous circuit
“Images”-Boolean Gates
Implementing Set Functions
NCL gates
a
b
z
Projection for
optimization
z=a+b
a
b
equivalent for
set phase
z
th22
z=ab+z(a+b)
a
z
b
th33w2
c
z=a(b+c)+z(a+b+c)
Mapping for
implementation
In the initial state:
Hysteresissequential behavior
a
b
a
b
z
z=a+b
z
z=ab
a
b
c
z
z=a(b+c)
…
…
z=a=b=c=0
Boolean gates (images)
Combinational
behavior
Image of Dual-rail NAND Gate
a.t
b.t
a.t
a.f
b.t
b.f
D-R
NAND
out.t
C
out.t
C
out.f
C
a.f
b.f
C
C-element equation: z=ab+z(a+b).
out.f
Image of Dual-rail NAND Gate
a.t
b.t
out.t
out.f
a.f
b.f
C-element equation: z=ab+z(a+b),
initially z=a=b=0
In a set phase it behaves like an AND gate z=ab
Dual-rail Expansion for MUX
x.t
s.t
s.f
a.t
a.f
x.f
b.t
b.f
y.t
z.t
z.f
Twelve 2-input C-gates
&
y.f
Three 3-input OR-gates
Image Circuit of Dual-rail Expansion
for MUX
x.t
s.t
s.f
a.t
a.f
b.t
b.f
x.f
z.t
z.f
y.t
y.f
Optimized with Design Compiler
MUX circuit passes technology independent optimization
and is mapped to “images” of gates from NCL library.
s.t
s.f
a.t
a.f
b.t
b.f
z.f
z.t
A(B+C)
image of th33w2
AB+CD
image of thXOR
Technology Mapping with Design
Compiler
NCL circuit: images are replaced by gates with
hysteresis
Semi-static CMOS
thXOR
s.t
s.f
a.t
2
th22
th33w2
2
implementation of
z.t thXOR.
e
th24w2
f
a.f
b.t
b.f
th33w2
2
th22
thXOR
thXOR
2
th24w2
m
n
z.f
k
e
m
f
n
44 transistors - 30% better than optimized DIMS
Outline

Added Value of NCL - Simplification of design

Canonical form of gates - The key for optimization

NCL in CAD flow. An example

Validation of optimization

Experimental results

Conclusion and future work
Optimization Flow
Synchronous
Virtual object
Boolean circuit
Asynchronous
DIMS circuit
translation
dual-rail
package
Dual-rail image
optimization
Design
compiler
Real object
DI
equivalence
Optimized
circuit
tech.mapping
Design
compiler
Hysteresis
gates
Mapped to
images
Validation of Optimization
The validity of transformations (DI equivalence)
is based on two properties:

Functional equivalence of optimized and original
circuits (under two-phase operation)

Maintenance of DI properties in optimized circuit
Both are based on the properties of prime and
irredundant networks and properties of algebraic
factorization [Brayton’90, Hachtel’92]
Validation of Optimization:
Idea of the Proof
Starting point: prime and irredundant Boolean network
(known to be 100% stuck-at testable, [Scherz’72])
algebraic transformations
Set of test vectors for stuck-at faults is maintained [Hachtel’92]
induction by topology order
Testability: each gate acknowledges inputs changes
(Delay insensitivity)
Same for tree-based technology mapping
Outline

Added Value of NCL - Simplification of design

Canonical form of gates - The key for optimization

NCL in CAD flow. An example

Validation of optimization

Experimental results

Conclusion and future work
Manual vs. Synthesized Designs
Area (transistor number)
4500
4000
3500
Manual
Synthesized
3000
2500
2000
1500
1000
500
0
For bigger circuits Synthesis/Manual ratio is better
(22% improvement for biggest example)
Synchronous vs. NCL design
gates
transistors
35000
2500
2000
clock
Penalty in transistors:
Dual-rail implementation
Effective delay-insensitivity
30000
25000
15 0 0
NCL
20000
15000
10 0 0
10000
500
5000
0
0
To reduce transistor count:
Use four-rail encoding
Improve architectural solutions: e.g., OR instead MUX
Compromise delay insensitivity
Outline

Added Value of NCL - Simplification of design

Canonical form of gates - The key for optimization

NCL in CAD flow. An example

Validation of optimization

Experimental results

Conclusion and future work
Conclusions
•
First methodology to use standard HDL
and commercial tools both to simulate and
synthesize asynchronous circuits
•
The methodology is formally validated
•
The results of the synthesis are acceptable
Future Tasks

Reduce area/power without losing delay insensitivity
(e.g., four-rail design)

Relax DI requirements to reduce area (e.g., using
timing assumptions)

Use peephole optimizations (e.g., merge gates used
for registration with their input gates etc.)

Write DesignWare components to get better
performance for arithmetic units (infer hand designed
components)