Synthesis of Speed Independent Circuits based on
Download
Report
Transcript Synthesis of Speed Independent Circuits based on
Synthesis of Speed Independent
Circuits Based on Decomposition
Tomohiro Yoneda
Hiroomi Onda
National Institute of Informatics
Tokyo Institute of Technology
Tokyo Institute of Technology
Chris Myers
University of Utah
Background
High-level synthesis
plays an important role to push Async. design to
wide use
Major approach to high-level synthesis
Prepare basic cells that correspond to specification
language constructs
Translate specifications to basic cell networks
syntax-directedly with local optimizations
Very efficient
Global optimization may be difficult
2004/4/21 Async2004
2
Challenge
Our approach to high-level synthesis
Translate high-level spec to low-level spec (time
Petri nets)
Use timed logic synthesis technique
Global optimization can be possible by
logic optimization
timing information
Cost for synthesis is very high
2004/4/21 Async2004
3
How to reduce the cost
Translation technique to low-level spec
guarantees that low-level spec has CSC
by adding state variables sufficiently
Idea: [Yoneda,Myers 2003]
Developing Balsa Compiler
Efficient logic synthesis technique
decomposes low-level spec w.r.t. each output
synthesizes each sub-circuit from each sub-spec
Goal of this work
In this paper, speed independent circuit synthesis is discussed
2004/4/21 Async2004
4
Decomposition based synthesis
Input
STG
1 safe
output semi-modular
with CSC (Complete State Coding)
several more restrictions
Output
Reduced STG for each output
g-C or atomic-gate implementation is synthesizable
Feature
Only state graphs for reduced STGs are necessary
It is not necessary to explore the reachable states of the
original STGs
2004/4/21 Async2004
5
Key issue - input set determination
ack1
req1
csc
gC
req2
req1
ack1
csc
ack2
gC
ack1
req1
gC
csc
csc
req1
ack1
2004/4/21 Async2004
6
Key issue - input set determination
ack1
req1
csc
gC
req2
gC
ack1
gC
csc
csc
req1
req1
ack1
csc
Reduction
ack2
req1
ack1
2004/4/21 Async2004
7
Related works
Synthesizing each output separately
T.A. Chu, Synthesis of Self-Timed VLSI Circuits
from Graph-theoretic Specification, PhD thesis,
MIT,1987
No idea for input set determination
Input set determination is performed based on the state
graph of the original STG
R. Puri, J. Gu, A Modular Partitioning Approach for
Asynchronous Circuit Synthesis, IEEE TCAD, 1995
Input signals are kept, if hiding them does not increase the
number of CSC conflicts
W. Vogler, R. Wollowski, Decomposition in
Asynchronous Circuit Design, Tech Report, Univ.
Augsburg, 2002
STG reduction technique - net contraction - is formalized
No general idea for input set determination
2004/4/21 Async2004
8
Our approach
Step 1: Select possible trigger signals as
the initial input set
Step 2: Contract the original STG by
deleting signals except for the output
and those in the current input set
Step 3: If the reduced STG has CSC, done
Step 4: Otherwise, choose appropriate
signals and add them to the input set
Step 5: Goto Step 2
2004/4/21 Async2004
9
Contraction
Original STG
Reduced STG
Possible trigger signals
contraction
bisimilar translation
(i.e., by W. Vogler, R. Wollowski)
2004/4/21 Async2004
10
Issues to be discussed
If the reduced STG has CSC, is a
correct speed independent circuit
synthesized from it?
How can appropriate signals be chosen
without the state graph of the original
STG?
How is the overhead (performance
degradation of the synthesized circuit)?
2004/4/21 Async2004
11
Issues to be discussed
If the reduced STG has CSC, is a
correct speed independent circuit
synthesized from it?
How can appropriate signals be chosen
without the state graph of the original
STG?
How is the overhead (performance
degradation of the synthesized circuit)?
2004/4/21 Async2004
12
An example
ES(x+)
(a b c x)
b+
0100
a+/1
c+
0110
110R
a+/1 c+
x+
111R
1101
x+
c+
1111
ES(x-)
2004/4/21 Async2004
a-/2
a+/2
a-/1
011F
x0110
0000
0000
b-
c0010
1000
c1010
a+/2
13
If a is deleted
CD(ES(x+))
(a b c x)
b+
0100
a+/1
c+
0110
110R
a+/1 c+
x+
111R
1101
x+
c+
1111
CD(ES(x-))
a-/2
a+/2
a-/1
011F
x0110
0000
0000
b-
c0010
1000
c1010
a+/2
CD(S): Extended set of S by deleting signals
2004/4/21 Async2004
14
Irrelevant input set
A set D of signals is an irrelevant input
set for an output x, if
D In Out – {x}
CD(ES(x+)) – UR = ES(x+)
CD(ES(x–)) – UR = ES(x–)
In: Input signal set of the original STG
Out: Output signal set of the original STG
UR: Unreachable state set of the original STG
2004/4/21 Async2004
15
If a is deleted
CD(ES(x+))
CD(ES(x+)) – UR ES(x+)
CD(ES(x–)) – UR ES(x–)
(a b c x)
b+
0100
a+/1
c+
0110
110R
a+/1 c+
x+
111R
1101
x+
c+
1111
0110
CD(ES(x-))
a-/2
a+/2
a-/1
011F
x-
{a} is not an irrelevant
input set
0000
0000
b-
c0010
1000
c1010
a+/2
If a non-irrelevant input set is deleted,
the reduced STG has no CSC
2004/4/21 Async2004
16
If c is deleted
CD(ES(x+))
CD(ES(x+)) – UR = ES(x+)
CD(ES(x–)) – UR = ES(x–)
(a b c x)
b+
0100
a+/1
c+
0110
110R
a+/1 c+
x+
111R
1101
x+
c+
1111
0101
{c} is an irrelevant
input set
CD(ES(x-))
a-/2
a+/2
a-/1
011F
x0110
0000
0000
b-
c0010
1000
c1010
a+/2
If an irrelevant input set (including no possible trigger signals)
is deleted, a correct circuit is obtained from the reduced STG
2004/4/21 Async2004
17
Theorem 1
For an STG G that has CSC and is output semimodular, if a reduced STG G' obtained from G
by deleting some signal set V (including no
possible trigger signals) has CSC, then a
correct circuit is obtained from G'
If V is not an irrelevant input set, G' must not have CSC
V must be an irrelevant input set
A correct circuit is obtained from G'
2004/4/21 Async2004
18
Issues to be discussed
If the reduced STG has CSC, is a
correct speed independent circuit
synthesized from it?
How can appropriate signals be chosen
without the state graph of the original
STG?
How is the overhead (performance
degradation of the synthesized circuit)?
2004/4/21 Async2004
19
Contraction with initial input set
Original STG
Reduced STG
Possible trigger signals
contraction
2004/4/21 Async2004
20
Checking CSC
Constructing state graph of the reduced STG
Reduced STG
00
a+/1
1R
CSC conflict
x+
11
a-/1
0F
x-
a-/2
00
a+/2
10
2004/4/21 Async2004
21
Guided Simulation
abstracted trace
original trace
State graph of the original STG
00
b+
0100
0000
a+/1
a+/1
c+
1R
0110
110R noninterface transition
a+/1 c+
x+
x+
111R
11
1101
x+
a-/1
c+
1111
interface transition
0F
a+/2 1000
a-/1
xc011F
0000
00
1010
xca+/2 This can be obtained by simulating the original
a+/2 STG
0110
0010
not requiring the state graph
b- of the original STG
10
2004/4/21 Async2004
22
Generating original trace
Original STG
noninterface
transitions
t1
t2
interface
transitions
b+
t3
abstracted trace: a+ b+
original trace: t2 t3 a+ b+
a+
2004/4/21 Async2004
23
Analysis of original trace
0100
c+
0110
a+/1
111R
x+
1111
b+
noninterface signal
interface signal
a+/2
a-/1
011F
x0110
0000
1000
0000
b-
c0010
Find a noninterface signal that
certainly changes odd times here
2004/4/21 Async2004
24
Analysis of original trace
b+
0100
c+
noninterface signal
0110
a+/1
111R
x+
1111
interface signal
a+/2
a-/1
011F
x0110
0000
1000
Add b to the input set
0000
b-
c0010
2004/4/21 Async2004
Resolve this CSC conflict
25
Analysis of original trace
0100
c+
0110
a+/1
111R
x+
1111
b+
noninterface signal
interface signal
a+/2
a-/1
011F
x0110
0000
1000
0000
b-
c0010
concurrent
But, c does not actually
c also seems to satisfy
this condition
Select a noninterface signal that
certainly changes odd times here
2004/4/21 Async2004
26
Formalization
original trace
(init)
we1
f0
last w
1.
2.
3.
4.
e1
CSC conf.
ws2
f1
we2
w is odd-confined by f1 :
first w
w changes odd times in f1
if w changes in f0, then we1 e1
e1 ws2
we2 e2
"" represents causality relation
obtained from structure of STG
last w
e2 interface signal
CSC conf.
2004/4/21 Async2004
27
Analysis of original trace
f0
b+
0100
c+
noninterface signal
0110
a+/1
111R
x+
1111
interface signal
a+/2
a-/1
011F
xf1
0110
0000
1000
0000
b-
c0010
b is odd-confined by f1
c is not odd-confined by f1
2004/4/21 Async2004
28
Theorem 2
f0
If w(and ui)satisfies the following
condition, adding w(and ui) resolves
the CSC conflict in f1
(sufficient condition)
w is odd-confined by f1
w does not changes in l
If w changes before the first
interface signal, for each odd i
ui is odd-confined by hi with causality
0 110R
w+
1 110R
1 110R
f1
w-
w+
1 110R
interface transition
relation shown in the figure
h1
1 1101
u+
1 1100
interface transition
1 1001
h3
l
For one CSC conflict, there exist
many candidate sets of signals
↓
• Analyze every CSC conflict
• Set up the covering problem
and solve it
2004/4/21 Async2004
29
Drawback
For an STG with conflicting transitions, backtracking
may be needed
Finding actually fired noninterface transitions is no longer
deterministic due to deleting conflicting transitions
If many conflicting transitions exist, backtracking
sometimes costs a lot
Approaches that seem practical are to
Keep all conflicting transitions even if they are not related to
backtracking, or
Manually specify some of necessary conflicting transitions
Our compiler from a high-level language can automatically specify
those conflicting transitions
2004/4/21 Async2004
30
Issues to be discussed
If the reduced STG has CSC, is a
correct speed independent circuit
synthesized from it?
How can appropriate signals be chosen
without the state graph of the original
STG?
How is the overhead (performance
degradation of the synthesized circuit)?
2004/4/21 Async2004
31
Experimental results
Experiments
Implementation of the proposed method in C
Pentium 2.8GHz, 4GB memory
Final logic synthesis tool: petrify -gc -eqn
Benchmarks
1. Instruction cache controller of TITAC2
generated from high-level spec by our compiler
large, but simple → input signal sets are small
compiler decisions are used for specifying conflicting transitions
2. Controllers of various filters
manually designed
medium, but complicated → input signal sets are large
all conflicting transitions are kept
3. Async Benchmarks
small and simple
all conflicting transitions are kept
2004/4/21 Async2004
32
Experimental results
CPU times, Memory usage
Benchmark1: significantly reduced
Benchmark2: reduced
Quality of synthesized circuits
Area (num. of transistors): almost no overhead
2004/4/21 Async2004
33
Experimental results
For Async Benchmarks
Quality : no overhead (exactly the same area)
Cost : advantageous only for largest specs
CPU times (sec)
2.5
2
1.5
Proposed
1
Petrify
0.5
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
2004/4/21 Async2004
34
Conclusion
New algorithm to find input signal sets for
decomposition based synthesis method
state graph of the original STG is not necessary
can handle larger circuits
Logic synthesis tool : NUTAS
Linux binary is downloadable from
http://research.nii.ac.jp/~yoneda
Future works
extend the algorithms to support timed circuit synthesis
finish the compiler development for high-level synthesis
and integrate both
2004/4/21 Async2004
35