Synthesis of Speed Independent Circuits based on

Download Report

Transcript Synthesis of Speed Independent Circuits based on

Synthesis of Speed Independent
Circuits Based on Decomposition
Tomohiro Yoneda
Hiroomi Onda
National Institute of Informatics
Tokyo Institute of Technology
Tokyo Institute of Technology
Chris Myers
University of Utah
Background
 High-level synthesis

plays an important role to push Async. design to
wide use
 Major approach to high-level synthesis
Prepare basic cells that correspond to specification
language constructs
 Translate specifications to basic cell networks
syntax-directedly with local optimizations
 Very efficient
 Global optimization may be difficult

2004/4/21 Async2004
2
Challenge
 Our approach to high-level synthesis
Translate high-level spec to low-level spec (time
Petri nets)
 Use timed logic synthesis technique
 Global optimization can be possible by



logic optimization
timing information
 Cost for synthesis is very high
2004/4/21 Async2004
3
How to reduce the cost
 Translation technique to low-level spec

guarantees that low-level spec has CSC


by adding state variables sufficiently
Idea: [Yoneda,Myers 2003]

Developing Balsa Compiler
 Efficient logic synthesis technique


decomposes low-level spec w.r.t. each output
synthesizes each sub-circuit from each sub-spec
Goal of this work
In this paper, speed independent circuit synthesis is discussed
2004/4/21 Async2004
4
Decomposition based synthesis
 Input

STG




1 safe
output semi-modular
with CSC (Complete State Coding)
several more restrictions
 Output

Reduced STG for each output

g-C or atomic-gate implementation is synthesizable
 Feature

Only state graphs for reduced STGs are necessary

It is not necessary to explore the reachable states of the
original STGs
2004/4/21 Async2004
5
Key issue - input set determination
ack1
req1
csc
gC
req2
req1
ack1
csc
ack2
gC
ack1
req1
gC
csc
csc
req1
ack1
2004/4/21 Async2004
6
Key issue - input set determination
ack1
req1
csc
gC
req2
gC
ack1
gC
csc
csc
req1
req1
ack1
csc
Reduction
ack2
req1
ack1
2004/4/21 Async2004
7
Related works
 Synthesizing each output separately


T.A. Chu, Synthesis of Self-Timed VLSI Circuits
from Graph-theoretic Specification, PhD thesis,
MIT,1987

No idea for input set determination

Input set determination is performed based on the state
graph of the original STG
R. Puri, J. Gu, A Modular Partitioning Approach for
Asynchronous Circuit Synthesis, IEEE TCAD, 1995
 Input signals are kept, if hiding them does not increase the
number of CSC conflicts

W. Vogler, R. Wollowski, Decomposition in
Asynchronous Circuit Design, Tech Report, Univ.
Augsburg, 2002


STG reduction technique - net contraction - is formalized
No general idea for input set determination
2004/4/21 Async2004
8
Our approach
Step 1: Select possible trigger signals as
the initial input set
Step 2: Contract the original STG by
deleting signals except for the output
and those in the current input set
Step 3: If the reduced STG has CSC, done
Step 4: Otherwise, choose appropriate
signals and add them to the input set
Step 5: Goto Step 2
2004/4/21 Async2004
9
Contraction
Original STG
Reduced STG
Possible trigger signals
contraction
bisimilar translation
(i.e., by W. Vogler, R. Wollowski)
2004/4/21 Async2004
10
Issues to be discussed
 If the reduced STG has CSC, is a
correct speed independent circuit
synthesized from it?
 How can appropriate signals be chosen
without the state graph of the original
STG?
 How is the overhead (performance
degradation of the synthesized circuit)?
2004/4/21 Async2004
11
Issues to be discussed
 If the reduced STG has CSC, is a
correct speed independent circuit
synthesized from it?
 How can appropriate signals be chosen
without the state graph of the original
STG?
 How is the overhead (performance
degradation of the synthesized circuit)?
2004/4/21 Async2004
12
An example
ES(x+)
(a b c x)
b+
0100
a+/1
c+
0110
110R
a+/1 c+
x+
111R
1101
x+
c+
1111
ES(x-)
2004/4/21 Async2004
a-/2
a+/2
a-/1
011F
x0110
0000
0000
b-
c0010
1000
c1010
a+/2
13
If a is deleted
CD(ES(x+))
(a b c x)
b+
0100
a+/1
c+
0110
110R
a+/1 c+
x+
111R
1101
x+
c+
1111
CD(ES(x-))
a-/2
a+/2
a-/1
011F
x0110
0000
0000
b-
c0010
1000
c1010
a+/2
CD(S): Extended set of S by deleting signals
2004/4/21 Async2004
14
Irrelevant input set
 A set D of signals is an irrelevant input
set for an output x, if



D  In  Out – {x}
CD(ES(x+)) – UR = ES(x+)
CD(ES(x–)) – UR = ES(x–)
In: Input signal set of the original STG
 Out: Output signal set of the original STG
 UR: Unreachable state set of the original STG

2004/4/21 Async2004
15
If a is deleted
CD(ES(x+))
CD(ES(x+)) – UR  ES(x+)
CD(ES(x–)) – UR  ES(x–)
(a b c x)
b+
0100
a+/1
c+
0110
110R
a+/1 c+
x+
111R
1101
x+
c+
1111
0110
CD(ES(x-))
a-/2
a+/2
a-/1
011F
x-
{a} is not an irrelevant
input set
0000
0000
b-
c0010
1000
c1010
a+/2
If a non-irrelevant input set is deleted,
the reduced STG has no CSC
2004/4/21 Async2004
16
If c is deleted
CD(ES(x+))
CD(ES(x+)) – UR = ES(x+)
CD(ES(x–)) – UR = ES(x–)
(a b c x)
b+
0100
a+/1
c+
0110
110R
a+/1 c+
x+
111R
1101
x+
c+
1111
0101
{c} is an irrelevant
input set
CD(ES(x-))
a-/2
a+/2
a-/1
011F
x0110
0000
0000
b-
c0010
1000
c1010
a+/2
If an irrelevant input set (including no possible trigger signals)
is deleted, a correct circuit is obtained from the reduced STG
2004/4/21 Async2004
17
Theorem 1
 For an STG G that has CSC and is output semimodular, if a reduced STG G' obtained from G
by deleting some signal set V (including no
possible trigger signals) has CSC, then a
correct circuit is obtained from G'
If V is not an irrelevant input set, G' must not have CSC
V must be an irrelevant input set
A correct circuit is obtained from G'
2004/4/21 Async2004
18
Issues to be discussed
 If the reduced STG has CSC, is a
correct speed independent circuit
synthesized from it?
 How can appropriate signals be chosen
without the state graph of the original
STG?
 How is the overhead (performance
degradation of the synthesized circuit)?
2004/4/21 Async2004
19
Contraction with initial input set
Original STG
Reduced STG
Possible trigger signals
contraction
2004/4/21 Async2004
20
Checking CSC

Constructing state graph of the reduced STG
Reduced STG
00
a+/1
1R
CSC conflict
x+
11
a-/1
0F
x-
a-/2
00
a+/2
10
2004/4/21 Async2004
21
Guided Simulation
abstracted trace
original trace
State graph of the original STG
00
b+
0100
0000
a+/1
a+/1
c+
1R
0110
110R noninterface transition
a+/1 c+
x+
x+
111R
11
1101
x+
a-/1
c+
1111
interface transition
0F
a+/2 1000
a-/1
xc011F
0000
00
1010
xca+/2 This can be obtained by simulating the original
a+/2 STG
0110
0010
not requiring the state graph
b- of the original STG
10
2004/4/21 Async2004
22
Generating original trace
Original STG
noninterface
transitions
t1
t2
interface
transitions
b+
t3
abstracted trace: a+ b+
original trace: t2 t3 a+ b+
a+
2004/4/21 Async2004
23
Analysis of original trace
0100
c+
0110
a+/1
111R
x+
1111
b+
noninterface signal
interface signal
a+/2
a-/1
011F
x0110
0000
1000
0000
b-
c0010
Find a noninterface signal that
certainly changes odd times here
2004/4/21 Async2004
24
Analysis of original trace
b+
0100
c+
noninterface signal
0110
a+/1
111R
x+
1111
interface signal
a+/2
a-/1
011F
x0110
0000
1000
Add b to the input set
0000
b-
c0010
2004/4/21 Async2004
Resolve this CSC conflict
25
Analysis of original trace
0100
c+
0110
a+/1
111R
x+
1111
b+
noninterface signal
interface signal
a+/2
a-/1
011F
x0110
0000
1000
0000
b-
c0010
concurrent
But, c does not actually
c also seems to satisfy
this condition
Select a noninterface signal that
certainly changes odd times here
2004/4/21 Async2004
26
Formalization
original trace
(init)
we1
f0
last w
1.
2.
3.
4.
e1
CSC conf.
ws2
f1
we2
w is odd-confined by f1 :
first w
w changes odd times in f1
if w changes in f0, then we1 e1
e1  ws2
we2  e2
"" represents causality relation
obtained from structure of STG
last w
e2 interface signal
CSC conf.
2004/4/21 Async2004
27
Analysis of original trace
f0
b+
0100
c+
noninterface signal
0110
a+/1
111R
x+
1111
interface signal
a+/2
a-/1
011F
xf1
0110
0000
1000
0000
b-
c0010
b is odd-confined by f1
c is not odd-confined by f1
2004/4/21 Async2004
28
Theorem 2
f0
If w(and ui)satisfies the following
condition, adding w(and ui) resolves
the CSC conflict in f1
(sufficient condition)
 w is odd-confined by f1
 w does not changes in l
 If w changes before the first
interface signal, for each odd i
ui is odd-confined by hi with causality
0 110R
w+
1 110R
1 110R
f1
w-
w+
1 110R
interface transition
relation shown in the figure
h1
1 1101
u+
1 1100
interface transition
1 1001
h3
l
For one CSC conflict, there exist
many candidate sets of signals
↓
• Analyze every CSC conflict
• Set up the covering problem
and solve it
2004/4/21 Async2004
29
Drawback
 For an STG with conflicting transitions, backtracking
may be needed

Finding actually fired noninterface transitions is no longer
deterministic due to deleting conflicting transitions
 If many conflicting transitions exist, backtracking
sometimes costs a lot
 Approaches that seem practical are to


Keep all conflicting transitions even if they are not related to
backtracking, or
Manually specify some of necessary conflicting transitions

Our compiler from a high-level language can automatically specify
those conflicting transitions
2004/4/21 Async2004
30
Issues to be discussed
 If the reduced STG has CSC, is a
correct speed independent circuit
synthesized from it?
 How can appropriate signals be chosen
without the state graph of the original
STG?
 How is the overhead (performance
degradation of the synthesized circuit)?
2004/4/21 Async2004
31
Experimental results
 Experiments



Implementation of the proposed method in C
Pentium 2.8GHz, 4GB memory
Final logic synthesis tool: petrify -gc -eqn
 Benchmarks
1. Instruction cache controller of TITAC2



generated from high-level spec by our compiler
large, but simple → input signal sets are small
compiler decisions are used for specifying conflicting transitions
2. Controllers of various filters



manually designed
medium, but complicated → input signal sets are large
all conflicting transitions are kept
3. Async Benchmarks


small and simple
all conflicting transitions are kept
2004/4/21 Async2004
32
Experimental results

CPU times, Memory usage



Benchmark1: significantly reduced
Benchmark2: reduced
Quality of synthesized circuits

Area (num. of transistors): almost no overhead
2004/4/21 Async2004
33
Experimental results
 For Async Benchmarks


Quality : no overhead (exactly the same area)
Cost : advantageous only for largest specs
CPU times (sec)
2.5
2
1.5
Proposed
1
Petrify
0.5
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
2004/4/21 Async2004
34
Conclusion
 New algorithm to find input signal sets for
decomposition based synthesis method

state graph of the original STG is not necessary

can handle larger circuits
 Logic synthesis tool : NUTAS

Linux binary is downloadable from
http://research.nii.ac.jp/~yoneda
 Future works


extend the algorithms to support timed circuit synthesis
finish the compiler development for high-level synthesis
and integrate both
2004/4/21 Async2004
35