Presentation slides. - Dept. of Electrical and Computer Engineering
Download
Report
Transcript Presentation slides. - Dept. of Electrical and Computer Engineering
Generalized Buffering of PTL
Logic Stages using Boolean
Division and Don’t Cares
Rajesh Garg
Sunil P. Khatri
Department of Electrical and Computer Engineering,
Texas A&M University,
College Station, TX 77843
1
Outline
Introduction
Objective
Previous
Work
CODCs and ACODCs
Generalized Buffering With CODCs
Results
Conclusions
2
Introduction
Pass Transistor Logic (PTL) typically used for specific
circuit implementations, like barrel shifters
No widely accepted PTL design methodology
There exists a direct mapping between an ROBDD node
and a PTL mux
f
f
v
v
fv
1
fv’
ROBDD Node
f
0
fv
MUX
fv’
v’
v
fv fv’
PTL based MUX
Hence ROBDDs can be used to perform PTL based
synthesis for general circuits
3
Introduction (contd)
Problems with direct mapping
Body Effect
Cannot connect more than 4-5 devices in series
Monolithic ROBDDs
Worst-case exponential size in number of the inputs (large)
Memory explosion can occur during ROBDD construction
Partitioned ROBDDs
Avoids memory explosion of monolithic ROBDDs
Output of each PTL structure needs to be buffered
Regenerate electrical drive capability after 4 or 5 levels
using a pair of inverters (avoid body effect problems)
4
Objective
New PTL Synthesis Approach
Use partitioned ROBDDs
Avoid memory explosion
Guarantee no more than 4-5 series devices
Use generalized buffering
Buffers can be complex logic gates in general (not simple
inverters/ buffers)
Use ACODCs or CODCs to improve the extraction of
generalized buffers
Simplifies the logic function of the PTL block
Boolean Division based formulation
Elegant, powerful formulation to extract generalized buffers
Augmented with CODC / ACODCs
Reduces total circuit delay and area
5
Previous Work
Buch et. al, “Logic synthesis for large pass transistor circuits”,
in Proceedings, IEEE/ACM ICCAD, Nov 1997, pp 663-670
Lai et. al, “BDD decomposition for mixed CMOS/PTL logic
circuit synthesis”, in Proceedings, IEEE ISCAS, May 2005, pp.
5649-5652
Yamashita et. al, “Pass-transistor/CMOS collaborated logic:
The best of both worlds”, in Digest of Technical Papers,
Symposium on VLSI Circuits, June 1997, pp. 31-32.
Garg et. al, “Generalized buffering of PTL logic stages using
Boolean division”, in Proceedings, IEEE ISCAS, May 2006.
6
CODCs
Observability Don’t Cares (ODCs)
ODC jk {x B n s.t.
z k ( x) | y j 0 z k ( x) | y j 1}
ODCs used to minimize the logic function of a node
Need to re-compute the ODCs of the other nodes
after optimization
Compatible Observability Don’t Cares
Can simultaneously change the function of all nodes
Subset of ODCs
full_simplify is used to compute CODCs in SIS
ROBDD based computation to compute CODCs
7
CODCs (contd.)
Memory intensive
Computation is possible only for small and medium
sized circuits
Approximate CODCs by Saluja et. al
30X faster than full CODCs
Requires 30X less memory than full CODCs
Literal count reduction is about 80% of that obtained by
full CODCs
Can compute ACODCs for arbitrarily large circuits
8
PTL with Generalized Buffering
Primary Output
Primary Inputs
Primary Output
Primary Inputs
9
Boolean Division
Definition 1: g is a Boolean divisor of f if h
and r exist such that
f = gh + r
where, gh ≠ Ø
Definition 2: g is a Boolean factor of f if, g is a
Boolean divisor of f, and in addition,
r = Ø , i.e. f = gh
Theorem: If fg ≠ Ø, then g is a Boolean divisor
of f.
10
ROBDD Division
Consider a (partitioned) ROBDD f of a node n in the network
Let d represent the CODCs of node n
Consider a library gate g ≡ G
Division of f by g can be represented by following equations:
f fg f g
f G ( f d x) f g
where x g
U G ( f d g ) f g // Upper bound of f
L fG f g
//Lower bound of f
Therefore, quotient f h f d g
remainder r f g
Finally,
f Gh r
11
Generalized Buffering with CODCs
Synthesize partitioned PTL blocks with
Maximum depth of 5
No more than 5 transistors in series
Optimize and decompose network
Using only 2-input gates and inverters
PTL structure will grow in predictable manner
Initially any ROBBD can have maximum 8 variables
If division fails, we can make one of the fanins a
ROBDD variable -- back-track
12
Algorithm: Generalized Buffering with CODCs
A = dfs_and_levelize_nodes(η*)
i =1
while i ≤ size(A) do
n = array_fetch(A,i)
f = ntbdd_node_to_bdd(n)
if bdd_depth(f) ≥ 5 then
for g ≡ G Gate Library do
d = compute_dc(n)
f = test_division(f,g,d,G)
end for
if bdd_depth(f) > 5 then
back-track
else
bdd_create_variable(n)
continue
end if
else
continue
end if
end while
//creates ROBDD of node n
13
test_division with CODCs
test_division(f,d,g,G) {
if fg ≠ 0 then
U ( fG dG f g d g )( g G )
L ( fG f g )( g G )
Z = bdd_between(L,U)
Z* = bdd_smooth(Z,gvars)
R = bdd_compose(Z*,G,g)
if f R f + d && bdd_depth(Z*) < bdd_depth(f) then
Q d ( g G )
d V Q
return(success, Z*)
end if
else
return fail
end if
14
}
back-track
n
a
n-1
n-2
d
Needs
back-track
c
n
b
Re-process b
Make ‘c’ a
variable
n-2
15
Experimental Setup
Implemented in SIS
Process Technology- 100nm BPTM
Gate Library
AND2, AND3, AND4
OR2, OR3, OR4
A set of benchmark circuits were synthesized
Compared with traditional method
Inverters for buffering
Similar to method reported by Buch. et. al
Also compared with generalized buffering without don’t
cares
16
Gate Delay and Area
Gate
MUX
INV
Buffer
AND2
AND3
AND4
OR2
OR3
OR4
Delay (ps)
18
10.26
20.5
30.20
37.76
47.39
38.70
46.08
68.28
Area (2)
0.08
0.08
0.16
0.28
0.44
0.64
0.36
0.68
1.12
17
Delay
Ckt
Traditional
Buffering (ps)
Without CODCs
With ACODCs
With CODCs
alu2
1927.26
0.53
0.55
0.54
alu4
3458.88
0.45
0.43
-
apex6
819.72
0.73
0.75
0.75
C432
2302.38
0.86
0.79
0.76
C880
1248.8
0.70
0.53
0.54
C1908
1336.14
0.74
0.67
0.68
C3540
2104.56
0.71
0.53
-
i8
940.68
0.69
0.62
0.60
C6288
5971.5
0.94
0.86
-
…
…
…
…
…
0.756
0.710
-
AVG
Generalized Buffering (ps)
18
Area
Ckt
Generalized Buffering (2)
Traditional
Buffering (2)
Without CODCs
With ACODCs
With CODCs
alu2
164.32
0.87
0.86
0.83
alu4
963.68
1.12
1.12
-
apex6
305.52
0.95
0.93
0.93
C432
87.12
1.02
0.93
1.10
C880
136.16
0.80
0.79
0.79
C1908
152.24
0.97
0.93
0.93
C3540
532.88
1.01
0.98
-
i8
520.48
0.75
0.72
0.70
C6288
1220.48
1.10
1.06
-
…
…
…
…
…
0.970
0.950
-
AVG
19
Multiplexers
Ckt
Traditional
Buffering
Without CODCs
With ACODCs
With CODCs
alu2
718
0.60
0.577
0.532
alu4
4188
0.704
0.689
-
apex6
1337
0.737
0.719
0.718
C432
404
0.995
0.795
0.834
C880
594
0.731
0.702
0.712
C1908
660
0.858
0.806
0.792
C3540
2362
0.732
0672
-
i8
2418
0.483
0.449
0.439
C6288
5393
0.917
0.834
-
…
…
…
…
…
0.770
0.729
-
AVG
Generalized Buffering
20
Run-time
Ckt
Generalized Buffering (s)
Without CODCs
With ACODCs
With CODCs
alu2
0.38
11.48
435.32
alu4
8.99
54.52
-
apex6
1.49
4.6
117.23
C432
0.46
13.82
236.82
C880
0.24
2.41
70.0
C1908
0.790
8.08
362.12
C3540
5.01
41.56
-
i8
1.63
28.73
2651.48
C6288
10.79
151.49
-
…
…
…
…
AVG
4.837
33.55
21
Conclusions
Generalized buffering with ACODCs results in delay
reduction by
Area reduction obtained by generalized buffering
with ACODCs is
29% over traditional buffering
5% over generalized buffering without don’t cares
5% compared to traditional buffering
2% compared to generalized buffering without don’t cares
Multiplexers also reduced by
27% compared to traditional buffering
4% compared to generalized buffering without don’t cares
22
Conclusions (contd)
A large number of divisions were obtained for each circuit
Little advantage of using CODCs over ACODCs
Delay reduction is less than 1%
Area increases by 1%
Run-time is 76X slower
Can synthesize arbitrary sized circuits using partitioned
ROBDDs and ACODC based division
23
Thank You!!
Questions?
24