Presentation slides. - Dept. of Electrical and Computer Engineering

Download Report

Transcript Presentation slides. - Dept. of Electrical and Computer Engineering

Generalized Buffering of PTL
Logic Stages using Boolean
Division and Don’t Cares
Rajesh Garg
Sunil P. Khatri
Department of Electrical and Computer Engineering,
Texas A&M University,
College Station, TX 77843
1
Outline
 Introduction
 Objective
 Previous
Work
 CODCs and ACODCs
 Generalized Buffering With CODCs
 Results
 Conclusions
2
Introduction



Pass Transistor Logic (PTL) typically used for specific
circuit implementations, like barrel shifters
No widely accepted PTL design methodology
There exists a direct mapping between an ROBDD node
and a PTL mux
f
f
v
v
fv
1
fv’
ROBDD Node

f
0
fv
MUX
fv’
v’
v
fv fv’
PTL based MUX
Hence ROBDDs can be used to perform PTL based
synthesis for general circuits
3
Introduction (contd)

Problems with direct mapping

Body Effect


Cannot connect more than 4-5 devices in series
Monolithic ROBDDs
Worst-case exponential size in number of the inputs (large)
 Memory explosion can occur during ROBDD construction


Partitioned ROBDDs



Avoids memory explosion of monolithic ROBDDs
Output of each PTL structure needs to be buffered
Regenerate electrical drive capability after 4 or 5 levels
using a pair of inverters (avoid body effect problems)
4
Objective

New PTL Synthesis Approach

Use partitioned ROBDDs
Avoid memory explosion
 Guarantee no more than 4-5 series devices


Use generalized buffering
Buffers can be complex logic gates in general (not simple
inverters/ buffers)
 Use ACODCs or CODCs to improve the extraction of
generalized buffers
 Simplifies the logic function of the PTL block


Boolean Division based formulation
Elegant, powerful formulation to extract generalized buffers
 Augmented with CODC / ACODCs


Reduces total circuit delay and area
5
Previous Work

Buch et. al, “Logic synthesis for large pass transistor circuits”,
in Proceedings, IEEE/ACM ICCAD, Nov 1997, pp 663-670

Lai et. al, “BDD decomposition for mixed CMOS/PTL logic
circuit synthesis”, in Proceedings, IEEE ISCAS, May 2005, pp.
5649-5652

Yamashita et. al, “Pass-transistor/CMOS collaborated logic:
The best of both worlds”, in Digest of Technical Papers,
Symposium on VLSI Circuits, June 1997, pp. 31-32.

Garg et. al, “Generalized buffering of PTL logic stages using
Boolean division”, in Proceedings, IEEE ISCAS, May 2006.
6
CODCs

Observability Don’t Cares (ODCs)
ODC jk  {x  B n s.t.



z k ( x) | y j 0  z k ( x) | y j 1}
ODCs used to minimize the logic function of a node
Need to re-compute the ODCs of the other nodes
after optimization
Compatible Observability Don’t Cares




Can simultaneously change the function of all nodes
Subset of ODCs
full_simplify is used to compute CODCs in SIS
ROBDD based computation to compute CODCs
7
CODCs (contd.)



Memory intensive
Computation is possible only for small and medium
sized circuits
Approximate CODCs by Saluja et. al




30X faster than full CODCs
Requires 30X less memory than full CODCs
Literal count reduction is about 80% of that obtained by
full CODCs
Can compute ACODCs for arbitrarily large circuits
8
PTL with Generalized Buffering
Primary Output
Primary Inputs
Primary Output
Primary Inputs
9
Boolean Division

Definition 1: g is a Boolean divisor of f if h
and r exist such that
f = gh + r
where, gh ≠ Ø
Definition 2: g is a Boolean factor of f if, g is a
Boolean divisor of f, and in addition,
r = Ø , i.e. f = gh
 Theorem: If fg ≠ Ø, then g is a Boolean divisor
of f.

10
ROBDD Division




Consider a (partitioned) ROBDD f of a node n in the network
Let d represent the CODCs of node n
Consider a library gate g ≡ G
Division of f by g can be represented by following equations:
f  fg  f g
f  G ( f  d  x)  f g
where x  g
U  G ( f  d  g )  f g // Upper bound of f
L  fG  f g
//Lower bound of f
Therefore, quotient f  h  f  d  g
remainder r  f g
Finally,
f  Gh  r
11
Generalized Buffering with CODCs

Synthesize partitioned PTL blocks with



Maximum depth of 5
No more than 5 transistors in series
Optimize and decompose network




Using only 2-input gates and inverters
PTL structure will grow in predictable manner
Initially any ROBBD can have maximum 8 variables
If division fails, we can make one of the fanins a
ROBDD variable -- back-track
12
Algorithm: Generalized Buffering with CODCs
A = dfs_and_levelize_nodes(η*)
i =1
while i ≤ size(A) do
n = array_fetch(A,i)
f = ntbdd_node_to_bdd(n)
if bdd_depth(f) ≥ 5 then
for g ≡ G Gate Library do
d = compute_dc(n)
f = test_division(f,g,d,G)
end for
if bdd_depth(f) > 5 then
back-track
else
bdd_create_variable(n)
continue
end if
else
continue
end if
end while
//creates ROBDD of node n
13
test_division with CODCs
test_division(f,d,g,G) {
if fg ≠ 0 then
U  ( fG  dG  f g  d g )( g G )
L  ( fG  f g )( g G )
Z = bdd_between(L,U)
Z* = bdd_smooth(Z,gvars)
R = bdd_compose(Z*,G,g)
if f  R  f + d && bdd_depth(Z*) < bdd_depth(f) then
Q  d ( g G )
d  V Q
return(success, Z*)
end if
else
return fail
end if
14
}
back-track
n
a
n-1
n-2
d
Needs
back-track
c
n
b
Re-process b
Make ‘c’ a
variable
n-2
15
Experimental Setup



Implemented in SIS
Process Technology- 100nm BPTM
Gate Library



AND2, AND3, AND4
OR2, OR3, OR4
A set of benchmark circuits were synthesized

Compared with traditional method



Inverters for buffering
Similar to method reported by Buch. et. al
Also compared with generalized buffering without don’t
cares
16
Gate Delay and Area
Gate
MUX
INV
Buffer
AND2
AND3
AND4
OR2
OR3
OR4
Delay (ps)
18
10.26
20.5
30.20
37.76
47.39
38.70
46.08
68.28
Area (2)
0.08
0.08
0.16
0.28
0.44
0.64
0.36
0.68
1.12
17
Delay
Ckt
Traditional
Buffering (ps)
Without CODCs
With ACODCs
With CODCs
alu2
1927.26
0.53
0.55
0.54
alu4
3458.88
0.45
0.43
-
apex6
819.72
0.73
0.75
0.75
C432
2302.38
0.86
0.79
0.76
C880
1248.8
0.70
0.53
0.54
C1908
1336.14
0.74
0.67
0.68
C3540
2104.56
0.71
0.53
-
i8
940.68
0.69
0.62
0.60
C6288
5971.5
0.94
0.86
-
…
…
…
…
…
0.756
0.710
-
AVG
Generalized Buffering (ps)
18
Area
Ckt
Generalized Buffering (2)
Traditional
Buffering (2)
Without CODCs
With ACODCs
With CODCs
alu2
164.32
0.87
0.86
0.83
alu4
963.68
1.12
1.12
-
apex6
305.52
0.95
0.93
0.93
C432
87.12
1.02
0.93
1.10
C880
136.16
0.80
0.79
0.79
C1908
152.24
0.97
0.93
0.93
C3540
532.88
1.01
0.98
-
i8
520.48
0.75
0.72
0.70
C6288
1220.48
1.10
1.06
-
…
…
…
…
…
0.970
0.950
-
AVG
19
Multiplexers
Ckt
Traditional
Buffering
Without CODCs
With ACODCs
With CODCs
alu2
718
0.60
0.577
0.532
alu4
4188
0.704
0.689
-
apex6
1337
0.737
0.719
0.718
C432
404
0.995
0.795
0.834
C880
594
0.731
0.702
0.712
C1908
660
0.858
0.806
0.792
C3540
2362
0.732
0672
-
i8
2418
0.483
0.449
0.439
C6288
5393
0.917
0.834
-
…
…
…
…
…
0.770
0.729
-
AVG
Generalized Buffering
20
Run-time
Ckt
Generalized Buffering (s)
Without CODCs
With ACODCs
With CODCs
alu2
0.38
11.48
435.32
alu4
8.99
54.52
-
apex6
1.49
4.6
117.23
C432
0.46
13.82
236.82
C880
0.24
2.41
70.0
C1908
0.790
8.08
362.12
C3540
5.01
41.56
-
i8
1.63
28.73
2651.48
C6288
10.79
151.49
-
…
…
…
…
AVG
4.837
33.55
21
Conclusions

Generalized buffering with ACODCs results in delay
reduction by



Area reduction obtained by generalized buffering
with ACODCs is



29% over traditional buffering
5% over generalized buffering without don’t cares
5% compared to traditional buffering
2% compared to generalized buffering without don’t cares
Multiplexers also reduced by


27% compared to traditional buffering
4% compared to generalized buffering without don’t cares
22
Conclusions (contd)


A large number of divisions were obtained for each circuit
Little advantage of using CODCs over ACODCs




Delay reduction is less than 1%
Area increases by 1%
Run-time is 76X slower
Can synthesize arbitrary sized circuits using partitioned
ROBDDs and ACODC based division
23
Thank You!!
Questions?
24