64 bit Kogge-Stone Adders in different logic styles – A study

Download Report

Transcript 64 bit Kogge-Stone Adders in different logic styles – A study

64 bit Kogge-Stone
Adders in different logic
styles – A study
Rob McNish
Satyanand Nalam
Objectives
To compare speed and power dissipation
of 64-bit Kogge Stone Adder in 3 logic
styles:
 Static CMOS
 Dynamic Logic
 Static Output Prediction Logic (OPL)


To reduce leakage power dissipation in
OPL circuits using MTCMOS techniques.
The Adder






Technology used – 90nm PTM
Hierarchical design
Inverting sub-blocks (dot and square)
to implement the nodes of the KoggeStone tree
Implement the basic sub-blocks in
static, dynamic and OPL-static styles
Minimal changes to the tree netlist to
construct the 3 adders
Schematic for 16 bit adder is shown.
Can be extended for a 64 bit adder.
Schematic for 16 bit adder
Output Prediction Logic (OPL)
Logic style that can be applied to different
logic styles to increase speed
 Retains attributes of the underlying family
(e.g Static, Dynamic, Pseudo-nmos etc.)
 Relies on alternating nature of logical
output values of a critical path, i.e, for any
critical path the outputs of the gates along
the paths will be alternating zeros and
ones.

OPL Concept
OPL predicts that every
output will be 1 after the
transitions are completed
 Since all gates are
inverting, the predictions
will be correct one half of
the time => at least 2X
speedup
 Problem: One at the
output of every inverting
gate is not a stable state

OPL Example



Solution: tri-state each gate
with a clock => 1 at input
and 1 output is possible.
Example shown – 3 input
nor gate in OPL-static,
where the predicted value is
a 1.
CLK=0 => gate is tristated,
with output precharged to 1.
CLK=1 => conventional
CMOS gate
OPL clocking: Chain of OPL gates
OPL clocking
 Clock
separation too less => heavy
glitching and precharge value lost
 Clock separation too large =>
minimal glitching, but speedup
achieved is limited by the clock, not
by the data
 Optimal Clock separation => limited
glitching and circuit is not clockblocked.
Delay Plot – Static CMOS



Best case delay for the
carry tree is the path for
C0 as this consists entirely
of inverters.
Best case delay
distribution for the static
cmos adder is shown. The
mean was 144 ps.
The input vectors (in hex)
are A=0000 0000 0000
0001 B=0000 0000 0000
0000 -> 0000 0000 0000
0001
Delay Plot – Static CMOS
 Delay
plot for a
random case is
shown.
 Input vectors are
A=8000 0000 0000
0000 B=0000 0000
0000 0000 ->
8000 0000 0000
0000
Delay Plot – Dynamic logic
Delay plot for a
random case is
shown.
 Input vectors are
A=8000 0000 0000
0000 B=0000 0000
0000 0000 -> 8000
0000 0000 0000

Power dissipation

Power dissipation was measured for the
three adders using spectre for the
Static CMOS
0.19 mW
Dynamic Logic
14.7 mW
OPL
16.29 mW
The novelty: Using a high VT footer to
reduce leakage power in OPL gates
 Added
High VT footer transistor, in
order to reduce leakage power in
standby mode for the OPL adder.
 Footers added to the basic subblocks.
 High VT transistor modeled by
applying a negative voltage to the
bulk of the footer transistor.
Leakage power reduction

10x reduction in leakage power got by
using the high VT footer in the OPL
adder.
w/o high
VT footer
6.3 uW
With high 0.54 uW
VT footer
References
1. A 0.5V, 400MHz, VDD-Hopping Processor with Zero-VTH FD-SOI
Technology
Hiroshi Kawaguchi, Kouichi Kanda ISSCC 2003 / SESSION 6 / LOW-POWER
DIGITAL TECHNIQUES / PAPER 6.3
2. Output prediction logic: a high-performance CMOS design technique
McMurchie, L.; Kio, S.; Yee, G.; Thorp, T.; Sechen, C.; Computer Design,
2000. Proceedings. 2000 International Conference on 17-20 Sept. 2000
Page(s):247 - 254
3. 409ps 4.7 FO4 64b adder based on output prediction logic in 0.18um CMOS
Sheng Sun; Yi Han; Xinyu Guo; Kian Haur Chong; McMurchie, L.; Sechen,
C.; VLSI, 2005. Proceedings. IEEE Computer Society Annual Symposium
on 11-12 May 2005 Page(s):52 - 58