Low-Power Arithmetic Logic Circuits

Download Report

Transcript Low-Power Arithmetic Logic Circuits

An Extra-Regular, Compact, Low-Power Multiplier Design
Using Triple-Expansion Schemes and Borrow Parallel Counter Circuits
Rong Lin
Ronald B. Alonzo
SUNY at Geneseo
University of Rochester
ISCA-WCED, San Diego, CA, June 2003
The Focus of The Presentation:
A Complexity-Reduced Multiplier Design Approach
With superiority in layout compactness, small area, low-power, highperformance, with potential for self testability.
Contents
•
•
•
•
•
•
Background
Overview of the building block circuits
Overview of the intermediate block circuits
Overview of the triple expanded multiplier architecture
Experimental work
Concluding remarks
1. Background
Traditional Approach
Stage 1: Generation of the large partial product bit matrix ----Usually with Booth recoding
Stage 2: Reduction of the partial product matrix into two
numbers ----- Usually with binary CSA adders:
(3,2) (4, 2) based
Stage 3: Final addition (by a standard fast adder)
Recently proposed designs:
Rectangular-styled Wallacetree
[Ref. 2] (Itoh, et al. 2001)
Limited switch dynamic logic
[Ref.1] (Montoye, et al. 2003)
two groups of partial
Product bits
merging precharged
Dynamic logic into
Input of every latch
Our Approach
Stage 1: Generation of many (81 for 54x54-b) small partial
product bit matrices in parallel -----Non-Booth
Stage 2: Reduction of the partial product matrices into two
numbers ----- with non-binary 4-b 1-hot encoded counters
(called borrow parallel counters ), which are larger than (3,2)
(4, 2) binary counters
Stage 3: Final addition (by a standard fast adder)
Complexity is reduced significantly:
simple CMOS technology
Smaller area
minimal custom design
repeatable and modular
self-testable
low-power
2. The Circuits Of Building Blocks
The building block circuits: borrow parallel counters
The 5_1 borrow parallel counter
About the large parallel counter 5_1
Receiving 5 binary Input bits with 1 of them being weighted 2
(called borrow bit), and others weighted 1.
Producing 2 output bits and 3 In-stage carry in and out bits),
so that the weighted sums of all in bits and all out bits are equal.
CMOS pass-transistor circuit processing 4-b 1-hot
encoded signals, each representing an integer of value
ranging 0 to 3.
(1) Low switching activity
(2) Fewer hot lines (data paths)
(3) Low transistor count (78; equivalent to 3.3 FA’s)
(4) A very compact layout due to good transistor
distribution and 4 identical paths processed in parallel
(binary logic does not have the advantages)
The borrow bit (in red)
(1) Simplify the logic, reduce the number of transistors
(2) Reduce the number of pass transistors cascaded (no more than 4 including
1 within the input inverter)
(3) Rearrange and balance input bits for small multipliers
The embedded full adder adding two 4-b 1-hot encoded
bits (s0 at column j+2, s1 column at j+1) and 1 binary
bit (q at column j) directly ------ they have the same
weight!
No type-conversion needed
The embedded full adder adding two 4-b 1-hot encoded
bits (s0 at column j+2, s1 column at j+1) and 1 binary
bit (q at column j) directly ------ they have the same
weight!
No type-conversion needed
3. The Circuits Of Intermediate Blocks
The 6 x 6-b borrow parallel multiplier
ovals with the
same color form
an embedded FA
(or HA or a binary
bit)
(3,2): 3 ovals
(2,2): 2 ovals
single bit: 1 oval
Input: two 6-b numbers; output two numbers:
p10 - p0 and q10 - q5
CSA style output, because it serves as an intermediate block)
•
An array of borrow parallel counters
(virtually eliminating all area needed for inter-counter connections)
The height of the block is very small
(important for triple expansion)
•
Inheriting all advantages of borrow parallel counters
Delay = a single counter delay
Height = a single counter height
•
Extra compact virtually no inter-counter connection
Comparison of inter-block connections of 6 x6 multipliers
Borrow parallel approach
Traditional approach
30% area reduction!
4. The Triple Expanded Multipliers
The partial product bit matrix
trisect-decomposition and
first-level multiplier triple expansion
Triple 6 x 6-b => 18 x 18-b multiplier
Second-level multiplier triple expansion
Triple 18 x 18-b => 54 x 54-b multiplier
54 x 54-b
The typical simulation data
The summary of multipliers
0.70
5. The Experimental Work:
Layout And Tests
The 5_1 borrow parallel counter (with output buffers):
The 6 x 6 multiplier
- wiring at this level very simple
- Manhattan cell structure
The 4X4 multiplier with counters (4,2), (3,2), and
(2,2)
- wiring very irregular
6. Concluding Remarks
Concluding Remarks
Complexity-reduced multiplier design with new arithmetic
circuits and schemes achieving low-power highperformance through a novel logic approach which
includes:
(1) 4-b 1-hot data paths are dominated (lower switching
activity in each logic stage)
(2) Fewer hot lines generated in logic process (power &
leakage power)
(3) Lower transistor count
(4) Higher circuit regularity, lower layout complexity
(5) Lower complexity of component interconnection
Concluding Remarks (cont’d)
(6) Utilizing borrow bits for simple circuit and high speed, more
importantly, reducing pass-transistor path length (no more than
4) and rearranging and balancing input bits to each column of
small multipliers.
(7) Utilizing partial product bit matrix decomposition for component
repetition and full self-testability, achieving high observability
and controllability for component circuits (small multipliers are
exhaustively testable)