CSE 477. VLSI Systems Design

Download Report

Transcript CSE 477. VLSI Systems Design

CSE477
VLSI Digital Circuits
Fall 2003
Lecture 20: Adder Design
Mary Jane Irwin ( www.cse.psu.edu/~mji )
www.cse.psu.edu/~cg477
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003
J. Rabaey, A. Chandrakasan, B. Nikolic]
CSE477 L20 Adder Design.1
Irwin&Vijay, PSU, 2003
Review: Basic Building Blocks

Datapath

Execution units
- Adder, multiplier, divider, shifter, etc.



Control


Finite state machines (PLA, ROM, random logic)
Interconnect


Register file and pipeline registers
Multiplexers, decoders
Switches, arbiters, buses
Memory

Caches (SRAMs), TLBs, DRAMs, buffers
CSE477 L20 Adder Design.2
Irwin&Vijay, PSU, 2003
The 1-bit Binary Adder
Cin
A
B
1-bit Full
Adder
(FA)
Cout
G=A&B
P=AB
K = !A & !B
S
A
B
Cin
Cout
S
carry status
0
0
0
0
0
kill
0
0
1
0
1
kill
0
1
0
0
1
propagate
0
1
1
1
0
propagate
1
0
0
0
1
propagate
1
0
1
1
0
propagate
1
1
0
1
0
generate
1
1
1
1
1
generate
S = A  B  Cin
= P  Cin
Cout = A&B | A&Cin | B&Cin
= G | P&Cin
(majority function)

How can we use it to build a 64-bit adder?

How can we modify it easily to build an adder/subtractor?

How can we make it better (faster, lower power, smaller)?
CSE477 L20 Adder Design.3
Irwin&Vijay, PSU, 2003
FA Gate Level Implementations

The way you learned to design in CSE271 and CSE471
A B Cin
A
t1
B Cin
t0
t2 t1
t0
t2
Cout
S
Cout
S
CSE477 L20 Adder Design.4
Irwin&Vijay, PSU, 2003
Review: XOR FA
Cin
A
S
B
Cout
16 transistors
CSE477 L20 Adder Design.5
Irwin&Vijay, PSU, 2003
Review: CPL FA
!B
!Cin
B
Cin
A
!S
!A
S
B
!B
Cin
A
!Cin
!Cout
B
Cin
!A
Cout
!B
!Cin
20+8 transistors, dual rail – beware of threshold drops
CSE477 L20 Adder Design.6
Irwin&Vijay, PSU, 2003
Review: Mirror Adder
24+4 transistors
A
8 B
8 B
8
4 B
4 Cin
4
kill
0-propagate
8
A 8
Cin
4
1-propagate
A
A
4 B
A 4
4 B
4
!Cout
2
generate
4
Cout = A&B | B&Cin | A&Cin
A
2 B
2 Cin 2
B
6
A
6
Cin 6
!S
Cin 3
A
3
B
3
SUM = A&B&Cin | COUT&(A | B | Cin)
Sizing: Each input in the carry circuit has a logical effort of 2 so the
optimal fan-out for each is also 2. Since !Cout drives 2 internal and 2
inverter transistor gates (to form Cin for the nms bit adder) should
oversize the carry circuit. PMOS/NMOS ratio of 2.
CSE477 L20 Adder Design.8
Irwin&Vijay, PSU, 2003
Mirror Adder Features

The NMOS and PMOS chains are completely
symmetrical with a maximum of two series transistors in
the carry circuitry, guaranteeing identical rise and fall
transitions if the NMOS and PMOS devices are properly
sized.

When laying out the cell, the most critical issue is the
minimization of the capacitances at node !Cout (four
diffusion capacitances, two internal gate capacitances,
and two inverter gate capacitances). Shared diffusions
can reduce the stack node capacitances.

The transistors connected to Cin are placed closest to the
output.

Only the transistors in the carry stage have to be
optimized for optimal speed. All transistors in the sum
stage can be minimal size.
CSE477 L20 Adder Design.9
Irwin&Vijay, PSU, 2003
A 64-bit Adder/Subtractor
add/subt


Ripple Carry Adder (RCA)
built out of 64 FAs
Subtraction – complement
all subtrahend bits (xor
gates) and set the low
order carry-in
RCA

advantage: simple logic,
so small (low cost)

disadvantage: slow (O(N)
for N bits) and lots of
glitching (so lots of energy
consumption)
CSE477 L20 Adder Design.10
A0
1-bit
FA
C1
S0
A1
1-bit
FA
C2
S1
A2
1-bit
FA
C3
S2
B0
B1
B2
...

C0=Cin
C63
A63
B63
1-bit
FA
S63
C64=Cout
Irwin&Vijay, PSU, 2003
Ripple Carry Adder (RCA)
A3 B3
A2 B2
A1 B1
A0 B0
FA
FA
FA
FA
Cout=C4
S3
S2
S1
C0=Cin
S0
Tadder  TFA(A,BCout) + (N-2)TFA(CinCout) + TFA(CinS)
T = O(N) worst case delay
Real Goal: Make the fastest possible carry path
CSE477 L20 Adder Design.11
Irwin&Vijay, PSU, 2003
Inversion Property

Inverting all inputs to a FA results in inverted values for
all outputs
A
Cout
B
FA
A
Cin

Cout
S
B
FA
Cin
S
!S (A, B, Cin) = S(!A, !B, !Cin)
!Cout (A, B, Cin) = Cout (!A, !B, !Cin)
CSE477 L20 Adder Design.12
Irwin&Vijay, PSU, 2003
Exploiting the Inversion Property
A3 B3
A2 B2
A1 B1
A0 B0
FA’
FA’
FA’
FA’
S3
S2
S1
S0
Cout=C4
C0=Cin
inverted cell regular cell
Minimizes the critical path (the carry chain) by
eliminating inverters between the FAs (will need to
increase the transistor sizing on the carry chain portion of
the mirror adder).

Now need two “flavors” of FAs
CSE477 L20 Adder Design.13
Irwin&Vijay, PSU, 2003
Fast Carry Chain Design

The key to fast addition is a low latency carry network

What matters is whether in a given position a carry is




generated
propagated
annihilated (killed)
Gi = Ai & Bi
Pi = Ai  Bi (sometimes use Ai | Bi)
Ki = !Ai & !Bi
Giving a carry recurrence of
Ci+1 = Gi | Pi&Ci
C1 = G0
C 2 = G1
C 3 = G2
C 4 = G3
|
|
|
|
P0&C0
P1&G0 | P1&P0 &C0
P2&G1 | P2&P1&G0 | P2&P1&P0&C0
P3&G2 | P3&P2&G1 | P3&P2&P1&G0 | P3&P2&P1&P0&C0
CSE477 L20 Adder Design.15
Irwin&Vijay, PSU, 2003
Manchester Carry Chain (MCC)

Switches controlled by Gi and Pi
!Ci+1
!Ci
Gi
Pi
clk

Total delay of



time to form the switch control signals Gi and Pi
setup time for the switches
signal propagation delay through N switches in the worst case
CSE477 L20 Adder Design.16
Irwin&Vijay, PSU, 2003
4-bit Sliced MCC Adder
A3
B3
A2
B2
A1
B1
A0
B0
clk
&

G P
&

G P
&

G P
&

G P
!C4
!C0
!C3
!C1
!C2




S3
S2
S1
S0
CSE477 L20 Adder Design.17
Irwin&Vijay, PSU, 2003
8-bit MCC Adder
&
!C7


&

4-bit slice MCC
4-bit slice MCC


!C0
Its really hard to beat the speed of a well
designed MCC for word lengths of 8 bits or less !
CSE477 L20 Adder Design.18
Irwin&Vijay, PSU, 2003
Carry Skip Adders
(aka Carry Bypass Adders)
T = O(n)
A = O(n)
CSE477 L20 Adder Design.19
Irwin&Vijay, PSU, 2003
Carry Skip Adder
C4
A3 B3
A2 B2
A1 B1
A0 B0
FA
FA
FA
FA
C0
C4
S3
BP = P0&P1&P2&P3
S2
S1
S0
“Block Propagate”
If (P0 & P1 & P2 & P3 = 1) then C4 = C0 otherwise the
block itself kills or generates the carry internally
CSE477 L20 Adder Design.20
Irwin&Vijay, PSU, 2003
Carry-Skip Chain Implementation
block carry-out
carry-out
BP
block carry-in
P3
P2
P1
P0
!Cout
Cin
G3
G2
G1
G0
BP
CSE477 L20 Adder Design.21
Irwin&Vijay, PSU, 2003
16 bit, 4-bit Block Carry Skip Adder
bits 12 to 15
bits 8 to 11
bits 4 to 7
bits 0 to 3
Setup
Setup
Setup
Setup
Carry
Propagation
Carry
Propagation
Carry
Propagation
Carry
Propagation
Ci,0
Sum
Sum
Sum
Sum
Worst-case delay  carry from bit 0 to bit 15 = carry generated
in bit 0, ripples through bits 1, 2, and 3, skips the middle two
groups (B is the group size in bits), ripples in the last group from
bit 12 to bit 15
Tadd = tsetup + B tcarry + ((N/B) - 2) tskip +B tcarry + tsum
CSE477 L20 Adder Design.22
Irwin&Vijay, PSU, 2003
Optimal Skip Block Size and Add Time

Assuming one stage of ripple (tcarry) has the same delay
as one skip logic stage (tskip) and both are 1
TCSkA = 1 + B + (N/B-2) + B + 1
tsetup
ripple in
block 0
skips
ripple in
last block
tsum
= 2B + N/B

So the optimal block size, B, is
dTCSkA/dB = 0  (N/2) = Bopt

And the optimal time is
Optimal TCSkA = 4√(n/2) – 1 = 2√(2n) – 1
CSE477 L20 Adder Design.23
Irwin&Vijay, PSU, 2003
Carry Skip Adder Extensions

Variable block sizes

A carry that is generated in, or absorbed by, one of the inner
blocks travels a shorter distance through the skip blocks, so can
have bigger blocks for the inner carries without increasing the
overall delay
Cout

Cin
Multiple levels of skip logic
Cout
Cin
skip level 1
skip level 2
CSE477 L20 Adder Design.24
AND of the
first level skip
signals (BP’s)
Irwin&Vijay, PSU, 2003
RCA, Carry Skip Adder Comparison
70
60
50
40
RCA
CSkA
30
20
B=4
10
B=2
B=5
B=6
B=3
0
8 bits
CSE477 L20 Adder Design.25
16 bits
32 bits
48 bits
64 bits
Irwin&Vijay, PSU, 2003
Prefix Adders
T = O(log n)
A = O(n log n)
CSE477 L20 Adder Design.31
Irwin&Vijay, PSU, 2003
Parallel Prefix Adders (PPAs)

Define carry operator € on (G,P) signal pairs
(G’’,P’’) (G’,P’)
G’’
G’
€
where
G = G’’ | P’’&G’
P = P’’&P’
(G,P)

!G
P’’
€ is associative, i.e.,
[(g’’’,p’’’) € (g’’,p’’)] € (g’,p’) = (g’’’,p’’’) € [(g’’,p’’) € (g’,p’)]
€
€
€
CSE477 L20 Adder Design.32
€
Irwin&Vijay, PSU, 2003
PPA General Structure

Given P and G terms for each bit position, computing all
the carries is equal to finding all the prefixes in parallel
(G0,P0) € (G1,P1) € (G2,P2) € … € (GN-2,PN-2) € (GN-1,PN-1)

Since € is associative, we can group them in any order

but note that it is not commutative
Pi, Gi logic (1 unit delay)

Measures to consider


Ci parallel prefix logic tree
(1 unit delay per level)




Si logic (1 unit delay)
CSE477 L20 Adder Design.33

number of € cells
tree cell depth (time)
tree cell area
cell fan-in and fan-out
max wiring length
wiring congestion
delay path variation (glitching)
Irwin&Vijay, PSU, 2003
Brent-Kung PPA
G15 G14 G13 G12
p15 p14 p13 P12
€
€
G11 G10 G9
p11 P10 p9
€
€
G8 G7
P8 P7
€
G6 G5
P6 P5
€
€
G4 G3
P4 P3
€
€
€
€
G2
p2
G1
P1
G0
P0
€
€
€
€
€
€
€
€
€
€
C16 C15 C14 C13 C12 C11
CSE477 L20 Adder Design.35
€
€
C10 C9
€
C8 C7
€
C6 C5
€
C4 C3
C2
C1
Irwin&Vijay, PSU, 2003
A Faster Yet PPA

Brent-Kung (BK) adder has the time bound of
TBK = 1 + (2log N – 2) + 1

There are even faster PPA approaches that are used in
most modern day machines for operands of 32 bits or
greater

Kogge-Stone (KS)

faster pp tree (logN for KS versus 2logN-2 for BK)

fan-out of carry cell € limited to two

takes more € cells (NlogN - N + 1 for KS versus
2N - 2 - logN for BK) and has more wiring
CSE477 L20 Adder Design.36
Irwin&Vijay, PSU, 2003
Kogge-Stone PPF Adder
G15 G14 G13 G12
P15 P14 P13 P12
G11 G10 G9
P11 P10 P9
G8
P8
G7
P7
G6
P6
G5
P5
G4
P4
G3
P3
G2 G1
P2 P1
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
€
C8
C7
C16 C15 C14 C13 C12 C11 C10 C9
CSE477 L20 Adder Design.38
C6
C5 C4
Tadd = tsetup + log2N t€ + tsum
C3
G0
P0 C
in
€
€
C2
C1
Irwin&Vijay, PSU, 2003
PPA Comparisons
Measure
BK PPA
N=64
KS PPA
N=64
# of € cells
tree depth
tree area
(WxH)
cell fan-in
2N - 2 - logN
2logN - 2
(N/2) * (2logN -2)
129
10
320
NlogN - N + 1
logN
N * logN
321
6
384
2
2
2
2
cell fan-out
max wire
length
wiring
density
glitching
logN
N/4
6
16
2
N/2
2
32
CSE477 L20 Adder Design.39
sparse
dense
high
low
Irwin&Vijay, PSU, 2003
More Adder Comparisons
70
60
50
RCA
CSkA
KS PPA
40
30
20
10
0
8 bits
CSE477 L20 Adder Design.40
16 bits
32 bits
48 bits
64 bits
Irwin&Vijay, PSU, 2003
Next Lecture and Reminders

Next lecture

Multiplier Design
- Reading assignment – Rabaey, et al, 11.4

Reminders





HW#4 due November 11th (not Nov 4th as on outline)
HW#5 will be optional (due November 20th)
Project final reports due December 4th
Final grading negotiations/correction (except for the final
exam) must be concluded by December 10th
Final exam scheduled
- Tuesday, December 16th from 10:10 to noon in 118 and 113
Thomas
CSE477 L20 Adder Design.41
Irwin&Vijay, PSU, 2003