Transcript Document

Computing Systems
Designing a basic ALU
[email protected]
1
Let’s start designing a processor !!!!
 Almost ready to move into chapter 5 and start building a
processor
 First, let’s review Boolean Logic and build the ALU we’ll
need (Material from Appendix B)
operation
a
32
ALU
result
32
b
32
2
Review: Boolean algebra and gates
Problem:




Consider a logic function with three
inputs: A, B, and C.
 Output D is true if at least one
input is true
 Output E is true if exactly two
inputs are true
 Output F is true only if all three
inputs are true
Show the truth table for these three
functions.
Show the Boolean equations for
these three functions.
Show an implementation
consisting of inverters, AND, and
OR gates.
Solution
A
0
0
0
0
1
1
1
1
B
0
0
1
1
0
0
1
1
C
0
1
0
1
0
1
0
1
D
0
1
1
1
1
1
1
1
E
0
0
0
1
0
1
1
0
D=A+B+C
E = A’.B.C + A.B’.C+ A.B.C’
F = A.B.C
F
0
0
0
0
0
0
0
1
3
one-bit adder
carry in
It takes three input bits and
generates two output bits
a
b
sum
carry out
Multiple bits can be cascaded
4
one-bit adder: Boolean algebra
ci = carry in, co = carry out = ci+1, s = sum
a
0
0
0
0
1
1
1
1
s
b
0
0
1
1
0
0
1
1
ci
0
1
0
1
0
1
0
1
co
0
0
0
1
0
1
1
1
s
0
1
1
0
1
0
0
1
= a’.b’.ci + a’.b.ci’+ a.b’.ci’ + a.b.ci = a xor b xor ci
ci+1 = a.b + (a+b).ci
when both a and b are 1 ci+1 is 1 no matter ci
5
Ripple adder
a0 b0
a1 b1
a2 b2
c1
c0
c2
s0
ai
0
0
0
0
1
1
1
1
s1
bi
0
0
1
1
0
0
1
1
an-1 bn-1
c3
cn-1
…
cn
s2
ci
0
1
0
1
0
1
0
1
ci+1
0
0
0
1
0
1
1
1
Sn-1
si
0
1
1
0
1
0
0
1
gi
0
0
0
0
0
0
1
1
pi
0
0
0
1
0
1
0
0
6
It is more convenient to use
pi* or pi** than pi
Ripple adder
ai
0
0
0
0
1
1
1
1
bi
0
0
1
1
0
0
1
1
ci
0
1
0
1
0
1
0
1
ci+1
0
0
0
1
0
1
1
1
si
0
1
1
0
1
0
0
1
gi
0
0
0
0
0
0
1
1
pi
0
0
0
1
0
1
0
0
and
gi = ai.bi
pi = ai’.bi.ci + ai.bi’.ci = ci.(ai xor bi)
ci+1 = gi + pi.ci = gi + ci.ci.(ai xor bi) = gi.ci.(ai xor bi) = gi + ci.pi*
where pi* = ai xor bi
or alternatively ci+1 = gi + pi**.ci with pi** = ai + bi
si = pi* xor ci
but:
si = pi xor ci
si = pi** xor ci
is not true !
is not true !
7
xor gates can be very fast if designed using pass transistors
Ripple adder timing
a0 b0
a1 b1
c1
c0
s0
a2 b2
c2
s1
an-1 bn-1
c3
s2
cn-1
…
cn
Sn-1
worst case:
tadder = (n-1) tcarry + tsum
where: tcarry = delay from ci to ci+1
tsum = delay from cn-1 to sn
ci+1 = gi + pi*.ci
si = pi* xor ci
Assuming the sum circuit is slower than
the carry otherwise simply: tadder = n tcarry
8
Problem: carry ripple adder is slow
 Is there more than one way to do addition ? YES !!!
 two extremes: ripple carry and sum-of-products
Can you see the ripple? How could you get rid of it?
c1
c2
c3
c4
=
=
=
=
b0c0
b1c1
b2c2
b3c3
+
+
+
+
a0c0
a1c1
a2c2
a3c3
+
+
+
+
a0b0
a1b1
a2b2
a3b3
c2 =
c3 =
c4 =
…
sum-of-product not feasible! Why?
we would need “infinite”hardware !!!
9
Carry-lookahead adder
 An approach in-between the two extremes (sum-of-products and ripple
adders)
 Motivation:
 If we didn't know the value of carry-in, what could we do?
 When would we always generate a carry?
gi=ai bi
 When would we propagate the carry?
pi=ai+bi
 How to get rid of the ripple?
 for each bit in an n-bit adder:
ci+1 = f(ai,bi,ci)= gi+ pici
 The dependency between ci+1 and ci can be eliminated by
expanding ci (i.e., compute ci for each stage in parallel instead
of waiting for the carry from the previous stage)
10
Carry lookahead adder
c1
c2
c3
c4
=
=
=
=
g0
g1
g2
g3
+
+
+
+
a0 b0
g0
c0
c0
c2 = g1+p1g0+p1p0c0
c3 = g2+p2g1+p2p1g0+p2p1p0c0
c4 = g3+p3g2+p3p2g1+p3p2p1g0+p3p2p1p0c0
a1 b1
p0
c0
p0c0
p1c1
p2c2
p3c3
g1
a2 b2
p1
g2
CLA UNIT
c2
c1
a3 b3
p2
g3
p3
c3
c4
c4
CLA 4-bit
ADDER
s0
s1
s2
s3
11
Building bigger adders
 Can’t build a 16 bit CLA adder
... (too big)
 Solution: use the CLA principle
recursively
 We could use ripple carry of
4-bit CLA adders
12
An ALU (arithmetic logic unit)
 Let’s build an ALU to support add, and,or instructions
 we'll just build a 1 bit ALU, and use 32 of them
operation
a
b
op a b res
result
 Possible Implementation (sum-of-products):
 Not easy to decide the “best” way to build something
 Don't want too many inputs to a single gate
 Don’t want to have to go through too many gates
 for our purposes, ease of comprehension is important
13
Building a 32-bit ALU
14
What about subtraction (a – b) ?
 Two's complement approach: just negate b and add.
 How do we negate b? invert b and add 1 through the cin
15
Adding the NOR instruction
 How do we get a nor b ?
 Can also choose to invert a
De Morgan
ab  ab  ab  ab
16
Tailoring the ALU to the MIPS
 Need to support the set-on-less-than instruction (slt)
 remember: slt is an arithmetic instruction
 produces a 1 if rs < rt and 0 otherwise
 use subtraction: (a-b) < 0 implies a < b
 Need to support test for equality (beq $t5, $t6, $t7)
 use subtraction: (a-b) = 0 implies a = b
17
Supporting slt and detecting overflow
Can we figure out the idea ?
Use this ALU for all other bits
Use this ALU for most significant bit
18
Supporting slt
19
Test for equality
Notice control lines:
0000
0001
0010
0110
0111
1100
=
=
=
=
=
=
and
or
add
subtract
slt
nor
Note: zero is a 1 when
the result is zero!
20
The ALU
21
Conclusion
 We can build an ALU to support the MIPS instruction set
 key idea: use multiplexor to select the output we want
 we can efficiently perform subtraction using two’s complement
 we can replicate a 1-bit ALU to produce a 32-bit ALU
 Important points about hardware
 all of the gates are always working
 the speed of a gate is affected by the number of inputs to the gate
 the speed of a circuit is affected by the number of gates in series
(on the “critical path” or the “deepest level of logic”)
 Our primary focus: comprehension, however,
 Clever changes to organization can improve performance
(similar to using better algorithms in software)
 We saw this in multiplication, and addition
22
Conclusion
 Real processors use more sophisticated techniques for
arithmetic
 Where performance is not critical, hardware description
languages allow designers to completely automate the
creation of hardware!
23