Transcript Document
1
ALU for Computers (MIPS)
• design a fast ALU for the MIPS ISA
• requirements ?
– support the arithmetic/logic operations: add, addi addiu,
sub, subu, and, or, andi, ori, xor, xori, slt, slti, sltu, sltiu
• design a multiplier
• design a divider
2
Review Digital Logic
Gates:
Combinational Logic
3
Review Digital Logic
PLA: AND array, OR array
Review Digital Logic
4
5
A D latch implemented with NOR gates.
A D flip-flop with a falling-edge trigger.
6
Review Digital Logic
D
Q
Value of D is sampled on positive clock
edge.
Q outputs sampled value for rest of
cycle.
CLK
D
Q
Review: Edge-Triggering in Verilog
module ff(D, Q, CLK);
input D, CLK;
output Q;
Module code has two bugs.
Where?
always @ (CLK)
Q <= D;
endmodule
module ff(D, Q, CLK);
input D, CLK;
output Q;
reg Q;
always @ (posedge CLK)
Q <= D;
endmodule
Correct ?
7
8
CLK
R
(red)
Change
Rst
If Change == 1 on
positive CLK
edge
traffic light
changes
Y
(yellow)
G
(green)
RYG
100
If Rst == 1 on
positive CLK
edge
RYG=100
9
Rst == 1
RYG
100
Change == 1
Change == 1
RYG
001
Change == 1
RYG
010
10
Rst == 1
Change == 1
RYG
100
Change == 1
RYG
001
Change == 1
RYG
010
Change
RYG
100
001
010
100
11
Rst == 1
Change == 1
RYG
100
RYG
001
Change == 1
RYG
010
Change == 1
“One-Hot Encoding”
D
Q
R
D
Q
G
D
Q
Y
12
Rst == 1
Change == 1
RYG
100
Change == 1
RYG
001
Change == 1
RYG
010
Rst
Change
Next State Combinational Logic
D
Q
R
D
Q
G
D
Q
Y
13
State Elements: Traffic Light Controller
D
Q
R
D
Q
G
wire
next_R, next_Y, next_G;
output R, Y, G;
???
D
Q
Y
14
D
Q
Value of D is sampled on positive clock
edge.
Q outputs sampled value for rest of
cycle.
module ff(Q, D, CLK);
CLK
input D, CLK;
output Q;
reg Q;
always @ (posedge CLK)
Q <= D;
endmodule
15
State Elements: Traffic Light Controller
D
Q
R
D
Q
G
wire
next_R, next_Y, next_G;
output R, Y, G;
ff ff_R(R, next_R, CLK);
ff ff_Y(Y, next_Y, CLK);
ff ff_G(G, next_G, CLK);
D
Q
Y
16
Next State Logic: Traffic Light Controller
Rst
Change
Next State Combinational Logic
next_R
wire
R
next_G
G
next_Y
next_R, next_Y, next_G;
assign next_R = rst ? 1’b1 : (change ? G : R);
assign next_Y = rst ? 1’b0 : (change ? R : Y);
assign next_G = rst ? 1’b0 : (change ? Y : G);
Y
17
wire
next_R, next_Y, next_G;
output R, Y, G;
assign next_R = rst ? 1’b1 : (change ? G : R);
assign next_Y = rst ? 1’b0 : (change ? R : Y);
assign next_G = rst ? 1’b0 : (change ? Y : G);
ff ff_R(R, next_R, CLK);
ff ff_Y(Y, next_Y, CLK);
ff ff_G(G, next_G, CLK);
18
Logic Diagram: Traffic Light Controller
Rst == 1
Change == 1
RYG
100
Change == 1
RYG
001
Change == 1
RYG
010
Next State Combinational Logic
D
Q
R
D
Q
G
D
Q
Y
ALU for MIPS ISA
• design a 1-bit ALU using AND gate, OR gate, a full
adder, and a mux
19
20
ALU for MIPS ISA
• design a 32-bit ALU
by cascading 32 1-bit ALUs
21
ALU for MIPS
• a 1-bit ALU performing AND, OR, addition and
subtraction
If we set Binvert = Carryin =1
then we can perform a - b
22
23
ALU for MIPS
• include a “less” input for set-on-less-than (slt)
24
ALU for MIPS
• design the most significant bit ALU
• most significant bit need to do more work (detect
overflow and MSB can be used for slt )
• how to detect an overflow
overflow = carryin{MSB} xor carryout{MSB]
overflow = 1 ; means overflow
overflow = 0 ; means no overflow
• set-on-less-than
slt $1, $2, $3; if $2 < $3 then $1 = 1, else $1 = 0
; if MSB of $2 - $3 is 1, then $1 = 1
; 2’s comp. MSB of a negative no. is 1
25
ALU for MIPS
• a 1-bit ALU for the MSB
Overflow
=Carryin XOR Carryout
26
A 32-bit ALU
constructed from
32 1-bit ALUs
27
A 32-bit ALU
with zero detector
28
29
A Verilog behavioral definition of a MIPS ALU.
30
ALU for MIPS
• Critical path of 32-bit ripple carry adder is 32 x carry
propagation delay
• How to solve this problem
– design trick : use more hardware
– design trick : look ahead, peek
– carry look adder (CLA)
• CLA
a
0
0
1
1
b
0
1
0
1
cout
0
cin
cin
1
propagate = a + b;
nothing happen
propagate cin
propagate cin
generate
generate = ab
31
ALU for MIPS
• CLA using 4-bit as an example
• two 4-bit numbers: a3a2a1a0, b3b2b1b0
• p0 = a0 + b0; g0 = a0b0
c1 = g0 + p0c0
c2 = g1 + p1c1
c3 = g2 + p2c2
c4 = g3 + p3c3
• larger CLA adders can be constructed by cascading 4bit CLA adders
• other adders: carry select adder, carry skip adder
32
Design Process
• Divide and Conquer
– using simple components
– glue simple components together
– work on the things you know how to do. The unknown
will become obvious as you make progress
• Successive Refinement
– multiplier design
– divider design
33
Multiplier
• paper and pencil method
multiplicand
multiplier
0110
1001
0110
0000
0000
0110
0110110
product
n bits x m bits = m+n bits
binary : 0
place 0
1
place a copy of multiplicand
34
Multiply Hardware Version 1
32 bits x 32 bits; using 64-bit multiplicand reg. 64 bit ALU, 64 bit product reg. 32 bit multiplier
multiplicand
64 bits
64-bit ALU
product
shift left
shift right
multiplier
ADD
write
control
64 bits
Control provides
four control
signals
Check the right
most bit of M’r
to decide to add 0
or multiplicand
Multiply Algorithm Version 1
1. test multiplier0 (i.e., bit0 of multiplier)
1.a if multiplier0 = 1, add
multiplicand to product
and place result in
product register
2. shift the multiplicand left 1 bit
3. shift the multiplier right 1 bit
4. 32nd repetition ? if yes done
if no go to 1.
35
36
Multiply Algorithm Version 1 Example
0010 x 0101 = 0000 1010
iter.
0
1
2
3
4
step
initial
1.a
2
3
2
3
1.a
2
3
2
3
multiplier
0101
0101
0101
0010
0010
0001
0001
0001
0000
0000
0000
multiplicand
0000 0010
0000 0010
0000 0100
0000 0100
0000 1000
0000 1000
0000 1000
0001 0000
0001 0000
0010 0000
0010 0000
product
0000 0000
0000 0010
0000 0010
0000 0010
0000 0010
0000 0010
0000 1010
0000 1010
0000 1010
0000 1010
0000 1010
37
Multiplier Algorithm Version 1
•
•
•
•
observations from version 1
1/2 bits in multiplicand always 0
use 64-bit adder is wasted (for 32 bit x 32 bit)
0’s inserted into multiplicand as shifted left, least
significant bits of the product does not change once
formed
• 3 steps per bit
• shift product to right instead of shifting multiplicand to
left ? (by adding to the left half of the product register)
38
Multiply Hardware Version 2
32-bit multiplicand reg. 32-bit ALU, 64-bit product reg. 32-bit multiplier reg
multiplicand
32 bits
32-bit ALU
product
32 bits
32 bits
ADD
shift right
shift right
multiplier
control
write
Write into the
left half of the
product register
Check the right
most bit of M’r
to decide to add 0
or multiplicand
39
Multiply Algorithm Version 2
1. test multiplier0 (i.e., bit 0 of the multiplier)
1a. if multiplier0 = 1 add
multiplicand to the left
half of product and place
the result in the left half of
product register;
2. shift product reg. right 1 bit
3. shift multiplier reg. right 1 bit
4. 32nd repetition ? if yes done
if no, go to 1.
40
Multiply Algorithm Version 2 Example
iter.
0
1
2
3
4
step
initial
1.a
2
3
1.a
2
3
2
3
2
3
multiplier
0011
0011
0011
0001
0001
0001
0000
0000
0000
0000
0000
multiplicand
0010
0010
0010
0010
0010
0010
0010
0010
0010
0010
0010
product
0000 0000
0010 0000
0001 0000
0001 0000
0011 0000
0001 1000
0001 1000
0000 1100
0000 1100
0000 0110
0000 0110
41
Multiply Version 2
• Observations
– product reg. wastes space that exactly matches the size
of multiplier
– 3 steps per bit
– combine multiplier register and product register
42
Multiply Hardware Version 3
• 32-bit multiplicand register, 32-bit ALU, 64-bit product
register, multiplier reg is part of product register
multiplicand
ADD
32 bit ALU
write into
left half
control
product (multiplier)
shift right
43
Multiply Algorithm Version 3
1. test product0 (multiplier is in the right half of product register)
1a. if product0 = 1
add multiplicand to the left
half of product and place the
result in the left half of product
register
2. shift product register right 1 bit
3. 32nd repetition ? if yes, done
if no, go to 1.
44
Multiply Algorithm Version 3 Example
1110 x 1011
iter.
0
1
2
3
4
step
initial
1.a
2
1.a
2
2
1.a
2
multiplicand
1110
1110
1110
1110
1110
1110
1110
1110
need to save the carry
1110 x 1011 = 1001 1010
14 x 11 = 154
product
0000 1011
1110 1011
0111 0101
10101 0101
1010 1010
0101 0101
10011 0101
1001 1010
45
Multiply Algorithm Version 3
• Observations
• 2 steps per bit because of multiplier and product in one
register, shift right 1 bit once (rather than twice in
version 1 and version 2)
• MIPS registers Hi and Li correspond to left and right
half of product
• MIPS has instruction multu
• How about signed numbers in multiplication ?
– method 1: keep the sign of both numbers and use the
magnitude for multiplication, after 32 repetitions, then
change the product to appropriate sign.
– method 2: Booth’s algorithm
– Booth’s algorithm is more elegant in signed number
multiplications
– Booth’s algorithm uses the same hardware as version 3
46
Booth’s Algorithm
• Motivation for Booth’s Algorithm is speed
example 2 x 6 = 0010 x 0110
normal approach
0010
0110
Booth’s approach
0010
0110
Booth’s approach : replace a string of 1s in multiplier by two actions
action 1: beginning of a string of 1s, subtract multiplicand
action 2: end of a string of 1s, add multiplicand
47
Booth’s Algorithm
end of run
middle of run
beginning of run
011111111111111111110
current bit bit to the right
explanation
action
sub. mult’d from
left half of product
(previous bit)
1
0
beginning of a run of 1s
1
1
middle of a run
no arithmetic oper.
0
1
end of a run
0
0
middle of a run of 0s
add mul’d to left
half of product
no arith. operation.
48
Booth’s Algorithm Example
-2 x 7=-14 in signed binary 1110 x 0111 = 1111 0010
iteration
step
0
initial
1
sub.
product shift right
2
shift right
3
shift right
4
add
shift right
multiplicand
1110
1110
1110
1110
1110
1110
1110
product
0000 0111
0010 0111
0001 0011
0000 1001
0000 0100
1110 0100
1111 0010
To begin with we put multiplier at the right half of
the product register
previous
bit
0
0
1
1
1
1
0
49
Divide Algorithm
Paper and pencil
divisor 1011
1010101010
quotient
dividend
remainder (modulo )
50
Divide Hardware Version 1
• 64-bit divisor reg., 64-bit ALU, 32-bit quotient reg. 64-bit
remainder register
divisor
shift right
64-bit ALU
quotient
shift left
remainder
write
control
put the dividend in the remainder register initially
51
Divide Algorithm Version 1
start: place dividend in remainder
1. sub. divisor from the remainder and place the result in
remainder
2. test remainder
2a. if remainder >= 0, shift quotient to left setting the new
rightmost bit to 1
2b. if remainder <0, restore the original value by adding
divisor to remainder, and place the sum in remainder. shift
quotient to left and setting new least significant bit 0
3. shift divisor right 1 bit
4. n+1 repetitions ? if yes, done, if no, go to 1.
Divide Algorithm Version 1 Example
iter.
0
1
2
3
4
5
step
initial
1
2b
3
1
2b
3
1
2b
3
1
2a
3
1
2a
3
quotient
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0001
0001
0001
0011
0011
divisor
0010 0000
0010 0000
0010 0000
0001 0000
0001 0000
0001 0000
0000 1000
0000 1000
0000 1000
0000 0100
0000 0100
0000 0100
0000 0010
0000 0010
0000 0010
0000 0001
remainder
0000 0111
1110 0111
0000 0111
0000 0111
1111 0111
0000 0111
0000 0111
1111 1111
0000 0111
0000 0111
0000 0011
0000 0011
0000 0011
0000 0001
0000 0001
0000 0001
52
53
Divide Algorithm Version 1
Observations
– 1/2 bits in divisor always 0
– 1/2 of divisor is wasted
– 1/2 of 64-bit ALU is wasted
Possible improvement
– instead of shifting divisor to right, shifting remainder to
left ?
– first step can not produce a 1 in quotient, so switch order
to shift first and then subtract. This can save one
iteration
54
Divide Hardware Version 2
32-bit divisor reg. 32-bit ALU, 32-bit quotient reg., 64-bit
remainder reg.
divisor
quotient
32-bit ALU
shift left
remainder
shift left
control
55
Divide Algorithm Version 2
start: place dividend in remainder
1. shift remainder left 1 bit
2. sub. divisor from the left half of remainder and place the
result in the left half of remainder
3. test remainder
3a. if remainder >= 0, shift quotient to left setting the new
rightmost bit to 1
3b. if remainder <0, restore the original value by adding
divisor to the left half of remainder, and place the sum in the
left of the remainder. also shift quotient to left and setting
new least significant bit 0
4. n repetitions ? if yes, done,
if no, go to 1.
56
Divide Algorithm Version 2 Example
iter.
0
1
2
3
4
step
initial
1
2
3b
1
2
3a
1
2
3b
1
2
3a
quotient
0000
0000
0000
0000
0000
0000
0001
0001
0001
0010
0010
0010
0101
divisor
0011
0011
0011
0011
0011
0011
0011
0011
0011
0011
0011
0011
0011
remainder
0000 1111
0001 1110
1110 1110
0001 1110
0011 1100
0000 1100
0000 1100
0001 1000
1110 1000
0001 1000
0011 0000
0000 0000
0000 0000
57
Divide Algorithm Version 2
• Observations
– 3 steps (shift remainder left, subtract, shift quotient left)
• Further improvement (version 3)
– eliminating quotient register by combining with
remainder register as shifted left
– therefore loop contains only two steps, because the shift
of remainder is shifting the remainder in the left half and
the quotient in the right half at the same time
– consequence of combining the two registers together is
the remainder shifted one time unnecessary at the last
iteration
– final correction step: shift back the remainder in the left
half of the remainder register (i.e., shift right 1 bit of
remainder only)
58
Divide Hardware Version 3
32-bit divisor register, 32-bit ALU, 64-bit remainder
register, 0-bit quotient register (quotient bit shifts into
remainder register, as remainder register shifts left)
divisor
32bits
32-bit ALU
shift left
remainder, quotient
64-bit
write
control
59
Divide Algorithm Version 3
start: place dividend in remainder
1. shift remainder left 1 bit
2. sub. divisor from the remainder and place the result in
remainder
3. test remainder
3a. if remainder >= 0, shift remainder to left setting the new
rightmost bit to 1
3b. if remainder <0, restore the original value by adding
divisor to the left half of remainder, and place the sum in the
left of the remainder. also shift remainder to left and setting
new least significant bit 0
4. n repetitions ? if yes, done,
if no, go to 2.
60
Divide Algorithm Version 3 Example
iter.
0
step
initial
1
2
3b
2
3b
2
3a
2
3b
divisor
0101
0101
0101
0101
0101
0101
0101
0101
0101
0101
remainder
0000 1110
0001 1100
1
1100 1100
0011 1000
2
1110 1000
0111 0000
3
0010 0000
0100 0001
4
1111 0001
1000 0010
0100 0010
correction step: shift remainder right 1bit.
quotient
61
Divide Algorithm Version 3
• Observations
– same hardware as multiply, need a 32-bit ALU to add and
subtract and a 64-bit register to shift left and right
– divide algorithm version 3 is called restoring division
algorithm for unsigned numbers
• Signed numbers divide
– simplest method
» remember signs of dividend and divisor, make
positive, and finally complement quotient and
remainder as necessary
» dividend and remainder must have the same sign
» quotient is negative if dividend sign and divisor sign
disagree
– SRT (named after three persons) method
» an efficient algorithm
62
Floating Point Numbers
• What can be represented in N bits ?
unsigned
0 <-------------> 2N-1
2’s complement.
-2N- 1 <------------------> 2N-1 - 1
1’s comp.
-2N-1+ 1 <---------------------->2N-1 - 1
BCD
0 <-----------------------> 10N/4 - 1
How about
very small numbers, very large numbers
rationals, such as 2/3; irrationals such as 2;
transcendentals, such as , .
63
Floating Point Numbers
• Mantissa (aka Significand), Exponent (using radix of
10)
6.12 x 10 23
S
E
M
IEEE standard F.P.
1.M x 2 E-127
single precision S(1bit), E(8 bits), M(23 bits)
mantissa = sign + magnitude; magnitude is normalized with
hidden integer bit: 1.M
exponent = E -127 (excess 127), 0 < E < 255
a FP number N = (-1)S 2(E-127) (1.M)
0 = 0 00000000 00000000000000000000000
-1.5 = 1 01111111 10000000000000000000000
64
Floating Point Numbers
• Single Precision FP numbers
- 0.75 = __________________________________
- 5.0 = ___________________________________
7 = ____________________________________
-0.75 =-0.11b=-1.1 x 2-1
E=126
-5.0 = -101.0b=-1.01 x 22
E=129
7 = 111b = 1.11 x 22
E=129
1 01111110 10000.......0
65
Floating Point Numbers
• Single precision FP number
What is the smallest number in magnitude ?
(1.0) 2 -126
What is the largest number in magnitude ?
(1.11111111111111111111111)binary 2127 = (2 - 2-23) 2127
66
Floating Point Numbers
single precision FP numbers
Exponent
Significand
0
0
0
nonzero
1 to 254
anything
255
0
255
nonzero
other topics in FP numbers
1. extra bits for rounding
2. guard bit, sticky bit
3. algorithms for FP numbers
Object represented
0
denormalized numbers
floating point numbers
infinite
NaN (Not A Number)
67
Floating Point Numbers
• Double precision
– 64 bits total
» 52-bit significand
» 11-bit exponent (excess 1023 bias)
– Number is: (-1)s (1.M) x 2E-1023
68
Basic Addition Algorithm
• Steps for Y + X, assuming Y >= X
1. Align binary points (denormalize smaller number)
a. compute Diff = Exp(Y) - Exp(X); Exp = Exp(Y)
b. Sig(X) = Sig(X) >> Diff
2. Add the aligned components
Sig = Sig(X) + Sig(Y)
3. Normalize the sum
1. shift Sig right/left until leading bit is 1; decrementing
or incrementing Exp.
2. Check for overflow in Exp
3. Round
4. repeat step 3 it not still normalized
69
Addition Example
• 4-bit significand
1.0110 x 23 + 1.1000 x 22
• align binary points (denormalize smaller number)
1. 0110 x 23
0. 1100 x 23
• Add the aligned components
10. 0010 x 23
• Normalize the sum
1.0001 x 24
No overflow, no rounding
70
Another Addition Example
• 1.0001 x 23 - 1.1110 x 1
– 4-bit significand; extra bit needed for accuracy
1. Align binary point:
1. 0001 x 23
- 0. 01111 x 23
2. Subtract the aligned components
0. 10011 x 23
3. Normalize
1.0011 x 22 = 4.75
Without extra bit, the result would be 0.1001 x 23 =
100.1 = 4.5, which is off by 0.25. This is too much!
71
Accuracy and Rounding
• Want arithmetic to be fully precise
– IEEE 754 keeps two extra digits on the right during
intermediate calculations (guard digit, round digit)
• Alignment step can cause data to be discarded (shifted
out on right)
2.56 x 100 + 2.34 x 102
2.3400 x 102
+ 0.0256 x 102
2.3656 x 102
Round
Guard
Answer = 2.37 x 102
Without using Guard and Round digits,
Answer would be 2.36 x 102