m - Oncourse

Download Report

Transcript m - Oncourse

Chapter 6-2
Multiplier


Multiplier
Next Lecture
 Divider
 Floating Point Numbers
Multiplication of Positive Numbers
using usual algorithm for multiplying integers



Algorithm applies to unsigned numbers and to positive numbers
Result of the product of two n-digit numbers can be
accommodated in 2n digits
Binary multiplication of positive operands can be implemented in
a purely combinational, two dimensional logic array
1101
1011
(13) Multiplicand M
(11) Multiplier Q
Partial Products
(143) Product P
2
Formal Representation
Z
=
··
X Y
M + N– 1

=
Zk 2
k=0
N – 1
i 
M – 1

= 
Xi 2   
 

 i=0
 j = 0

=

 Xi Yj 2
with
M –1
=
 Xi 2
i=0
N– 1
Y
=
 Y j2
j= 0
3



i =0 j= 0
X
j
Yj 2 
M – 1 N – 1




i
j
k
i + j



Multiplier Implementation
Multiplicand
Partial product 0
(PP0)
m3 0
m2 0
m1 0
m0
q0
0
PP1
q1
0
PP2
q2
0
PP3
q3
0
p7
p6
p5
p4
p0
p1
p2
PP4 = p7 , p6, ...p0 = Product
p3
Bit of incoming partial product PPi
mj
qi
Typical cell
Carry-out
FA
Carry-in
Bit of outgoing partial product PP(i+1)
4
Array Multiplier
m3
P7
5
m2
m1
m0
q 1 P0
m3
m2
m1
m0
HA
FA
FA
HA
m3
m2
m1
m0
FA
FA
FA
HA
m3
m2
m1
m0
FA
FA
FA
HA
P6
P5
P4
q3
P3
q2
P2
P1
q0
Ripple-Carry Array Multiplier

For the multiplication operation M  Q = P for 4-bit operands


M: m3m2m1m0
Q: q3q2q1q0

P: p7p6p5p4p3p2p1p0

0
m3q1
miqj = mi·qj
FA
m3q2
FA
m3q3
FA
p7
6
p6
p5
m2q0
m1q1
m1q0
m0q1
FA
FA
FA
m2q2
FA
m2q3
FA
m3q0
m2q1
m1q2
FA
p4
0
m0q2
FA
m1q3
m0q0
FA
0
m0q3
FA
p3
0
p2
p1
p0
The MxN Array Multiplier
Critical Path
FA
HA
FA
FA
FA
FA
HA
HA
Critical Path 1
Critical Path 2
FA
FA
FA
HA
Critical Path 1 & 2
Dmult=[(M-1)+(N-2)]Dcarry +(N-1)Dsum+1Dand
7
Multiplier Implementation







8
The main component in each cell is an adder circuitry
Each AND gate determines whether a multiplicand bit mj is
added to the incoming partial product bit, based on the value of
the multiplier bit qj
For each row i ( 0 ≤ i ≤ 3) where qi = 1, adds the multiplicand
appropriately shifted, to the incoming partial product, PPi, to
generate PPi+1
If qi = 0, PPi is passed vertically downward unchanged
PP0 is all 0s
PP4 is the desired product
The multiplicand is shifted left one position per row by the
diagonal signal path
Another Method of Multiplier Design




9
The previous algorithm may be impractical for large numbers
because it uses many gates
Multiplication can be performed using a mixture of
combinational array techniques and sequential techniques that
require less combinational logic
In early computers, because of the cost of logic gates, the
adder circuitry in the ALU was used to perform multiplication
sequentially
Called sequential circuit binary multiplier
Register A (initially 0)
Shift right
an - 1
C
a0
qn - 1
q0
Multiplier Q
Add/Noadd
control
n-bit
adder
Control
sequencer
MUX
0
0
mn - 1
m0
Multiplicand M
M
1 1 0 1
Initial configuration
0
C
0 0 0 0
A
1 0 1 1
Q
0
0
1 1 0 1
0 1 1 0
1 0 1 1
1 1 0 1
Add
Shift
First cycle
1
0
0 0 1 1
1 0 0 1
1 1 0 1
1 1 1 0
Add
Shift
Second cycle
0
0
1 0 0 1
0 1 0 0
1 1 1 0
1 1 1 1
No add
Shift
Third cycle
1
0
0 0 0 1
1 0 0 0
1 1 1 1
1 1 1 1
Add
Shift
Fourth cycle
Product
Sequential Circuit Binary Multiplier







11
This circuit performs multiplication by using a single adder n
times to implement the spatial addition performed by the n rows
of ripple carry adders
Registers A and Q combined hold PPi while multiplier bit qi
generates the signal Add/Noadd
Add/Noadd controls the addition of the multiplicand M to PPi to
generate PPi+1
The product is computed in n cycles
The partial product grows in length 1 bit per cycle from the
initial vector PP0 of n 0s in register A
The carry-out from the adder is stored in Flip-Flop C
At the start, the multiplier is loaded into register Q, the
multiplicand into register M, and C as well as A are cleared to 0
Sequential Circuit Binary Multiplier






12
At the end of each cycle, C, A and Q are shifted right by one bit
position to allow for the growth of the partial product as the
multiplier is shifted out of register Q
Because of this shifting, multiplier bit qi appears in the LSB
position of Q to generate the Add/Noadd signal at the correct
time, starting with q0 during the first cycle, q1 during the second
cycle, etc...
If the adder has a delay of 10 ns
The control setting and the shift operations take another 10ns
each
A hardwired multiply in a 32-bit word-length computer would
take about 640ns
Multiply instructions took much longer to execute than Add
instructions in early computers
Signed Operand Multiplication




13
Multiplication of signed operands generates a double length
product in the 2's complement number system
Consider the case of a positive multiplier and a negative
multiplicand
When we add a negative multiplicand to a partial product, we
must extend the sign bit value of the multiplicand to the left as
far as the product will extend
The previous hardware can be used for negative multiplicands
if it provides for sign extension of the partial products
Sign Extension of Negative Multiplicand

Sign extension is
shown in blue

14
1
0
0
1
1
0
1
0
1
1
1
1
1
1
1
1
1
0
0
1
1
1
1
1
1
0
0
1
1
0
0
0
0
0
0
0
0
1
1
1
0
0
1
1
0
0
0
0
0
0
1
1
0
1
1
1
0
0
0
1
( - 13)
( + 11)
( - 143)
Negative number must be the multiplicand and the positive
number is the multiplier
Booth Algorithm







15
A powerful algorithm for signed-number multiplication
 treats positive and negative numbers uniformly
So far, the number of additions equals the number of 1s in the
multiplier
Consider a multiplication in which the multiplier is positive and
has a single block of 1s (e.g., 00111102 = 3010)
To derive the product, we could add four appropriately shifted
versions of the multiplicand (i.e., for four 1s)
We can reduce the number of operations by regarding the
multiplier as the difference between two numbers, i.e., 3210-210
or 01000002-00000102
This suggests that the product can be generated by adding 25
times the multiplicand to the 2's complement of 21 times the
multiplicand
The sequence of required operations can be recoded as
0+1000-10
Booth Algorithm



16
-1 times the shifted multiplicand is selected when changing
multiplier from 0 to 1
+1 times the shifted multiplicand is selected when changing
multiplier from 1 to 0
The multiplier is scanned form right to left
Normal and Booth Multiplication Schemes
Normal
0
Booth
17
0
0
1
0
1
0
0
0
1 0 1 1 0
0 +1 +1 + 1+1
1
0
0
0
1
1
0
0
1
0
1
0
1
0
1
1
0
1
1
0
1
0
1
0
1
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
1
0
0
0
1
1
0
0 1
0 +1
0
0
1
0
1 0
0 -1
1
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
1
0
1
0
1
0
0
0
1
1
0
2's complement of
the multiplicand
Booth Recoding of a Multiplier
0
0
1
0
0 +1 -1 +1

18
1
1
0 - 1
0
0
1
1
1
0
1
0
0 +1
0
0 - 1 +1 - 1 + 1
1
1
0
0
0 - 1
0
0
When the least significant bit is 1 , assume an implied 0
lies to its right
Booth Multiplication with a Negative
Multiplier
0 1 1 0 1
 1 1 0 1 0
( + 13)
(- 6)
0 1 1 0 1
0 - 1 +1 - 1 0
0
1
0
1
0
0
1
0
1
0
0
1
0
1
0
0
1
0
0
0
0
1
1
0
0
0
0
1
1
0
0 0 0 0
0 1 1
0 1
1
1 1 1 0 1 1 0 0 1 0

19
( - 78)
Handles both positive and negative multipliers uniformly
Correctness of Booth Technique for
Negative Multipliers



Let the leftmost zero of a negative number, X, be at bit position k
X = 11…10xk-1….x0
The value of X is given by
V(X)= -2k+1 + xk-12k-1 +….+x020
-2k+1 =
X=


20
Example V(X)
11000 (-8)
11001 (-7)
= -23
= -23 + 1
For example, 1101102(-1010) is recoded as 0-1+10-10
-24+23-2 = -1010
Booth Multiplier Recoding Scheme
Multiplier
21
Version of multiplicand
selected by bit i
Bit i
Bit i -1
0
0
0M
0
1
+1M
1
0
1M
1
1
0M
Booth Recoded Multipliers
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
Worst-case
multiplier
+1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1 +1 - 1
1
1
0
0
0
1
0
1
1
0
1
1
1
1
0
0
Ordinary
multiplier
0 -1 0
0 +1 - 1 +1
0 - 1 +1
0
0
0 -1
0
0
0
0
0
0
1
1
1
1
1
0
0
0
0
1
1
1
0
0
0 +1
0
0
0
0 -1
0
0
0 +1
0
0 -1
Good
multiplier

22
Achieves some efficiency in the number of additions required
when the multiplier has a few large blocks of 1s
Fast Multiplication
Bit-pair recoding of multipliers
 Halves the maximum number of summands
 Derived from the booth algorithm
 (+1 -1) is equivalent to (0 +1)
 Because (+1 -1) is (+102 + -12) = +2M + -M = +M = (0 +1)
 Instead of adding +1×M at position i+1 to -1 times the
multiplicand M at a shift position i
 The same result can be obtained by adding +1×M at position i
 (+1 0) is equivalent to (0 +2)
 (-1 +1) is equivalent to (0 -1)
 The booth-recoded multiplier is examined two bits at a time,
starting from the right
23
Multiplier Bit-Pair Recoding
Sign extension
Implied 0 to right of LSB
1
1
0
Multiplier bit-pair
1
0
1
0
0
1 +1 1
0
0
1
2
0
Multiplier bit on the right
(a) Example of bit-pair recoding derived from Booth recoding
Multiplicand
selected at position
i
i +1
i
i 1
0
0
0
0 M
0
0
1
+ 1 M
0
1
0
+ 1 M
0
1
1
+ 2 M
1
0
0
 2 M
1
0
1
 1 M
1
1
0
 1 M
1
1
1
0 M
(b) Table of multiplicand selection decisions
24
Example
0 1 1 0 1 ( + 13)
 1 1 0 1 0 (- 6 )
0 1 1 0 1
0 - 1 +1 - 1 0
0
1
0
1
0
0
1
0
1
0
0
1
0
1
0
0
1
0
0
0
0
1
1
0
0
0
0
1
1
0
0 0 0 0
0 1 1
0 1
1
1 1 1 0 1 1 0 0 1 0 ( - 78)
Multiplication
Requiring only
n/2 Summands
0 1 1 0 1
0
-1
-2
1 1 1 1 1 0 0 1 1 0
1 1 1 1 0 0 1 1
0 0 0 0 0 0
1 1 1 0 1 1 0 0 1 0
Ripple-Carry Array Disadvantage






26
Multiplication requires many additions
Using Ripple-Carry Array is slow
Consider the addition of three n-bit numbers W, X, Y to produce
the sum Z
We can first add W to X to generate a number A
Then we can add A to Y to produce Z
This can be done by using two ripple carry adders
A Different Approach




27
Instead of adding W to X to produce A in the upper ripple carry
adder, let’s introduce the bits of Y into the inputs
This generates the vectors S and the saved carries C as the
outputs
In the second row, S and C are added in a a ripple carry adder to
produce Z
Carry save addition can speedup this process
Carry Save Array

For the multiplication operation M  Q = P for 4-bit operands

M: m3m2m1m0
Q: q3q2q1q0
P: p7p6p5p4p3p2p1p0


0
m3q0
m2 q1
m3q1
m3 q2
m2 q3
m3q3
FA
p7
28
p6
FA
m2 q0
m2 q2
FA
m1q3
FA
FA
FA
FA
FA
p5
p4
m1 q1
m1 q2
p3
m0 q2
FA
m0 q3
FA
m1 q0
m0 q1
FA
m0 q0
0
0
FA
0
p2
Q: Do you see any saving here?
p1
p0
Example Ripple-Carry vs. Carry-Save
29
Carry-Save Addition Approach
1
0
1
1
0
1
(45)
M
1
1
1
1
1
1
(63)
Q
1
0
1
1
0
1
A
1
0
1
1
0
1
1
0
1
1
0
1
1
0
1
1
0
1
1
0
1
1
0
1
1
0
1
1
0
1
0
1
1
0
0
0
X
1
30
B
C
D
E
F
1
0
0
1
1
(2,835)
Product
Complete
Example
0
1
0
1
1
0
1
M
x 1
1
1
1
1
1
Q
1
0
1
1
0
1
1
0
1
1
0
1
1
0
1
1
0
1
1
1
0
0
0
0
0
1
0
0
1
1
1
1
1
0
1
1
0
1
1
0
1
1
0
1
1
0
1
1
0
1
1
1
0
0
0
0
1
0
1
1
1
1
0
0
1
1
0
0
0
0
1
0
0
1
1
1
1
0
0
F
S
1
2
C
2
0
1
1
1
1
0
1
0
1
0
0
0
1
0
0
0
0
1
0
1
1
0
0
0
0
0
1
1
1
1
0
0
0
1
0
1
1
1
0
1
0
0
1
+ 0
1
0
1
0
1
0
0
0
0
0
0
0
1
S1
1
C
1
S2
0
0
1
E
0
1
1
C
D
0
1
S
1
0
1
0
B
C
1
1
A
S
1
3
C3
C2
0
0
1
S4
1
C
4
1
Product
Schematic Representation of C.S.A.
F
E
D
C
B
A
Level 1 CSA
C2
S2
C1
S1
Level 2 CSA
C2
C3
S3
Level 3 CSA
C4
S4
Final addition
+
Product
1.7log2k – 1.7 steps, where k is the number of summands
32
Example Ripple-Carry vs. Carry-Save










33
Carry-save addition transforms W, X and Y into S and C
Advantages: all bits of S and C are produced in a short fixed amount
of time after W, X, and Y are applied
Each row approximately takes one full-adder delay
Carry propagation takes place only in the last row
Carry lookahead adder could be used effectively to add the S and C
vectors because all bits of S and C are available in parallel
Consider the addition of many summands
We can group the summands in threes and perform the carry save
addition on each of these groups in parallel to generate S and C
Next, group all the S and C vectors into threes and perform carry
save addition on them
Continue this process until there are only two vectors remaining
These remaining vectors can be added in a ripple carry or a carry
lookahead adder to produce the sum