Slides - GMU Computer Science
Download
Report
Transcript Slides - GMU Computer Science
Advanced ALU
Lecture 5
CS365
RoadMap
Implementation of MIPS ALU
Signed
and unsigned numbers (Section 3.2)
Addition and subtraction (Section 3.3)
Constructing an arithmetic logic unit (Appendix
B.5, B.6)
Multiplication (Section 3.4)
This lecture
Division (Section 3.5)
Floating point (Section 3.6)
Advanced ALU
CS465 F08
2
D. Barbara
Unsigned Multiply
Paper and pencil example (unsigned):
Multiplicand
Multiplier
Product
1000
1001
1000
0000
0000
1000
01001000
m bits x n bits = m+n bit product (at most)
Binary makes it easy:
0 => place 0
1 => place a copy
( 0 x multiplicand)
( 1 x multiplicand)
4 versions of multiply hardware & algorithm:
Successive refinement
Advanced ALU
CS465 F08
3
D. Barbara
Unsigned Combinational Multiplier
0
A3
0
A2
0
A1
0
A0
B0
A3
A3
A2
A2
A1
A1
A0
B1
A0
B2
A3
P7
P6
A2
A1
P5
A0
P4
B3
P3
P2
P1
P0
Stage i accumulates A * 2i if Bi == 1
Advanced ALU
CS465 F08
4
D. Barbara
How Does It Work?
0
0
0
0
A3
A3
A3
A3
P7
P6
A2
P5
A2
A1
P4
A2
A1
0
A2
A1
0
A1
0
A0
A0
B1
A0
B2
A0
P3
B0
B3
P2
P1
P0
At each stage shift A left ( x 2)
Use next bit of B to determine whether to add in
shifted multiplicand
Accumulate 2n bit partial product at each stage
Advanced ALU
CS465 F08
5
D. Barbara
Unsigned Multiplier (Version 1)
Shift-add multiplier
64-bit
Multiplicand reg, 64-bit ALU, 64-bit
Product reg, 32-bit Multiplier reg
Shift Left
Multiplicand
64 bits
Multiplier
64-bit ALU
Product
Shift Right
32 bits
Write
Control
64 bits
Figure 3.5
Advanced ALU
CS465 F08
6
D. Barbara
Multiply Algorithm Version 1
Start
Figure 3.6
Multiplier0 = 1
1. Test
Multiplier0
Multiplier0 = 0
1a. Add multiplicand to product &
place the result in Product register
0010 x 0011
Product
0000 0000
0000 0010
0000 0110
0000 0110
0000 0110
Multiplier Multiplicand
0011
0000 0010
0001
0000 0100
0000
0000 1000
0000
0001 0000
2. Shift the Multiplicand register left 1 bit.
3. Shift the Multiplier register right 1 bit.
32nd
repetition?
Yes: 32 repetitions
Done
Advanced ALU
No: < 32 repetitions
CS465 F08
7
D. Barbara
Observations on Multiply Version 1
One clock cycle per step => ~100 clocks per
multiply
Half bits in multiplicand always 0 => 64-bit adder
is wasted
0 is inserted when shift the multiplicand left
=> least significant bits of product never changed
once formed
Instead of shifting multiplicand to left, shift
product to right?
Advanced ALU
CS465 F08
8
D. Barbara
How Does It Work?
0
A3
0
A2
0
A1
0
A0
B0
A3
A2
A1
A0
B1
A3
A2
A1
A0
B2
A3
P7
A2
A1
P6
A0
P5
B3
P4
P3
P2
P1
P0
Multiplicand stays still and product moves right
Advanced ALU
CS465 F08
9
D. Barbara
Unsigned Multiplier Version 2
Refined from the first version: 32-bit
Multiplicand reg, 32 -bit ALU, 64-bit
Product reg, 32-bit Multiplier reg
Multiplican
d
32 bits
Multiplier
32-bit ALU
Shift Right
32 bits
Shift Right
Product
64 bits
Advanced ALU
Control
Write
CS465 F08
10
D. Barbara
Multiply Algorithm Version 2
Start
Multiplier0 = 1
1:
2:
3:
1:
2:
3:
1:
2:
3:
1:
2:
3:
1. Test
Multiplier0
0010 x 0011
1a. Add multiplicand to the left half of product &
place the result in the left half of Product register
Product
Multiplier
0000 0000 0011
0010 0000 0011
0001 0000 0011
0001 0000 0001
0011 0000 0001
0001 1000 0001
0001 1000 0000
0001 1000 0000
0000 1100 0000
0000 1100 0000
0000 1100 0000
0000 0110 0000
0000 0110 0000
Multiplicand
0010
0010
0010
0010
0010
0010
0010
0010
0010
0010
0010
0010
0010
Advanced ALU
CS465 F08
Multiplier0 = 0
2. Shift the Product register right 1 bit.
3. Shift the Multiplier register right 1 bit.
32nd
repetition?
No: < 32 repetitions
Yes: 32 repetitions
Done
11
D. Barbara
Observations on Multiply Version 2
Useful portion of multiplier shrinks while
valid portion of product grows in the
process of multiplication
Product register wastes space that exactly
matches size of multiplier
=>Combine Multiplier register and Product
register?
Advanced ALU
CS465 F08
12
D. Barbara
MULTIPLY Hardware Version 3
32-bit Multiplicand reg, 32 -bit ALU, 64-bit
Product reg, (0-bit Multiplier reg)
Multiplican
d
32 bits
32-bit ALU
Shift Right
Product (Multiplier)
Write
64 bits
Advanced ALU
Control
CS465 F08
13
D. Barbara
Multiply Algorithm Version 3
Start
Product0 = 1
1. Test
Product0
Product0 = 0
0010 x 0011
1a. Add multiplicand to the left half of product &
place the result in the left half of Product register
Multiplicand
0010
0010
0010
0010
0010
Advanced ALU
Product
0000 0011
0010 0011
0001 0001
0011 0001
0001 1000
0000 1100
0000 0110
2. Shift the Product register right 1 bit.
32nd
repetition?
CS465 F08
No: < 32 repetitions
Yes: 32 repetitions
14
Done
D. Barbara
Multiply Version 3
Two steps per bit because Multiplier &
Product combined
MIPS multiplication (unsigned)
multu $t1, $t2
# t1 * t2
No destination register: product could be ~2 64;
need two special registers to hold it
Registers Hi and Lo are left and right half of
Product
To use the result:
mfhi $t3
mflo $t4
Advanced ALU
CS465 F08
15
D. Barbara
Signed Multiplication
What about signed multiplication?
Easiest
solution is to make both positive &
remember whether to complement product
when done
Apply definition of 2’s complement
Need to sign-extend partial products
Booth’s
algorithm is an elegant way to multiply
signed and unsigned numbers using the same
hardware and save cycles
Can handle multiple bits at a time
Advanced ALU
CS465 F08
16
D. Barbara
Motivation for Booth’s Algorithm
Example 2 x 6 = 0010 x 0110:
Can get the same
result in more than
one way:
6= – 2 + 8
0110 = – 0010 + 1000
0010
x
0110
+
0000
+
0010
+ 0010
+ 0000
00001100
shift (0 in multiplier)
add (1 in multiplier)
add (1 in multiplier)
shift (0 in multiplier)
Basic idea: replace a string of 1s with an initial subtract
on seeing a one and add after last one
0010 x (-2)
0010 x 8
Advanced ALU
0010
x
0110
0000
–
0010
0000
+ 0010
00001100
shift (0 in multiplier)
sub (first 1 in multiplier)
shift (mid string of 1s)
add (prior step had last 1)
CS465 F08
18
D. Barbara
Booth’s Algorithm
end of run
Current
bit
1
1
0
0
Bit to
right
0
1
1
0
middle of run
beginning of run
0 1 1 1 1 0
Explanation
Begins run of 1s
Middle of run of 1s
End of run of 1s
Middle of run of 0s
Example
Op
0001111000
0001111000
0001111000
0001111000
sub
none
add
none
Originally for speed when shift was faster than
add
Big advantage: it handles signed number
–1
Why it works?
+ 10000
01111
Advanced ALU
CS465 F08
19
D. Barbara
Booth’s Algorithm
1. Depending on the current and previous
bits, do one of the following:
00:Middle of a string of 0s, no arithmetic op
01:End of a string of 1s, so add multiplicand to
the left half of the product
10:Beginning of a string of 1s, so subtract
multiplicand from the left half of the product
11: Middle of a string of 1s, so no arithmetic op
2. As in the previous algorithm, shift the
Product register right (arithmetically) 1 bit
Advanced ALU
CS465 F08
20
D. Barbara
Booths Example (2 x 7)
step
Multiplicand Product
0. initial value 0010
1. P = P - m
2.
3.
4.
1110
0010
0010
0010
0010
0010
Advanced ALU
0000 0111 0
+1110
1110 0111 0
1111 0011 1
1111 1001 1
1111 1100 1
+0010
0001 1100 1
0000 1110 0
CS465 F08
operation
10 -> sub
shift P(sign ext)
11 -> nop, shift
11 -> nop, shift
01 -> add
shift
done
21
D. Barbara
Booths Example (2 x -3)
step
Multiplicand Product
0. initial value 0010
1. P = P - m
2.
1110
0010
0010
3.
0010
1110
0010
0010
4.
0010
Advanced ALU
0000 1101 0
+1110
1110 1101 0
1111 0110 1
+ 0010
0001 0110 1
0000 1011 0
+1110
1110 1011 0
1111 0101 1
1111 0101 1
1111 1010 1
CS465 F08
operation
10 -> sub
shift P(sign ext)
01 -> add
shift P
10 -> sub
shift
11 -> nop
shift
done
22
D. Barbara
Outline
Implementation of MIPS ALU
Signed
and unsigned numbers (Section 3.2)
Addition and subtraction (Section 3.3)
Constructing an arithmetic logic unit (Appendix
B.5, B.6)
Multiplication (Section 3.4)
Booth’s algorithm: CD: 3.23 In More Depth
Division
(Section 3.5)
Floating point (Section 3.6)
Advanced ALU
CS465 F08
23
D. Barbara
DIVIDE: Paper & Pencil
1001
Divisor 1000 1001010
–1000
10
101
1010
–1000
10
Remainder (or Modulo result)
See how big a number can be subtracted,
creating quotient bit on each step
Quotient
Dividend
Binary => 1 * divisor or 0 * divisor
Dividend = Quotient x Divisor + Remainder
3 versions of divide, successive refinement
Advanced ALU
CS465 F08
24
D. Barbara
DIVIDE Hardware Version 1
64-bit Divisor reg, 64-bit ALU, 64-bit
Remainder reg, 32-bit Quotient reg
Shift Right
Divisor
64 bits
Quotient
64-bit ALU
Remainder
Shift Left
32 bits
Write
Control
64 bits
Advanced ALU
CS465 F08
25
D. Barbara
Divide Algorithm Version 1
Start: Place Dividend in Remainder
Takes n+1 iterations for nbit Quotient & Remainder
Quot. Divisor
Rem.
0000 00100000 00000111
11100111
00000111
0000 00010000 00000111
11110111
00000111
0000 00001000 00000111
11111111
00000111
0000 00000100 00000111
00000011
0001
00000011
0001 00000010 00000011
00000001
0011
00000001
0011 00000001 00000001
Advanced ALU
1. Subtract the Divisor register from the
Remainder register, and place the result
in the Remainder register.
Remainder 0
2a. Shift the
Quotient register
to the left, setting
the new rightmost
bit to 1.
Test
Remainder
Remainder < 0
2b. Restore the original value by adding the
Divisor register to the Remainder register, &
place the sum in the Remainder register. Also
shift the Quotient register to the left, setting
the new least significant bit to 0.
3. Shift the Divisor register right 1 bit.
n+1
repetition?
No: < n+1 repetitions
Yes: n+1 repetitions (n = 4 here)
Done
CS465 F08
26
D. Barbara
Observations on Divide Version 1
1st step cannot produce a 1 in quotient bit
(otherwise quotient is too big for the register)
=> switch order to shift first and then subtract
=> save 1 iteration
1/2 bits in divisor always 0
=> 1/2 of 64-bit adder is wasted
=> 1/2 of divisor is wasted
Instead of shifting divisor to right, shift remainder
to left?
Advanced ALU
CS465 F08
27
D. Barbara
DIVIDE Hardware Version 2
32-bit Divisor reg, 32-bit ALU, 64-bit
Remainder reg, 32-bit Quotient reg
Divisor
32 bits
Quotient
32-bit ALU
Shift Left
32 bits
Shift Left
Remainder
64 bits
Advanced ALU
Control
Write
CS465 F08
28
D. Barbara
Divide Algorithm Version 2
Start: Place Dividend in Remainder
1. Shift the Remainder register left 1 bit.
2. Subtract the Divisor register from the
left half of the Remainder register, & place the
result in the left half of the Remainder register.
Remainder 0
3a. Shift the
Quotient register
to the left setting
the new rightmost
bit to 1.
Test
Remainder
Remainder < 0
3b. Restore the original value by adding the Divisor
register to the left half of the Remainder register,
&place the sum in the left half of the Remainder
register. Also shift the Quotient register to the left,
setting the new least significant bit to 0.
nth
repetition?
No: < n repetitions
Yes: n repetitions
Done
Advanced ALU
CS465 F08
29
D. Barbara
Observations on Divide Version 2
Remainder shrinks while Quotient grows
Eliminate Quotient register by combining
with Remainder register
Start
by shifting the Remainder left as before
Only shifting once for both Remainder(left half)
and Quotient(right half): 2 steps each iteration
Another consequence is that the remainder
will be shifted left one more time
Thus the final correction step must shift back only
the remainder in the left half of the register
Advanced ALU
CS465 F08
30
D. Barbara
DIVIDE HARDWARE Version 3
32-bit Divisor reg, 32 -bit ALU, 64-bit
Remainder reg, (0-bit Quotient reg)
Divisor
32 bits
32-bit ALU
“HI”
“LO”
Shift Left
Remainder (Quotient)
Write
64 bits
Advanced ALU
Control
CS465 F08
31
D. Barbara
Divide Algorithm Version 3
Step
0
1.1
1.2
1.3b
2.2
2.3b
3.2
3.3a
Remainder Div.
0000 0111 0010
0000 1110
1110 1110
0001 1100
1111 1100
0011 1000
0001 1000
0011 0001
3a. Shift the
Remainder register
to the left setting
the new rightmost
bit to 1.
4.2 0001 0001
4.3a 0010 0011
0001 0011
Advanced ALU
Start: Place Dividend in Remainder
1. Shift the Remainder register left 1 bit.
2. Subtract the Divisor register from the
left half of the Remainder register, & place the
result in the left half of the Remainder register.
Remainder ≥ 0
Test
Remainder
Remainder < 0
3b. Restore the original value by adding the Divisor
register to the left half of the Remainder register,
&place the sum in the left half of the Remainder
register. Also shift the Remainder register to the
left, setting the new least significant bit to 0.
nth
No: < n repetitions
repetition?
Yes: n repetitions (n = 4 here)
Done. Shift left half of Remainder right 1 bit
CS465 F08
32
D. Barbara
More Division Issues
Signed divides: simplest is to remember signs,
make positive, and complement quotient and
remainder if necessary
Satisfy Dividend =Quotient x Divisor + Remainder
Note: Dividend and Remainder must have same sign
Note: Quotient negated if Divisor sign & Dividend sign
disagree, e.g., –7 ÷ 2 = –3, remainder = –1
Quotient = 4 and remainder =1 is not correct
Possible for quotient to be too large
If divide 64-bit integer by 1, quotient is 64 bits (“called
saturation”)
MIPS ignores overflow, SW detection and processing
Advanced ALU
CS465 F08
33
D. Barbara
Recap
Same hardware for Divide as Multiply: just need
ALU to add or subtract, and 64-bit register to shift
left or shift right
Hi and Lo registers in MIPS combine to act as 64-bit
register for multiply and divide
div $s2, $s3
#Lo=$S2/$s3, Hi=$s2 mod $s3
Successive refinement to see final design
Booth’s algorithm to handle signed multiplies
Overflow
Both MIPS multiply and divide instructions ignore
overflow
Software must check for overflow scenarios
Advanced ALU
CS465 F08
34
D. Barbara
Outline
Implementation of MIPS ALU
Signed
and unsigned numbers (Section 3.2)
Addition and subtraction (Section 3.3)
Constructing an arithmetic logic unit (Appendix
B.5, B.6)
Multiplication (Section 3.4, CD: 3.23 In More
Depth)
Division (Section 3.5)
Floating point (Section 3.6)
Advanced ALU
CS465 F08
36
D. Barbara
Floating-Point: Motivation
What can be represented in N bits?
Unsigned
2’s Complement
1’s Complement
Binary Coded Decimal
0
to
- 2N-1 to
-2N-1+1to
0
to
2N-1
2N-1 - 1
2N-1-1
10N/4 –1
But, what about?
Very large numbers?
9,349,398,989,787,762,244,859,087,678
Very small number?
0.0000000000000000000000045691
Rational numbers
2/3
Irrational numbers
pi
Advanced ALU
CS465 F08
37
D. Barbara
Recall Scientific Notation: Binary
binary point
exponent
1.01 x 223
Significand(Mantissa)
Normalized form: no leading 0s, exactly one digit to left of
decimal/binary point
Alternatives to represent 1/1,000,000,000
radix (base)
Normalized:
1.0 x 10-9
Not normalized: 0.1 x 10-8, 10.0 x 10-10
Computer arithmetic that supports scientific notation
numbers is called floating point, because the binary point
is not fixed, as it is for integers
Advanced ALU
CS465 F08
38
D. Barbara
FP Representation
Normalized format: 1.xxxxxxxxxxtwo 2yyyytwo
Want to put it into multiple words: 32 bits for
single-precision and 64 bits for double-precision
A simple single-precision representation:
31 30
23 22
0
S Exponent
Fraction
1 bit 8 bits
23 bits
S represents sign
Exponent represents y’s
Fraction represents x’s
Represent numbers as small as 2.0 x 10-38 to
as large as 2.0 x 1038
Advanced ALU
CS465 F08
39
D. Barbara
Double Precision Representation
Next multiple of word size (64 bits)
31 30
20 19
S
Exponent
1 bit
Fraction
11 bits
0
20 bits
Fraction (cont’d)
32 bits
Double precision (vs. single precision)
Represent numbers almost as small as
2.0 x 10-308 to almost as large as 2.0 x 10308
But primary advantage is greater accuracy
due to larger fraction
Advanced ALU
CS465 F08
40
D. Barbara
IEEE 754 Standard (1/4)
Regarding single precision, DP similar
Sign bit:
1 means negative
0 means positive
Significand:
To pack more bits, leading 1 implicit for normalized
numbers
Significand = 1 + fraction: 1 + 23 bits single, 1 + 52
bits double
Always true: 0 < fraction < 1
(for normalized numbers)
Note: 0 has no leading 1, so reserve exponent
value 0 just for number 0
Advanced ALU
CS465 F08
41
D. Barbara
IEEE 754 Standard (2/4)
Exponent:
Need to represent positive and negative exponents
Also want to compare FP numbers as if they were
integers, to help in value comparisons
If use 2’s complement to represent?
e.g., 1.0 x 2-1 versus 1.0 x2+1 (1/2 versus 2)
1/2 0 1111 1111 000 0000 0000 0000 0000 0000
2 0 0000 0001 000 0000 0000 0000 0000 0000
If we use integer comparison for these two
words, we will conclude that 1/2 > 2!!!
Advanced ALU
CS465 F08
42
D. Barbara
Biased (Excess) Notation
Remapping: shift all values by subtracting a bias
from the unsigned representation
Biased 7
Bit-pattern biased-7
value
0000
0001
0010
0011
0100
0101
0110
0111
Advanced ALU
-7
-6
-5
-4
-3
-2
-1
0
Bit-pattern biased-7
value
1000
1
•All-zero
pattern is the
1001
2
smallest; all-one
1010
3
pattern is the
1011
4
largest
1100
5
•Now integer
1101
6
comparison can
be used directly
1110
7
to compare
1111
8
exponents
CS465 F08
43
D. Barbara
IEEE 754 Standard (3/4)
Use biased notation for exponents, where bias is
the number subtracted to get the real number
IEEE 754 uses bias of 127 for single precision:
subtract 127 from Exponent field to get actual value
for exponent
1023 is the bias for double precision
1/2 0 0111 1110 000 0000 0000 0000 0000 0000
2 0 1000 0000 000 0000 0000 0000 0000 0000
All-zero and all-one exponent patterns are
reserved for special numbers (Fig 3.15)
Advanced ALU
CS465 F08
44
D. Barbara
IEEE 754 Standard (4/4)
Summary (single precision):
31 30
23 22
S Exponent
1 bit
Fraction
8 bits
0
23 bits
(-1)S x (1.Fraction) x 2(Exponent-127)
Double precision identical, except longer
fraction and exponent segments with
exponent bias of 1023
Advanced ALU
CS465 F08
45
D. Barbara
Example: FP to Decimal
0 0110 1000 101 0101 0100 0011 0100 0010
Sign: 0 => positive
Exponent:
0110 1000two = 104ten
Bias adjustment: 104 - 127 = -23
Fraction:
1+2-1+2-3 +2-5 +2-7 +2-9 +2-14 +2-15 +2-17 +2-22
= 1.0 + 0.666115
Represents: 1.666115ten2-23 1.986 10-7
Advanced ALU
CS465 F08
46
D. Barbara
Example 1: Decimal to FP
Number = - 0.75
= - 0.11two 20 (scientific notation)
= - 1.1two 2-1 (normalized scientific
notation)
Sign: negative => 1
Exponent:
Bias adjustment: -1 +127 = 126
126ten = 0111 1110two
1 0111 1110 100 0000 0000 0000 0000 0000
Advanced ALU
CS465 F08
47
D. Barbara
Example 2: Decimal to FP
A more difficult case: representing 1/3?
= 0.33333…10 = 0.0101010101… 2 20
= 1.0101010101… 2 2-2
Sign: 0
Exponent = -2 + 127 = 12510=011111012
Fraction = 0101010101…
0 0111 1101 0101 0101 0101 0101 0101 010
Advanced ALU
CS465 F08
48
D. Barbara
Special Numbers
What have we defined so far? (single precision)
Exponent
0
0
1-254
255
255
Fraction
0
nonzero
anything
0
nonzero
Object
0
???
+/- floating-point
???
???
Range:
1.0 2-126 1.8 10-38
What if result too small? (>0, < 1.8x10-38 =>
Underflow!)
(2 – 2-23) 2127 3.4 1038
What if result too large? (> 3.4x1038 => Overflow!)
Advanced ALU
CS465 F08
49
D. Barbara
Denormalized Numbers
For single-precision f. p. (page 217)
0 is 00000000000000000000000000000000
Smallest positive normalized number is
1.0000 0000 0000 0000 0000 000 x 2-126
This is actually a pretty big gap...
Gradual underflow: allow a number to degrade in
significance until it becomes 0
By getting rid of implicit leading 1 in front of the
fraction... we can represent a smaller number
We denote a denormalized number by 0 exponent and
a non-zero fraction
The smallest denormalized number is
0.0000 0000 0000 0000 0000 001 x 2-126 or 1.0 x 2-149
For double-precision f. p.
Smallest denormalized number: 1.0 x 2-1074
Advanced ALU
CS465 F08
50
D. Barbara
Special Numbers
What have we defined so far? (single
precision)
Exponent
0
0
1-254
255
255
Advanced ALU
Fraction
0
nonzero
anything
0
nonzero
CS465 F08
Object
0
denorm
+/- floating-point
???
???
51
D. Barbara
Representation for +/- Infinity
In FP, divide by zero should produce +/infinity, not overflow
Why?
OK
to do further computations with infinity, e.g.,
X/0 > Y may be a valid comparison
IEEE 754 represents +/- infinity by all-one
exponent and all-zero fraction
S 1111 1111 0000 0000 0000 0000 0000 000
Advanced ALU
CS465 F08
52
D. Barbara
Representation for Not a Number
What do I get if I calculate sqrt(-4.0) or 0/0?
If infinity is not an error, these should not be either
They are called Not a Number (NaN)
IEEE 754: all-one Exponent (255), nonzero Fraction
Why is this useful?
A symbol for invalid operations: allow programmers to
postpone some tests and decisions to later time
They contaminate: op(NaN,X) = NaN
Advanced ALU
CS465 F08
53
D. Barbara
Special Numbers
What have we defined so far? (single-precision)
Exponent
0
0
1-254
255
255
Fraction
0
nonzero
anything
0
nonzero
Object
0
denom
+/- fl. pt. #
+/- infinity
NaN
Fig. 3.15
Advanced ALU
CS465 F08
54
D. Barbara
FP Operations Complexities
Operations are somewhat more complicated
In addition to overflow we can have “underflow”
Absolute value too small
Accuracy can be a big problem
IEEE 754 keeps two extra bits, guard and round
Four rounding modes
Special cases:
Positive divided by zero yields “infinity”
Zero divide by zero yields “not a number”
Implementing the standard can be tricky
Not using the standard can be even worse
See text for description of 80x86 and Pentium bug!
Advanced ALU
CS465 F08
55
D. Barbara
Basic Addition Algorithm
(1) We'll need to convert from fraction to significand
(2) We'll need to shift the smaller number so that we have
the same exponent:
num of shifts= Larger Number's exponent - Smaller Number's exponent
-- right shift smaller number that many positions
(3) Add or subtract the two numbers
(4) Normalize:
a. left shift result, decrement result exponent (e.g. 0.0001)
b. right shift result, increment result exponent (e.g. 101.1)
continue until MSB of data is 1 (will not be stored)
(5) Check for overflow or underflow
(6) If the result is a 0 significand, set the exponent to zero
by special step
Advanced ALU
CS465 F08
56
D. Barbara
Floating Point Addition Example
Example: Add 9.999 101 and 1.610 10-1 assuming 4 decimal digits
Align decimal point of number with smaller exponent
1.610 10-1 = 0.161 100 = 0.0161 101
Shift smaller number to right
Add significands
9.999
+ 0.016
10.015
SUM = 10.015 101
NOTE: One digit of precision lost during shifting
Shift sum to put it in normalized form 1.0015 102
Since significand only has 4 digits, we need to round the
sum
SUM = 1.002 102
NOTE: normalization maybe needed again after
rounding, e.g, rounding 9.9999 you get 10.000
Advanced ALU
CS465 F08
57
D. Barbara
Start
FP Addition
Algorithm: Fig 3.16
1. Compare the exponents of the two numbers.
Shift the smaller number to the right until its
exponent would match the larger exponent
2. Add the significands
3. Normalize the sum, either shifting right and
incrementing the exponent or shifting left
and decrementing the exponent
Overflow or
underflow?
Yes
No
Exception
4. Round the significand to the appropriate
number of bits
No
Still normalized?
Yes
Done
Advanced ALU
CS465 F08
58
D. Barbara
Sign
Exponent
Significand
Sign
Exponent
Significand
Compare
exponents
Small ALU
Exponent
difference
0
1
0
Control
1
0
Shift smaller
number right
Shift right
Big ALU
0
1
0
Increment or
decrement
Advanced ALU
Exponent
Add
1
Shift left or right
Rounding hardware
Sign
1
Significand
CS465 F08
Normalize
Round
Fig 3.17
59
D. Barbara
Accurate Arithmetic
Round accurately requires HW to include extra
bits in the calculation
IEEE 754 standard specifies the use of 2 extra
bits on the right during intermediate calculations
– Guard bit and Round bit
Example: Add 2.56 100 and 2.34 102,
assuming 3 significant digits
Shift the smaller number
Without guard and round bits
2.56 100 = 0.0256 102(guard holds 5 and round holds 6)
2.34+ 0.02 = 2.36 102
With guard and round bits
2.34+ 0.0256 = 2.3656 102
ROUND 2.37102
Advanced ALU
CS465 F08
60
D. Barbara
Floating-Point Multiplication
Algorithm: Z = X * Y
Product exponent = X's exponent + Y's
exponent
Doubly-biased exponent must be corrected
Product exponent = sum of stored exponents – bias
Xe = 7
Ye = -3
Bias 8
Xe = 1111
Ye = 0101
10100
= 15
= 5
20
= 7+8
= -3 + 8
4+8+8
Multiply
the significands: Zm = Xm x Ym
Normalization, rounding and exception
checking (overflow and underflow)
Determine the sign bit
Advanced ALU
CS465 F08
61
D. Barbara
MIPS Floating-Point Instructions
Floating-point instructions
Computation: add.s, add.d, sub.s, sub.d, mul.s, mul.d,
div.s, div.d
Comparison: c.x.s, c.x.d (x=eq, neq, lt, le, gt, ge)
FP conditional branch: bc1t, bc1f
Floating-point data transfer: lwc1, swc1, ldc1, sdc1
32 floating-point registers: $f0, …, $f31
Double precision: by convention, even/odd pair
contain one DP FP number: $f0/$f1, $f2/$f3
See page 206 for details
Advanced ALU
CS465 F08
62
D. Barbara
Summary
Multiplication
Refined
hardware and algorithms
Booth’s algorithm
Division
Similar
hardware as multiplication
Floating point
Representing
numbers
Addition and multiplication
Precision is a big issue
Advanced ALU
CS465 F08
63
D. Barbara
Next Lecture
Topic:
Assessing
and understanding performance
Reading
Ch4
Patterson and Hennessy
Advanced ALU
CS465 F08
64
D. Barbara