Chapter 4

Transcript Chapter 4

ECE200 – Computer Organization
Chapter 4 – Arithmetic
for Computers
Homework 4

4.1, 4.2, 4.8, 4.9, 4.17, 4.23, 4.25, 4.36, 4.40,
4.42, 4.50, 4.53, 4.55
Outline for Chapter 4 lectures

Some arithmetic fundamentals

Division of hardware into integer and floating
point computation

Designing an integer Arithmetic Logic Unit (ALU)
for MIPS

Integer addition

Integer multiplication

Integer division

Floating point representation

Floating point computation

MIPS floating point registers

MIPS floating point instructions
Some arithmetic fundamentals

Unsigned numbers: values are always positive
 Example:

1000 10102 = 27+23+21=14610
Signed numbers: two’s complement notation
 Example:
1000 10102 = -27+23+21=-11810
 Leftmost bit called the sign bit


Positive numbers have a sign bit of 0, negative numbers a sign bit of 1
Sign extending a number: replicate the most
significant bit the number of times needed
 Example:
1000 10102 is the same as 1111 1111 1000 10102
Some arithmetic fundamentals

Negating a two’s complement number: invert
(NOT) all bits and add 1
 Example:

negative of 1000 1010 = 0111 0101 + 1 =
0111 0110 = 118
Logical operations
 AND,
basis

OR, NOR, XOR perform logic function on a bit-by-bit
Example: 1000 1010 AND 1101 0110 = 1000 0010
 Also
arithmetic/logical shift left/right
Basic hardware building blocks
Integer and floating point computation

Most general-purpose ISAs specify separate
integer and floating point register files
 Operand
representation formats differ
 Computation hardware differs

Result is a split of the execution core into
integer and floating point sections
P
C
instruction
memory
integer
register
file
flt pt
register
file
integer
ALU
integer
multiplier
flt pt
adder
flt pt
multiplier
data
memory
Recall the 604e microprocessor
integer register
file and
execution units
flt pt register file
and execution
units
Designing an integer ALU for MIPS

ALU = Arithmetic Logic Unit

Performs single cycle execution of simple
integer instructions

Supports add, subtract, logical, set less than,
and equality test for beq and bne
 Both
signed and unsigned versions of add, sub, and slt
ALU block diagram

Inputs
 a,b:
the data (operands) to be operated on
 ALU operation: the operation to be performed

Outputs
 Result:
the result of the operation
 Zero: indicates if the Result = 0 (for beq, bne)
 CarryOut: the carry out of an addition operation
 Overflow: indicates if an add or sub had an overflow (later)
Basic integer addition

Pencil-and-paper binary addition
(CarryIn)
a
b
(CarryOut) Sum

Full adder sum and carry equations for each bit
Sum  a  b  CarryIn  a  b  CarryIn  a  b  CarryIn  a  b  CarryIn  a  b  CarryIn
CarryOut  a  b  a  CarryIn  b  CarryIn
1-bit ALU bit-slice

Bit-slice design: create a building block of part

ALU bit-slice supporting addition, AND, OR
of the datapath (1 bit in our example) and
replicate it to form the entire datapath
 1-bit
AND, 1-bit OR, 1-bit full add of a and b always calculated
 Operation input (2 bits) determines which of these passes
through the MUX and appears at Result
 CarryIn is from previous bit-slice
 CarryOut goes to next bit-slice
Creating a 32-bit ALU from 1-bit bit-slices
What should we set this to when Operation=10 (add)?
(LSB)
(MSB)

Performs ripple carry addition
 Carry
follows serial path from bit 0 to bit 31 (slow)
Handling subtraction

Perform a+(-b)

Recall that to negate b we
 Invert
(NOT) all bits
 Add a 1


Set CarryIn input of LSB bitslice to 1
New bit-slice design
Implementing bne, beq

Need to detect if a=b

Detected by determining if a-b=0
 Perform
a+(-b)
 NOR all Result bits to detect Result=0
Implementing Set Less Than (slt)

Result=1 if a<b, otherwise Result=0

All bits except bit 0 are set to zero through a
new bit-slice input called Less that goes to a 4th
input on the bit-slice MUX
What do we input here?
Less
Set to zero for all but LSB
3
Implementing Set Less Than (slt)

Set bit 0 to one if the result of a-b is negative
and to zero otherwise
 a-b=negative
 Feed
Less
number implies that a<b
the adder output of bit 31 to the Less input of bit 0
3
Set
Only used for MSB
Full 32-bit ALU design
Full 32-bit ALU design

Bnegate controls CarryIn input to bit 0 and
Binvert input to all bit-slices
 Both
are 1 for subtract and 0 otherwise, so a single signal can
be used

NOR, XOR, shift operations would also be
included in a MIPS implementation
Overflow

Overflow occurs when the result from an

For signed addition, overflow occurs when
operation cannot be represented with the
number of available bits (32 in our ALU)
 Adding

Example: 01110000…+00010000…=1000000…
 Adding


two positive numbers gives a negative result
two negative numbers gives a positive result
Example: 10000000…+10000000…=0000000…
For signed subtraction, overflow occurs when
 Subtracting
a negative from a positive number gives a
negative result
 Subtracting a positive from a negative number gives a
positive result
Overflow

Overflow on unsigned arithmetic, which is
primarily used for manipulating addresses, is
ignored in many ISAs (including MIPS)

Overflow on signed arithmetic causes an
interrupt to deal with the problem (Chapter 5)

Overflow detection: XOR CarryIn of MSB with
CarryOut of MSB (problem 4.42)
Faster carry generation

Ripple carry addition is too slow for wide adders

Some alternative faster parallel schemes
 Carry
lookahead
 Carry skip
 Carry select

Cost is more hardware!
Carry lookahead addition

Two ways in which carry=1 from the ith bit-slice
 Generated

if both ai and bi are 1
gi  ai  bi
CarryIn (ci) of 1 is propagated if ai or bi are 1
 pi  ai  bi
A
Are these carries generated or propagated?
Carry lookahead addition

Two ways in which carry=1 from the ith bit-slice
 Generated

if both ai and bi are 1
gi  ai  bi
CarryIn (ci) of 1 is propagated if ai or bi are 1
 pi  ai  bi
A

The carry out, ci+1, is therefore
ci 1  gi  pi  ci

Using substitution, can get carries in parallel
c1  g 0  p 0  c0
c 2  g1  p1  g 0  p1  p 0  c0
c3  g 2  p 2  g1  p 2  p1  g 0  p 2  p1  p 0  c0
c 4  g 3  p3  g 2  p3  p 2  g1  p3  p 2  p1  g 0  p3  p 2  p1  p 0  c0
.
.
.
Carry lookahead addition

Drawbacks
 n+1
input OR and AND gates for nth
input
 Irregular structure with many long
wires

Solution: do two levels of
carry lookahead
 First
level generates Result (using
carry lookahead to generate the
internal carries) and propagate and
generate signals for a group of 4 bits
(Pi and Gi)
 Second
level generates carry out’s for
each group based on carry in and Pi
and Gi from previous group
Carry lookahead addition

Internal equations for group 0
gi  ai  bi
pi  ai  bi
ci 1  gi  pi  ci

Group equations for group 0
G 0  g 3  p3  g 2  p3  p 2  g1  p3  p 2  p1  g 0
P0  p3  p 2  p1  p0

C1 output of carry lookahead
unit
C1  G0  P0  c0
Carry lookahead addition

Internal equations for group 1
gi  ai  bi
pi  ai  bi
ci 1  gi  pi  ci

Group equations for group 1
G1  g 7  p7  g 6  p7  p6  g 5  p7  p6  p5  g 4
P1  p7  p 6  p5  p 4

C2 output of carry lookahead
unit
C2  G1  P1 G0  P1 P0  c0
Carry skip addition

CLA group generate (Gi) hardware is complex

Carry skip addition
 Generate
Gi’s by ripple carry of a and b inputs with cin’s = 0
(except c0)
 Generate
Pi’s as in carry lookahead
 For
each group, combine Gi, Pi, and cin as in carry lookahead
to form group carries
 Generate
carries
sums by ripple carry from a and b inputs and group
Carry skip addition

Operation
 Generate
Gi’s through ripple carry
Carry skip addition

Operation
 Generate
 In
Gi’s through ripple carry
parallel, generate Pi’s as in carry lookahead
Carry skip addition

Operation
 Group
carries to each block are generated or propagated
Carry skip addition

Operation
 Group
 Sums
carries to each block are generated or propagated
are generated in parallel from group carries
Carry select addition

For each group, do two additions in parallel
 One
with cin forced to 0
 One with cin forced to 1

Generate cin in parallel and use a MUX to select
the correct sum outputs

Example for 8 bits
MUXes
Carry select addition

A larger design
generated carry
propagated carry
 Why


different numbers of bits in each block?
Hint1: it’s to minimize the adder delay
Hint2: assume a k-input block has k time units of delay, and the ANDOR logic has 1 time unit of delay
Time and space comparison of adders
(worst)
(best)
(worst)

Differences only matter for large number of bits

Other factors
 Ripple
carry is the most regular structure (most amenable to
VLSI implementation, but any can be used in practice)
 Carry skip requires clearing cin’s at start of operation (such as
dynamic CMOS logic)
 Carry select requires driving many MUXes
Integer multiplication

Pencil and paper binary multiplication
1000 (multiplicand)
x 1001 (multiplier)
Integer multiplication

Pencil and paper binary multiplication
1000 (multiplicand)
x 1001 (multiplier)
1000
Integer multiplication

Pencil and paper binary multiplication
1000 (multiplicand)
x 1001 (multiplier)
1000
00000
Integer multiplication

Pencil and paper binary multiplication
1000 (multiplicand)
x 1001 (multiplier)
1000
00000
000000
Integer multiplication

Pencil and paper binary multiplication
1000 (multiplicand)
x 1001 (multiplier)
1000
00000
000000
1000000
Integer multiplication

Pencil and paper binary multiplication
1000 (multiplicand)
x 1001 (multiplier)
1000
00000
(partial products)
000000
+1000000
1001000 (product)
Integer multiplication

Pencil and paper binary multiplication
1000 (multiplicand)
x 1001 (multiplier)
1000
00000
(partial products)
000000
+1000000
1001000 (product)

Key elements
 Examine
multiplier bits from right to left
 Shift multiplicand left one position each step
 Simplification: each step, add multiplicand to running product
total, but only if multiplier bit = 1
Integer multiplication

Initialize product register to 0
1000 (multiplicand)
1001 (multiplier)
00000000 (running product)
Integer multiplication

Multiplier bit = 1: add multiplicand to product
1000 (multiplicand)
1001 (multiplier)
00000000
+1000
00001000 (new running product)
Integer multiplication

Shift multiplicand left
10000
1001
00000000
+1000
00001000
(multiplicand)
(multiplier)
Integer multiplication

Multiplier bit = 0: do nothing
10000
1001
00000000
+1000
00001000
(multiplicand)
(multiplier)
Integer multiplication

Shift multiplicand left
100000
1001
00000000
+1000
00001000
(multiplicand)
(multiplier)
Integer multiplication

Multiplier bit = 0: do nothing
100000
1001
00000000
+1000
00001000
(multiplicand)
(multiplier)
Integer multiplication

Shift multiplicand left
1000000
1001
00000000
+1000
00001000
(multiplicand)
(multiplier)
Integer multiplication

Multiplier bit = 1: add multiplicand to product
1000000 (multiplicand)
1001 (multiplier)
00000000
+1000
00001000
+1000000
01001000 (product)
Integer multiplication

32-bit hardware implementation
LSB
 Multiplicand
loaded into right half of multiplicand register
 Product register initialized to all 0’s
 Repeat the following 32 times



If multiplier register LSB=1, add multiplicand to product
Shift multiplicand one bit left
Shift multiplier one bit right
Integer multiplication

Algorithm
Integer multiplication

Drawback: half of 64-bit multiplicand register
are zeros
 Half

of 64 bit adder is adding zeros
Solution: shift product right instead of
multiplicand left
 Only
left half of product register added to multiplicand
Integer multiplication

Drawback: half of 64-bit multiplicand register
are zeros
 Half

of 64 bit adder is adding zeros
Solution: shift product right instead of
multiplicand left
 Only
left half of product register added to multiplicand
1000 (multiplicand)
1001 (multiplier)
00000000 (running product)
Integer multiplication

Drawback: half of 64-bit multiplicand register
are zeros
 Half

of 64 bit adder is adding zeros
Solution: shift product right instead of
multiplicand left
 Only
left half of product register added to multiplicand
1000 (multiplicand)
1001 (multiplier)
00000000
+1000
10000000 (new running product)
Integer multiplication

Drawback: half of 64-bit multiplicand register
are zeros
 Half

of 64 bit adder is adding zeros
Solution: shift product right instead of
multiplicand left
 Only
left half of product register added to multiplicand
1000 (multiplicand)
1001 (multiplier)
00000000
+1000
01000000 (new running product)
Integer multiplication

Drawback: half of 64-bit multiplicand register
are zeros
 Half

of 64 bit adder is adding zeros
Solution: shift product right instead of
multiplicand left
 Only
left half of product register added to multiplicand
1000 (multiplicand)
1001 (multiplier)
00000000
+1000
01000000 (new running product)
Integer multiplication

Drawback: half of 64-bit multiplicand register
are zeros
 Half

of 64 bit adder is adding zeros
Solution: shift product right instead of
multiplicand left
 Only
left half of product register added to multiplicand
1000 (multiplicand)
1001 (multiplier)
00000000
+1000
00100000 (new running product)
Integer multiplication

Drawback: half of 64-bit multiplicand register
are zeros
 Half

of 64 bit adder is adding zeros
Solution: shift product right instead of
multiplicand left
 Only
left half of product register added to multiplicand
1000 (multiplicand)
1001 (multiplier)
00000000
+1000
00100000 (new running product)
Integer multiplication

Drawback: half of 64-bit multiplicand register
are zeros
 Half

of 64 bit adder is adding zeros
Solution: shift product right instead of
multiplicand left
 Only
left half of product register added to multiplicand
1000 (multiplicand)
1001 (multiplier)
00000000
+1000
00010000 (new running product)
Integer multiplication

Drawback: half of 64-bit multiplicand register
are zeros
 Half

of 64 bit adder is adding zeros
Solution: shift product right instead of
multiplicand left
 Only
left half of product register added to multiplicand
1000 (multiplicand)
1001 (multiplier)
00000000
+1000
00010000
+1000
10010000 (new running product)
Integer multiplication

Drawback: half of 64-bit multiplicand register
are zeros
 Half

of 64 bit adder is adding zeros
Solution: shift product right instead of
multiplicand left
 Only
left half of product register added to multiplicand
1000 (multiplicand)
1001 (multiplier)
00000000
+1000
00010000
+1000
01001000 (product)
Integer multiplication

Hardware implementation
Integer multiplication

Final improvement: use right half of product
register for the multiplier
Integer multiplication

Final algorithm
Multiplication of signed numbers

Naïve approach
 Convert
to positive numbers
 Multiply
 Negate
product if multiplier and multiplicand signs differ
 Slow and extra hardware
Multiplication of signed numbers

Booth’s algorithm
 Invented


for speed
Shifting was faster than addition at the time
Objective: reduce the number of additions required
 Fortunately,
it works for signed numbers as well
 Basic
idea: the additions from a string of 1’s in the multiplier
can be converted to a single addition and a single subtraction
operation
 Example:
00111110 is equivalent to 01000000 – 00000010
requires additions for each of
these bit positions
requires an addition for this
bit position
and a subtraction for this bit
position
Booth’s algorithm

Starting from right to left, look at two adjacent
bits of the multiplier
 Place
a zero at the right of the LSB to start

If bits = 00, do nothing

If bits = 10, subtract the multiplicand from the
product
 Beginning

If bits = 11, do nothing
 Middle

of a string of 1’s
If bits = 01, add the multiplicand to the product
 End

of a string of 1’s
of a string of 1’s
Shift product register right one bit
Booth recoding

Example
0010 (multiplicand)
x 1101 (multiplier)
Booth recoding

Example
0010 (multiplicand)
00001101 0 (product+multiplier)
extra bit
position
Booth recoding

Example
0010 (multiplicand)
00001101 0
+1110
11101101 0
Booth recoding

Example
0010 (multiplicand)
00001101 0
+1110
11110110 1
Booth recoding

Example
0010 (multiplicand)
00001101 0
+1110
11110110 1
+0010
00010110 1
Booth recoding

Example
0010 (multiplicand)
00001101 0
+1110
11110110 1
+0010
00001011 0
Booth recoding

Example
0010 (multiplicand)
00001101 0
+1110
11110110 1
+0010
00001011 0
+1110
11101011 0
Booth recoding

Example
0010 (multiplicand)
00001101 0
+1110
11110110 1
+0010
00001011 0
+1110
11110101 1
Booth recoding

Example
0010 (multiplicand)
00001101 0
+1110
11110110 1
+0010
00001011 0
+1110
11110101 1
Booth recoding

Example
0010 (multiplicand)
00001101 0
+1110
11110110 1
+0010
00001011 0
+1110
11111010 1 (product)
Integer division

Pencil and paper binary division
(divisor) 1000 01001000
(dividend)
Integer division

Pencil and paper binary division
1
(divisor) 1000 01001000
- 1000
0001
(dividend)
(partial remainder)
Integer division

Pencil and paper binary division
1
(divisor) 1000 01001000
- 1000
00010
(dividend)
Integer division

Pencil and paper binary division
10
(divisor) 1000 01001000
- 1000
00010
(dividend)
Integer division

Pencil and paper binary division
10
(divisor) 1000 01001000
- 1000
000100
(dividend)
Integer division

Pencil and paper binary division
100
(divisor) 1000 01001000
- 1000
000100
(dividend)
Integer division

Pencil and paper binary division
100
(divisor) 1000 01001000
- 1000
0001000
(dividend)
Integer division

Pencil and paper binary division
1001
(divisor) 1000 01001000
- 1000
0001000
- 0001000
0000000
(quotient)
(dividend)
(remainder)
Integer division

Pencil and paper binary division
1001
(divisor) 1000 01001000
- 1000
0001000
- 0001000
0000000

(quotient)
(dividend)
(remainder)
Steps in hardware
 Shift
the dividend left one position
 Subtract the divisor from the left half of the dividend
 If result positive, shift left a 1 into the quotient
 Else, shift left a 0 into the quotient, and repeat from the
beginning
 Once the result is positive, repeat the process for the partial
remainder
 Do n iterations where n is the size of the divisor
Integer division

Initial state
(divisor) 1000
01001000
(dividend)
0000
(quotient)
Integer division

Shift dividend left one position
(divisor) 1000
10010000
(dividend)
0000
(quotient)
Integer division

Subtract divisor from left half of dividend
(divisor) 1000
10010000
- 1000
00010000
(dividend)
(keep these bits)
0000
(quotient)
Integer division

Result positive, left shift a 1 into the quotient
(divisor) 1000
10010000
- 1000
00010000
(dividend)
0001
(quotient)
Integer division

Shift partial remainder left one position
(divisor) 1000
10010000
- 1000
00100000
(dividend)
0001
(quotient)
Integer division

Subtract divisor from left half of partial remainder
(divisor) 1000
10010000
- 1000
00100000
- 1000
11010000
(dividend)
0001
(quotient)
Integer division

Result negative, left shift 0 into quotient
(divisor) 1000
10010000
- 1000
00100000
- 1000
11010000
(dividend)
0010
(quotient)
Integer division

Restore original partial remainder (how?)
(divisor) 1000
10010000
- 1000
00100000
- 1000
11010000
00100000
(dividend)
0010
(quotient)
Integer division

Shift partial remainder left one position
(divisor) 1000
10010000
- 1000
00100000
- 1000
11010000
01000000
(dividend)
0010
(quotient)
Integer division

Subtract divisor from left half of partial remainder
(divisor) 1000
10010000
- 1000
00100000
- 1000
11010000
01000000
- 1000
11000000
(dividend)
0010
(quotient)
Integer division

Result negative, left shift 0 into quotient
(divisor) 1000
10010000
- 1000
00100000
- 1000
11010000
01000000
- 1000
11000000
(dividend)
0100
(quotient)
Integer division

Restore original partial remainder
(divisor) 1000
10010000
- 1000
00100000
- 1000
11010000
01000000
- 1000
11000000
01000000
(dividend)
0100
(quotient)
Integer division

Shift partial remainder left one position
(divisor) 1000
10010000
- 1000
00100000
- 1000
11010000
01000000
- 1000
11000000
10000000
(dividend)
0100
(quotient)
Integer division

Subtract divisor from left half of partial remainder
(divisor) 1000
10010000
- 1000
00100000
- 1000
11010000
01000000
- 1000
11000000
10000000
- 1000
00000000
(dividend)
0100
(quotient)
Integer division

Result positive, left shift 1 into quotient
(divisor) 1000
10010000
- 1000
00100000
- 1000
11010000
(dividend)
01000000
- 1000
11000000
10000000
- 1000
00000000 (remainder)
1001
(quotient)
Integer division

Hardware implementation
What operations do
we do here?
Load dividend here initially
Integer and floating point revisited
instruction
memory
P
C
integer
register
file
HI
LO
flt pt
register
file
integer
ALU
data
memory
integer
multiplier
flt pt
adder
flt pt
multiplier

Integer ALU handles add, subtract, logical, set
less than, equality test, and effective address
calculations

Integer multiplier handles multiply and divide
 HI
and LO registers hold result of integer multiply and
divide
Floating point representation

Floating point (fp) numbers represent reals
 Example
reals: 5.6745, 1.23 x 10-19, 345.67 x 106
 Floats and doubles in C

Fp numbers are in signed magnitude
representation of the form (-1)S x M x BE where
is the sign bit (0=positive, 1=negative)
 M is the mantissa (also called the significand)
 B is the base (implied)
 E is the exponent
 Example: 22.34 x 10-4
S




S=0
M=22.34
B=10
E=-4
Floating point representation

Fp numbers are normalized in that M has only
one digit to the left of the “decimal point”
 Between
1.0 and 9.9999… in decimal
 Between 1.0 and 1.1111… in binary
 Simplifies fp arithmetic and comparisons
 Normalized: 5.6745 x 102, 1.23 x 10-19
 Not normalized: 345.67 x 106 , 22.34 x 10-4 , 0.123 x 10-45
 In binary format, normalized numbers are of the form
(-1)S x 1.M x BE

Leading 1 in 1.M is implied
Floating point representation tradeoffs

Representing a wide enough range of fp values
with enough precision (“decimal” places) given
limited bits
(-1)S x 1.M x BE
32 bits
S
E??
M??
 More
E bits increases the range
 More M bits increases the precision
 A larger B increases the range but decreases the precision
 The distance between consecutive fp numbers is not constant!
…
…
BE
BE+1
BE+2
Floating point representation tradeoffs

Allowing for fast arithmetic implementations
 Different
exponents requires lining up the significands; larger
base increases the probability of equal exponents

Handling very small and very large numbers
representable negative
numbers (S=1)
exponent
overflow
0
exponent
underflow
representable positive
numbers (S=0)
exponent
overflow
Sorting/comparing fp numbers

fp numbers can be treated as integers for sorting
and comparing purposes if E is placed to the left
(-1)S x 1.M x BE
S
E
bigger E is
bigger number

M
If E’s are same,
bigger M is
bigger number
Example
 3.67
x 106 > 6.34 x 10-4 > 1.23 x 10-4
Biased exponent notation

111…111 represents the most positive E and
000…000 represents the most negative E for
sorting/comparing purposes

To get correct signed value for E, need to
subtract a bias of 011…111

Biased fp numbers are of the form
(-1)S x 1.M x BE-bias

Example: assume 8 bits for E
 Bias
is 01111111 = 127
 Largest E represented by 11111111 which is
255 – 127 = 128
 Smallest E represented by 00000000 which is
0 – 127 = -127
IEEE 754 floating point standard

Created in 1985 in response to the wide range of
fp formats used by different companies
 Has
greatly improved portability of scientific applications

B=2

Single precision (sp) format (“float” in C)
S
E
M
1 bit 8 bits

23 bits
Double precision (dp) format (“double” in C)
S
1 bit
E
M
11 bits
52 bits
IEEE 754 floating point standard

Exponent bias is 127 for sp and 1023 for dp

Fp numbers are of the form (-1)S x 1.M x 2E-bias
1
in mantissa and base of 2 are implied
 Sp
form is
(-1)S x 1.M22 M21 …M0 x 2E-127
and value is
(-1)S x (1+(M22x2-1) +(M21x2-2)+…+(M0x2-23)) x 2E-127

Sp example
1 00000001
S
 Number
E
1000…000
M
is –1.1000…000 x 21-127=-1.5 x 2-126=1.763 x 10-38
IEEE 754 floating point standard

Denormalized numbers
 Allow
for representation of very small numbers
representable negative
numbers
exponent
overflow
representable positive
numbers
exponent
underflow
 Identified
 Format
0
by E=0 and a non-zero M
is (-1)S x 0.M x 2-bias-1
 Smallest
positive dp denormalized number is
0.00…01 x 2-1022 = 2-2074
smallest positive dp normalized number is 1.0 x 2-1022
 Hardware
software
support is complex, and so often handled by
exponent
overflow
Floating point addition

Make both exponents the same
 Find
the number with the smaller one
 Shift its mantissa to the right until the exponents match

Must include the implicit 1 (1.M)

Add the mantissas

Choose the largest exponent

Put the result in normalized form
 Shift
mantissa left or right until in form 1.M
 Adjust exponent accordingly

Handle overflow or underflow if necessary

Round

Renormalize if necessary if rounding produced
an unnormalized result
Floating point addition

Algorithm
Floating point addition example

Initial values
1 00000001
S
E
0 00000011
S
E
0000…01100
M
0100…00111
M
Floating point addition example

Identify smaller E and calculate E difference
1 00000001
S
E
0000…01100
M
difference = 2
0 00000011
S
E
0100…00111
M
Floating point addition example

Shift smaller M right by E difference
1 00000001
S
E
0 00000011
S
E
0100…00011
M
0100…00111
M
Floating point addition example

Add mantissas
1 00000001
S
E
0 00000011
S
E
0100…00011
M
0100…00111
M
-0.0100…00011 + 1.0100…00111 =
1.0000…00100
0
S
0000…00100
E
M
Floating point addition example

Choose larger exponent for result
1 00000001
S
E
0 00000011
S
E
0 00000011
S
E
0100…00011
M
0100…00111
M
0000…00100
M
Floating point addition example

Final answer (already normalized)
1 00000001
S
E
0 00000011
S
E
0 00000011
S
E
0100…00011
M
0100…00111
M
0000…00100
M
Floating point addition

Hardware design
determine
smaller
exponent
Floating point addition

Hardware design
shift mantissa
of smaller
number right
by exponent
difference
Floating point addition

Hardware design
add mantissas
Floating point addition

Hardware design
normalize result by
shifting mantissa of
result and adjusting
larger exponent
Floating point addition

Hardware design
round result
Floating point addition

Hardware design
renormalize if
necessary
Floating point multiply

Add the exponents and subtract the bias from
the sum
 Example:
(5+127) + (2+127) – 127 = 7+127

Multiply the mantissas

Put the result in normalized form
 Shift
mantissa left or right until in form 1.M
 Adjust exponent accordingly

Handle overflow or underflow if necessary

Round

Renormalize if necessary if rounding produced
an unnormalized result

Set S=0 if signs of both operands the same, S=1
otherwise
Floating point multiply

Algorithm
Floating point multiply example

Initial values
1 00000111
S
E
0 11100000
S
E
1000…00000
-1.5 x 27-127
M
1000…00000
M
1.5 x 2224-127
Floating point multiply example

Add exponents
1 00000111
S
E
0 11100000
S
E
1000…00000
-1.5 x 27-127
M
1000…00000
M
00000111 + 11100000 = 11100111 (231)
1.5 x 2224-127
Floating point multiply example

Subtract bias
1 00000111
S
E
0 11100000
S
E
1000…00000
-1.5 x 27-127
M
1000…00000
1.5 x 2224-127
M
11100111 – 01111111 = 11100111 + 10000001 = 01101000
(104)
01101000
S
E
M
Floating point multiply example

Multiply the mantissas
1 00000111
S
E
0 11100000
S
E
1000…00000
M
1000…00000
M
1.1000… x 1.1000… = 10.01000…
01101000
S
E
-1.5 x 27-127
M
1.5 x 2224-127
Floating point multiply example

Normalize by shifting 1.M right one position and
adding one to E
1 00000111
S
E
0 11100000
S
E
1000…00000
M
1000…00000
M
10.01000… => 1.001000…
01101001
S
E
-1.5 x 27-127
001000…
M
1.5 x 2224-127
Floating point multiply example

Set S=1 since signs are different
1 00000111
S
E
0 11100000
S
E
1 01101001
S
E
1000…00000
-1.5 x 27-127
M
1000…00000
1.5 x 2224-127
M
001000…
M
-1.125 x 2105-127
Rounding



Fp arithmetic operations may produce a result
with more digits than can be represented in 1.M
The result must be rounded to fit into the
available number of M positions
Tradeoff of hardware cost (keeping extra bits)
and speed versus accumulated rounding error
Rounding

Examples from decimal multiplication

Renormalization is required after rounding in c)
Rounding

Examples from binary multiplication (assuming
two bits for M)
1.01 x 1.01 = 1.1001
(1.25 x 1.25 = 1.5625)
1.11 x 1.01 = 10.0011
(1.75 x 1.25 = 2.1875)
Result has twice as many bits
1.10 x 1.01 = 1.111
May require renormalization
after rounding
(1.5 x 1.25 = 1.875)
Rounding

In binary, an extra bit of 1 is halfway in between
the two possible representations
1.001 (1.125) is halfway between 1.00 (1) and 1.01 (1.25)
1.101 (1.625) is halfway between 1.10 (1.5) and 1.11 (1.75)
IEEE 754 rounding modes

Truncate
 Remove
all digits beyond those supported
 1.00100 -> 1.00

Round up to the next value
 1.00100

-> 1.01
Round down to the previous value
 1.00100
-> 1.00
 Differs from Truncate for negative numbers

Round-to-nearest-even
 Rounds
to the even value (the one with an LSB of 0)
 1.00100 -> 1.00
 1.01100 -> 1.10
 Produces zero average bias
 Default mode
Implementing rounding

A product may have twice as many digits as the
multiplier and multiplicand
 1.11

x 1.01 = 10.0011
For round-to-nearest-even, we need to know
value to the right of the LSB (round bit)
 Whether any other digits to the right of the round digit are 1’s
 The

The sticky bit is the OR of these digits
LSB of final rounded result
1.00101
rounds to
1.01
Round bit Sticky bit = 0 OR 1 = 1
1.00100
rounds to
1.00
Implementing rounding

The product before normalization may have 2
digits to the left of the binary point

Product register format needs to be
bb.bbbb…

Two possible cases
1b.bbbb…
01.bbbb…
r sssss…
r sssss…
Need this as a result bit!
Implementing rounding


The guard bit (g) becomes part of the unrounded
result when the MSB = 0
g, r, and s suffice for rounding addition as well
MIPS floating point registers
31
floating point registers
f0
f1
0
.
.
.
31
control/status register
FCR31
0
implementation/revision register
31
0
FCR0
f30
f31

32 32-bit FPRs
 16
64-bit registers (32-bit register pairs) for dp floating point
 Software conventions for their usage (as with GPRs)

Control/status register
 Status

of compare operations, sets rounding mode, exceptions
Implementation/revision register
 Identifies
type of CPU and its revision number
MIPS floating point instruction overview

Operate on single and double precision operands

Computation
 Add,
sub, multiply, divide, sqrt, absolute value, negate
 Multiply-add, multiply-subtract


Added as part of MIPS-IV revision of ISA specification
Load and store
 Integer
register read for EA calculation
 Data to be loaded or stored in fp register file

Move between registers

Convert between different formats

Comparison instructions

Branch instructions
MIPS R10000 arithmetic units
EA calc
P
C
instruction
memory
integer
register
file
integer
ALU
integer
ALU +
multiplier
flt pt
adder
flt pt
register
file
flt pt
multiplier
flt pt
divider
flt pt
sq root
data
memory
MIPS R10000 arithmetic units

Integer ALU + shifter
 All

instructions take one cycle
Integer ALU + multiplier
 Booth’s
algorithm for multiplication (5-10 cycles)
 Non-restoring division (34-67 cycles)

Floating point adder
 Carry

propagate (2 cycles)
Floating point multiplier (3 cycles)
 Booth’s
algorithm

Floating point divider (12-19 cycles)

Floating point square root unit

Separate unit for EA calculations

Can start up to 5 instructions in 1 cycle
Questions?

Chapter 4

Transcript Chapter 4

Directory