Floating point

Download Report

Transcript Floating point

Floating Point
Number system corresponding to the decimal notation
1,837 * 10
significand
4
exponent
A great number of corresponding binary standards exists.
There is one common standard:
IEEE 754-1985 (IEC 559)
Computer Engineering FloatingPoint page 1
IEEE 754-1985

Number representations:
– Single precision (32 bits)
sign:
exponent:8 bits
fraction: 23 bits
1 bit
– Double precision (64 bits)
sign:
exponent:11 bits
fraction: 52 bits
1 bit
Computer Engineering FloatingPoint page 2
Single Precision Format
Sign S
1
S
8
E
Exponent E:
excess 127
binary integer
23
M
Mantissa M:
normalized binary
significand w/ hidden
integer bit: 1.M
Excess 127; actual exponent is
e = E - 127
N = (-1)S * (1.M [bit-string])*2e
Computer Engineering FloatingPoint page 3
Example 1
S
1
E
01111110
M
10000000000000000000000
e = E - 127
e = 126 - 127 = -1
N = (-1)1 * (1.1 [bit-string]) *2-1
N = -1 * 0.11 [bit-string]
N = -1 * (2-1 *1 + 2 -2 *1)
N = -1 * (0.5*1 + 0.25*1) = -0.75
Computer Engineering FloatingPoint page 4
Single Precision Range
Magnitude of numbers that can be
represented is in the range:
2-126 *(1.0)
to
2-127 *(2-223)
which is approximately:
1.8*10-38
to
3.4 *1038
Computer Engineering FloatingPoint page 5
IEEE 754-1985

Fraction part:
23 / 52 bits;
0 x < 1

Significand:
1 + fraction part.
“1” is not stored; “hidden bit”.
corresponds to 7 resp. 16 decimal digits.

Exponent:
127 / 1023 added to the exponent; “biased exponent”.
corresponds to 10 -39 to 10 39 / 10 -308 to 10 308
Computer Engineering FloatingPoint page 6
IEEE 754-1985

Special features:
– Correct rounding of “halfway” result (to even number).
– Includes special values:



NaN

-
Not a number
Infinity
- Infinity
– Uses denormal number to represent numbers less than 2 -E min
– Rounds to nearest by default; Three other rounding modes exist.
– Sophisticated exception handling.
Computer Engineering FloatingPoint page 7
Add / Sub
(s1 * 2e1) +/- (s2 * 2 e2 ) = (s1 +/- s2) * 2 e3 = s3 * 2 e3
– s = 1.s, the hidden bit is used during the operation.
1: Shift summands so they have the same exponent:
– e.g., if e2 < e1: shift s2 right and increment e2 until e1 = e2
2: Add/Sub significands using the sign bits for s1 and s2.
– set sign bit accordingly for the result.
3: Normalize result (sign bit kept separate):
– shift s3 left and decrement e3 until MSB = 1.
4: Round s3 correctly.
– more than 23 / 52 bits is used internally for the addition.
Computer Engineering FloatingPoint page 8
Multiplication
(s1 * 2e1) * (s2 * 2 e2 ) = s1 * s2 * 2 e1+e2
so, multiply significands and add exponents.
Problem:
Significand coded in sign & magnitude;
use unsigned multiplication and take care of sign.
Round 2n bits significand to n bits significand.
Normalize result, compute new exponent with respect to bias.
Computer Engineering FloatingPoint page 9
Accurate Arithmetic
1. Multiply the two significands to get the 2n-bits product:
– But we have only 23/52 bit to store the result!
P
x0 x1 x2 x3 x4 x5
A
g r s s s s
guard round
bit
bit
Case 1: x0 = 0, shift needed:
P
x1 x2 x3 x4 x5 g
The s bits are
OR:ed together
(“sticky bit”) STICKY
A
r STICKY 0….0
Case 2: x0 = 1, increment exponent, set g = r; r = STICKY or r.
P
x0 x1 x2 x3 x4 x5
A
r (STICKY
or r) STICKY
Computer Engineering FloatingPoint page 10
Rounding
2: For both cases:
if r = 0, P is the correctly rounded product.
if r = 1 and STICKY = 1, then P + 1 is the correctly rounded product
if r = 1 and s = 0, (the “halfway case”), then
P is the correctly rounded product if x5 (or g) is 0
P+1 is the correctly rounded product if x5 (or g) is 1
Computer Engineering FloatingPoint page 11
Division
e1
(s1 * 2
e1
e1-e2
) / (s2 * 2 ) = (s1 / s2) * 2
so, divide significands and subtract exponents
Problem:
Significand coded in signed- magnitude - use unsigned
division (different algoritms exists) and take care of sign
Round n + 2 (guard and round) bits significand to n bits
significand
Compute new exponent with respect to bias
Computer Engineering FloatingPoint page 12