Floating point

Download Report

Transcript Floating point

Floating Point
Number system corresponding to the decimal notation
1,837 * 10 4
significand
exponent
A great number of corresponding binary standards exists.
There is one common standard:
IEEE 754-1985 (IEC 559)
Computer Engineering FloatingPoint page 1
IEEE 754-1985

Number representations:
– Single precision (32 bits)
sign:
exponent:
fraction:
1 bit
8 bits
23 bits
– Double precision (64 bits)
sign:
exponent:
fraction:
1 bit
11 bits
52 bits
Computer Engineering FloatingPoint page 2
Single Precision Format
Sign S
1
S
8
E
Exponent E:
excess 127
binary integer
23
F
Mantissa M (24 bit):
normalized binary
significand w/ hidden
integer bit: 1.F
Excess 127; actual exponent is
e = E - 127
N = (-1)S * (1.F [bit-string])*2e
Computer Engineering FloatingPoint page 3
Example 1
S
1
E
01111110
F
10000000000000000000000
e = E - 127
e = 126 - 127 = -1
N = (-1)1 * (1.1 [bit-string]) *2-1
N = -1 * 0.11 [bit-string]
N = -1 * (2-1 *1 + 2 -2 *1)
N = -1 * (0.5*1 + 0.25*1) = -0.75
Computer Engineering FloatingPoint page 4
Single Precision Range
Magnitude of numbers that can be
represented is in the range:
2-126 *(1.0)
to
2128 *(2-2-23)
which is approximately:
1.2*10-38
to
3.4 *1038
Computer Engineering FloatingPoint page 5
IEEE 754-1985

Single Precision (32 bits)

Double Precision (64 bits)

Fraction part: 23 bits;

Fraction part: 52 bits;
0 x < 1

Significand:
1 + fraction part.
“1” is not stored; “hidden bit”.
Corresponds to 7 decimal digits.

0 x < 1

Significand:
1 + fraction part.
“1” is not stored; “hidden bit”.
Corresponds to 16 decimal digits.
Exponent:
127 added to the exponent.
Corresponds to the range 10 -39 to 10 39

Exponent:
1023 added to the exponent;
Corresponds to the range 10 -308 to 10 308
Computer Engineering FloatingPoint page 6
IEEE 754-1985

Special features:
– Correct rounding of “halfway” result (to even number).
– Includes special values:



NaN

-
Not a number
Infinity
- Infinity
– Uses denormal number to represent numbers less than 2 -E min
– Rounds to nearest by default; Three other rounding modes exist.
– Sophisticated exception handling.
Computer Engineering FloatingPoint page 7
Add / Sub
(s1 * 2e1) +/- (s2 * 2 e2 ) = (s1 +/- s2) * 2 e3 = s3 * 2 e3
– s = 1.s, the hidden bit is used during the operation.
1: Shift summands so they have the same exponent:
– e.g., if e2 < e1: shift s2 right and increment e2 until e1 = e2
2: Add/Sub significands using the sign bits for s1 and s2.
– set sign bit accordingly for the result.
3: Normalize result (sign bit kept separate):
– shift s3 left and decrement e3 until MSB = 1.
4: Round s3 correctly.
– more than 23 / 52 bits is used internally for the addition.
Computer Engineering FloatingPoint page 8
Multiplication
(s1 * 2e1) * (s2 * 2 e2 ) = s1 * s2 * 2 e1+e2
so, multiply significands and add exponents.
Problem:
Significand coded in sign & magnitude;
use unsigned multiplication and take care of sign.
Round 2n bits significand to n bits significand.
Normalize result, compute new exponent with respect to bias.
Computer Engineering FloatingPoint page 9
Division
(s1 * 2e1 ) / (s2 * 2 e2 ) = (s1 / s2) * 2 e1-e2
so, divide significands and subtract exponents
Problem:
Significand coded in signed- magnitude - use unsigned division (different
algoritms exists) and take care of sign
Round n + 2 (guard and round) bits significand to n bits significand
Compute new exponent with respect to bias
Computer Engineering FloatingPoint page 10