Transcript document
Floating Point Numbers
• Floating point is used to represent “real” numbers
• 1.23233, 0.0003002, 3323443898.3325358903
• Real means “not imaginary”
• Computer floating-point numbers are a subset of real
numbers
• Limit on the largest/smallest number represented
• Depends on number of bits used
• Limit on the precision
• 12345678901234567890 --> 12345678900000000000
• Floating Point numbers are approximate, while
integers are exact representation
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 2
Scientific Notation
+ 34.383 x 102 = 3438.3
Sign
Significand
Exponent
+ 3.4383 x 103 = 3438.3
Normalized form: Only one
digit before the decimal point
+3.4383000E+03 = 3438.3
Floating point notation
8 digit significand can only represent 8 significant digits
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 3
Binary Floating Point Numbers
+ 101.1101
= 1 x 22 + 0 x 21 + 1 x 20 + 1 x 2-1 + 1 x 2-2 + 0 x 2-3 + 1 x 2-4
= 4 + 0
+
1 + 1/2 + 1/4 + 0
+ 1/16
= 5.8125
+1.011101 E+2
Normalized so that the binary point
immediately follows the leading digit
Note: First digit is always non-zero
--> First digit is always one.
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 4
Converting Decimal Fractions to Binary
Multiply by a power of 2, convert to binary, divide by the same power of 2
Example: 13.387
220
13.387 x 1048576 = 14037286.912
1. Multiply by
2. If fraction remains, multiply by a larger number or truncate it
3. Convert integer portion to binary
1403728610 = 1101011000110001001001102
4. Divide by 220 (shift radix point left 20)
1101.011000110001001001102
20 bits
This works with any power of 2! Use larger powers to get more bits.
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 5
IEEE Floating Point Format
31 30
23 22
0
8 bits
Sign
0: Positive
1: Negative
23 bits
Exponent
Significand
Biased by 127.
Leading ‘1’ is implied, but not
represented
Number = -1S * (1 + Sig) x 2E-127
• Allows representation of numbers in range 2-127 to 2+128 (10±38)
• Since the significand always starts with ‘1’, we don’t have to
represent it explicitly
• Significand is effectively 24 bits
• Zero is represented by Sign=Significand=Exp=0
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 6
IEEE Double Precision Format
Sign
63 62
52 51
11 bits
32
20 bits
Exponent
31
Bias:1023
Significand
0
32 bits
Number = -1S * (1 + Sig) x 2E-1023
• Allows representation of numbers in range 2-1023 to 2+1024(10± 308)
• Larger significand means more precision
• Takes two registers to hold one number
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 7
Conversion
Convert 5.75 to Single-Precision IEEE Floating Point
1. Convert 5.7510 to Binary ---> 101.112
2. Normalize
---> 1.0111 x 22
Significand
Exponent
3. Sign = 0 (positive).
4. Add 127 (bias) to exponent. Exponent = 12910 = 100000012
5. Express significand as 24 bits
Sig = 1.01110000000000000000000
6. Remove leading one from significand, leaving 23 bits
Sig = .01110000000000000000000
7. Put in proper bit fields
Number = 0 10000001 01110000000000000000000 = 0x40B80000
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 8
Adding Floating Point Numbers
1.2232E+3 + 4.211E+5
1. Normalize to higher exponent
a. Find the difference between exponents (= 2)
b. Shift smaller number right by that amount
1.2232E+3 == 0.012232E+5
2. Now that exponents are the same, add significands together
4.211
E+5
+
0.012232 E+5
4.223232 E+5
5.0 E+2
Note: If carry out of MSD, re-normalize + 7.0 E+2
12.0 E+2 = 1.2 E+3
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 9
Adding IEEE Floating Point Numbers
SE
Sig.
0x45B8CD8D --> 0 8B 38CD8D = 5913.69410
+ 0x46FC8672 --> 0 8D 7C8672 = 32323.2210
1. Check for Sign=Exp=Significand=0 --> If so, treat as a special case
2. Put the ‘1’ back in bit 23 of significands
38CD8D = 011 1000 1100 1101 1000 1101
---> 1011 1000 1100 1101 1000 1101 = B8CD8D
7C8672 =
111 1100 1000 0110 0111 0010
---> 1111 1100 1000 0110 0111 0010 = FC8672
0 8B B8CD8D
+ 0 8D FC8672
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 10
Adding IEEE Floating Point Numbers
0 8B B8CD8D
+ 0 8D FC8672
3. Normalize to higher exponent:
a. Find difference in exponents: 8D - 8B = 2
b. Shift significand of number with smaller exponent right by the difference
B8CD8D =
1011 1000 1100 1101 1000 1101
right shift by 2 --> 0010 1110 0011 0011 0110 0011 = 2E3363
c. Set lower-valued exponent to higher one
0 8D 2E3363 (re-normalized form of 0 8B B8CD8D)
+ 0 8D FC8672
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 11
Adding IEEE Floating Point Numbers
0 8D 2E3363
+ 0 8D FC8672
4. Add significands: (note: carry produced one too many bits)
0010 1110 0011 0011 0110 0011
+ 1111 1100 1000 0110 0111 0010
1 0010 1010 1011 1001 1101 0101 = 12AB9D5
5. Since bit 24 is ‘1’, we must re-normalize by shifting significand right
1 and incrementing exponent by one.
1 0010 1010 1011 1001 1101 0101
SRL --> 1001 0101 0101 1100 1110 1010 = 955CEA (significand)
exp: 8D --> 8E
Result is:
6. Get rid of bit 23 in significand (for IEEE standard) 0 8E 155CEA
or 0x47155CEA
1001 0101 0101 1100 1110 1010
= 38236.9110
--> 001 0101 0101 1100 1110 1010 = 155CEA
Bit 24
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 12
Multiplying Floating Point Numbers
34.233 E +09 * 212.32 E +03
1. Add exponents: --> 9 + 3 = 12
2. Multiply significands --> 34.233 * 212.32 = 7268.35056
3. Result is 7268.35056 E +12
4. Normalize: 7.26835056 E +15
Note: Number of digits to right of decimal point
in product = sum of the number of bits to right
of decimal points in factors
5. Truncate extra bits... --> 7.26835 E +15
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 13
Multiplying IEEE Floating Point Numbers
0 8B 38CD8D = 5913.69410
x 0 8D 7C8672 = 32323.2210
1. Check for zero.
2. Add exponents. (Note: both have the bias of 127 already. Only
want to bias once, so subtract 127 (7F) .)
8B = 0C+7F. 8D = 0E +7F.
Sum: (0C+7F)+(0E+7F)-7F = (1A+7F)=99
3. Put ‘1’ back onto bit 23, multiply significands.
38CD8D --> B8CD8D
Multiplying two 24-bit numbers, each with
7C8672 --> FC8672
23 bits to the right of the binary point –
result has 46 bits to the right of the point
B8CD8D * FC8672 =
10.11 0110 0100 1011 0110 0100 1010 1111 0101 0110 1100 1010
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 14
Multiplying IEEE Floating Point Numbers
0 8B 38CD8D = 5913.69410
x 0 8D 7C8672 = 32323.2210
10.11 0110 0100 1011 0110 0100 1010 1111 0101 0110 1100 1010
5. Re-normalize so one place to left of binary point.
1.011 0110 0100 1011 0110 0100 1010 1111 0101 0110 1100 1010
(Add one to exponent) --> 99 + 1 = 9A
6. Remove extra bits so only 24 bits remain (truncate)
1.011 0110 0100 1011 0110 0100
7. Remove implied one (bit 23)
011 0110 0100 1011 0110 0100
Result is: 0 9A 364B64 = 191149632.174710
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 15