Transcript document

Floating Point Numbers
• Floating point is used to represent “real” numbers
• 1.23233, 0.0003002, 3323443898.3325358903
• Real means “not imaginary”
• Computer floating-point numbers are a subset of real
numbers
• Limit on the largest/smallest number represented
• Depends on number of bits used
• Limit on the precision
• 12345678901234567890 --> 12345678900000000000
• Floating Point numbers are approximate, while
integers are exact representation
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 2
Scientific Notation
+ 34.383 x 102 = 3438.3
Sign
Significand
Exponent
+ 3.4383 x 103 = 3438.3
Normalized form: Only one
digit before the decimal point
+3.4383000E+03 = 3438.3
Floating point notation
8 digit significand can only represent 8 significant digits
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 3
Binary Floating Point Numbers
+ 101.1101
= 1 x 22 + 0 x 21 + 1 x 20 + 1 x 2-1 + 1 x 2-2 + 0 x 2-3 + 1 x 2-4
= 4 + 0
+
1 + 1/2 + 1/4 + 0
+ 1/16
= 5.8125
+1.011101 E+2
Normalized so that the binary point
immediately follows the leading digit
Note: First digit is always non-zero
--> First digit is always one.
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 4
Converting Decimal Fractions to Binary
Multiply by a power of 2, convert to binary, divide by the same power of 2
Example: 13.387
220
13.387 x 1048576 = 14037286.912
1. Multiply by
2. If fraction remains, multiply by a larger number or truncate it
3. Convert integer portion to binary
1403728610 = 1101011000110001001001102
4. Divide by 220 (shift radix point left 20)
1101.011000110001001001102
20 bits
This works with any power of 2! Use larger powers to get more bits.
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 5
IEEE Floating Point Format
31 30
23 22
0
8 bits
Sign
0: Positive
1: Negative
23 bits
Exponent
Significand
Biased by 127.
Leading ‘1’ is implied, but not
represented
Number = -1S * (1 + Sig) x 2E-127
• Allows representation of numbers in range 2-127 to 2+128 (10±38)
• Since the significand always starts with ‘1’, we don’t have to
represent it explicitly
• Significand is effectively 24 bits
• Zero is represented by Sign=Significand=Exp=0
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 6
IEEE Double Precision Format
Sign
63 62
52 51
11 bits
32
20 bits
Exponent
31
Bias:1023
Significand
0
32 bits
Number = -1S * (1 + Sig) x 2E-1023
• Allows representation of numbers in range 2-1023 to 2+1024(10± 308)
• Larger significand means more precision
• Takes two registers to hold one number
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 7
Conversion
Convert 5.75 to Single-Precision IEEE Floating Point
1. Convert 5.7510 to Binary ---> 101.112
2. Normalize
---> 1.0111 x 22
Significand
Exponent
3. Sign = 0 (positive).
4. Add 127 (bias) to exponent. Exponent = 12910 = 100000012
5. Express significand as 24 bits
Sig = 1.01110000000000000000000
6. Remove leading one from significand, leaving 23 bits
Sig = .01110000000000000000000
7. Put in proper bit fields
Number = 0 10000001 01110000000000000000000 = 0x40B80000
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 8
Adding Floating Point Numbers
1.2232E+3 + 4.211E+5
1. Normalize to higher exponent
a. Find the difference between exponents (= 2)
b. Shift smaller number right by that amount
1.2232E+3 == 0.012232E+5
2. Now that exponents are the same, add significands together
4.211
E+5
+
0.012232 E+5
4.223232 E+5
5.0 E+2
Note: If carry out of MSD, re-normalize + 7.0 E+2
12.0 E+2 = 1.2 E+3
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 9
Adding IEEE Floating Point Numbers
SE
Sig.
0x45B8CD8D --> 0 8B 38CD8D = 5913.69410
+ 0x46FC8672 --> 0 8D 7C8672 = 32323.2210
1. Check for Sign=Exp=Significand=0 --> If so, treat as a special case
2. Put the ‘1’ back in bit 23 of significands
38CD8D = 011 1000 1100 1101 1000 1101
---> 1011 1000 1100 1101 1000 1101 = B8CD8D
7C8672 =
111 1100 1000 0110 0111 0010
---> 1111 1100 1000 0110 0111 0010 = FC8672
0 8B B8CD8D
+ 0 8D FC8672
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 10
Adding IEEE Floating Point Numbers
0 8B B8CD8D
+ 0 8D FC8672
3. Normalize to higher exponent:
a. Find difference in exponents: 8D - 8B = 2
b. Shift significand of number with smaller exponent right by the difference
B8CD8D =
1011 1000 1100 1101 1000 1101
right shift by 2 --> 0010 1110 0011 0011 0110 0011 = 2E3363
c. Set lower-valued exponent to higher one
0 8D 2E3363 (re-normalized form of 0 8B B8CD8D)
+ 0 8D FC8672
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 11
Adding IEEE Floating Point Numbers
0 8D 2E3363
+ 0 8D FC8672
4. Add significands: (note: carry produced one too many bits)
0010 1110 0011 0011 0110 0011
+ 1111 1100 1000 0110 0111 0010
1 0010 1010 1011 1001 1101 0101 = 12AB9D5
5. Since bit 24 is ‘1’, we must re-normalize by shifting significand right
1 and incrementing exponent by one.
1 0010 1010 1011 1001 1101 0101
SRL --> 1001 0101 0101 1100 1110 1010 = 955CEA (significand)
exp: 8D --> 8E
Result is:
6. Get rid of bit 23 in significand (for IEEE standard) 0 8E 155CEA
or 0x47155CEA
1001 0101 0101 1100 1110 1010
= 38236.9110
--> 001 0101 0101 1100 1110 1010 = 155CEA
Bit 24
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 12
Multiplying Floating Point Numbers
34.233 E +09 * 212.32 E +03
1. Add exponents: --> 9 + 3 = 12
2. Multiply significands --> 34.233 * 212.32 = 7268.35056
3. Result is 7268.35056 E +12
4. Normalize: 7.26835056 E +15
Note: Number of digits to right of decimal point
in product = sum of the number of bits to right
of decimal points in factors
5. Truncate extra bits... --> 7.26835 E +15
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 13
Multiplying IEEE Floating Point Numbers
0 8B 38CD8D = 5913.69410
x 0 8D 7C8672 = 32323.2210
1. Check for zero.
2. Add exponents. (Note: both have the bias of 127 already. Only
want to bias once, so subtract 127 (7F) .)
8B = 0C+7F. 8D = 0E +7F.
Sum: (0C+7F)+(0E+7F)-7F = (1A+7F)=99
3. Put ‘1’ back onto bit 23, multiply significands.
38CD8D --> B8CD8D
Multiplying two 24-bit numbers, each with
7C8672 --> FC8672
23 bits to the right of the binary point –
result has 46 bits to the right of the point
B8CD8D * FC8672 =
10.11 0110 0100 1011 0110 0100 1010 1111 0101 0110 1100 1010
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 14
Multiplying IEEE Floating Point Numbers
0 8B 38CD8D = 5913.69410
x 0 8D 7C8672 = 32323.2210
10.11 0110 0100 1011 0110 0100 1010 1111 0101 0110 1100 1010
5. Re-normalize so one place to left of binary point.
1.011 0110 0100 1011 0110 0100 1010 1111 0101 0110 1100 1010
(Add one to exponent) --> 99 + 1 = 9A
6. Remove extra bits so only 24 bits remain (truncate)
1.011 0110 0100 1011 0110 0100
7. Remove implied one (bit 23)
011 0110 0100 1011 0110 0100
Result is: 0 9A 364B64 = 191149632.174710
Seattle Pacific University
EE/CS/CPE 3760 - Computer Organization
Ch3d- 15