07FloatingPointx
Download
Report
Transcript 07FloatingPointx
Updated: 06/03/2010
FLOATING POINT REPRESENTATIONS
1998 Morgan Kaufmann Publishers
DECIMAL FLOATING POINT NUMBERS
1.5
102.450
Decimal fractions
¾ -> 0.75
1/100 -> 0.01
1998 Morgan Kaufmann Publishers
SCIENTIFIC NOTATION
How can we more compactly represent these values?
1,000,000,000
-> 1.0 x 10^9
0.000025
-> 2.5 x 10^-5
What are the two parts of a scientific notation value called?
Mantissa and Exponent
Normalize mantissa so it has one digit left of the decimal point
1998 Morgan Kaufmann Publishers
FLOATING POINT NUMBERS
Floating point numbers such as 7.519, -0.01, and 4.3x108
are represented using the IEEE 754 standard format
Floating point is represented using a mantissa and
exponent
Example: 7.51x25
The mantissa is 7.51
The exponent is 5
---> Note: exponent is a power of 2
A set number of bits is assigned to represent the mantissa
and exponent sign bit exponent
mantissa
32 bit single precision
sign bit
1
1
8 bits
23 bits
exponent
mantissa
11 bits
52 bits
64 bit double precision
1998 Morgan Kaufmann Publishers
ROUNDING
Not every floating point value can be represented exactly in
binary using a finite number of bits
Question: What are some examples?
1/3 = 0.3333…
PI = 3.141….
In these cases, must round to the nearest number that can
be represented
If a number is halfway between two possible representable
values, then round to the one whose least-significant digit is
even
1998 Morgan Kaufmann Publishers
EXAMPLES OF ROUNDING
Round each of these numbers to two significant digits.
1.345 --> 1.3
than 1.4
Choose 1.3 since 1.345 is nearer to 1.3
78.953 --> 79
Choose 79 since it’s nearer than 78
12.5 --> 12
even
13.5 --> 14
even
12.5 is halfway between 12 and 13
Choose 12 since its least significant digit is
13.5 is halfway between 13 and 14
Choose 14 since its least significant digit is
1998 Morgan Kaufmann Publishers
FRACTIONAL BINARY NUMBERS
Fractional binary numbers use the familiar decimal place-value
representation, but with a base of 2 instead of 10
Example:
11.101b = 1x21 + 1x20 + 1x2-1 + 0x2-2 + 1x2-3
=2
+1
=3
+ 1/2
+ 0/4
+ 1/8
+ 0.5
+ 0.0
+ 0.125
= 3.625
1998 Morgan Kaufmann Publishers
EXERCISES
Convert this binary fraction into decimal
101.011
Answer: 5 + 0*1/2 + 1*1/4 + 1*1/8 = 5.375
Express the decimal value 6.5 as a binary fraction
Answer: 110.1
Express the decimal value 11.75 as a binary fraction
Answer: 1011.11
1998 Morgan Kaufmann Publishers
NORMALIZED MANTISSA FOR SCIENTIFIC NOTATION
Scientific notation numbers express the mantissa with one
digit to the left of the decimal point.
Given your original number
Shift the decimal point left or right until one non-zero digit is to
the left of the decimal point
For each shift left increase the power of ten exponent by 1
For each shift right decrease power of ten exponent by 1
Examples:
102.5 x 104 = 1.025 x 106
7589 x 105 = 7.589 x 108
0.0045 x 100 = 4.5 x 10-3
1998 Morgan Kaufmann Publishers
NORMALIZED BINARY MANTISSA
Binary Fraction
Normalized IEEE 754 Mantissa
11.110
1.1110 (shift binary point left 1 place)
1.01
1.01 (no shift)
101.01
1.0101 (shift binary point left 2)
0.001011
1.011 (shift binary point right 3)
Question: What do all of the normalized binary mantissas
have in common?
1998 Morgan Kaufmann Publishers
FRACTIONAL REPRESENTATION OF MANTISSA
What do all of the normalized binary mantissas have
in common?
The one bit to the left of the binary point is always
a1
So if we use 23 bits for the single-precision
mantissa, we can “save” a bit by not storing this
leading 1
So simply discard the lead 1 after normalizing the
binary mantissa
1998 Morgan Kaufmann Publishers
MANTISSA EXAMPLE
What is the binary representation of the mantissa in IEEE 754 for
6.25?
Solution:
6.25 = 1x22 + 1x21 + 0x20 + 0x2-1 + 1x2-2
= 110.01b
Shift the binary point as far to the left as possible until the bit
to the left of the binary point is 1
110.01b --> 1.1001b (Shift left by 2 places)
This shift gives us the assumed 1 bit in the integer part of the
mantissa fractional representation...effectively gains one
additional bit of representation
Mantissa encodes only bits to the right of the binary point
1001b
1998 Morgan Kaufmann Publishers
MANTISSA EXAMPLE CONTINUED...
What is the binary representation of the mantissa in IEEE 754 for
6.25?
Solution...:
Keeping only the bits to the right of the binary point...
1001b
Sign extend the the 4-bits into 23 bits for single
precision
Append the extra bits to the right for a binary fraction
1001 0000 0000 0000 0000 000
Our imaginary binary point
1998 Morgan Kaufmann Publishers
UPDATING THE
EXPONENT
6.25 = 110.01b has an implied exponent of 20
Following the IEEE 754 convention of shifting the binary point
to the left, in this case by 2 positions has the effect of updating
the exponent
1.1001b (Following shift left of binary point by 2 positions)
For each left shift binary point = Add 1 to binary exponent
6.25 = 1.1001b x 22
1998 Morgan Kaufmann Publishers
UPDATING THE BINARY EXPONENT
Binary Fraction
Normalized Binary Exponents
1.01 x 20
1.01 x 20
11.110 x 20
1.1110 x 21
1101.01 x 20
1.10101 x 23
0.001011 x 20
1.011 x 2-3
0.0000101 x 20
1.01 x 2-5
1998 Morgan Kaufmann Publishers
REPRESENTING THE EXPONENT IN IEEE 754
The exponent is represented as a biased integer
For single precision add 127 to the value of the normalized
base ten integer exponent
For double precision add 1023 to the value of the normalized
base ten integer exponent
1998 Morgan Kaufmann Publishers
REPRESENTING THE EXPONENT IN IEEE 754
The exponent is represented as a biased integer
For single precision add 127 to the value of the exponent
For double precision add 1023 to the value of the exponent
Example:
How would the values -45 and 123 be represented in the 8-bit
biased format for single precision?
Answer:
-45 + 127 = 82 = 01010010b
123 + 127 = 250 = 11111010b
1998 Morgan Kaufmann Publishers
ENCODING THE BIASED BINARY EXPONENT
Binary Fraction
Normalized Exponents Biased Exponent
1.01 x 20
1.01 x 20
0 + 127 = 127
11.110 x 20
1.1110 x 21
1 + 127 = 128
1101.01 x 20
1.10101 x 23
3 + 127 = 130
0.001011 x 20
1.011 x 2-3
-3 + 127 = 124
0.0000101 x 20
1.01 x 2-5
-5 + 127 = 122
Encode each biased exponent as an unsigned 8-bit number.
Encode each biased exponent in 8-bit two’s complement.
Suppose you had to rapidly sort by exponents, which format
would be more efficient?
1998 Morgan Kaufmann Publishers
FLOATING POINT EXAMPLE #1
Recall that 6.25 = 1.1001b x 22
Encode 6.25 as a 32-bit single precision binary number
Sign bit = 0
Mantissa = 1.1001 (encoding omits assumed lead 1)
Exponent = 2 + 127 = 129 = 10000001
Encode using 32-bit single precision binary format
0 10000001 10010000000000000000000
Sign bit
Exponent
Mantissa
1998 Morgan Kaufmann Publishers
FLOATING POINT EXAMPLE #2
What is the value of the single-precision floating-point number
represented by the following 32-bit binary encoding?
0 10000000 110 0000 0000 0000 0000 0000
Sign bit = 0
Encoded Exponent = 10000000 = 128
Encoded Mantissa = 110 0000 0000 0000 0000 0000
Subtract the added bias of 127 to reveal an exponent = 1
Mantissa = . 110 0000 0000 0000 0000 0000
Mantissa = 1.11
(Replace the assumed 1 before the binary
point)
Mantissa = 1.11 = 1x20 + 1x2-1 + 1x2-2 = 1.75
Value = 1.75 x 21 = 3.5
1998 Morgan Kaufmann Publishers
FLOATING POINT EXAMPLE #3
-6.25 = -1.1001b x 22
Encode -6.25 as a 32-bit single precision binary number
Sign bit = 1 (Use signed magnitude for mantissa)
Mantissa = 1.1001 (encoding omits assumed lead 1)
Exponent = 2 + 127 = 129 = 10000001
Encode using 32-bit single precision binary format
1 10000001 10010000000000000000000
Sign bit
Exponent
Mantissa
1998 Morgan Kaufmann Publishers
EXERCISE
Exercise 2.18 (a) on page 42 of Computer
Architecture by N. Carter
What value is represented by this IEEE single
precision value?
1 01111010 100 0000 0000 0000 0000
0000
1998 Morgan Kaufmann Publishers
EXERCISE: SOLUTION
What value is represented by this IEEE single precision value?
1 01111010 100 0000 0000 0000 0000 0000
Sign bit = 1
Encoded Exponent = 01111010 = 122
Encoded Mantissa = 100 0000 0000 0000 0000 0000
Subtract added bias of 127 from encoded exponent
Actual exponent is -5
Mantissa = . 100 0000 0000 0000 0000 0000 = .1
Mantissa = 1.1
(Add back the assumed 1 before the binary
point)
Mantissa = - 1 x 20 + 1x2-1 = -1.5
Value = -1.5 x 2-5 = -1.5 x (1/32) = -0.046875
1998 Morgan Kaufmann Publishers
IEEE 754 SINGLE PRECISION RANGE
Smallest positive normalized number
1.00000000000000000000000 x 2-126
Largest normalized number
1.11111111111111111111111 x 2127
1998 Morgan Kaufmann Publishers
REPRESENTING 1.0
1998 Morgan Kaufmann Publishers
REPRESENTING 0.0
The assumed 1 bit in the mantissa gains an
extra bit of precision
But zero cannot be represented exactly since a
mantissa of 0 is interpreted as 1.0
The IEEE 754 standard specifies that zero is
represented using an exponent of 0 with a
mantissa of 0.
1998 Morgan Kaufmann Publishers
NAN
NaN = Not a Number
Special value used to represent a value produced by an error
condition such as overflow, underflow, or divide by zero
NaN is represented by all 1’s in the exponent field and a
non-zero mantissa field
Any math operation using NaN results in NaN
Example: NaN + 4.5 = NaN
1998 Morgan Kaufmann Publishers
INFINITY
IEEE 754 represents infinity using all 1’s in the
exponent and a fraction field of 0.
The sign bit designates positive or negative
infinity
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION (DECIMAL EXAMPLE)
Example: 9.999 x 101 + 1.610x10-1
Step 1: Shift decimal point of smaller number to the left until its
updated exponent matches the exponent of the larger number
1.610x10-1 0.01610x101
Step 2: Add the mantissas (Assume only 4 significant digits)
9.999 x 101
+0.016 x 101
10.015 x 101
Step 3: Re-normalize to get one non-zero digit left of decimal point
10.015 x 101 1.0015 x 102
Step 4: Round the mantissa to 4 significant digits
1.0015 x 102 1.002 x 102
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXAMPLE
Use single-precision floating point to compute 0.25 + 1.5
0.25 (base 10) = (1/4) = 0.01 = 1.0 x 2-2
1.5 (base 10) = 1 + (1/2) = 1.1 x 20
Shift binary point of smaller number to the left so
exponents match
1.0 x 2-2 0.01 x 20
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXAMPLE (CONTINUED)
Use single-precision floating point to compute 0.25 + 1.5
Next, add the mantissas, both with exponent of 0
0.01 x 20
+1.10 x 20
1.11 x 20
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXAMPLE (CONTINUED)
Use single-precision floating point to compute 0.25 + 1.5
0.01 x 20
+1.10 x 20
1.11 x 20
Encode result using 32-bit single precision
Sign bit = 0
Mantissa = 11000000000000000000000 (23 bits)
Exponent = 0 + 127 = 127 = 01111111
The 32-bit single precision encoding is...
0 01111111 11000000000000000000000
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXERCISE: SOLUTION
2.20 (b) Use single precision to compute 147.5 + 0.25
147.5 (base 10) = 128 + 16 + 2 + 1 + (1/2) =
= 10010011.1
Convert to normalized mantissa format
10010011.1 x 20 1.00100111 x 27
Shifted binary point 7 places to the left
See Computer Architecture by N. Carter, page 43
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXERCISE: SOLUTION
2.20 (b) Use single precision to compute 147.5 + 0.25
0.25 (base 10) = (1/4) = 0.01
Convert to normalized mantissa format
0.01 x 20 1.0 x 2-2
Shift binary point 2 places to the right
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXERCISE: SOLUTION
2.20 (b) Use single precision to compute 147.5 + 0.25
1.00100111 x 27
+ 1.0 x 2-2
Shift binary point of smaller number to left to match exponent
(7) of the larger number
1.0 x 2-2 0.000000001 x 27
Shift binary point 9 places to the left to go from exponent of -2 to
7
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXERCISE: SOLUTION
2.20 (b) Use single precision to compute 147.5 + 0.25
Add the mantissas, both expressed with exponent 7
1.001001110 x 27
+ 0.000000001 x 27
1.001001111 x 27
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXERCISE: SOLUTION
2.20 (b) Use single precision to compute 147.5 + 0.25
Encode the result 1.001001111 x 27 in single precision
Sign bit = 0 since result is positive
Mantissa = 00100111100000000000000 (23 bits)
Exponent = 7 + 127 = 134 = 10000110
The 32-bit single precision encoding is...
0 10000110 00100111100000000000000
1998 Morgan Kaufmann Publishers
ADDITION WITH NEGATIVE VALUES
If a value is negative, you must first convert the negative value into
two’s complement
Example:
-0.111
Convert to two’s complement by...
1.000 inverting all bits
+ 0.001 adding 1
1.001
Use the two’s complement version of the value when adding the
mantissas. Discard the carry overflow bit.
1998 Morgan Kaufmann Publishers
ADDITION WITH NEGATIVE VALUE(S)
1.000 x 20
-1.000 x 2-1
(1.0 in base ten)
(-0.5 in base ten)
Move binary point of the smaller number so exponents match
1.000 x 20
(1.0 in base ten)
-0.100 x 20
(-0.5 in base ten)
Convert mantissa of -0.5 into two’s complement then add
1.000 x 20
(1.0 in base ten)
+1.100 x 2-1 (-0.5 in base ten)
10.100 x 2-1 Two’s complement addition discards carry overflow bit
The sum is 0.100 x 2-1
Normalize the exponent to get sum of 1.000 x 2-1 (0.5 base ten)
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION (PAGE 282 OF H&P)
Example: Compute 0.5 + -0.4375 (base 10) using binary
arithmetic.
0.5 (base 10) = 0.1 x 20
Normalize to get 1 to left of the binary point
0.5 = 0.1 x 20 = 1.0x2-1
-0.4375 = -0.0111 = - ((1/4) + (1/8) + (1/16))
Normalize to get 1 to the left of the binary point
-0.0111 1.11 x 2-2
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION (PAGE 282 OF H&P)
Compute
0.5 = 0.1 x 20 = 1.0x2-1
+
-0.4375 = 1.11 x 2-2
Step 1: Shift binary point of smaller number to the left until its
updated exponent matches the exponent of the larger number
1.11 x 2-2 0.111 x 2-1
Step 2: Add the mantissas * Convert negative value to two’s
1.000 (1.0 decimal)
-0.111 (-0.875 decimal)
0.001 (0.125 decimal)
complement then add
* Discard carry overflow bit
0.001 x 2-1
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION (PAGE 282 OF H&P)
Step 3: Normalize to get 1 to left of binary point
0.001 x 2-1 1.0 x 2-4
Exponent of -4 lies between 127 and -126 (range of single
precision exponents)...therefore no overflow or underflow
Express exponent in biased notation by adding 127
Encoded exponent = -4 + 127 = 123
Step 4: Round to 23 binary digits of mantissa
precision
1.0 x 2-4 (no rounding needed)
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION
Multiply the mantissas and add the exponents
Result = (mantissa1 x mantissa2) + 2(exp1 + exp2)
Example (in decimal)
5x103 x 2x106 = 10x109
If the mantissa is >= 10 then shift the mantissa down 1
place (divide by 10) and increment the result exponent
Example (in decimal)
10x109 = 1x1010
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION
Since the IEEE 754 uses biased integers to represent the exponent, the
bias must be considered when adding the exponents
Add the two biased integer exponents, then subtract the bias value from
the result
Example: Add biased +127 exponents of 150 and 45
Break down the exponents to see the bias values of 127
150 = (23 + 127)
45 = (-82 + 127)
Add the biased exponents: 150 + 45 = 195
Subtract the bias of 127: 150 + 45 – 127 = 68 result biased exponent
Check it: 68 – 127 = Actual exponent of -59 = 23 + -82
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION: EXAMPLE
Exercise 2.20 (a) Use IEEE single precision to compute 32 x
16.
32 (base 10) = 100000.0 x 20 (binary)
Convert to normalized binary mantissa format
100000.0 x 20 1.0 x 25 (Shift binary point 5 places to left)
Exponent = 5 + 127 = 132
16 (base 10) = 10000.0 x 20 (binary)
Convert to normalized binary mantissa format
10000.0 x 20 1.0 x 24 (Shift binary point 4 places to left)
Exponent = 4 + 127 = 131
See Computer Architecture by N. Carter, page 43
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION: EXAMPLE
Exercise 2.20 (a) Use IEEE single precision to compute 32 x 16.
1.0 x 25
Exponent = 5 + 127 = 132
1.0 x 24
Exponent = 4 + 127 = 131
Multiply mantissas: 1.0 x 1.0
1.0
x1.0
Count number of bits right of binary point of operands
00
Place binary point two places from left of product
+100
1.0 0
Add +127 biased exponents: 132 + 131 – 127 = 136
Actual unbiased exponent = 136 – 127 = 9
Product = 1.0 x 29 = 512
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION: EXAMPLE
Exercise 2.20 (a) Use IEEE single precision to compute 32 x 16.
1.0 x 25
Exponent = 5 + 127 = 132
1.0 x 24
Exponent = 4 + 127 = 131
Multiply mantissas: 1.0 x 1.0 = 1.0 (binary)
Add +127 biased exponents: 132 + 131 – 127 = 136
Actual unbiased exponent = 136 – 127 = 9
Product = 1.0 x 29 = 512
Sign Bit = 0
Mantissa = 1.00000000000000000000000
Exponent = 136 = 10001000
The encoded IEEE 754 single precision number is...
0 10001000 00000000000000000000000
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION EXERCISE:
SOLUTION
2.20 (c) Compute 0.125 x 8 using single-precision binary.
0.125 (base 10) = 0.001 x 20 = 1.0 x 2-3 (Normalized binary mantissa)
8 (base 10) = 1000.0 x 20 = 1.0 x 23 (Normalized binary mantissa)
See Computer Architecture by N. Carter, page 43
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION EXERCISE:
SOLUTION
2.20 (c) Compute 0.125 x 8 using single-precision binary.
1.0 x 2-3 Biased exponent = -3 + 127 = 124
1.0 x 23 Biased exponent = 3 + 127 = 130
Multiply mantissas: 1.0 x 1.0 = 1.0 (binary)
Add biased exponents: 124 + 130 – 127 = 127
Actual exponent: 127 – 127 = 0
Sign Bit = 0
Mantissa = 1.00000000000000000000000
Exponent = 127 = 01111111
0 01111111 00000000000000000000000 is the
encoded binary number
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION EXERCISE
Multiply 0.75 x 32 using IEEE 754 single-precision format
0.75 = 0.11 x 20
Normalized 1.1 x 2-1
Biased exponent = -1 + 127 = 126
32 = 100000.0 x 20
Normalized 1.0 x 25
Biased exponent = 5 + 127 = 132
Multiply the mantissas
1.1
x1.0
00
+11
1.1 0
To place the binary point...
Count number of bits to right of binary points
of the two operands 1.1 and 1.0
Total of 2 places so place binary point
two places from the left in the product
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION EXERCISE
Multiply 0.75 x 32 using IEEE 754 single-precision
format
Multiply the mantissas
1.1
x1.0
00
+11
1.1 0
Add the biased exponents:
126 + 132 – 127 = 131 (unbiased exponent is 4)
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION EXERCISE
Multiply 0.75 x 32 using IEEE 754 single-precision format
Product of the mantissas
1.10
Add the biased exponents:
126 + 132 – 127 = 131 (unbiased exponent is 4)
The product is already normalized
Encode the product using IEEE 32-bit format
Sign bit = 0
Exponent = 131 = 10000011
Mantissa = 10000000000000000000000
0 10000011 10000000000000000000000
1998 Morgan Kaufmann Publishers
ROUNDING OF FLOATING POINT NUMBERS
Accurate rounding requires the hardware to use a few extra bits to
hold intermediate results
Use these extra bits to decide how to round when the final result is
stored in the 32-bit single precision or 64-bit double precision format
The IEEE 754 standard uses up to three additional bits called the
guard, round, and sticky bits to assist in accurate rounding
See pages 297-298 of Computer Organization and Design
1998 Morgan Kaufmann Publishers
ROUNDING OF FLOATING POINT NUMBERS
Compute this base ten addition rounding all
intermediate values to three significant digits
2.56 x 100
+ 2.34 x 102
First shift the decimal point of the top number to align the
exponents
0.02 x 102 Rounding to three digits looses information
+ 2.34 x 102
2.36 x 102
1998 Morgan Kaufmann Publishers
ROUNDING OF FLOATING POINT NUMBERS
Compute this base ten addition using intermediate values that
keep an extra two digits
2.56 x 100
+ 2.34 x 102
First shift the decimal point of the top number to align the exponents
0.0256 x 102 Intermediate values use two extra bits
+ 2.3400 x 102
2.3656 x 102
Use extra two bits to round the result down to three significant digits
2.37 x 102
1998 Morgan Kaufmann Publishers
JAVA APPLETS FOR IEEE FLOATING POINT
A Java applet that converts decimal numbers to IEEE single or
double precision encodings can be found at...
http://babbage.cs.qc.edu/courses/cs341/IEEE-754.html
This applet may be used to make up your own sample problems to
convert between decimal and IEEE format and to check the result
of other calculations in IEEE floating point format
Interactive floating point addition demo
http://timacmp.imag.fr/~guyot/Cours/Oparithm/english/Flottan.htm
1998 Morgan Kaufmann Publishers