07FloatingPointx

Transcript 07FloatingPointx

Updated: 06/03/2010
FLOATING POINT REPRESENTATIONS
1998 Morgan Kaufmann Publishers
DECIMAL FLOATING POINT NUMBERS
1.5
 102.450


Decimal fractions
¾ -> 0.75

1/100 -> 0.01
1998 Morgan Kaufmann Publishers
SCIENTIFIC NOTATION


How can we more compactly represent these values?
1,000,000,000
-> 1.0 x 10^9

0.000025
-> 2.5 x 10^-5
What are the two parts of a scientific notation value called?
Mantissa and Exponent
Normalize mantissa so it has one digit left of the decimal point
1998 Morgan Kaufmann Publishers
FLOATING POINT NUMBERS



Floating point numbers such as 7.519, -0.01, and 4.3x108
are represented using the IEEE 754 standard format
Floating point is represented using a mantissa and
exponent
Example: 7.51x25



The mantissa is 7.51
The exponent is 5
---> Note: exponent is a power of 2
A set number of bits is assigned to represent the mantissa
and exponent sign bit exponent
mantissa
32 bit single precision
sign bit
1
1
8 bits
23 bits
exponent
mantissa
11 bits
52 bits
64 bit double precision
1998 Morgan Kaufmann Publishers
ROUNDING

Not every floating point value can be represented exactly in
binary using a finite number of bits
Question: What are some examples?
1/3 = 0.3333…
PI = 3.141….

In these cases, must round to the nearest number that can
be represented

If a number is halfway between two possible representable
values, then round to the one whose least-significant digit is
even
1998 Morgan Kaufmann Publishers
EXAMPLES OF ROUNDING

Round each of these numbers to two significant digits.

1.345 --> 1.3
than 1.4
Choose 1.3 since 1.345 is nearer to 1.3

78.953 --> 79
Choose 79 since it’s nearer than 78

12.5 --> 12
even

13.5 --> 14
even
12.5 is halfway between 12 and 13
Choose 12 since its least significant digit is
13.5 is halfway between 13 and 14
Choose 14 since its least significant digit is
1998 Morgan Kaufmann Publishers
FRACTIONAL BINARY NUMBERS

Fractional binary numbers use the familiar decimal place-value
representation, but with a base of 2 instead of 10

Example:
11.101b = 1x21 + 1x20 + 1x2-1 + 0x2-2 + 1x2-3
=2
+1
=3
+ 1/2
+ 0/4
+ 1/8
+ 0.5
+ 0.0
+ 0.125
= 3.625
1998 Morgan Kaufmann Publishers
EXERCISES

Convert this binary fraction into decimal
101.011
Answer: 5 + 0*1/2 + 1*1/4 + 1*1/8 = 5.375

Express the decimal value 6.5 as a binary fraction
Answer: 110.1

Express the decimal value 11.75 as a binary fraction
Answer: 1011.11
1998 Morgan Kaufmann Publishers
NORMALIZED MANTISSA FOR SCIENTIFIC NOTATION

Scientific notation numbers express the mantissa with one
digit to the left of the decimal point.





Given your original number
Shift the decimal point left or right until one non-zero digit is to
the left of the decimal point
For each shift left increase the power of ten exponent by 1
For each shift right decrease power of ten exponent by 1
Examples:
102.5 x 104 = 1.025 x 106
7589 x 105 = 7.589 x 108
0.0045 x 100 = 4.5 x 10-3
1998 Morgan Kaufmann Publishers
NORMALIZED BINARY MANTISSA
Binary Fraction
Normalized IEEE 754 Mantissa
11.110
1.1110 (shift binary point left 1 place)
1.01
1.01 (no shift)
101.01
1.0101 (shift binary point left 2)
0.001011
1.011 (shift binary point right 3)
Question: What do all of the normalized binary mantissas
have in common?
1998 Morgan Kaufmann Publishers
FRACTIONAL REPRESENTATION OF MANTISSA

What do all of the normalized binary mantissas have
in common?
 The one bit to the left of the binary point is always
a1

So if we use 23 bits for the single-precision
mantissa, we can “save” a bit by not storing this
leading 1

So simply discard the lead 1 after normalizing the
binary mantissa
1998 Morgan Kaufmann Publishers
MANTISSA EXAMPLE

What is the binary representation of the mantissa in IEEE 754 for
6.25?

Solution:
6.25 = 1x22 + 1x21 + 0x20 + 0x2-1 + 1x2-2
= 110.01b
Shift the binary point as far to the left as possible until the bit
to the left of the binary point is 1
110.01b --> 1.1001b (Shift left by 2 places)
This shift gives us the assumed 1 bit in the integer part of the
mantissa fractional representation...effectively gains one
additional bit of representation
Mantissa encodes only bits to the right of the binary point
1001b
1998 Morgan Kaufmann Publishers
MANTISSA EXAMPLE CONTINUED...

What is the binary representation of the mantissa in IEEE 754 for
6.25?

Solution...:
Keeping only the bits to the right of the binary point...
1001b
Sign extend the the 4-bits into 23 bits for single
precision
Append the extra bits to the right for a binary fraction
1001 0000 0000 0000 0000 000
Our imaginary binary point
1998 Morgan Kaufmann Publishers
UPDATING THE
EXPONENT

6.25 = 110.01b has an implied exponent of 20

Following the IEEE 754 convention of shifting the binary point
to the left, in this case by 2 positions has the effect of updating
the exponent

1.1001b (Following shift left of binary point by 2 positions)
For each left shift binary point = Add 1 to binary exponent

6.25 = 1.1001b x 22
1998 Morgan Kaufmann Publishers
UPDATING THE BINARY EXPONENT
Binary Fraction
Normalized Binary Exponents
1.01 x 20
1.01 x 20
11.110 x 20
1.1110 x 21
1101.01 x 20
1.10101 x 23
0.001011 x 20
1.011 x 2-3
0.0000101 x 20
1.01 x 2-5
1998 Morgan Kaufmann Publishers
REPRESENTING THE EXPONENT IN IEEE 754

The exponent is represented as a biased integer

For single precision add 127 to the value of the normalized
base ten integer exponent

For double precision add 1023 to the value of the normalized
base ten integer exponent
1998 Morgan Kaufmann Publishers
REPRESENTING THE EXPONENT IN IEEE 754



The exponent is represented as a biased integer
For single precision add 127 to the value of the exponent
For double precision add 1023 to the value of the exponent
Example:
How would the values -45 and 123 be represented in the 8-bit
biased format for single precision?

Answer:
-45 + 127 = 82 = 01010010b
123 + 127 = 250 = 11111010b

1998 Morgan Kaufmann Publishers
ENCODING THE BIASED BINARY EXPONENT
Binary Fraction
Normalized Exponents Biased Exponent
1.01 x 20
1.01 x 20
0 + 127 = 127
11.110 x 20
1.1110 x 21
1 + 127 = 128
1101.01 x 20
1.10101 x 23
3 + 127 = 130
0.001011 x 20
1.011 x 2-3
-3 + 127 = 124
0.0000101 x 20
1.01 x 2-5
-5 + 127 = 122
Encode each biased exponent as an unsigned 8-bit number.
Encode each biased exponent in 8-bit two’s complement.
Suppose you had to rapidly sort by exponents, which format
would be more efficient?
1998 Morgan Kaufmann Publishers
FLOATING POINT EXAMPLE #1

Recall that 6.25 = 1.1001b x 22

Encode 6.25 as a 32-bit single precision binary number

Sign bit = 0
Mantissa = 1.1001 (encoding omits assumed lead 1)
Exponent = 2 + 127 = 129 = 10000001



Encode using 32-bit single precision binary format
0 10000001 10010000000000000000000
Sign bit
Exponent
Mantissa
1998 Morgan Kaufmann Publishers
FLOATING POINT EXAMPLE #2

What is the value of the single-precision floating-point number
represented by the following 32-bit binary encoding?
0 10000000 110 0000 0000 0000 0000 0000
Sign bit = 0
Encoded Exponent = 10000000 = 128
Encoded Mantissa = 110 0000 0000 0000 0000 0000
Subtract the added bias of 127 to reveal an exponent = 1
Mantissa = . 110 0000 0000 0000 0000 0000
Mantissa = 1.11
(Replace the assumed 1 before the binary
point)
Mantissa = 1.11 = 1x20 + 1x2-1 + 1x2-2 = 1.75
Value = 1.75 x 21 = 3.5
1998 Morgan Kaufmann Publishers
FLOATING POINT EXAMPLE #3

-6.25 = -1.1001b x 22

Encode -6.25 as a 32-bit single precision binary number

Sign bit = 1 (Use signed magnitude for mantissa)
Mantissa = 1.1001 (encoding omits assumed lead 1)
Exponent = 2 + 127 = 129 = 10000001



Encode using 32-bit single precision binary format
1 10000001 10010000000000000000000
Sign bit
Exponent
Mantissa
1998 Morgan Kaufmann Publishers
EXERCISE

Exercise 2.18 (a) on page 42 of Computer
Architecture by N. Carter

What value is represented by this IEEE single
precision value?
1 01111010 100 0000 0000 0000 0000
0000
1998 Morgan Kaufmann Publishers
EXERCISE: SOLUTION

What value is represented by this IEEE single precision value?
1 01111010 100 0000 0000 0000 0000 0000
Sign bit = 1
Encoded Exponent = 01111010 = 122
Encoded Mantissa = 100 0000 0000 0000 0000 0000
Subtract added bias of 127 from encoded exponent
Actual exponent is -5
Mantissa = . 100 0000 0000 0000 0000 0000 = .1
Mantissa = 1.1
(Add back the assumed 1 before the binary
point)
Mantissa = - 1 x 20 + 1x2-1 = -1.5
Value = -1.5 x 2-5 = -1.5 x (1/32) = -0.046875
1998 Morgan Kaufmann Publishers
IEEE 754 SINGLE PRECISION RANGE

Smallest positive normalized number
1.00000000000000000000000 x 2-126

Largest normalized number
1.11111111111111111111111 x 2127
1998 Morgan Kaufmann Publishers
REPRESENTING 1.0
1998 Morgan Kaufmann Publishers
REPRESENTING 0.0

The assumed 1 bit in the mantissa gains an
extra bit of precision

But zero cannot be represented exactly since a
mantissa of 0 is interpreted as 1.0

The IEEE 754 standard specifies that zero is
represented using an exponent of 0 with a
mantissa of 0.
1998 Morgan Kaufmann Publishers
NAN

NaN = Not a Number

Special value used to represent a value produced by an error
condition such as overflow, underflow, or divide by zero

NaN is represented by all 1’s in the exponent field and a
non-zero mantissa field

Any math operation using NaN results in NaN

Example: NaN + 4.5 = NaN
1998 Morgan Kaufmann Publishers
INFINITY

IEEE 754 represents infinity using all 1’s in the
exponent and a fraction field of 0.

The sign bit designates positive or negative
infinity
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION (DECIMAL EXAMPLE)

Example: 9.999 x 101 + 1.610x10-1

Step 1: Shift decimal point of smaller number to the left until its
updated exponent matches the exponent of the larger number
1.610x10-1  0.01610x101

Step 2: Add the mantissas (Assume only 4 significant digits)
9.999 x 101
+0.016 x 101
10.015 x 101

Step 3: Re-normalize to get one non-zero digit left of decimal point
10.015 x 101  1.0015 x 102

Step 4: Round the mantissa to 4 significant digits
1.0015 x 102  1.002 x 102
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXAMPLE

Use single-precision floating point to compute 0.25 + 1.5

0.25 (base 10) = (1/4) = 0.01 = 1.0 x 2-2
1.5 (base 10) = 1 + (1/2) = 1.1 x 20


Shift binary point of smaller number to the left so
exponents match
1.0 x 2-2  0.01 x 20
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXAMPLE (CONTINUED)

Use single-precision floating point to compute 0.25 + 1.5

Next, add the mantissas, both with exponent of 0
0.01 x 20
+1.10 x 20
1.11 x 20
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXAMPLE (CONTINUED)

Use single-precision floating point to compute 0.25 + 1.5
0.01 x 20
+1.10 x 20
1.11 x 20
Encode result using 32-bit single precision
Sign bit = 0
Mantissa = 11000000000000000000000 (23 bits)
Exponent = 0 + 127 = 127 = 01111111
The 32-bit single precision encoding is...
0 01111111 11000000000000000000000
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXERCISE: SOLUTION

2.20 (b) Use single precision to compute 147.5 + 0.25
147.5 (base 10) = 128 + 16 + 2 + 1 + (1/2) =
= 10010011.1
Convert to normalized mantissa format
10010011.1 x 20  1.00100111 x 27
Shifted binary point 7 places to the left
See Computer Architecture by N. Carter, page 43
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXERCISE: SOLUTION

2.20 (b) Use single precision to compute 147.5 + 0.25
0.25 (base 10) = (1/4) = 0.01
Convert to normalized mantissa format
0.01 x 20  1.0 x 2-2
Shift binary point 2 places to the right
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXERCISE: SOLUTION

2.20 (b) Use single precision to compute 147.5 + 0.25
1.00100111 x 27
+ 1.0 x 2-2
Shift binary point of smaller number to left to match exponent
(7) of the larger number
1.0 x 2-2  0.000000001 x 27
Shift binary point 9 places to the left to go from exponent of -2 to
7
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXERCISE: SOLUTION

2.20 (b) Use single precision to compute 147.5 + 0.25
Add the mantissas, both expressed with exponent 7
1.001001110 x 27
+ 0.000000001 x 27
1.001001111 x 27
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION EXERCISE: SOLUTION

2.20 (b) Use single precision to compute 147.5 + 0.25
Encode the result 1.001001111 x 27 in single precision
Sign bit = 0 since result is positive
Mantissa = 00100111100000000000000 (23 bits)
Exponent = 7 + 127 = 134 = 10000110
The 32-bit single precision encoding is...
0 10000110 00100111100000000000000
1998 Morgan Kaufmann Publishers
ADDITION WITH NEGATIVE VALUES

If a value is negative, you must first convert the negative value into
two’s complement

Example:
-0.111

Convert to two’s complement by...
1.000 inverting all bits
+ 0.001 adding 1
1.001
Use the two’s complement version of the value when adding the
mantissas. Discard the carry overflow bit.
1998 Morgan Kaufmann Publishers
ADDITION WITH NEGATIVE VALUE(S)
1.000 x 20
-1.000 x 2-1
(1.0 in base ten)
(-0.5 in base ten)
Move binary point of the smaller number so exponents match
1.000 x 20
(1.0 in base ten)
-0.100 x 20
(-0.5 in base ten)

Convert mantissa of -0.5 into two’s complement then add
1.000 x 20
(1.0 in base ten)
+1.100 x 2-1 (-0.5 in base ten)
10.100 x 2-1 Two’s complement addition discards carry overflow bit



The sum is 0.100 x 2-1
Normalize the exponent to get sum of 1.000 x 2-1 (0.5 base ten)
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION (PAGE 282 OF H&P)

Example: Compute 0.5 + -0.4375 (base 10) using binary
arithmetic.
0.5 (base 10) = 0.1 x 20

Normalize to get 1 to left of the binary point
0.5 = 0.1 x 20 = 1.0x2-1
-0.4375 = -0.0111 = - ((1/4) + (1/8) + (1/16))

Normalize to get 1 to the left of the binary point
-0.0111  1.11 x 2-2
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION (PAGE 282 OF H&P)

Compute
0.5 = 0.1 x 20 = 1.0x2-1
+
-0.4375 = 1.11 x 2-2

Step 1: Shift binary point of smaller number to the left until its
updated exponent matches the exponent of the larger number
1.11 x 2-2  0.111 x 2-1

Step 2: Add the mantissas * Convert negative value to two’s
1.000 (1.0 decimal)
-0.111 (-0.875 decimal)
0.001 (0.125 decimal)
complement then add
* Discard carry overflow bit
0.001 x 2-1
1998 Morgan Kaufmann Publishers
FLOATING POINT ADDITION (PAGE 282 OF H&P)

Step 3: Normalize to get 1 to left of binary point
0.001 x 2-1  1.0 x 2-4
Exponent of -4 lies between 127 and -126 (range of single
precision exponents)...therefore no overflow or underflow
Express exponent in biased notation by adding 127
Encoded exponent = -4 + 127 = 123

Step 4: Round to 23 binary digits of mantissa
precision
1.0 x 2-4 (no rounding needed)
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION


Multiply the mantissas and add the exponents
Result = (mantissa1 x mantissa2) + 2(exp1 + exp2)

Example (in decimal)
5x103 x 2x106 = 10x109

If the mantissa is >= 10 then shift the mantissa down 1
place (divide by 10) and increment the result exponent

Example (in decimal)
10x109 = 1x1010
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION

Since the IEEE 754 uses biased integers to represent the exponent, the
bias must be considered when adding the exponents

Add the two biased integer exponents, then subtract the bias value from
the result

Example: Add biased +127 exponents of 150 and 45

Break down the exponents to see the bias values of 127
150 = (23 + 127)
45 = (-82 + 127)


Add the biased exponents: 150 + 45 = 195
Subtract the bias of 127: 150 + 45 – 127 = 68 result biased exponent

Check it: 68 – 127 = Actual exponent of -59 = 23 + -82
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION: EXAMPLE

Exercise 2.20 (a) Use IEEE single precision to compute 32 x
16.
32 (base 10) = 100000.0 x 20 (binary)
Convert to normalized binary mantissa format
100000.0 x 20  1.0 x 25 (Shift binary point 5 places to left)
Exponent = 5 + 127 = 132
16 (base 10) = 10000.0 x 20 (binary)
Convert to normalized binary mantissa format
10000.0 x 20  1.0 x 24 (Shift binary point 4 places to left)
Exponent = 4 + 127 = 131
See Computer Architecture by N. Carter, page 43
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION: EXAMPLE

Exercise 2.20 (a) Use IEEE single precision to compute 32 x 16.
1.0 x 25
Exponent = 5 + 127 = 132
1.0 x 24
Exponent = 4 + 127 = 131
Multiply mantissas: 1.0 x 1.0
1.0
x1.0
Count number of bits right of binary point of operands
00
Place binary point two places from left of product
+100
1.0 0
Add +127 biased exponents: 132 + 131 – 127 = 136
Actual unbiased exponent = 136 – 127 = 9
Product = 1.0 x 29 = 512
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION: EXAMPLE

Exercise 2.20 (a) Use IEEE single precision to compute 32 x 16.
1.0 x 25
Exponent = 5 + 127 = 132
1.0 x 24
Exponent = 4 + 127 = 131
Multiply mantissas: 1.0 x 1.0 = 1.0 (binary)
Add +127 biased exponents: 132 + 131 – 127 = 136
Actual unbiased exponent = 136 – 127 = 9
Product = 1.0 x 29 = 512
Sign Bit = 0
Mantissa = 1.00000000000000000000000
Exponent = 136 = 10001000
The encoded IEEE 754 single precision number is...
0 10001000 00000000000000000000000
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION EXERCISE:
SOLUTION

2.20 (c) Compute 0.125 x 8 using single-precision binary.
0.125 (base 10) = 0.001 x 20 = 1.0 x 2-3 (Normalized binary mantissa)
8 (base 10) = 1000.0 x 20 = 1.0 x 23 (Normalized binary mantissa)
See Computer Architecture by N. Carter, page 43
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION EXERCISE:
SOLUTION

2.20 (c) Compute 0.125 x 8 using single-precision binary.
1.0 x 2-3 Biased exponent = -3 + 127 = 124
1.0 x 23 Biased exponent = 3 + 127 = 130
Multiply mantissas: 1.0 x 1.0 = 1.0 (binary)
Add biased exponents: 124 + 130 – 127 = 127
Actual exponent: 127 – 127 = 0
Sign Bit = 0
Mantissa = 1.00000000000000000000000
Exponent = 127 = 01111111
0 01111111 00000000000000000000000 is the
encoded binary number
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION EXERCISE

Multiply 0.75 x 32 using IEEE 754 single-precision format
0.75 = 0.11 x 20
Normalized 1.1 x 2-1
Biased exponent = -1 + 127 = 126
32 = 100000.0 x 20
Normalized 1.0 x 25
Biased exponent = 5 + 127 = 132
Multiply the mantissas
1.1
x1.0
00
+11
1.1 0
To place the binary point...
Count number of bits to right of binary points
of the two operands 1.1 and 1.0
Total of 2 places so place binary point
two places from the left in the product
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION EXERCISE

Multiply 0.75 x 32 using IEEE 754 single-precision
format
Multiply the mantissas
1.1
x1.0
00
+11
1.1 0
Add the biased exponents:
126 + 132 – 127 = 131 (unbiased exponent is 4)
1998 Morgan Kaufmann Publishers
FLOATING POINT MULTIPLICATION EXERCISE

Multiply 0.75 x 32 using IEEE 754 single-precision format
Product of the mantissas
1.10
Add the biased exponents:
126 + 132 – 127 = 131 (unbiased exponent is 4)
The product is already normalized
Encode the product using IEEE 32-bit format
Sign bit = 0
Exponent = 131 = 10000011
Mantissa = 10000000000000000000000
0 10000011 10000000000000000000000
1998 Morgan Kaufmann Publishers
ROUNDING OF FLOATING POINT NUMBERS

Accurate rounding requires the hardware to use a few extra bits to
hold intermediate results

Use these extra bits to decide how to round when the final result is
stored in the 32-bit single precision or 64-bit double precision format

The IEEE 754 standard uses up to three additional bits called the
guard, round, and sticky bits to assist in accurate rounding

See pages 297-298 of Computer Organization and Design
1998 Morgan Kaufmann Publishers
ROUNDING OF FLOATING POINT NUMBERS

Compute this base ten addition rounding all
intermediate values to three significant digits
2.56 x 100
+ 2.34 x 102
First shift the decimal point of the top number to align the
exponents
0.02 x 102 Rounding to three digits looses information
+ 2.34 x 102
2.36 x 102
1998 Morgan Kaufmann Publishers
ROUNDING OF FLOATING POINT NUMBERS

Compute this base ten addition using intermediate values that
keep an extra two digits
2.56 x 100
+ 2.34 x 102
First shift the decimal point of the top number to align the exponents
0.0256 x 102 Intermediate values use two extra bits
+ 2.3400 x 102
2.3656 x 102
Use extra two bits to round the result down to three significant digits
2.37 x 102
1998 Morgan Kaufmann Publishers
JAVA APPLETS FOR IEEE FLOATING POINT

A Java applet that converts decimal numbers to IEEE single or
double precision encodings can be found at...
http://babbage.cs.qc.edu/courses/cs341/IEEE-754.html

This applet may be used to make up your own sample problems to
convert between decimal and IEEE format and to check the result
of other calculations in IEEE floating point format

Interactive floating point addition demo
http://timacmp.imag.fr/~guyot/Cours/Oparithm/english/Flottan.htm
1998 Morgan Kaufmann Publishers

07FloatingPointx

Transcript 07FloatingPointx

Directory