CS 147 Peter Budiono

Transcript CS 147 Peter Budiono

Floating Point
Agenda
History
 Basic Terms
 General representation of floating point
 Constructing a simple floating point
representation
 Floating Point Arithmetic
 The IEEE-754 Floating-Point Standard
 Range, Precision, and Accuracy

History
The first floating point representation
was firstly used in “V1” machine (1945).
It had 7-bit exponent, 16-bit mantissa,
and a sign bit.
 In 1954, floating point representation
was used by IBM for the modern
computing system.
 In 1962, the UNIVAC 1100/2200 series
was introduced. It contains single
precision and double precision.

Basic Terms
Scientific notation: A notation that renders
numbers with a single digit to the left of the
decimal point.
 Normalized: A number in floating-point notation
that has no leading 0s.
 Floating point: Computer arithmetic that represents
numbers in which the binary point is not fixed.
 Fraction: The value, between 0 and 1, placed in
the fraction field of the floating point.
 Exponent: In the numerical representation system
of floating-point arithmetic, the value that is placed
in the exponent field.

General representation of
floating point
Constructing a simple floating
point representation
We will use 14-bit model: 1 sign bit, 5-bit
exponent, and 8-bit significand.
 For example, storing a decimal number
17 into this model.
 In decimal we can say, 17 = 0.17 x 10^2
 But, in order to construct a floating point
representation we have to convert it into
binary.

17 (decimal) = 10001 ( binary)
 10001 = 0.10001 x 2^5
 Then, we can now construct its
representation

1bit
5 bits
8 bits
0 00101 1000100
sign field:
0 : positive value
1 : negative value
What if we want to store a negative exponent
value?
The previous example can’t handle this
problem, thus we could fix that by using
biased exponent.
 For example, if we want to store 0.25,
we will have 0.1 x 2^-1
 We can fix this by using excess-16
representation. So that we add 16 to the
negative exponent (-1 + 16 = 15).

0 01111 1000000
Another problem using this method
0 10101 10001000
0 10110 01000100
0 10111 00100010
= 17
0 11000 00010001
We don’t have a unique representation for
each number.
Remedy
This problem can be fixed by
normalization.
 Normalization is a convention that the
leftmost bit of the significand must
always be 1. So that we only have

0 01111 1000000
for decimal value 17.
Floating Point Arithmetic

Addition
0 10010 11001000
0 10000 10011010
11.001000
0.10011010
11.10111010
0 10010 11101110

Multiplication
0 10010 11001000
• T5t5t55tttttttttttttttttttttttttttttttttttttttttttt = 0.11001000 x 2^2
0 10000 10011010
•
= 0.10011010 x 2^0
0.11001000 x 0.10011010 = 0.0111100001010000
2^2 x 2^0 = 2^2
0 10001 11110000
Some other problems in floating
point arithmetic
Division by zero.
 Overflow, if the result is greater in
magnitude than the given storage.
 Underflow, if the result is smaller in
magnitude than the given storage.

The IEEE-754 Floating-Point
Standard
This was first introduced in 1985.
 This type of floating point includes two
formats: single precision and double
precision.

The standard defines:





arithmetic formats: sets of binary and decimal
floating-point data, which consist of finite numbers,
(including negative zero and subnormal numbers),
infinities, and special 'not a number' values.
interchange formats: encodings (bit strings) that may
be used to exchange floating-point data in an
efficient and compact form
rounding algorithms: methods to be used for
rounding numbers during arithmetic and conversions
operations: arithmetic and other operations on
arithmetic formats
exception handling: indications of exceptional
conditions (such as division by zero, overflow,
underflow, etc.)
Single Precision IEEE-754
1bit
8 bits
23bits
This representation uses an excess-127
 This representation assumes an implied
1 to the left of the radix point, for
example we put 1 = 1.0 x 2^(0+127)

Floating Point Number
Single Precision Representation
1.0
0 01111111
00000000000000000000000
0.5
0 10000000 00000000000000000000000
19.5
0 10000011 00111000000000000000000
-3.75
1 10000000 11100000000000000000000
Double Precision IEEE-754
1bit
11 bits
52 bits
This representation uses an excess1023
 This representation assumes an implied
1 to the left of the radix point, for
example we put 1 = 1.0 x 2^(0+127).
(same as the single precision)

Range, Precision, and Accuracy

Range
In double precision, for example, we
have
Negative
Overflow
Expressible Negative
Number
-1.0 x 10^308
Negative
Underflow
-1.0 x 10^-308
Positive
Underflow
0
Expressible Positive
Numbers
1.0 x 10^-308
Positive
Overflow
1.0 x 10^308

Accuracy
how close a number is to its true value
for example, we can’t represent 0.1 in floating
point, but we can still find a number in the
range that relatively close to 0.1

Precision
how much information we have about a value
and the amount of information used to
represent the value
for example, 1.666 (4 decimal digits of
precision) and 1.6660 (5 decimal digits of
precision). Thus, the first number is more
accurate than the second one.
Thank You
References
Wikipedia:
http://en.wikipedia.org/wiki/IEEE_754
 Books:
Computer Organization and Design
by Patterson, D
Computer Organization and Architecture
by Null, Linda


CS 147 Peter Budiono

Transcript CS 147 Peter Budiono

Directory