Floating Point Numbers
Download
Report
Transcript Floating Point Numbers
Floating Point Numbers
Dr. Mohsen NASRI
College of Computer and Information Sciences,
Majmaah University, Al Majmaah
[email protected]
Sign/Magnitude
Representation
A signed number MUST have a sign (+/-). A method
is needed to represent the sign as part of the binary
representation.
In sign/magnitude (S/M) representation, the
leftmost bit of a binary code represents the sign
of the value:
0 for positive,
1 for negative;
The remaining bits represent the
numeric value.
Signed number with 2s complement
Re-order Negative
numbers to eliminate
one Discontinuity
Note:
Negative Numbers
still have 1 for the
most significant bit
(MSB)
-1
-2
-3
+1
0000
0001
1110
1101
-4
1100
-5
1011
-6
1111
+0
0010
Inner numbers: 0011
Binary
representation 0100
1010
0101
1001
-7
+2
0110
1000
-8
0111
+7
• Only one discontinuity now
• Only one zero
• One extra negative number
+6
+3
+4
+5
Eight
Positive
Numbers
Two’s Complement
Representation
Biggest reason two’s complement used in most
systems today?
The binary codes can be added and subtracted
as if they were unsigned binary numbers,
without regard to the signs of the numbers
they actually represent.
Two’s Complement
Representation
For example, to add +4 and -3, we simply add
the corresponding binary codes, 0100 and
1101:
0100 (+4)
+1101 (-3)
0001 (+1)
NOTE: A carry to the leftmost column has
been ignored.
The result, 0001, is the code for +1, which IS
the sum of +4 and -3.
Twos Complement
Representation
Likewise, to subtract +7 from +3:
0011 (+3)
- 0111 (+7)
1100 (-4)
NOTE: A “phantom” 1 was borrowed from
beyond the leftmost position.
The result, 1100, is the code for -4, the result
of subtracting +7 from +3.
Two’s Complement
Representation
Summary - Benefits of Twos
Complements:
Addition and subtraction are simplified
in the two’s-complement system,
-0 has been eliminated, replaced by one
extra negative value, for which there is
no corresponding positive number.
Valid Ranges
For any integer data representation,
there is a LIMIT to the size of number
that can be stored.
The limit depends upon number of bits
available for data storage.
Unsigned Integer Ranges
Range = 0 to (2n – 1)
where n is the number of bits used to store
the unsigned integer.
Numbers with values GREATER than (2n – 1)
would require more bits. If you try to store
too large a value without using more bits,
OVERFLOW will occur.
Unsigned Integer Ranges
Example: On a system that stores
unsigned integers in 16-bit words:
Range = 0 to (216 – 1)
= 0 to 65535
Therefore, you cannot store numbers
larger than 65535 in 16 bits.
Signed Integer Ranges
Range = (-2(n-1) – 1) to +(2(n-1) – 1)
where n is the number of bits used to store the
sign/magnitude integer.
Numbers with values GREATER than +(2(n-1) – 1)
and values LESS than (-2(n-1) – 1) would
require more bits. If you try to store too
large/too small a value without using more bits,
OVERFLOW will occur.
Signed Integer Ranges
Example: On a system that stores unsigned
integers in 16-bit words:
Range = -(215 – 1) to +(215 – 1)
= -32767 to +32767
Therefore, you cannot store numbers larger
than 32767 or smaller than -32767 in 16 bits.
Two’s Complement Ranges
Range = -2(n-1) to +(2(n-1) – 1)
where n is the number of bits used to store the
two-s complement signed integer.
Numbers with values GREATER than +(2(n-1) – 1)
and values LESS than -2(n-1) would require
more bits. If you try to store too large/too small
a value, OVERFLOW will occur.
Two’s Complement Ranges
Example: On a system that stores unsigned
integers in 16-bit words:
Range = -215 to +(215 – 1)
= -32768 to +32767
Therefore, you cannot store numbers larger
than 32767 or smaller than -32768 in 16 bits.
Using Ranges for Validity
Checking
Once you know how small/large a value
can be stored in n bits, you can use this
knowledge to check whether you
answers are valid, or cause overflow.
Overflow can only occur if you are
adding two positive numbers or two
negative numbers
Using Ranges for Validity
Checking
Ex 1:
Given the following 2’s complement
equations in 5 bits, is the answer valid?
11111 (-1)
+11101 (-3)
11100 (-4)
Range =
-16 to +15
VALID
Using Ranges for Validity
Checking
Ex 2:
Given the following 2’s complement
equations in 5 bits, is the answer valid?
10111 (-9)
+10101 (-11)
01100 (-20)
Range =
-16 to +15
INVALID
Floating Point Numbers
Now you've seen unsigned and signed
integers. In real life we also need to be able
represent numbers with fractional parts (like: 12.5 & 45.39).
Called Floating Point numbers.
You will learn the IEEE 32-bit floating
point representation.
Floating Point Numbers
In the decimal system, a decimal point
(radix point) separates the whole
numbers from the fractional part
Examples:
37.25 ( whole = 37, fraction = 25/100)
123.567
10.12345678
Floating Point Numbers
For example, 37.25 can be analyzed as:
101
Tens
3
100
Units
7
10-1
Tenths
2
10-2
Hundredths
5
37.25 = (3 x 10) + (7 x 1) + (2 x 1/10) + (5 x 1/100)
Binary Equivalence
The binary equivalent of a floating point number
can be determined by computing the binary
representation for each part separately.
1) For the whole part:
Use subtraction or division method
previously learned.
2) For the fractional part:
Use the subtraction or
method (to be shown next)
multiplication
Fractional Part – Multiplication Method
In the binary representation of a floating point
number the column values will be as follows:
… 25 24 23 22 21 20 . 2-1 2-2 2-3
2-4 …
… 32 16 8 4 2 1 . 1/2 1/4 1/8
1/16…
… 32 16 8 4 2 1 . .5 .25 .125 .0625…
Fractional Part – Subtraction Method
Start with the column values again, as follows:
… 20 . 2-1 2-2 2-3 2-4
2-5
2-6…
… 1 . 1/2 1/4 1/8 1/16 1/32
1/64…
… 1 . .5 .25 .125 .0625 .03125 .015625…
Problem storing binary form
We have no way to store the radix point!
Standards committee came up with a way
to store floating point numbers (that have
a decimal point)
IEEE 754 Single Precision Floating Point Format
Representation:
S(1 bit) E (8bits)
M(23 bits)
•S is one bit representing the sign of the number
•E is an 8 bit biased integer representing the exponent
The true value represented is:
(-1)S.
Mantissa (M) x 2
________
• S = sign bit
• e = E – bias
•Bias=127
Exponent
(E)
----
IEEE 754 Single Precision Floating Point Format
The first (leftmost) field of our floating
point representation will STILL be the
sign bit:
0 for a positive number,
1 for a negative number.
IEEE 754 Single Precision Floating Point Format
S, E, F all represent fields within a representation. Each is just
a bunch of bits.
S is the sign bit
• (-1)S (-1)0 = +1 and (-1)1 = -1
• Just a sign bit for signed magnitude
E is the exponent field
• The E field is a biased-127 representation.
• True exponent is (E – bias)
IEEE 754 Single Precision Floating Point Format
Every binary number, except the one
corresponding to the number zero, can be
normalized by choosing the exponent so that the
radix point falls to the right of the leftmost 1 bit.
37.2510 = 100101.012 = 1.0010101 x 25
0
10000100
00101010000000000000000
7.62510 = 111.1012 = 1.11101 x 22
0
100000001 11101000000000000000000
IEEE 754 double Precision Floating Point Format
Representation:
S(1 bit) E (11 bits)
M(52 bits)
•S is one bit representing the sign of the number
•E is an 8 bit biased integer representing the exponent
The true value represented is:
(-1)S.
Mantissa (M) x 2
________
• S = sign bit
• e = E – bias
• Bias=127
Exponent
(E)
----
Thank You
Have a Nice Day