Floating Point Presentation

Download Report

Transcript Floating Point Presentation

Data Representation
Floating Point
13/04/2015
1
Learning Objectives:
Demonstrate an understanding of floating
point representation of a real binary
number.
Normalise a real binary number
Discuss the trade-off between accuracy
and range when representing numbers.
13/04/2015
2
Binary Range
Limited by the number of bits used to
represent a number.
More bits means a wider range.
But even using 4 bytes (32 bits) to
represent a number means that
4,278,190,080 is the largest number which
can be held.
13/04/2015
3
Fixed Point Binary
A number with a decimal point is known
(strangely!) as a real number as opposed to an
integer which is a whole number.
We can extend the binary system to represent
real numbers by reserving some bits for the real
or fractional part.
8
0
4
1
2
1
1
0
.
½
1
¼
1
1/8 1/16
0
0
6.75 = 0110.1100
13/04/2015
4
Fixed Point Binary Range
The range is now even more limited as
now some bits are reserved for the real /
fractional part and so are no longer being
used to hold higher numbers!
The range is even more limited if we also
wish to represent negative numbers, as
the last bit will need to be a sign bit.
13/04/2015
5
Fixed Point Binary - Decimal
Converter
Try using it to ‘play’ with fixed point binary.
13/04/2015
6
Fixed Point Binary Precision
Also the fractional part can only hold 4 places,
any places after the first 4 will be either rounded
or truncated, so precision will be lost.
This might first appear to be accurate enough for
most purposes.
However, each binary digit after the point is
worth half of the last not 1/10 like in decimal
values.

Example shown on next slide.
13/04/2015
7
Fixed Point Binary Precision
110.1 = 6.5
110.11 = 6.75
We have missed out 6.51 to 6.74!
This means accuracy is poor.
13/04/2015
8
Floating Point (Fractional Real
Numbers)
This increases the possible range of stored real numbers
but not accuracy (this only achieved by using more bytes (bits):

e.g.
Standard form (also referred to as "Scientific notation"):


1,400,000,000,000 (Decimal) = 1.4*1012
1.4 = mantissa, 12 = exponent
A number is therefore held in two parts:

Mantissa
Some websites also state that this is known more correctly as the
Significand and that the Mantissa is the number before the decimal
point. But Cambridge exams have always used the term Mantissa
for the “whole digit string” e.g. 14 in the example above (at least up to
now).

Exponent
Could be represented: 14 12 if it was understood that the
first part is the mantissa and second part is the exponent
13/04/2015
9
Mantissa & Exponent – 1 byte each
Most exam questions appear to use 8 bits for the
mantissa and 8 bits for the exponent.
Try ‘playing’ with the Floating Point Binary Decimal Converter.
Also use it whenever you need to for the rest of
this presentation.
13/04/2015
10
Mantissa
Represents the magnitude of the number
and is the fractional part of the
representation.

Place value of MSB is –1 and the other bits
are ½, ¼.
13/04/2015
11
Exponent
Represents the power of 2 by which the
mantissa must be multiplied to give the
original value.
13/04/2015
12
Positive Mantissa
&
Positive Exponent
Denary -> Floating Point Binary
13/04/2015
13
6.5
Fixed point binary
6.5 = 6 ½ = 110 .1000
3
Add 0’s to right of the
= 0 .1101 * 2
Binary
mantissa and to left
11
= 0.1101 * 2
(before the sign bit) of the
exponent.
sign bits
0 1101 000
mantissa
13/04/2015
0 00000 11
exponent
14
Try this independently first.
Using an 8 bit byte for the
mantissa and another 8 bit byte
for the exponent show 1.75 as a
2 byte, floating point number in
two’s complement form.
13/04/2015
15
1.75
1.75 = 1 + ½ + ¼
= 1.11 (binary – fixed point)
= 0.111 * 21
= 0.111 * 200000001
= 01110000 00000001
mantissa
13/04/2015
exponent
16
Positive Mantissa
&
Positive Exponent
Floating Point Binary -> Denary
13/04/2015
17
01101000 00000011
00000011 = 3
0.1101000 * 23 = 110.1
Assumed binary
point between sign
bit and 2nd bit.
13/04/2015
= 6.5
18
Try this independently first.
Using 8 bits for the mantissa, 8 bits for
the exponent and storing the mantissa
and the exponent in two’s complement
form.
Give the denary number which would
have 01011000 00000011 as its
binary, floating point representation.
13/04/2015
19
01011000 00000011
0000011 = 3
0.1011000 * 23 = 101.1000
Assumed binary
point between sign
bit and 2nd bit.
13/04/2015
= 5.5
20
Positive Mantissa
&
Negative Exponent
Denary -> Floating Point Binary
13/04/2015
21
0.125
0.125 = 1/8
= 0.001 (binary – fixed point)
0.1 * 2 -2
-2 = - 00000010
two’s complement
= 1 1111110
01000000 11111110
13/04/2015
22
Try this independently first.
Using an 8 bit byte for the
mantissa and another 8 bit byte
for the exponent show 0.375 as
a 2 byte, floating point number
in two’s complement form.
13/04/2015
23
0.375
0.375 = ¼ + 1/8
= 0.011 (binary – fixed point)
= 11 * 2-1
-1 = - 00000001
two’s complement
= 1 1111111
= 0 1100000
13/04/2015
1 1111111
24
Positive Mantissa
&
Negative Exponent
Floating Point Binary -> Denary
13/04/2015
25
01000000 11111110
negative
11111110
undo two’s complement
- 00000010 = -2
0.1000000 * 2 -2 = 0.001 (binary – fixed point)
= 1/8
= 0.125
13/04/2015
26
Try this independently first.
Using 8 bits for the mantissa, 8 bits for
the exponent and storing the mantissa
and the exponent in two’s complement
form.
Give the denary number which would
have 01100000 11111111 as its
binary, floating point representation.
13/04/2015
27
01100000 11111111
11111111
= - 00000001
= -1 (decimal)
undo two’s
complement
0.1100000 * 2-1 = 0.01100000
= ¼ + 1/8
= 0.25 + 0.125
= 0.375
13/04/2015
28
Negative Mantissa
&
Positive Exponent
Denary -> Floating Point Binary
13/04/2015
29
- 1.5
1.5 = 1.1 (binary)
two’s complement
- 1.1 = - 0.11 * 21
= 1 01 * 200000001
= 1 0100000 0 0000001
mantissa
13/04/2015
exponent
30
Try this independently first.
Using an 8 bit byte for the
mantissa and another 8 bit byte
for the exponent show -1.25 as
a 2 byte, floating point number
in two’s complement form.
13/04/2015
31
- 1.25
- 1.25 = - 1 + ¼
= - 1.01 (binary – fixed point)
= - 0.101 * 21
= 1 011 * 200000001
= 1 0110000
mantissa
13/04/2015
0 0000001
exponent
32
Negative Mantissa
&
Positive Exponent
Floating Point Binary -> Denary
13/04/2015
33
11101000 00000011
00000011
undo two’s
complement
00000011 = 3
1.1101000 * 23 = - 0.0011000 * 23
= - 0001.1
= - 1.5
•You may notice that as shown previously, -1.5 can also be
shown as 1 0100000 0 0000001.
•This is because 11101000 00000011 is not normalised
which is something we will look at later.
Try this independently first.
Using 8 bits for the mantissa, 8 bits for
the exponent and storing the mantissa
and the exponent in two’s complement
form.
Give the denary number which would
have 10111010 00000011 as its
binary, floating point representation.
13/04/2015
35
10111010 00000011
0 0000011 = 3
1.0111010 * 23 = - 0.1000110
= - 0100.0110
= - (4 + ¼ + 1/8)
= - 4.375
13/04/2015
36
Negative Mantissa
&
Negative Exponent
Denary -> Floating Point Binary
13/04/2015
37
- 0.125
-0.125 = - 1/8
= - 0.001 (binary – fixed point)
- 0.1 * 2-2 = -0.1 * 2-00000010
= 1 1000000 * 21 11111110
1 1000000
13/04/2015
11111110
38
Try this independently first.
Using an 8 bit byte for the
mantissa and another 8 bit byte
for the exponent show -0.25 as
a 2 byte, floating point number
in two’s complement form.
13/04/2015
39
- 0.25
- 0.25 = - ¼
= - 0.01 (binary – fixed point)
= - 0.1 * 2-1
= - 0.1000000 * 2-00000001
= 11000000 * 211111111
= 11000000
13/04/2015
11111111
40
Negative Mantissa
&
Negative Exponent
Floating Point Binary -> Denary
13/04/2015
41
10000000
11111101
11111101
= - 00000011
= -3
1.0000000 * 2-3 = 1.0000000 * 2-3
= - 0.001
= - 1/8
= - 0.125
undo two’s
complement
Note that the
mantissa looks the
same in two’s
complement form
as in none two’s
complement form
because the last 1
is at the beginning.
42
Try this independently first.
Using 8 bits for the mantissa, 8 bits for
the exponent and storing the mantissa
and the exponent in two’s complement
form.
Give the denary number which would
have 10000000 11111110 as its
binary, floating point representation.
13/04/2015
43
10000000 11111110
11111110
= - 0000010
=-2
1.0000000 * 2-2 = - 0.01
undo two’s
complement
= - 0.25
•You may notice that as shown previously, -0.25
can also be shown as 1 1000000 1 1111111.
•This is because 10000000 11111110 is
normalised which is something we will look at next.
Denary -> Floating Point
1.
2.
3.
Convert fractional part of denary number to fractions.
Convert to fixed point binary (keep – sign if exists).
Move binary point to left hand side of first 1 and count
how many places and note direction needed.
0.number * 2^no.of places needed in step 3 (denary).
0.number * 2^no.of places needed in step 3 (binary).
4.
5.
If moved right then use – sign and then flip for two’s
complement

1st binary number - Remove binary point (keeping 1st
0) and add any necessary 0’s to right (to make 8 bits).
6.
Convert to two’s complement if –tive and remove - sign.
This is the Mantissa.


7.
Add any necessary 0’s (before sign bit) to left of 2nd
binary number (to make 8 bits, including sign bit). This
is the Exponent.
13/04/2015
45
Floating Point -> Denary
1. Convert exponent to denary.
2. If sign bit = 1 then flip to convert from two’s
complement.
3. Mantissa * 2^exponent (denary).


Convert mantissa from two’s complement if sign bit =
1 and insert for our benefit a – sign.
Insert assumed binary point after the sign bit.
4. Move the binary point the exponent number of
places (> +, < -).
5. Convert to denary as fixed point binary.
13/04/2015
46
Decimal Normalisation
34,568,000 = 3456.8 x 104
= 0.34568 x 108
= 3.4568 x 107
The last way is more efficient and is the
typical “correct” way to use scientific
notation.
This form is called the normalised form.
13/04/2015
47
Floating Point Binary Normalisation
In binary the normalised form is used to
maximise efficiency and to have only one
way to represent a number.
The mantissa is said to be normalised if
the first two bits are different.
 For positive numbers, the first bit is always 0
and the second is always 1.
 For negative numbers the first bit is always 1
and the second is always 0.
13/04/2015
48
Normalising Floating Point
Numbers
1. Convert the exponent to denary.
2. Shift the mantissa (not the sign bit) as
many places to left as necessary to
achieve a leading 1 (if positive i.e. sign bit =
0) or a leading 0 (if negative i.e. sign bit = 1).
3. Subtract the number of places that were
necessary from the exponent and
convert back to binary.
13/04/2015
49
0 0001101 00000010
1. The exponent 00000010 = 2
2. The mantissa 0 0001101 has to be shifted (3x)
left to achieve a leading 1 (not including the sign bit)
i.e. 0
1101000
3. So exponent should be 2 – 3 = -1
= - 00000001
= 1 1111111
So normalised 01101000
13/04/2015
11111111
50
1 1111001 00000011
1. 00000011 = 3
2. 1 1111001 has to be shifted (4x) left to achieve
a leading 0 (not including the sign bit).
3. So exponent should be 3 - 4 = -1
= - 00000001
= 1 1111111
 So normalised 10010000
13/04/2015
11111111
51
Try this independently first.
Normalise these floating point
binary numbers.
11101000 00000011
11000000 11111111
13/04/2015
52
11101000 00000011
1.
00000011 = 3
2. 11101000 has to be shifted (2x) left to achieve
a leading 0 (not including the sign bit) to make
10100000.
3. So exponent should be 3 – 2 = 1
= 00000001
 So normalised 1
13/04/2015
0100000 0 0000001
53
11000000 11111111
1.
11111111 = - 0000001
= -1 (denary)
2. 1100000 has to be shifted (1x) left to make the
2nd bit 0 and achieve 10000000.
3. So exponent should be -1 – 1 = -2
= - 00000010
= 11111110
 So normalised 10000000
13/04/2015
11111110
54
If you are asked to give the floating
point binary form of a decimal and
make sure it is normalised.
Then convert as practised and
normalise if necessary.
13/04/2015
55
Numbers are held in floating point form with one byte for the
mantissa (fraction) and one byte for the exponent
(characteristic). All values are held in two’s complement form
and the mantissa is normalised.
Using this format, write down the binary floating point values
and the denary values of
(i) the largest magnitude, positive number;
(ii) the smallest magnitude, positive number;
(iii) the largest magnitude, negative number;
(iv) the smallest magnitude, negative number.
(The denary values may be left as a product of a power of 2).
13/04/2015
56
Floating Point Binary - Decimal
converter
Either use own understanding or the Fixed
Point Binary - Decimal Converter to help
you do the last slide independently first.
13/04/2015
57
The largest magnitude, positive number that can
be held in a floating point system using 8 bits for
the mantissa and 8 bits for the exponent.
0 1111111 * 20
13/04/2015
1111111 =
127/128 * 2127
58
The smallest magnitude, positive number that can
be held in a floating point system using 8 bits for
the mantissa and 8 bits for the exponent.
0 1000000 * 21
13/04/2015
0000000 =
0.5 * 2-128
59
The largest magnitude, negative number; that can
be held in a floating point system using 8 bits for
the mantissa and 8 bits for the exponent.
1 0000000 * 20
13/04/2015
1111111 =
- 1 * 2127
60
The smallest magnitude, negative number; that
can be held in a floating point system using 8 bits
for the mantissa and 8 bits for the exponent.
1 0111111 * 21
13/04/2015
0000000 =
- 65/128 * 2-128
61
Improving Accuracy of Binary
Floating Point Numbers
If we want to improve accuracy we must use
more bits for the mantissa by reducing the
number of bits for the exponent.

As more digits could be represented after the binary
point.
However the range would be decreased as the
exponent could not be as large as before.

So the power of two which the mantissa is multiplied
by is decreased.
13/04/2015
62
Representing Zero
Using the Floating Point Binary - Decimal
converter:


Try representing 0 as a non-normalised binary
floating point number.
Now try representing 0 as a normalised
floating point number?
Can you? Why?
13/04/2015
63
Representing Zero
A normalised value must have the first two
bits of the mantissa different.
Therefore one must be a 1 which must
represent either -1 or + ½ , but not zero.
13/04/2015
64
Floating Point Binary
You may now be thinking ‘If the range is so large
why don’t we use floating point binary
representation for all numbers (including
integers)?’
However, it is more complicated to perform
arithmetic on floating point numbers than
integers and so they are slower to work with.
Because of this floating point representation is
only used with real fractional numbers or
integers outside the range of +2 billion to -2
billion (which is the limit for 4 byte normal binary
representation).
13/04/2015
65
Plenary
Give the denary number which would have
01000000 00000000 as its binary, floating
point representation in this computer.
13/04/2015
66
Plenary
½
13/04/2015
or
0.5
67
Plenary
Show


10½
-10½
as 2 byte, normalised, floating point
numbers.
13/04/2015
68
Plenary
01010100 00000100
10101100 00000100
13/04/2015
69
Plenary
Explain the effect on the


range
accuracy
of the numbers that can be stored if the
number of bits in the exponent is reduced.
13/04/2015
70
Plenary
Range is decreased because power of
two which the mantissa is multiplying by
is decreased.
Accuracy is increased because more
digits are represented after the binary
point.
13/04/2015
71