Signed integer representation

Download Report

Transcript Signed integer representation

Lecture 2
Data Representation in
Computer Systems
Lecture Duration: 2 Hours
Lecture Overview





Introduction
Positional Numbering System
Decimal to binary conversion
Signed integer representation
Floating-point representation
Prepared by Dr. Hassan SALTI - 2012
2
Introduction
Some Notifications – A reminder (1/2)
 Bit: The most basic unit of information in a digital
computer (On/Off ; 0/1 state)
 Byte: A set of 8bits
 Word: two or more adjacent bytes that are
manipulated collectively
 Word size: The size of a word in bits depends on
the computer organization (16, 32, 64 bits, …)
 Nibbles (or nybbles): set of 4 bits – Usually a set
of 8 bits is divided into two nibbles, a low order
nibble and a high order nibble
Prepared by Dr. Hassan SALTI - 2012
3
Introduction
Some notifications – A reminder (2/2)
 Example:
Most
Significant bit 0 1 1 0 0 1 1 1 1 0 0 0 1 1 0 1
(MSB)
bit bit bit bit bit bit bit bit bit bit bit bit bit bit bit bit
High Order
nibble
Least
Significant bit
(LSB)
Low Order High Order Low Order
nibble
nibble
nibble
byte
byte
Word (16 bit)
Prepared by Dr. Hassan SALTI - 2012
4
Lecture Overview





Introduction
Positional Numbering System
Decimal to binary conversion
Signed integer representation
Floating-point representation
Prepared by Dr. Hassan SALTI - 2012
5
Positional Numbering System
Positional Numbering System (1/3)
 Any numeric value is represented through
increasing powers of a radix (or base)
 The set of valid numerals (digits) is equal in size
to the radix of that system
 The least numeral is 0 and the highest one in 1
smaller than the radix
 Example:
• In the decimal system (base 10)
- The radix is 10
- The number of valid numerals is 10 (equal to the radix)
- The set of valid numerals is: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Prepared by Dr. Hassan SALTI - 2012
6
Positional Numbering System
Positional Numbering System (2/3)
 The most important radices (bases) in computer
science are:
• Binary
- Radix 2 or base 2
- Numerals: {0 , 1}
• Octal
- Radix 8 or Base 8
- Numerals: {0 , 1 , 2 , 3 , 4 , 5 , 6 , 7}
• Hexadecimal
- Radix 16 or base 16
- Numerals: {0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , A , B , C , D , E , F}
Prepared by Dr. Hassan SALTI - 2012
7
Positional Numbering System
Positional Numbering System (3/3)
 Any numeric value is represented through
increasing powers of a radix (or base)
 Examples
• 43.5110 = 2x102 + 4x101 + 3x100 + 5x10-1 + 1x10-2
• 2123 = 2x32 + 1x31 + 2x30 = 2310
• 10110.012 = 1x24 + 0x23 + 1x22 + 1x21 + 0x20 +
0x2-1 + 1x2-2= 22.2510
Prepared by Dr. Hassan SALTI - 2012
8
Lecture Overview
 Introduction
 Positional Numbering System
 Decimal to binary conversion
• Converting Unsigned Whole Numbers
• Converting fractions
• Converting between Power-of-Two Radices
 Signed integer representation
 Floating-point representation
Prepared by Dr. Hassan SALTI - 2012
9
Decimal to binary conversion
Some numbers to remember (1/1)
 Keep in mind the following tables or how to
obtain them!
Prepared by Dr. Hassan SALTI - 2012
10
Decimal to binary conversion
Converting Unsigned Whole Numbers (1/6)
 A real number can take any value (ex.
10323.7643 ; -16813.5322703)
 Whole number: No fractions (ex: 10, 1231,
3543, …, -12, -12334,…)
 Unsigned number: Only positive numbers (ex:
102313.43234, 1231.56234, 12357, …)
 Unsigned whole numbers: No fraction and
only positive numbers
Prepared by Dr. Hassan SALTI - 2012
11
Decimal to binary conversion
Converting Unsigned Whole Numbers (2/6)
 Convert the decimal number 11310 to binary:
11310 =
2
 Method 1: Repeated subtraction
113
- 64
49
- 32
17
- 16
1
- 1
0
11310 = 11100012
Prepared by Dr. Hassan SALTI - 2012
1
0
0
0
1
1
1
12
Decimal to binary conversion
Converting Unsigned Whole Numbers (3/6)
 Method 2: Division-remainder
2 |113
Remainder 1 LSB
2 |56
Remainder 0
2 |28
Remainder 0
2 |14
Remainder 0
11310 = 11100012
2 |7
Remainder 1
2 |3
Remainder 1
Remainder 1 MSB
2 |1
0
Prepared by Dr. Hassan SALTI - 2012
13
Decimal to binary conversion
Converting Unsigned Whole Numbers (4/6)
 A binary number with N bits
can represent 2N unsigned
integers from 0 to 2N-1
 Example:
• Having N=4 bits, we can
represent 24 = 16 unsigned
integers from 0 to 24-1=16-1=15
• The number 16 CANNOT be
represented with only 4 bits!!
Prepared by Dr. Hassan SALTI - 2012
14
Decimal to binary conversion
Converting Unsigned Whole Numbers (5/6)
 The subtraction method is cumbersome.
 The subtraction method requires a familiarity
with the powers of the radix being used.
 The division-remainder method is faster and
easier than the repeated subtraction method.
 The division-remainder method can be used
to convert from decimal to any other base
system (not only to base 2).
Prepared by Dr. Hassan SALTI - 2012
15
Decimal to binary conversion
Converting Unsigned Whole Numbers (6/6)
 Example: Convert 10410 to base 3 using the
division-remainder method.
3 |104
3 |34
3 |11
3 |3
3 |1
0
Remainder 2
Remainder 1
Remainder 2
Remainder 0
Remainder 1
Prepared by Dr. Hassan SALTI - 2012
10410 = 102123
16
Lecture Overview
 Introduction
 Positional Numbering System
 Decimal to binary conversion
• Converting Unsigned Whole Numbers
• Converting fractions
• Converting between Power-of-Two Radices
 Signed integer representation
 Floating-point representation
Prepared by Dr. Hassan SALTI - 2012
17
Decimal to binary conversion
Converting fractions (1/5)
 Fractions in a decimal system can be
converted/approximated to fractions in any other radix
system
 Radix points separate the integer part of a number from its
fractional part
 Example of fractions (the integer part is italic and the
fractional part is bold)
• Base 10 : 2390167.1208
• Base 3 : 2012.11022
• Base 2 : 1011110.111011
 The “radix point” is called a “decimal point” in a decimal
system, a “binary point” in a binary system, and so on…
Prepared by Dr. Hassan SALTI - 2012
18
Decimal to binary conversion
Converting fractions (2/5)
 To convert fractions from decimal to any other base system we
repeatedly multiply by the destination radix
 Example: Convert 0.430410 to base 5.
0.4304
x
5
2.1520
0.1520
x
5
0.7600
x
5
3.8000
0.8000
x
5
4.0000
The integer part is 2
The integer part is 0
0.430410 = 0.20345
The integer part is 3
The integer part is 4,
the fractional part is zero, we are done
Prepared by Dr. Hassan SALTI - 2012
19
Decimal to binary conversion
Converting fractions (3/5)
 Some fractions in one base could be indeterminate
• Fractions that contain repeating strings of digits to the right of
the radix point
• Example: (2/3)10=(0.666…)10
 An indeterminate fraction in one base could be
determinate in another base (and vice-versa).
• Example: (2/3)10=0.23=(0.666…)10
- 2/3 is indeterminate in base 10 but determinate in base 3.
 When a fraction is indeterminate, an approximation is
needed
• We fix the number of digits to the right of the radix point
 Also, approximation is needed due to the limited
computing resources (example: limited size of the
processor’s registers)
Prepared by Dr. Hassan SALTI - 2012
20
Decimal to binary conversion
Converting fractions (4/5)
 Example: Convert 0.3437510 to binary with 4 bits to the right of the
binary point.
0.34375
x2
0.68750
x2
1.37500
0.3437510 = 0.01012
0.37500
x2
0.75000
x2
1.50000
This is our fourth bit.
We will stop here.
Prepared by Dr. Hassan SALTI - 2012
21
Decimal to binary conversion
Converting fractions (5/5)
 Convert 26.78125 to binary:
26.7812510 =
2
 By using the methods just described we will
have:
2610=110102 and 0.7812510=0.110012
So 26.7812510=11010.110012
Prepared by Dr. Hassan SALTI - 2012
22
Decimal to binary conversion
Going back to positional numbering system (1/1)
 Any unsigned whole or fractional number
could be converted to decimal by using the
“Positional Numbering System” described
previously
 Examples:
 0.01012=0x2-1+1x2-2+0x2-3+1x2-4 = 0 + 0.25 +
0 + 0.0625 = 0.312510
 134.20345 = 1x52 + 3x51 + 4x50 + 2x5-1 + 0x5-2
+ 3x5-3 + 4x5-4 = 44.430410
Prepared by Dr. Hassan SALTI - 2012
23
Lecture Overview
 Introduction
 Positional Numbering System
 Decimal to binary conversion
• Converting Unsigned Whole Numbers
• Converting fractions
• Converting between Power-of-Two Radices
 Signed integer representation
 Floating-point representation
Prepared by Dr. Hassan SALTI - 2012
24
Decimal to binary conversion
Converting between Power-of-Two Radices (1/4)
 To convert between any base to any other base
(different than base 10), it is easier to pass
through base 10.
• Example: 31214=
3?
• First step: 31214 = 3x43 + 1x42 + 2x41 + 1x40=21710
• Second step: by using the division-remainder
method: 21710 = 220013
• So 31214=220013
 Working between bases that are powers of two
is much more easier.
Prepared by Dr. Hassan SALTI - 2012
25
Decimal to binary conversion
Converting between Power-of-Two Radices (2/4)
 The must famous power-of-two radices are:
binary (base 2), octal (base 23 / base 8) and
hexadecimal (base 24 / base 16).
 Each octal digit is equivalent to a group of 3
binary digits called octet1
 Each hexadecimal digit is equivalent to a
group of 4 binary digits called hextet
 We convert from binary to octal and from
binary to hexadecimal by simply grouping bits
1
The term “Octet” could also be used in the literature to describe a set of 8 bits.
Prepared by Dr. Hassan SALTI - 2012
26
Decimal to binary conversion
Converting between Power-of-Two Radices (3/4)
 Example: Convert 101100100111012 to octal
• Make Groups of 3 bits (from right to left):
- 10 110 010 011 101
• Add zero(s) on the left to complete the last octet
- 010 110 010 011 101
• Convert each octet to its corresponding octal digit
- 010 110 010 011 101
2 6 2 3 5
• Finally: 101100100111012 = 262358
Prepared by Dr. Hassan SALTI - 2012
27
Decimal to binary conversion
Converting between Power-of-Two Radices (4/4)
 Example: Convert 101100100111012 to
hexadecimal
• Make Groups of 4 bits (from right to left):
- 10 1100 1001 1101
• Add zero(s) on the left to complete the last hextet
- 0010 1100 1001 1101
• Convert each hextet to its corresponding hexadecimal
digit
- 0010 1100 1001 1101
2
C
9
D
• Finally: 101100100111012 = 2C9D16
Prepared by Dr. Hassan SALTI - 2012
28
Lecture Overview




Introduction
Positional Numbering System
Decimal to binary conversion
Signed integer representation
• Signed Magnitude
• Complement system
 Floating-point representation
Prepared by Dr. Hassan SALTI - 2012
29
Signed integer representation
Signed integer representation
 An integer is a whole number
 Signed integers are the set of positive and
negative whole numbers
 How should we encode and deal with the
actual sign of the number?
 Two concepts are used
• Signed Magnitude concept
• Complement concept
Prepared by Dr. Hassan SALTI - 2012
30
Signed integer representation
Signed Magnitude (1/13)
 Signed magnitude is the most intuitive
method
 The MSB (Most Significant Bit) of a binary
number is kept as the “sign” of the number
• MSB = 1: negative number
• MSB = 0: positive number
 The remaining bits represent the magnitude
(or absolute value) of the numeric value
Prepared by Dr. Hassan SALTI - 2012
31
Signed integer representation
Signed Magnitude (2/13)
 Example: In a 8 bit word signed magnitude
system give the decimal representation of the
following numbers
• 00000001?
- The MSB is 0: The number is positive
- The remaining 7 bits are: 00000012 = 110
- The decimal number is +1
• 10000001?
- The MSB is 1: The number is negative
- The remaining 7 bits are: 00000012 = 110
- The decimal number is -1
Prepared by Dr. Hassan SALTI - 2012
32
Signed integer representation
Signed Magnitude (3/13)
 Example: In a 8 bit word signed magnitude
system give the decimal representation of the
following numbers
• 10001001?
- The MSB is 1: The number is negative
- The remaining 7 bits are: 00010012 = 910
- The decimal number is -9
• 01000001?
- The MSB is 0: The number is positive
- The remaining 7 bits are: 10000012 = 6510
- The decimal number is +65
Prepared by Dr. Hassan SALTI - 2012
33
Signed integer representation
Signed Magnitude (4/13)
 In a N bit word signed magnitude system
•
•
•
•
1 bit is used for the sign of the number
N-1 bits are used for the magnitude of the number
The largest integer is 2N-1 - 1
The smallest integer is -(2N-1 - 1)
 Example: in a 8 bit word signed magnitude
system
• The largest integer is 011111112 = 27-1 = 12710
• The smallest integer is 111111112 = -(27-1) = -12710
Prepared by Dr. Hassan SALTI - 2012
34
Signed integer representation
Signed Magnitude (5/13)
 Computers should be able to carry out
mathematical operations
 Signed-magnitude arithmetic is carried out using
essentially the same methods as humans
• At first we look at the signs of the two operands
• We arrange the operands in a certain way based on
their signs
• We perform the calculation without regard to the
signs
• Finally, we supply the sign as appropriate
Prepared by Dr. Hassan SALTI - 2012
35
Signed integer representation
Signed Magnitude (6/13)
 Adding operands that have the same sign
 Example: Add 010011112 to 001000112 using
signed-magnitude arithmetic.
1 1 1 1 ⇐ carries
Sign
0
1001111
(79)
0 +0100011
+ (35)
0
1110010
(114)
 We find 010011112 + 001000112 = 011100102 in
signed-magnitude representation.
Prepared by Dr. Hassan SALTI - 2012
36
Signed integer representation
Signed Magnitude (7/13)
 Overflow condition
• In the last example, adding the seventh’ bits to
the left gives no carry
• If there is a carry, we say that we have an
overflow condition and the carry is discarded,
resulting in an incorrect sum.
 Example: Add 010000012 to 011000012 using
signed-magnitude arithmetic
Prepared by Dr. Hassan SALTI - 2012
37
Signed integer representation
Signed Magnitude (8/13)
1
⇐ carries
0
1000001
(65)
0 +1100001
+ (97)
(34)
0
0100010
 The addition overflows
 The last carry is discarded
 The sum’s result is incorrect
1
X
Prepared by Dr. Hassan SALTI - 2012
38
Signed integer representation
Signed Magnitude (9/13)
 Signed-magnitude subtraction is carried out in a
manner similar to pencil and paper decimal arithmetic
 Example 1: Subtract 010011112 (79) from 011000112
(99) using signed-magnitude arithmetic.
0112
⇐ borrows
0 1100011
(99)
0 -1001111
(79)
0 0010100
(20)
 We find 011000112 - 010011112 = 000101002 in
signed-magnitude representation.
Prepared by Dr. Hassan SALTI - 2012
39
Signed integer representation
Signed Magnitude (10/13)
 Example 2: Subtract 011000112 (99) from 010011112
(79) using signed-magnitude arithmetic.
• Here the subtrahend, 01100011, is larger than the
minuend, 01001111.
• With the result obtained in Example 2.12, we know that
the difference of these two numbers is 00101002.
• Because the subtrahend is larger than the minuend, all
that we need to do is change the sign of the difference.
• So we find 010011112 - 011000112 = 100101002 in signedmagnitude representation
Prepared by Dr. Hassan SALTI - 2012
40
Signed integer representation
Signed Magnitude (11/13)
 Example 3: Add 100100112 (-19) to 000011012 (+13)
using signed-magnitude arithmetic.
• The result is negative
• We subtract 13 from 19
• The result of the binary subtraction is: 100001102 (-6)
 Example 4: Subtract 100110002 (-24) from 101010112
(-43) using signed-magnitude arithmetic.
•
•
•
•
This is equivalent to adding -43 to 24
The result is negative
We subtract 24 from 43
The result of the binary subtraction is: 100100112 (-19)
Prepared by Dr. Hassan SALTI - 2012
41
Signed integer representation
Signed Magnitude (12/13)
 General rules when operands have different
signs
• Determine which operand has the larger
magnitude
• The sign of the result is the same as the sign of
the operand with the larger magnitude
• the magnitude must be obtained by subtracting
(not adding) the smaller one from the larger one
Prepared by Dr. Hassan SALTI - 2012
42
Signed integer representation
Signed Magnitude (13/13)
 Problems related to signed magnitude
• To much decisions to make (larger number? ;
borrows? ; what signs?).
• The number 0 could have two representations :
10000000 and 00000000.
• Complicated method
• Expensive circuits
Prepared by Dr. Hassan SALTI - 2012
43
Lecture Overview




Introduction
Positional Numbering System
Decimal to binary conversion
Signed integer representation
• Signed Magnitude
• Complement system
 Floating-point representation
Prepared by Dr. Hassan SALTI - 2012
44
Signed integer representation
Complement system (1/19)
 Complement system is used to
represent/convert negative numbers only
 When using complement system the
subtraction is converted to an addition
 Advantages of complement system
• Simplify computer arithmetic
• No need to process sign bits separately
• The sign of a number is easily checked by looking
at its high-order bit (MSB).
Prepared by Dr. Hassan SALTI - 2012
45
Signed integer representation
Complement system (2/19)
 In base 10, “Casting out 9s” was used to subtract
numbers
 Let’s say we wanted to find 167 - 52
• At first, 999 - 52 is calculated
999 – 52 = 947
• 947 is then added to 167 and the last carry is added
to the sum:
167 – 52 = 167 + 947 = 114 + 1 = 115
Carries:
1
+
1
1
1
6
7
9
4
7
1
1
4
Prepared by Dr. Hassan SALTI - 2012
a
46
Signed integer representation
Complement system (3/19)
 The last method uses a “diminished radix
complement”
 Working in base r (radix), the diminished radix is
given by : r-1
 Example: Base 10 ; r=10
• The diminished radix is r-1 = 10 - 1 = 9
• We say that a negative number is converted to its 9’s
complement
• For example, -246810 is converted to its nine’s
complement as follows: -246810 = 9999 - 2468 =
7531C9
Prepared by Dr. Hassan SALTI - 2012
47
Signed integer representation
Complement system (4/19)
 In a binary system r=2
• The diminished radix complement is r-1 = 1
• We say that we work in one’s complement (C1)
• To convert a negative number to its one’s
complement this number is subtracted from all ones
• A positive number is directly converted to its binary
representation
• Example:
- The one’s complement of 01012 is 11112 - 01012 = 1010C1
- It is nothing more than switching all of the 1s with 0s and
vice versa!!
Prepared by Dr. Hassan SALTI - 2012
48
Signed integer representation
Complement system (5/19)
 Example: Express 2310 and -910 in 8-bit binary
one’s complement form.
 2310 = + (000101112) = 00010111C1
 -910 = - (000010012) = 11110110C1
Prepared by Dr. Hassan SALTI - 2012
49
Signed integer representation
Complement system (6/19)
 In one’s compliment the subtraction is converted
into addition
• Example: 2310 – 910 = 2310 + (-910)
 Example: Add 2310 to -910 using 8-bit binary one’s
complement arithmetic.
Carries:
1
+
1
1
1
1
1
0
0
0
1
0
1
1
1
2310
1
1
1
1
0
1
1
0
+ (-910)
0
0
0
0
1
1
0
1
1410
 The result is 00001110C1 = +(000011102) = 1410
Prepared by Dr. Hassan SALTI - 2012
50
Signed integer representation
Complement system (7/19)
 Example: Add 910 to -2310 using 8-bit binary one’s
complement arithmetic.
 -2310 = - (00010111)2 = 11101000C1
 910 = + (000010012) = 00001001C1
 910 + (-2310) = 11101000C1 + 00001001C1
Carries:
0
+
0
0
0
1
0
0
0
0
0
0
0
1
0
0
1
1
1
1
0
1
0
0
0
1
1
1
1
0
0
0
1
910
+ (-2310)
-1410
 Result: 11110001C1 = -(000011102) = -1410
Prepared by Dr. Hassan SALTI - 2012
51
Signed integer representation
Complement system (8/19)
 In One’s complement, we still have two
representations for zero: 00000000 and
11111111
 Computer engineers long ago stopped using
one’s complement
 A more efficient representation for binary
numbers is the two’s complement
Prepared by Dr. Hassan SALTI - 2012
52
Signed integer representation
Complement system (9/19)
 Two’s complement is an example of a radix
complement
 No need to subtract one from the radix r when
working in a radix complement.
 Example: Base 10 ; r=10
• We say that a negative number is converted to its 10’s
complement
• For example, -246810 is converted to its ten’s
complement as follows: -246810 = 10000 - 2468 =
7532C10
Prepared by Dr. Hassan SALTI - 2012
53
Signed integer representation
Complement system (10/19)
 In a binary system r=2
•
•
•
•
The diminished radix r = 2
We say that we work in two’s complement
Consider “d” is the number of digits
To convert a negative number “N” to its two’s
complement this number is subtracted from rd =
2d : N10 = (2d – N)C2
• A positive number is directly converted to its
binary representation
Prepared by Dr. Hassan SALTI - 2012
54
Signed integer representation
Complement system (11/19)
 Example:
• In a 4 bits system: d=4;
• All negative numbers are converted by being
subtracted from 2d = 24 = 1610 = 100002
• The two’s complement of 00112 is 100002 - 00112
= 1101C2
• It is nothing more than one’s complement
incremented by 1!!
Prepared by Dr. Hassan SALTI - 2012
55
Signed integer representation
Complement system (12/19)
 Example: Express 2310, -2310, and -910 in 8-bit
binary two’s complement form.
• 2310 = + (000101112) = 000101112
• -2310 = -(000101112) = 111010002 + 1 = 111010012
• -910 = -(000010012) = 111101102 + 1 = 111101112
Prepared by Dr. Hassan SALTI - 2012
56
Signed integer representation
Complement system (13/19)
 Unlike C1 arithmetic, in C2 the last carry is
discarded
 Example 1: Add 910 to -2310 using two’s
complement arithmetic.
Carries:
0
+
0
0
0
1
0
0
1
0
0
0
0
1
0
0
1
1
1
1
0
1
0
0
1
1
1
1
1
0
0
1
0
910
+ (-2310)
-1410
 The result is 11110010C2 = -(000011102) = -1410
Prepared by Dr. Hassan SALTI - 2012
57
Signed integer representation
Complement system (14/19)
 Note how a negative binary number in C2 is
converted to decimal
• At first all 0 and 1 in the C2’s number are
switched: 11110010 → 00001101
• A “1” is then added to the last number:
00001101+1 = 00001110
• So 11110010C2 = -(000011102) = -1410
Prepared by Dr. Hassan SALTI - 2012
58
Signed integer representation
Complement system (15/19)
 Example 2: Find the sum of 2310 and -910 in
binary using two’s complement arithmetic.
 2310 = +(00010111)2 = 00010111C2
 -910 = -(000010012) = 11110111C2
 2310 + (-910) = 00010111C2 + 11110111C2
Carries:
1
+
1
1
1
0
1
1
1
0
0
0
1
0
1
1
1
2310
1
1
1
1
0
1
1
1
+ (-910)
0
0
0
0
1
1
1
0
-1410
 Result: 00001110C2 = +(000011102) = 1410
Prepared by Dr. Hassan SALTI - 2012
59
Signed integer representation
Complement system (16/19)
 Advantages of two’s complement
• It is the most popular choice for representing
signed numbers
• The algorithm for adding and subtracting is quite
easy
• It has the best representation for 0 (all 0 bits)
• It is self-inverting
• It is easily extended to larger numbers of bits.
Prepared by Dr. Hassan SALTI - 2012
60
Signed integer representation
Complement system (17/19)
 Drawback
• the asymmetry seen in the range of values that
can be represented by N bits.
• Examples:
- With signed-magnitude, 4 bits allow us to represent
the values -7 (11112) through +7 (01112).
- Using two’s complement, we can represent the values:
-8 (1000C2) through +7 (0111C2)
Prepared by Dr. Hassan SALTI - 2012
61
Signed integer representation
Complement system (18/19)
 Overflow in complement systems (C1 and C2)
• An overflow occurs if two positive numbers are
added and the result is negative
• or if two negative numbers are added and the
result is positive.
• It is not possible to have overflow when if a
positive and a negative number are being added
together.
Prepared by Dr. Hassan SALTI - 2012
62
Signed integer representation
Complement system (19/19)
 To Detect Overflow
• Check the last two carries
- If these are different: there is an overflow
- If these are equal: there is no overflow
 Example 1: Find the sum of 12610 and 810 in binary using
two’s complement arithmetic.
Carries: 0 1 1 1 1 1 1 0
+
0
1
1
1
1
1
1
0
0
0
0
0
1
0
0
0
1
0
0
0
0
1
1
0
12610
+
810
-1410
 The result is 10000110C2 = -(01111010)2 = -12210!!!
 Note that the last two carries are different
Prepared by Dr. Hassan SALTI - 2012
63
Lecture Overview





Introduction
Positional Numbering System
Decimal to binary conversion
Signed integer representation
Floating-point representation
• A simple model
• Floating-point arithmetic
• Floating point errors
Prepared by Dr. Hassan SALTI - 2012
64
Floating-point representation
Floating-point representation (1/1)
 A computer is supposed to solve all problems
 Huge and fractional numbers and complicated
mathematical operations could be involved
 An optimized solution to give a good ratio:
“Biggest Number/word size” is the Floating
point representation
Prepared by Dr. Hassan SALTI - 2012
65
 Computers use a form of scientific notation
for floating-point representation
 Numbers written in scientific notation have
three components:
 Scientific notation in base 10:
+
0.579
x
107
 Scientific notation in base 2:
+
0.101101 x
Prepared by Dr. Hassan SALTI - 2012
23
66
Floating-point representation
A simple model (1/8)
 In digital computers, floating-point numbers
consist of three parts:
• A sign bit,
• an exponent part: representing the exponent on a
power of 2,
• a fractional part called a significand: which is a
fancy word for a mantissa.
Prepared by Dr. Hassan SALTI - 2012
67
Floating-point representation
A simple model (2/8)
 More bits used for the exponent increases the
range of numbers
 More bits used for the significant increases
the precision
 For simplicity, in all this course, we will use a
simplified 14 bits model
• Sign bit: 1 bit
• Exponent: 5 bits
• Significand: 8 bits
Prepared by Dr. Hassan SALTI - 2012
68
Floating-point representation
A simple model (3/8)
 Exercise 1: Represent the number 17 in a 14 bits
floating point representation
• 17 = 17.0 x 100 = 1.7 x 101 = 0.17 x 102
• Analogically in binary:
• 1710= 100012 x 20 = 1000.12 x 21= 100.012 x 22 =
10.0012 x23 = 1.00012 x 24 = 0.100012 x 25 = 0.0100012
x 26 = 0.00100012 x 27 = ...
• As a convention, we stop when the MSB of the
significant is “1”: 0.100012 x 25
• The exponent is 510 = 001012
• The significant is: 100012 → 100010002
• So: 0 0 0 1 0 1 1 0 0 0 1 0 0 0
Prepared by Dr. Hassan SALTI - 2012
69
Floating-point representation
A simple model (4/8)
 The last floating point representation is not
suitable for negative exponents
• Example:
- the number 0.25 = 0.012 = 0.12 x 2-1
- How to represent the negative exponent -1?!
 To solve such problems we use an excess-16 bias
• All negative and positive exponents are added by 16
• We say that the real exponent is replaced by a biased
exponent
• All exponents are converted to positive biased
exponents
Prepared by Dr. Hassan SALTI - 2012
70
Floating-point representation
A simple model (5/8)
 With an excess-16 bias
• Exponent values less than 16 will indicate
negative exponent values
• Exponent values more than 16 will indicate
positive exponent values
• exponents of all zeros or all ones are typically
reserved for special numbers (such as zero or
infinity).
Prepared by Dr. Hassan SALTI - 2012
71
Floating-point representation
A simple model (6/8)
 Example 1: Represent the number 17 in a 14
bits floating point form with excess-16 bias
•
•
•
•
•
The number is positive: sign bit is “0”
1710= 0.100012 x 25
The exponent is 510 → (5+16)10 = 2110 = 101012
The significant is: 100012 → 100010002
So 17 in floating point form with excess-16 bias is:
0
1
0
1
0
1
1
0
Prepared by Dr. Hassan SALTI - 2012
0
0
1
0
0
0
72
Floating-point representation
A simple model (7/8)
 Example 2: Represent the number 0.2510 in a
14 bits floating point form with excess-16
bias.
•
•
•
•
•
The number is positive: sign bit is “0”
0.25 = 0.012 x 20 = 0.12 x 2-1
The exponent is -110 → (-1+16)10 = 1510 = 011112
The significant is 1 → 10000000
So 0.25 in floating point form with excess-16 bias
is:
0
0
1
1
1
1
1
0
Prepared by Dr. Hassan SALTI - 2012
0
0
0
0
0
0
73
Floating-point representation
A simple model (8/8)
 Example 3: Express -0.0312510 in normalized
floating-point form with excess-16 bias.
• The number is negative: sign bit is “1”
• 0.0312510 = 0.000012 = 0.00001x20 = 0.0001x2-1 =
… = 0.1x2-4
• The exponent is -410 → (-4+16)10 = 1210 = 011002
• The significant is 1 → 10000000
• So -0.03125 in floating point form with excess-16
bias is:
1
0
1
1
0
0
1
0
Prepared by Dr. Hassan SALTI - 2012
0
0
0
0
0
0
74
Lecture Overview





Introduction
Positional Numbering System
Decimal to binary conversion
Signed integer representation
Floating-point representation
• A simple model
• Floating-point arithmetic
• Floating point errors
Prepared by Dr. Hassan SALTI - 2012
75
Floating-point representation
Floating point arithmetic (1/2)
 To add/subtract two numbers in floating point
form
• Both numbers should have the same exponent
• If exponents are different
1. we change one of the numbers so that both of them
are expressed in the same power of the base
2. We add the binary numbers
3. We represent the result in a normalized floating
point form
Prepared by Dr. Hassan SALTI - 2012
76
Floating-point representation
Floating point arithmetic (2/2)

Example: Add the following binary numbers as represented in a normalized 14-bit
format with an excess-16 bias.
1810 → 210
+
0
1
0
0
1
0
1
1
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
1
1
0
1
0
1
0
1610 → 010





The second number is 0.10011010x20
The first number is 0.11001000x22 = 11.001000x20
Now 0.100110102 + 11.0010002 :
0.1 0 0 1 1 0 1 0
+ 1 1.0 0 1 0 0 0 0 0
1 1.1 0 1 1 1 0 1 0
The result is 11.10111010 x 20 = 0.1110111010 x 22
In floating point form with excess-16
0
1
0
0
1
0
1
1
Prepared by Dr. Hassan SALTI - 2012
1
0
1
1
77
Lecture Overview





Introduction
Positional Numbering System
Decimal to binary conversion
Signed integer representation
Floating-point representation
• A simple model
• Floating-point arithmetic
• Floating point errors
Prepared by Dr. Hassan SALTI - 2012
78
Floating-point representation
Floating Point Errors (1/2)
 Computers are finite systems
 When dealing with floating-point form, we are
modeling the infinite system of real numbers in a finite
system of integers
 What we have, in truth, is an approximation of the real
number system
 The more bits we use, the better the approximation
 However, there is always some element of error
 Such errors can propagate through a lengthy
calculation, causing substantial loss of precision
Prepared by Dr. Hassan SALTI - 2012
79
Floating-point representation
Floating Point Errors (2/2)
 Example:
• In our previous simple model
- we are limited between -0.111111112x215 through
+0.111111112x215.
- we cannot store 2x-19 or 2128; they simply don’t fit.
- Also, 128.5 cannot be accurately stored even if it is well
within our range
→ 128.510 = 10000000.12 = 0.1000000012x28
→ The significant is expressed with more than 8 bits!
→ In practice we store only the first 8 bits: 10000000
→ We actually store 128 and not 128.5 with an absolute error of
0.5
→ The relative error is : 128.5 - 128 = 0.0038910 = 0.39%.
128.5
Prepared by Dr. Hassan SALTI - 2012
80
End of lecture 2
Try to solve all exercises related to lecture 2