Chapter 8 BCD ---L2 - Department of Computer Science

Download Report

Transcript Chapter 8 BCD ---L2 - Department of Computer Science

Chapter 8 Problems
Prof. Sin-Min Lee
Department of Mathematics and
Computer Science
Addition And Subtraction
• For both the non-negative and two’s complement
notations, addition and subtraction are fairly
straightforward.
• However, when the result cannot be represented as
an 8-bit value, a problem arises. For the nonnegative notation, consider the addition 255+1,
1111 1111 + 0000 0001. Straight binary addition
yields the result 1 0000 0000, a 9-bit value, which
cannot be stored in an 8-bit register.
Overflow
• Arithmetic overflow: The extra bit generates a
carry out from the parallel adder. In non-negative
notation, this carry bit can set an overflow flag,
signaling the rest of the system that an overflow
has occurred, and that the result generated is not
entirely correct. The rest of the system can either
fix the result of handle the error appropriately.
Overflow
• In two’s complement notation an overflow can
occur at either end of the numeric range. At the
positive end, adding 127+1, 0111 1111 + 0000
0001, yields a result of 1 0000 0000. However, in
two’s complement notation, this is -128, not the
desired value of +128. In this notation, the key to
recognizing overflow is to check not only the
carry out, but also the carry in to the most
significant bit of the result. If two carries are
equal, then there is no overflow.
Overflow
• Overflow only occurs when two numbers with the
same sign are added. Adding two numbers with
different signs always produces a valid result.
Overflow generation in unsigned
two’s complement addition
•
•
•
•
126
+1
01111110
00000001
01111111
00
•
•
•
•
127
+1
01111111
00000001
10000000
01
• -127
• +(-1)
•
•
10000001
11111111
10000000
11
•
•
•
•
-128
+(-1)
10000000
11111111
01111111
10
Whether
there is an overflow or not depends on
the interpretation (signed or unsigned number).
each partial product is the
product of one bit of the
multiplier times the
multiplicand. There are as
many partial products as
there are bits in the
multiplier, and there are as
many bits in each partial
product as there are bits in
the multiplicand.
In looking for an algorithmic statement of this
approach to binary multiplication, it was found
that a group of Russian peasants used precisely
this method to multiply decimal numbers, and as
a result, the binary multiplication algorithm given
here is commonly known as the Russian Peasant
Algorithm:
The best of these fast multiplication
algorithms was developed by HP for the
HP PA RISC architecture. In developing
the first generation of this architecture,
the HP designers concluded that a
hardware multiply instruction was not
justified, because most multiplies are
multiplies by constants and can be
replaced by add-and-shift instructions (as
on the Hawk) and because the rare
multiply of one variable by another could
be done quickly enough in software.
The HP PA RISC multiply algorithm is based on the following
notions:
1. First, do the multiply in base 16, so each step involves
multiplying the multiplicand by the least significant 4 bits of
the multiplier and then adding the partial product to the result.
2. To multiply the multiplicand by one hex digit of the multiplier,
use an efficient case/select control structure to select one of 16
different blocks of code for multiplying by a constant. Each of
these blocks will be only a few instructions long because of the
use of add-and-shift instructions.
3. To avoid the cost of loop control instructions, note that the
fixed word size implies that each multiply can be done in a
fixed number of iterations (4 hex digits in a 16 bit multiplier, or
8 hex digits in a 32 bit multiplier). So, just write out the body of
the loop 4 or 8 times in series and eliminate any need for loop
counters.
What are BCD Numbers?
• Binary Coded Decimal numbers are actually
binary numbers that are "coded" to represent
decimal numbers
• Coded numbers are "not real numbering systems".
In fact, they are just what they say they are, "codes
that represent actual numbers". Although the
actual numbers can be mathematically
manipulated, codes follow no such rules.
Example 1
Example 2.
BCD (365)_10 -------------> 0011 | 0110 | 0101
Addition
• Addition is analogous to decimal addition
with normal binary addition taking place
from right to left. For example,
Where the result of any addition exceeds 9(1001) then six (0110) must be
added to the sum to account for the six invalid BCD codes that are
available with a 4-bit number. This is illustrated in the example below
• When one adds two BCD digits,
• if the binary sum is less than 1010, the
corresponding BCD sum digit is correct.
• if the binary sum is greater than or equal to
1010, add (0110)2 to the corresponding
BCD sum digit and produce a carry.
8.4.1 Pipelining
1. Data enters a stage of the pipeline, and it
will through go through different stages
of arithmetic operation till the final
computation is completed.
Note: Each stage only perform its specific
function.
2. Pipeline improves the performance by
overlapping computation; that is, each stage
operate on different data simultaneously.
Compare process time between
Pipeline and Non-Pipeline
Consider this code:
For i = 1 to 100 Do {A[i] (B[i] * C[i]) + D[i]}
(assume multiplication & Addition require 10ns)
Non-pipeline: 10ns * 2 * 100 = 2000ns
10ns Stage 1: B[1] * C[1]
10ns Stage 2: Add above
value and D[1] to
A[1]
Stage 1: B[2] * C[2]
10ns Stage 2: Calculate
A[2]
Stage 1: B [3] * C[3]
An Example on Arithmetic Pipeline
• Consider the following snippet code:
FOR I = 1 to 100 DO {A[I](B[I] . C[I]) + D[I]}
•
•
Assume that each operation, multiplication and addition,
requires 10 ns to complete. A non-pipelined uniprocessor
take 20 ns to calculate A[I], and 2000 ns to execute the
code.
A pipeline unit could break this computation into two
stages as shown in the next slide.
A two-stage pipeline to implement the
program loop
B[i]
*
Latch
C[i]
+
CLK
Latch
D[i]
Latch
CLK
Stage 1 (*)
Stage 2 (+)
Example: A two-stage pipeline to implement the program
loop
During the first 10 ns, the
first stage calculates
B[1].C[1]. In the next 10
ns, stage 2 adds this
value to D[1]and stores
the result in A[1]. At the
same time, stage 1
multiplies B[2] and C[2].
During the following 10
ns, stage 1 forms
B[3].C[3] and stage 2
calculate the final value
A[2]. Instead of 2000 ns
which a non-pipelined
uniprocessor would take,
this pipeline executes the
code in 1010 ns.
Speedup for Pipeline
A pipeline’s speedup Sn is the time needed to
process n pieces of data using non-pipeline
(T1), divided by the time needed to process
same data using k-stage pipeline (Tk).
Speedup of pipeline can expressed as:
Sn =
nT1
(k + n -1) Tk
Continue:
Apply Speedup expression using previous
T1, is the time to
nexample:
S100 =
100 * 20ns
(2 + 100 - 1) * 10ns
= 1.98
to process n pieces
data using nonpipeline (* and +).
Tk
k = Two stages (* and +)
Speed up
• The most popular metrics used to measure the performance of a
pipeline are throughput (the number of results generated per time unit)
and speedup.
• A pipeline’s speedup, Sn, is the amount of time needed to process n
pieces of data using a non-pipelined arithmetic unit, divided by the
time need to process the same data using a k-stage pipeline.
Sn = nT1 / (k + n - 1) x Tk
Calculating Speedup
Sn = nT1 / (k + n -1) x Tk
T1 : the time required to calculate using non-pipeline
Tk : is the the clock period of the k-stage pipeline
The pipeline unit requires k time units, each of duration Tk, to move the
first piece of data to the pipeline. Using the last example, the speedup
can be calculated as followed:
S100 = 100x20ns / (2 + 100 - 1) x 10ns
= 1.98
As n approaches infinity, n / (k + n - 1) approaches 1, so S = T1 / Tk
The maximum speedup occurs when each stage has the same delay.
8.4.2 Lookup Tables
1. Theoretically, any combinatorial circuit can
be implemented by a ROM configured as a
lookup table.
2. The ROM is programmed with data such
that the correct values are output for any
possible input values.
3. A 4x1 ROM is like a two-input AND gate.
The input of the AND gate serves as the
address input for ROM, and the output of
ROM corresponds to the AND gate output.
Lookup ROM equivalent
to AND gate
x
A1
4x1
ROM D0
y
x
y
Address
x&y
A0
AND
x&y
0
0
1
1
0
1
0
1
Data
0
0
0
1
All possible And gate
inputs are stored in ROM