HO-04 ch 3 F09

Download Report

Transcript HO-04 ch 3 F09

EGRE 426
Fall 09
Chapter Three
1
Arithmetic
•
What's up ahead:
– Implementing the Architecture
operation
a
32
ALU
result
32
b
32
2
Numbers
•
•
•
•
Bits are just bits (no inherent meaning)
— conventions define relationship between bits and numbers
Binary numbers (base 2)
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001...
decimal: 0...2n-1
Of course it gets more complicated:
numbers are finite (overflow)
fractions and real numbers
negative numbers
e.g., no MIPS subi instruction; addi can add a negative number)
How do we represent negative numbers?
i.e., which bit patterns will represent which numbers?
3
Possible Representations
•
Sign Magnitude:
000 = +0
001 = +1
010 = +2
011 = +3
100 = -0
101 = -1
110 = -2
111 = -3
•
•
One's Complement
Two's Complement
000 = +0
001 = +1
010 = +2
011 = +3
100 = -3
101 = -2
110 = -1
111 = -0
000 = +0
001 = +1
010 = +2
011 = +3
100 = -4
101 = -3
110 = -2
111 = -1
Issues: balance, number of zeros, ease of operations
Which one is best? Why?
4
MIPS
•
32 bit signed numbers:
0000
0000
0000
...
0111
0111
1000
1000
1000
...
1111
1111
1111
0000 0000 0000 0000 0000 0000 0000two = 0ten
0000 0000 0000 0000 0000 0000 0001two = + 1ten
0000 0000 0000 0000 0000 0000 0010two = + 2ten
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1110two
1111two
0000two
0001two
0010two
=
=
=
=
=
+
+
–
–
–
2,147,483,646ten
2,147,483,647ten
2,147,483,648ten
2,147,483,647ten
2,147,483,646ten
maxint
minint
1111 1111 1111 1111 1111 1111 1101two = – 3ten
1111 1111 1111 1111 1111 1111 1110two = – 2ten
1111 1111 1111 1111 1111 1111 1111two = – 1ten
5
Two's Complement Operations
•
Negating a two's complement number: invert all bits and add 1
– Easier rule: Start at least significant bit. Copy through fist “1”. Then
invert each bit.
• Example 0010101100
1101010100
– remember: “negate” and “invert” are quite different!
•
Converting n bit numbers into numbers with more than n bits:
– MIPS 16 bit immediate gets converted to 32 bits for arithmetic
– copy the most significant bit (the sign bit) into the other bits
0010
-> 0000 0010
1010
-> 1111 1010
– "sign extension" (lbu vs. lb) Load byte
lbu rt,ofst(rs) # rt  024||MB(rs + ofst) or rt  MB(rs + ofst)
lb rt,ofst(rs) # rt  MB(rs + ofst;7)24||MB(rs + ofst) or rt  MB(rs + ofst)
6
Addition & Subtraction
•
Just like in grade school (carry/borrow 1s)
0111 ( 7)
0111 ( 7)
0110 ( 6)
+ 0110 ( 6)
- 0110 (-6)
- 0101 (-5)
•
Two's complement operations easy
– subtraction using addition of negative numbers
0111 ( 7)
+ 1010 (-6)
•
Overflow (result too large for finite computer word):
– e.g., adding two n-bit numbers does not yield an n-bit number
0111 ( 7)
+ 0001 ( 1)
note that overflow term is somewhat misleading,
1000 (-8)
it does not mean a carry “overflowed”
7
Detecting Overflow
•
•
•
•
No overflow when adding a positive and a negative number
No overflow when signs are the same for subtraction
Overflow occurs when the value affects the sign:
– overflow when adding two positives yields a negative
– or, adding two negatives gives a positive
– or, subtract a negative from a positive and get a negative
– or, subtract a positive from a negative and get a positive
Consider the operations A + B, and A – B
– Can overflow occur if B is 0 ?
– Can overflow occur if A is 0 ?
8
Two’s Complement Arithmetic
Representation of 4 bit words in two’s complement..
Decimal
value
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
Binary
-1000
-0111
-0110
-0101
-0100
-0011
-0010
-0001
0000
0001
0010
0011
0100
0101
0110
0111
Two’s
complement
1000
1001
1010
1011
1100
1101
1110
1111
0000
0001
0010
0011
0100
0101
0110
0111
9
10
When does overflow occur? Let S be the sign bit (bit 3) of the
result, let Co be the carry out of the sign bit, Let SA and SB be the
sign bits of the two numbers added. Then form the previous table
we can construct a Karnaough map for overflow.
Thus, overflow is given by . Using Co and Cin, by inspection Ov = Cin  Co
Ov  S A S B S  S A S B S
11
Effects of Overflow
•
•
•
An exception (interrupt) occurs
– Control jumps to predefined address for exception
– Interrupted address is saved for possible resumption
Details based on software system / language
– example: flight control vs. homework assignment
Don't always want to detect overflow
— new MIPS instructions: addu, addiu, subu
note: addiu ignores overflow. still sign-extends!
note: sltu, sltiu for unsigned comparisons (no sign-extension)
12
Review: Boolean Algebra & Gates
See Appendix B
•
Problem: Consider a logic function with three inputs: A, B, and C.
Output D is true if at least one input is true
Output E is true if exactly two inputs are true
Output F is true only if all three inputs are true
•
Show the truth table for these three functions.
•
Show the Boolean equations for these three functions.
•
Show an implementation consisting of inverters, AND, and OR gates.
13
Review: The Multiplexor
•
Selects one of the inputs to be the output, based on a control input
S
A
0
B
1
C
note: we call this a 2-input mux
even though it has 3 inputs!
•
Lets build our ALU using a MUX:
14
An ALU (arithmetic logic unit)
•
Let's build an ALU to support the and and or instructions
– we'll just build a 1 bit ALU, and use 32 of them
operation
a
result
b
•
Possible Implementation (sum-of-products):
15
Different Implementations
•
•
Not easy to decide the “best” way to build something
– Don't want too many inputs to a single gate
– Don't want to have to go through too many gates
– for our purposes, ease of comprehension is important
Let's look at a 1-bit ALU for addition:
CarryIn
a
Sum
b
CarryOut
cout = a b + a cin + b cin
sum = a xor b xor cin
•
How could we build a 1-bit ALU for add, and, and or?
•
How could we build a 32-bit ALU?
16
Building a 32 bit ALU
CarryIn
Operation
Operation
a0
CarryIn
b0
CarryIn
ALU0
Result0
CarryOut
a
0
a1
b1
CarryIn
ALU1
Result1
CarryOut
1
Result
a2
b2
CarryIn
ALU2
Result2
CarryOut
2
b
CarryOut
a31
b31
CarryIn
ALU31
Result31
17
What about subtraction (a – b) ?
•
•
Two's complement approach: just negate b and add.
How do we negate?
Binvert
Operation
CarryIn
•
A very clever solution:
a
0
1
b
0
Result
2
1
CarryOut
18
Tailoring the ALU to the MIPS
•
Need to support the set-on-less-than instruction (slt)
– remember: slt is an arithmetic instruction
– produces a 1 if rs < rt and 0 otherwise
– use subtraction: (a-b) < 0 implies a < b
•
Need to support test for equality (beq $t5, $t6, $t7)
– use subtraction: (a-b) = 0 implies a = b
19
Supporting slt
Binvert
Operation
CarryIn
a
0
•
Can we figure out the idea?
1
Result
b
If a < b then a – b < 0 and set i.e.
(a-b) Bit 31 = 1 else set = 0.
0
2
1
Less
3
a.
CarryOut
Binvert
Operation
CarryIn
a
0
1
Result
b
0
2
1
Less
3
Set
Overflow
detection
b.
Overflow
Binvert
CarryIn
a0
b0
CarryIn
ALU0
Less
CarryOut
a1
b1
0
CarryIn
ALU1
Less
CarryOut
a2
b2
0
CarryIn
ALU2
Less
CarryOut
Operation
Result0
Result1
Result2
CarryIn
a31
b31
0
CarryIn
ALU31
Less
Result31
Set
Overflow
21
Test for equality
•
Bnegate
Operation
a0
b0
CarryIn
ALU0
Less
CarryOut
Result0
a1
b1
0
CarryIn
ALU1
Less
CarryOut
Result1
a2
b2
0
CarryIn
ALU2
Less
CarryOut
Result2
a31
b31
0
CarryIn
ALU31
Less
Notice control lines:
000
001
010
110
111
=
=
=
=
=
and
or
add
subtract
slt
Zero
•Note: zero is a 1 when the result is zero!
Result31
Set
Overflow
22
23
Conclusion
•
We can build an ALU to support the MIPS instruction set
– key idea: use multiplexor to select the output we want
– we can efficiently perform subtraction using two’s complement
– we can replicate a 1-bit ALU to produce a 32-bit ALU
•
Important points about hardware
– all of the gates are always working
– the speed of a gate is affected by the number of inputs to the gate
– the speed of a circuit is affected by the number of gates in series
(on the “critical path” or the “deepest level of logic”)
•
Our primary focus: comprehension, however,
– Clever changes to organization can improve performance
(similar to using better algorithms in software)
– we’ll look at two examples for addition and multiplication
24
Lab 3 – Build modified one bit ALU in VHDL
Binvert
Operation
CarryIn
g
a
0
1
Result
b
0
2
1
Less
3
p
a.
CarryOut
You should fully document your code, and in the lab report briefly explain the decisions you
made. Include in your lab report simulation waveforms to verify the ALU1 operation, and
before 3:00 p.m. on 9/21/09 email to [email protected] your source code for the lab. Use
“EGRE426 Lab 3-your name” as the subject line. The correct simulation waveform is shown
Binvert
Operation
below. Turn in your
lab report at the beginning
of class on 9/22/09.
CarryIn
a
0
25
ARCHITECTURE TEST OF TB IS
SIGNAL a, b, cin, cout, g, p: std_logic := '0';
SIGNAL ainv, binv, less, r_and, r_or, r_add, r_less: std_logic := '0';
BEGIN
less <= not less after 10 ns;
cin <= not cin after 20 ns;
a <= not a after 40 ns;
b <= not b after 80 ns;
binv <= not binv after 160 ns;
ainv <= not ainv when (binv'event and binv = '0');
alu1_and: entity alu1 PORT MAP (a,b,ainv,binv,less,"00",cin,R_and);
alu1_or: entity alu1 PORT MAP (a,b,ainv,binv,less,"01",cin,R_or);
alu1_add: entity alu1 PORT MAP (a,b,ainv,binv,less,"10",cin,R_add, cout,p,g);
alu1_less: entity alu1 PORT MAP (a,b,ainv,binv,less,"11",cin,R_less, cout,p,g);
process begin
wait on less, cin, a, b, binv; -- Wait for input to change
wait for 9 ns;
-- Let settle
assert r_and = (a and b) report "AND error.";
assert r_or = (a or b) report "OR error.";
assert r_add = ((a xor b) xor cin) report "ADD error.";
assert (r_less = less) report "LESS error";
end process;
END ARCHITECTURE TEST;
26
Problem: ripple carry adder is slow
•
•
Is a 32-bit ALU as fast as a 1-bit ALU?
Is there more than one way to do addition?
– two extremes: ripple carry and sum-of-products
Can you see the ripple? How could you get rid of it?
c1
c2
c3
c4
=
=
=
=
b0c0
b1c1
b2c2
b3c3
+
+
+
+
a0c0
a1c1
a2c2
a3c3
+
+
+
+
a0b0
a1b1
a2b2
a3b3
c2 =
c3 =
c4 =
Not feasible! Why?
27
Carry-lookahead adder See B-39
•
•
•
c1
c2
c3
c4
An approach in-between our two extremes
Motivation:
– If we didn't know the value of carry-in, what could we do?
– When would we always generate a carry?
gi = ai bi
– When would we propagate the carry?
pi = a i + bi
Did we get rid of the ripple?
=
=
=
=
g0
g1
g2
g3
+
+
+
+
p0c0
p1c1
p2c2
p3c3
Feasible! Why?
28
Use principle to build bigger adders
CarryIn
a0
b0
a1
b1
a2
b2
a3
b3
CarryIn
Result0--3
ALU0
P0
G0
pi
gi
Carry-lookahead unit
C1
a4
b4
a5
b5
a6
b6
a7
b7
a8
b8
a9
b9
a10
b10
a11
b11
a12
b12
a13
b13
a14
b14
a15
b15
•
•
•
ci + 1
Result4--7
ci + 2
CarryIn
Result8--11
ALU2
P2
G2
pi + 2
gi + 2
C3
ci + 3
CarryIn
P1  p7  p6  p5  p4
P3  p15  p14  p13  p12
G 0  g3  p3 g 2  p3 p2 g1  p3 p2 p1 g0
pi + 1
gi + 1
C2
P0  p3  p2  p1  p0
P 2  p11  p10  p9  p8
CarryIn
ALU1
P1
G1
Can’t build a 16 bit adder this way... (too big)
Could use ripple carry of 4-bit CLA adders
Better: use the CLA principle again!
G1  g7  p7 g6  p7 p6 g5  p7 p6 p5 g 4
G 2  g11  p11g10  p11 p10 g9  p11 p10 p9 g8
G 3  g15  p15 g14  p15 p14 g13  p15 p14 p13g12
C1  G 0  P0c0
Result12--15
ALU3
P3
G3
pi + 3
gi + 3
C4
CarryOut
ci + 4
C 2  G1  P1G 0  P1P0c0
C 3  G 2  P 2G1  P 2 P1G 0  P 2 P1P0c0
C 4  G 3  P3G 2  P3P 2G1  P3P 2 P1G 0  P3P 2 P1P0c0
See p B44-B45
29
Multiplication
•
•
•
•
More complicated than addition
– accomplished via shifting and addition
More time and more area
Let's look at 3 versions based on gradeschool algorithm
0010 (multiplicand)
__x_1011 (multiplier)
0010
0010
0000
0010
.
0010110
Negative numbers: convert and multiply
– there are better techniques, we won’t look at them
30
Multiplication: Implementation
0010
x_1011
0010
0010
0000
0010
0010110
Start
(multiplicand
Multiplier0 = 1
(multiplier)
1011
0101
0010
0001
1. Test
Multiplier0
Multiplier0 = 0
1a. Add multiplicand to product and
place the result in Product register
2. Shift the Multiplicand register left 1 bit
Multiplicand
Shift left
3. Shift the Multiplier register right 1 bit
64 bits
Multiplier
Shift right
64-bit ALU
32 bits
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions
Done
Product
Write
Control test
64 bits
Two 64 bit registers, one 32 bit reg, ALU is 64 bits wide.
31
Second Version
Start
Multiplier0 = 1
1. Test
Multiplier0
Multiplier0 = 0
1a. Add multiplicand to the left half of
the product and place the result in
the left half of the Product register
Multiplicand
32 bits
2. Shift the Product register right 1 bit
Multiplier
Shift right
32-bit ALU
3. Shift the Multiplier register right 1 bit
32 bits
Product
Shift right
Write
32nd repetition?
No: < 32 repetitions
Control test
Yes: 32 repetitions
64 bits
Done
One 64 bit reg, two 32 bit reg, 32 bit ALU.
32
Final Version
Start
Product0 = 1
1. Test
Product0
Product0 = 0
1a. Add multiplicand to the left half of
the product and place the result in
the left half of the Product register
Multiplicand
32 bits
2. Shift the Product register right 1 bit
32-bit ALU
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions
Product
Shift right
Write
Control
test
Done
64 bits
One 32 bit reg, one 64 bit reg, 32 bit ALU
33
Faster Multiplication: See Booths algorithm
for a faster serial approach. For a fully
parallel approach consider:
Worst case delay for S5 = 7 full adders.
34
Carry Save Adders are a faster approach.
Worst case delay for S5 = 6 FA’s, faster using one CLA.
Other techniques exist.
35
Floating Point (a brief look)
•
We need a way to represent
– numbers with fractions, e.g., 3.1416
– very small numbers, e.g., .000000001
– very large numbers, e.g., 3.15576  109
•
Representation:
– sign, exponent, significand:
(–1)sign  significand  2exponent
– more bits for significand gives more accuracy
– more bits for exponent increases range
•
IEEE 754 floating point standard:
– single precision: 8 bit exponent, 23 bit significand
– double precision: 11 bit exponent, 52 bit significand
36
IEEE 754 floating-point standard
•
Leading “1” bit of significand is implicit
•
Exponent is “biased” to make sorting easier
– all 0s is smallest exponent all 1s is largest
– bias of 127 for single precision and 1023 for double precision
– summary: (–1)sign  (1significand)  2exponent – bias
•
Example:
– decimal: -.75 = -3/4 = -3/22
– binary: -.11 = -1.1 x 2-1
– floating point: exponent = 126 = 01111110
–
–
–
–
–
–
IEEE single precision: 1,01111110,10000000000000000000000
Example:
1,10000001,0100000000000000000000 = ?
Negative, exponent of 129 – 127 = +2,
Significan = 1.012
-1.012x22 = -101.02 = -5.0
37
Floating point addition
Sign
Exponent
Fraction
Sign
Start
Exponent
Fraction
•
Small ALU
2. Add the significands
Exponent
difference
0
1. Compare the exponents of the two numbers.
Shift the smaller number to the right until its
exponent would match the larger exponent
1
0
1
0
1
3. Normalize the sum, either shifting right and
incrementing the exponent or shifting left
and decrementing the exponent
Shift right
Control
Overflow or
underflow?
Big ALU
Yes
No
0
0
1
Increment or
decrement
Exception
1
4. Round the significand to the appropriate
number of bits
Shift left or right
Rounding hardware
No
Still normalized?
Sign
Exponent
Fraction
Yes
Done
38
Floating Point Complexities
•
Operations are somewhat more complicated (see text)
•
In addition to overflow we can have “underflow”
•
Accuracy can be a big problem
– IEEE 754 keeps two extra bits, guard and round
– four rounding modes
– positive divided by zero yields “infinity”
– zero divide by zero yields “not a number”
– other complexities
•
•
Implementing the standard can be tricky
Not using the standard can be even worse
– see text for description of 80x86 and Pentium bug!
39
Chapter Three Summary
•
•
•
•
•
Computer arithmetic is constrained by limited precision
Bit patterns have no inherent meaning but standards do exist
– two’s complement
– IEEE 754 floating point
Computer instructions determine “meaning” of the bit patterns
Performance and accuracy are important so there are many
complexities in
real machines (i.e., algorithms and implementation).
We are ready to move on (and implement the processor)
40