Lec4-alu - ECE Users Pages - Georgia Institute of Technology
Download
Report
Transcript Lec4-alu - ECE Users Pages - Georgia Institute of Technology
ECE3055
Computer Architecture and
Operating Systems
Lecture 4 Numbers and ALU
Prof. Hsien-Hsin Sean Lee
School of Electrical and Computer Engineering
Georgia Institute of Technology
H.-H. S. Lee
1
Arithmetic
Where we've been:
Performance (seconds, cycles, instructions)
Abstractions:
Instruction Set Architecture
Assembly Language and Machine Language
What's up ahead:
Implementing the Architecture
operation
a
32
ALU
result
32
b
32
2
Numbers
Bits are just bits (no inherent meaning) — conventions
define relationship between bits and numbers
Binary numbers (base 2)
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001...
decimal: 0...2n-1
Of course it gets more complicated:
numbers are finite (overflow)
fractions and real numbers
negative numbers (e.g., no MIPS subi instruction; addi can
add a negative number)
How do we represent negative numbers? i.e., which
bit patterns will represent which numbers?
3
Possible Representations
Sign Magnitude:
000 = +0
001 = +1
010 = +2
011 = +3
100 = -0
101 = -1
110 = -2
111 = -3
One's Complement
000 = +0
001 = +1
010 = +2
011 = +3
100 = -3
101 = -2
110 = -1
111 = -0
Two's Complement
000 = +0
001 = +1
010 = +2
011 = +3
100 = -4
101 = -3
110 = -2
111 = -1
Issues: balance, number of zeros, ease of operations
Which one is best? Why?
4
MIPS
32 bit signed numbers:
0000
0000
0000
...
0111
0111
1000
1000
1000
...
1111
1111
1111
0000 0000 0000 0000 0000 0000 0000two = 0ten
0000 0000 0000 0000 0000 0000 0001two = + 1ten
0000 0000 0000 0000 0000 0000 0010two = + 2ten
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1110two
1111two
0000two
0001two
0010two
=
=
=
=
=
+
+
–
–
–
2,147,483,646ten
2,147,483,647ten
2,147,483,648ten
2,147,483,647ten
2,147,483,646ten
maxint
minint
1111 1111 1111 1111 1111 1111 1101two = – 3ten
1111 1111 1111 1111 1111 1111 1110two = – 2ten
1111 1111 1111 1111 1111 1111 1111two = – 1ten
5
Two's Complement Operations
Negating a two's complement number: invert all bits
and add 1
remember: “negate” and “invert” are quite different!
Converting n bit numbers into numbers with more than
n bits:
MIPS 16 bit immediate gets converted to 32 bits for
arithmetic
copy the most significant bit (the sign bit) into the other bits
0010
-> 0000 0010
1010
-> 1111 1010
"sign extension" (lbu vs. lb)
6
Addition & Subtraction
Just like in grade school (carry/borrow 1s, assume
unsigned)
+
0111
0111
0110
0110
- 0110
- 0101
Two's complement operations easy
subtraction using addition of negative numbers
0111
+ 1010
Overflow (result too large for finite computer word):
e.g., adding two n-bit numbers does not yield an n-bit
number
0111
+ 0001
1000
note that overflow term is somewhat misleading,
it does not mean a carry “overflowed”
7
Detecting Overflow
No overflow when adding a positive and a negative
number
No overflow when signs are the same for subtraction
Overflow occurs when the value affects the sign:
overflow when adding two positives yields a negative
or, adding two negatives gives a positive
or, subtract a negative from a positive and get a negative
or, subtract a positive from a negative and get a positive
Consider the operations A + B, and A – B
Can overflow occur if B is 0 ?
Can overflow occur if A is 0 ?
8
Overflow Detection
Cn-1
An-1
Bn-1
Sn-1
Cn
OF
0
0
0
0
0
0
0
0
1
1
0
0
0
1
0
1
0
0
0
1
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
1
1
0
0
1
0
1
1
1
1
1
0
Examine the MSB bit
Bottom line:
P: positive; N: negative
N+N=N
P+P=P
P+N or N+P always fall
into the range
E.g. -128+P cannot be
smaller than -128 or
bigger than 127
Problem lies in
N+N = P
P+P = N
Discarded
9
Overflow Detection
Cn-1
An-1
Bn-1
Sn-1
Cn
OF
0
0
0
0
0
0
0
0
1
1
0
0
0
1
0
1
0
0
0
1
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
1
1
0
0
1
0
1
1
1
1
1
0
OF CAB CAB
or
OF Cn 1 Cn
Cn
Overflow/
Underflow
n-bit Adder/Subtractor
Cn-1
Discarded
10
Effects of Overflow
An exception (interrupt) occurs
Control jumps to predefined address for exception
Interrupted address is saved for possible resumption
Details based on software system / language
example: flight control vs. homework assignment
Don't always want to detect overflow
— new MIPS instructions: addu, addiu, subu
note: addiu still sign-extends!
same as addi except for no overflow exception
note: sltu, sltiu for unsigned comparisons
11
Review: Boolean Algebra & Gates
Problem: Consider a logic function with three inputs:
A, B, and C.
Output D is true if at least one input is true
Output E is true if exactly two inputs are true
Output F is true only if all three inputs are true
Show the truth table for these three functions.
Show the Boolean equations for these three functions.
Show an implementation consisting of inverters, AND,
and OR gates.
12
ALU (arithmetic logic unit)
Let's build an ALU to support the “and” and
“or” instructions
we'll just build a 1 bit ALU, and use 32 of them (bitslice)
operation
a
op a
b
res
result
b
Possible Implementation (sum-of-products):
13
AND and OR ALU
Operation
0
Result
A
1
B
14
Review: The Multiplexer
Selects one of the inputs to be the output, based on a
control input
S
note: we call this a 2-input mux
even though it has 3 inputs!
A
0
B
1
C
Lets build our ALU using a MUX:
15
Different Implementations
Not easy to decide the “best” way to build something
Don't want too many inputs to a single gate
Don’t want to have to go through too many gates
for our purposes, ease of comprehension is important
Let's look at a 1-bit ALU for addition:
cout = ab + acin + bcin
sum = a b cin
CarryIn
a
Sum
b
CarryOut
How could we build a 1-bit ALU for add, and, and or?
How could we build a 32-bit ALU?
16
Building a 32-bit ALU
CarryIn
a0
b0
Operation
Operation
CarryIn
ALU0
Result0
CarryOut
CarryIn
a1
a
0
b1
1
ALU1
Result1
CarryOut
Result
a2
2
b
CarryIn
b2
CarryIn
ALU2
Result2
CarryOut
CarryOut
a31
b31
CarryIn
ALU31
Result31
17
Subtraction (a – b) ?
Two's complement approach: just negate b and add
1.
How do we negate?
sub
Binvert
CarryIn
a0
CarryIn
ALU0
b0
Operation
Result0
CarryOut
A clever solution:
Binvert
Operation
a1
CarryIn
CarryIn
ALU1
b1
Result1
CarryOut
a
0
1
a2
Result
CarryIn
ALU2
b2
Result2
CarryOut
b
0
2
1
CarryOut
a31
b31
CarryIn
ALU31
Result31
18
Tailoring the ALU to the MIPS
Need to support the set-on-less-than instruction (slt)
remember: slt is an arithmetic instruction
produces a 1 if rs < rt and 0 otherwise
use subtraction: (a-b) < 0 implies a < b
Need to support test for equality (beq $t5, $t6, $t7)
use subtraction: (a-b) = 0 implies a = b
19
Supporting slt
Binvert
Binvert
Operation
CarryIn
a
Operation
CarryIn
a
0
0
1
1
Result
b
Result
b
0
0
2
1
2
Less
1
3
Set
Less
3
Overflow
detection
Overflow
CarryOut
20
What Result31 is when (a-b)<0?
Binvert
CarryIn
a0
b0
CarryIn
ALU0
Less
CarryOut
a1
b1
0
CarryIn
ALU1
Less
CarryOut
a2
b2
0
CarryIn
ALU2
Less
CarryOut
Operation
Result0
Result1
Result2
CarryIn
a31
b31
0
CarryIn
ALU31
Less
Result31
Set
Overflow
21
Test for equality
Notice control lines:
000
001
010
110
111
=
=
=
=
=
and
or
add
subtract
slt
•Note: zero is a 1 when the result is zero!
Bnegate
Operation
a0
b0
CarryIn
ALU0
Less
CarryOut
Result0
a1
b1
0
CarryIn
ALU1
Less
CarryOut
Result1
a2
b2
0
CarryIn
ALU2
Less
CarryOut
Result2
a31
b31
0
CarryIn
ALU31
Less
Zero
Result31
Set
Overflow
22
Conclusion
We can build an ALU to support the MIPS instruction set
key idea: use multiplexer to select the output we want
we can efficiently perform subtraction using two’s complement
we can replicate a 1-bit ALU to produce a 32-bit ALU
Important points about hardware
all of the gates are always working
the speed of a gate is affected by the number of inputs to the gate
the speed of a circuit is affected by the number of gates in series
(on the “critical path” or the “deepest level of logic”)
Our primary focus: comprehension, however,
Clever changes to organization can improve performance
(similar to using better algorithms in software)
we’ll look at two examples for addition and multiplication
23
4-bit Ripple Adder
A Full Adder
S
A
B
Cin
Cout
A3
B3
A2
B2
A1
B1
A0
B0
Carry
Cin
carry3
S3
carry2
S2
carry1
S1
S0
Critical Path = DXOR+4*(DAND+DOR) for 4-bit ripple adder (9 gate levels)
For an N-bit ripple adder
Critical Path Delay ~ 2(N-1)+3 = (2N+1) Gate delays
24
Problem: Ripple carry adder is slow
Is a 32-bit ALU as fast as a 1-bit ALU?
Is there more than one way to do addition?
Can you see the ripple? How could you get rid of it?
An approach in-between our two extremes
Motivation:
If we didn't know the value of carry-in, what could we do?
ci+1= ai bi + ai ci + bi ci = ai bi + ci (ai + bi)
When would we always generate a carry? gi = ai bi
When would we propagate the carry?
pi = ai + bi
Did we get rid of the ripple?
25
Carry Lookahead
Ci1 Ai Bi Ci (Ai Bi )
g i A i Bi
(generate)
pi A i Bi
Ci1 gi pi Ci
(propagate )
C1 g 0 p 0 C0
C 2 g1 p1C1 g1 p1g 0 p1p 0 C0
C3 g 2 p 2 C 2 g 2 p 2 g1 p 2 p1g 0 p 2 p1p 0 C0
C 4 g 3 p 3C3 g 3 p 3g 2 p 3 p 2 g1 p 3p 2 p1g 0 p 3 p 2 p1p 0 C0
Note that all the carry’s are only
dependent on input A and B and C
26
4-bit Carry Lookahead Adder
C0
C4
g3
C1 g 0 p 0C0
p3
C3
S3 a3 b3
g2
p2 C2
S2 a2 b2
C 2 g1 p1C1 g1 p1g 0 p1p 0C0
C 3 g 2 p 2C2 g 2 p 2 g1 p 2p1g 0 p 2p1p 0C0
C4 g 3 p 3C3 g 3 p 3 g 2 p 3p 2 g1 p 3p 2p1g 0 p 3p 2p1p 0C0
g1
p1 C1 g0 p0
S1 a1 b1
S0 a0 b0
Only 3 Gate Delay for each Carry Ci
= DAND + 2*DOR
4 Gate Delay for each Sum Si
= DAND + 2*DOR + DXOR
27
Hierarchically build bigger adders
CarryIn
a0
b0
a1
b1
a2
b2
a3
b3
CarryIn
Result0--3
ALU0
P0
G0
pi
gi
Carry-lookahead unit
C1
a4
b4
a5
b5
a6
b6
a7
b7
a8
b8
a9
b9
a10
b10
a11
b11
a12
b12
a13
b13
a14
b14
a15
b15
ci + 1
CarryIn
Result4--7
ALU1
P1
G1
pi + 1
gi + 1
C2
ci + 2
CarryIn
Result8--11
ALU2
P2
G2
pi + 2
gi + 2
C3
Can’t build a 16 bit adder this way...
(too big)
Could use ripple carry of 4-bit CLA
adders
Better: use the CLA principle again!
ci + 3
CarryIn
Result12--15
ALU3
P3
G3
pi + 3
gi + 3
C4
ci + 4
CarryOut
28
Multiplication
More complicated than addition
accomplished via shifting and addition
More time and more area
Let's look at 3 versions based on grade school
algorithm
0010
__x_1011
(multiplicand)
(multiplier)
Negative numbers: convert and multiply
there are better techniques (see my ECE2030 lecture13 PPT
slide)
29
Multiplication: Implementation
Start
Multiplier0 = 1
1. Test
Multiplier0
Multiplier0 = 0
Multiplicand
Shift left
1a. Add multiplicand to product and
place the result in Product register
64 bits
Multiplier
Shift right
64-bit ALU
2. Shift the Multiplicand register left 1 bit
32 bits
Product
Write
Control test
3. Shift the Multiplier register right 1 bit
64 bits
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions
Done
30
Second Version
Start
Multiplier0 = 1
1. Test
Multiplier0
Multiplier0 = 0
Multiplicand
1a. Add multiplicand to the left half of
the product and place the result in
the left half of the Product register
32 bits
Multiplier
Shift right
32-bit ALU
32 bits
Product
Shift right
Write
2. Shift the Product register right 1 bit
Control test
3. Shift the Multiplier register right 1 bit
64 bits
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions
Done
31
Final Version
Start
Product0 = 1
1. Test
Product0
Multiplicand
Product0 = 0
32 bits
1a. Add multiplicand to the left half of
the product and place the result in
the left half of the Product register
32-bit ALU
Product
Shift right
Write
Control
test
2. Shift the Product register right 1 bit
64 bits
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions
Done
32
Floating Point (a brief look)
We need a way to represent
numbers with fractions, e.g., 3.1416
very small numbers, e.g., .000000001
very large numbers, e.g., 3.15576x109
Representation:
sign, exponent, significand:
(–1)sign x significand x 2exponent
more bits for significand gives more accuracy
more bits for exponent increases range
IEEE 754 floating point standard:
single precision: 8 bit exponent, 23 bit significand
double precision: 11 bit exponent, 52 bit significand
Signficand = (1 + fraction), defined implicitly
33
IEEE 754 Standard Floating-point
Representation
Single Precision (32-bit)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
S
exponent
significand
1bit
23 bits
8 bits
(–1)sign x (1+fraction) x 2exponent-127
Double Precision (64-bit)
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
S
exponent
significand
1bit
20 bits
11 bits
sigfinicand (continued)
32 bits
(–1)sign x (1+fraction) x 2exponent-1023
34
IEEE 754 floating-point standard
Leading “1” bit of significand is implicit
Exponent is “biased” to make sorting easier
all 0s is smallest exponent all 1s is largest
bias of 127 for single precision and 1023 for double precision
summary: (–1)sign x (1fraction) x 2exponent – bias
(a.k.a. a normalized number – because of the 1 for scientific
notation)
Example:
decimal: -.75 = -3/4 = -3/22
binary: -.11 = -1.1 x 2-1
floating point: exponent = 126 = 01111110
IEEE single precision: 10111111010000000000000000000000
35
IEEE 754 Standard Example
(single precision)
(1) sign (1 Fraction) 2(exp 127)
(1) sign (1 s1* 2 1 s 2 * 2 2 s3 * 2 3 ... s 23 * 2 23 ) 2(exp 127)
(-0.75)10 = (????????)16
(-0.75)10 = (BF400000)16
36
IEEE 754 Standard Example
(single precision)
(1) sign (1 Fraction) 2(exp 127)
(1) sign (1 s1* 2 1 s 2 * 2 2 s3 * 2 3 ... s 23 * 2 23 ) 2(exp 127)
(23.15625)10 = (????????)16
(23.15625)10 = (41B94000)16
37
IEEE 754 Standard Encoding
Single Precision
Double Precision
Object
Represented
Exponent
Fraction
Exponent
Fraction
0
0
0
0
0 (zero)
0
Non-zero
0
Non-zero
±Denormaliz
ed number
1-254
Anything
1-2046
Anything
±Floatingpoint number
255
0
2047
0
± Infinity
255
Non-zero
2047
Non-zero
NaN (Not a
Number)
• NaN : (infinity – infinity), or 0/0
• Denormalized number = (-1)sign * 0.f * 21-bias
38
Precision Issues
Cannot represent all possible real numbers, they are
infinite !
Must be sacrifice precision when representing FP
numbers in some cases
Precision lost when integer portion is too large
Precision lost when fraction portion is too small
Example
How to represent 224 and 224+1 ?
Both = 4B800000 in single precision
How to represent 2-127 ? (use denormalized number ?? 0.1*2-126)
How about 2-150? (use denormalized number ?? What is the
smallest number by denormalized number?)
39
Floating Point Complexities
Operations are somewhat more complicated (see text)
In addition to overflow we can have “underflow”
Accuracy can be a big problem
IEEE 754 keeps two extra bits, guard and round
four rounding modes
positive divided by zero yields “infinity”
zero divide by zero yields “not a number”
other complexities
Implementing the standard can be tricky
Not using the standard can be even worse
see text for description of 80x86 and Pentium bug!
40