Lec4-alu - ECE Users Pages - Georgia Institute of Technology

Download Report

Transcript Lec4-alu - ECE Users Pages - Georgia Institute of Technology

ECE3055
Computer Architecture and
Operating Systems
Lecture 4 Numbers and ALU
Prof. Hsien-Hsin Sean Lee
School of Electrical and Computer Engineering
Georgia Institute of Technology
H.-H. S. Lee
1
Arithmetic
 Where we've been:
 Performance (seconds, cycles, instructions)
 Abstractions:
Instruction Set Architecture
Assembly Language and Machine Language
 What's up ahead:
 Implementing the Architecture
operation
a
32
ALU
result
32
b
32
2
Numbers
 Bits are just bits (no inherent meaning) — conventions
define relationship between bits and numbers
 Binary numbers (base 2)
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001...
decimal: 0...2n-1
 Of course it gets more complicated:
 numbers are finite (overflow)
 fractions and real numbers
 negative numbers (e.g., no MIPS subi instruction; addi can
add a negative number)
 How do we represent negative numbers? i.e., which
bit patterns will represent which numbers?
3
Possible Representations

Sign Magnitude:
000 = +0
001 = +1
010 = +2
011 = +3
100 = -0
101 = -1
110 = -2
111 = -3
One's Complement
000 = +0
001 = +1
010 = +2
011 = +3
100 = -3
101 = -2
110 = -1
111 = -0
Two's Complement
000 = +0
001 = +1
010 = +2
011 = +3
100 = -4
101 = -3
110 = -2
111 = -1
 Issues: balance, number of zeros, ease of operations
 Which one is best? Why?
4
MIPS
 32 bit signed numbers:
0000
0000
0000
...
0111
0111
1000
1000
1000
...
1111
1111
1111
0000 0000 0000 0000 0000 0000 0000two = 0ten
0000 0000 0000 0000 0000 0000 0001two = + 1ten
0000 0000 0000 0000 0000 0000 0010two = + 2ten
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1111
1111
0000
0000
0000
1110two
1111two
0000two
0001two
0010two
=
=
=
=
=
+
+
–
–
–
2,147,483,646ten
2,147,483,647ten
2,147,483,648ten
2,147,483,647ten
2,147,483,646ten
maxint
minint
1111 1111 1111 1111 1111 1111 1101two = – 3ten
1111 1111 1111 1111 1111 1111 1110two = – 2ten
1111 1111 1111 1111 1111 1111 1111two = – 1ten
5
Two's Complement Operations
 Negating a two's complement number: invert all bits
and add 1
 remember: “negate” and “invert” are quite different!
 Converting n bit numbers into numbers with more than
n bits:
 MIPS 16 bit immediate gets converted to 32 bits for
arithmetic
 copy the most significant bit (the sign bit) into the other bits
0010
-> 0000 0010
1010
-> 1111 1010
 "sign extension" (lbu vs. lb)
6
Addition & Subtraction
 Just like in grade school (carry/borrow 1s, assume
unsigned)
+
0111
0111
0110
0110
- 0110
- 0101
 Two's complement operations easy
 subtraction using addition of negative numbers
0111
+ 1010
 Overflow (result too large for finite computer word):
 e.g., adding two n-bit numbers does not yield an n-bit
number
0111
+ 0001
1000
note that overflow term is somewhat misleading,
it does not mean a carry “overflowed”
7
Detecting Overflow
 No overflow when adding a positive and a negative
number
 No overflow when signs are the same for subtraction
 Overflow occurs when the value affects the sign:




overflow when adding two positives yields a negative
or, adding two negatives gives a positive
or, subtract a negative from a positive and get a negative
or, subtract a positive from a negative and get a positive
 Consider the operations A + B, and A – B
 Can overflow occur if B is 0 ?
 Can overflow occur if A is 0 ?
8
Overflow Detection
Cn-1
An-1
Bn-1
Sn-1
Cn
OF
0
0
0
0
0
0
0
0
1
1
0
0
0
1
0
1
0
0
0
1
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
1
1
0
0
1
0
1
1
1
1
1
0




Examine the MSB bit
Bottom line:
P: positive; N: negative
N+N=N


P+P=P
P+N or N+P always fall
into the range

E.g. -128+P cannot be
smaller than -128 or
bigger than 127
 Problem lies in


N+N = P
P+P = N
Discarded
9
Overflow Detection
Cn-1
An-1
Bn-1
Sn-1
Cn
OF
0
0
0
0
0
0
0
0
1
1
0
0
0
1
0
1
0
0
0
1
1
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
1
1
0
0
1
0
1
1
1
1
1
0
OF  CAB  CAB
or
OF  Cn 1  Cn
Cn
Overflow/
Underflow
n-bit Adder/Subtractor
Cn-1
Discarded
10
Effects of Overflow
 An exception (interrupt) occurs
 Control jumps to predefined address for exception
 Interrupted address is saved for possible resumption
 Details based on software system / language
 example: flight control vs. homework assignment
 Don't always want to detect overflow
— new MIPS instructions: addu, addiu, subu
note: addiu still sign-extends!
 same as addi except for no overflow exception
note: sltu, sltiu for unsigned comparisons
11
Review: Boolean Algebra & Gates
 Problem: Consider a logic function with three inputs:
A, B, and C.
Output D is true if at least one input is true
Output E is true if exactly two inputs are true
Output F is true only if all three inputs are true
 Show the truth table for these three functions.
 Show the Boolean equations for these three functions.
 Show an implementation consisting of inverters, AND,
and OR gates.
12
ALU (arithmetic logic unit)
 Let's build an ALU to support the “and” and
“or” instructions
 we'll just build a 1 bit ALU, and use 32 of them (bitslice)
operation
a
op a
b
res
result
b
 Possible Implementation (sum-of-products):
13
AND and OR ALU
Operation
0
Result
A
1
B
14
Review: The Multiplexer
 Selects one of the inputs to be the output, based on a
control input
S
note: we call this a 2-input mux
even though it has 3 inputs!
A
0
B
1
C
 Lets build our ALU using a MUX:
15
Different Implementations
 Not easy to decide the “best” way to build something
 Don't want too many inputs to a single gate
 Don’t want to have to go through too many gates
 for our purposes, ease of comprehension is important
 Let's look at a 1-bit ALU for addition:
cout = ab + acin + bcin
sum = a  b  cin
CarryIn
a
Sum
b
CarryOut
 How could we build a 1-bit ALU for add, and, and or?
 How could we build a 32-bit ALU?
16
Building a 32-bit ALU
CarryIn
a0
b0
Operation
Operation
CarryIn
ALU0
Result0
CarryOut
CarryIn
a1
a
0
b1
1
ALU1
Result1
CarryOut
Result
a2
2
b
CarryIn
b2
CarryIn
ALU2
Result2
CarryOut
CarryOut
a31
b31
CarryIn
ALU31
Result31
17
Subtraction (a – b) ?
 Two's complement approach: just negate b and add
1.
 How do we negate?
sub
Binvert
CarryIn
a0
CarryIn
ALU0
b0
Operation
Result0
CarryOut
 A clever solution:
Binvert
Operation
a1
CarryIn
CarryIn
ALU1
b1
Result1
CarryOut
a
0
1
a2
Result
CarryIn
ALU2
b2
Result2
CarryOut
b
0
2
1
CarryOut
a31
b31
CarryIn
ALU31
Result31
18
Tailoring the ALU to the MIPS
 Need to support the set-on-less-than instruction (slt)
 remember: slt is an arithmetic instruction
 produces a 1 if rs < rt and 0 otherwise
 use subtraction: (a-b) < 0 implies a < b
 Need to support test for equality (beq $t5, $t6, $t7)
 use subtraction: (a-b) = 0 implies a = b
19
Supporting slt
Binvert
Binvert
Operation
CarryIn
a
Operation
CarryIn
a
0
0
1
1
Result
b
Result
b
0
0
2
1
2
Less
1
3
Set
Less
3
Overflow
detection
Overflow
CarryOut
20
What Result31 is when (a-b)<0?
Binvert
CarryIn
a0
b0
CarryIn
ALU0
Less
CarryOut
a1
b1
0
CarryIn
ALU1
Less
CarryOut
a2
b2
0
CarryIn
ALU2
Less
CarryOut
Operation
Result0
Result1
Result2
CarryIn
a31
b31
0
CarryIn
ALU31
Less
Result31
Set
Overflow
21
Test for equality
 Notice control lines:
000
001
010
110
111
=
=
=
=
=
and
or
add
subtract
slt
•Note: zero is a 1 when the result is zero!
Bnegate
Operation
a0
b0
CarryIn
ALU0
Less
CarryOut
Result0
a1
b1
0
CarryIn
ALU1
Less
CarryOut
Result1
a2
b2
0
CarryIn
ALU2
Less
CarryOut
Result2
a31
b31
0
CarryIn
ALU31
Less
Zero
Result31
Set
Overflow
22
Conclusion
 We can build an ALU to support the MIPS instruction set

key idea: use multiplexer to select the output we want

we can efficiently perform subtraction using two’s complement

we can replicate a 1-bit ALU to produce a 32-bit ALU
 Important points about hardware

all of the gates are always working

the speed of a gate is affected by the number of inputs to the gate

the speed of a circuit is affected by the number of gates in series
(on the “critical path” or the “deepest level of logic”)
 Our primary focus: comprehension, however,


Clever changes to organization can improve performance
(similar to using better algorithms in software)
we’ll look at two examples for addition and multiplication
23
4-bit Ripple Adder
A Full Adder
S
A
B
Cin
Cout
A3
B3
A2
B2
A1
B1
A0
B0
Carry
Cin
carry3
S3
carry2
S2
carry1
S1
S0
Critical Path = DXOR+4*(DAND+DOR) for 4-bit ripple adder (9 gate levels)
For an N-bit ripple adder
Critical Path Delay ~ 2(N-1)+3 = (2N+1) Gate delays
24
Problem: Ripple carry adder is slow
 Is a 32-bit ALU as fast as a 1-bit ALU?
 Is there more than one way to do addition?
Can you see the ripple? How could you get rid of it?
An approach in-between our two extremes
 Motivation:

If we didn't know the value of carry-in, what could we do?
 ci+1= ai bi + ai ci + bi ci = ai bi + ci (ai + bi)
 When would we always generate a carry? gi = ai bi
 When would we propagate the carry?
pi = ai + bi
 Did we get rid of the ripple?
25
Carry Lookahead
Ci1  Ai Bi  Ci (Ai  Bi )
g i  A i Bi
(generate)
pi  A i  Bi
Ci1  gi  pi Ci
(propagate )
C1  g 0  p 0 C0
C 2  g1  p1C1  g1  p1g 0  p1p 0 C0
C3  g 2  p 2 C 2  g 2  p 2 g1  p 2 p1g 0  p 2 p1p 0 C0
C 4  g 3  p 3C3  g 3  p 3g 2  p 3 p 2 g1  p 3p 2 p1g 0  p 3 p 2 p1p 0 C0
Note that all the carry’s are only
dependent on input A and B and C
26
4-bit Carry Lookahead Adder
C0
C4
g3
C1  g 0  p 0C0
p3
C3
S3 a3 b3
g2
p2 C2
S2 a2 b2
C 2  g1  p1C1  g1  p1g 0  p1p 0C0
C 3  g 2  p 2C2  g 2  p 2 g1  p 2p1g 0  p 2p1p 0C0
C4  g 3  p 3C3  g 3  p 3 g 2  p 3p 2 g1  p 3p 2p1g 0  p 3p 2p1p 0C0
g1
p1 C1 g0 p0
S1 a1 b1
S0 a0 b0
Only 3 Gate Delay for each Carry Ci
= DAND + 2*DOR
4 Gate Delay for each Sum Si
= DAND + 2*DOR + DXOR
27
Hierarchically build bigger adders
CarryIn
a0
b0
a1
b1
a2
b2
a3
b3
CarryIn
Result0--3
ALU0
P0
G0
pi
gi
Carry-lookahead unit
C1
a4
b4
a5
b5
a6
b6
a7
b7
a8
b8
a9
b9
a10
b10
a11
b11
a12
b12
a13
b13
a14
b14
a15
b15
ci + 1
CarryIn
Result4--7
ALU1
P1
G1
pi + 1
gi + 1
C2
ci + 2
CarryIn
Result8--11
ALU2
P2
G2
pi + 2
gi + 2
C3
 Can’t build a 16 bit adder this way...
(too big)
 Could use ripple carry of 4-bit CLA
adders
 Better: use the CLA principle again!
ci + 3
CarryIn
Result12--15
ALU3
P3
G3
pi + 3
gi + 3
C4
ci + 4
CarryOut
28
Multiplication
 More complicated than addition
 accomplished via shifting and addition
 More time and more area
 Let's look at 3 versions based on grade school
algorithm
0010
__x_1011
(multiplicand)
(multiplier)
 Negative numbers: convert and multiply
 there are better techniques (see my ECE2030 lecture13 PPT
slide)
29
Multiplication: Implementation
Start
Multiplier0 = 1
1. Test
Multiplier0
Multiplier0 = 0
Multiplicand
Shift left
1a. Add multiplicand to product and
place the result in Product register
64 bits
Multiplier
Shift right
64-bit ALU
2. Shift the Multiplicand register left 1 bit
32 bits
Product
Write
Control test
3. Shift the Multiplier register right 1 bit
64 bits
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions
Done
30
Second Version
Start
Multiplier0 = 1
1. Test
Multiplier0
Multiplier0 = 0
Multiplicand
1a. Add multiplicand to the left half of
the product and place the result in
the left half of the Product register
32 bits
Multiplier
Shift right
32-bit ALU
32 bits
Product
Shift right
Write
2. Shift the Product register right 1 bit
Control test
3. Shift the Multiplier register right 1 bit
64 bits
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions
Done
31
Final Version
Start
Product0 = 1
1. Test
Product0
Multiplicand
Product0 = 0
32 bits
1a. Add multiplicand to the left half of
the product and place the result in
the left half of the Product register
32-bit ALU
Product
Shift right
Write
Control
test
2. Shift the Product register right 1 bit
64 bits
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions
Done
32
Floating Point (a brief look)
 We need a way to represent
 numbers with fractions, e.g., 3.1416
 very small numbers, e.g., .000000001
 very large numbers, e.g., 3.15576x109
 Representation:
 sign, exponent, significand:
(–1)sign x significand x 2exponent
 more bits for significand gives more accuracy
 more bits for exponent increases range
 IEEE 754 floating point standard:
 single precision: 8 bit exponent, 23 bit significand
 double precision: 11 bit exponent, 52 bit significand
 Signficand = (1 + fraction), defined implicitly
33
IEEE 754 Standard Floating-point
Representation
Single Precision (32-bit)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
S
exponent
significand
1bit
23 bits
8 bits
(–1)sign x (1+fraction) x 2exponent-127
Double Precision (64-bit)
63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32
S
exponent
significand
1bit
20 bits
11 bits
sigfinicand (continued)
32 bits
(–1)sign x (1+fraction) x 2exponent-1023
34
IEEE 754 floating-point standard
 Leading “1” bit of significand is implicit
 Exponent is “biased” to make sorting easier
 all 0s is smallest exponent all 1s is largest
 bias of 127 for single precision and 1023 for double precision
 summary: (–1)sign x (1fraction) x 2exponent – bias
(a.k.a. a normalized number – because of the 1 for scientific
notation)
 Example:




decimal: -.75 = -3/4 = -3/22
binary: -.11 = -1.1 x 2-1
floating point: exponent = 126 = 01111110
IEEE single precision: 10111111010000000000000000000000
35
IEEE 754 Standard Example
(single precision)
(1) sign  (1  Fraction)  2(exp 127)
 (1) sign  (1  s1* 2 1  s 2 * 2  2  s3 * 2 3  ...  s 23 * 2  23 )  2(exp 127)
(-0.75)10 = (????????)16
(-0.75)10 = (BF400000)16
36
IEEE 754 Standard Example
(single precision)
(1) sign  (1  Fraction)  2(exp 127)
 (1) sign  (1  s1* 2 1  s 2 * 2  2  s3 * 2 3  ...  s 23 * 2  23 )  2(exp 127)
(23.15625)10 = (????????)16
(23.15625)10 = (41B94000)16
37
IEEE 754 Standard Encoding
Single Precision
Double Precision
Object
Represented
Exponent
Fraction
Exponent
Fraction
0
0
0
0
0 (zero)
0
Non-zero
0
Non-zero
±Denormaliz
ed number
1-254
Anything
1-2046
Anything
±Floatingpoint number
255
0
2047
0
± Infinity
255
Non-zero
2047
Non-zero
NaN (Not a
Number)
• NaN : (infinity – infinity), or 0/0
• Denormalized number = (-1)sign * 0.f * 21-bias
38
Precision Issues
 Cannot represent all possible real numbers, they are
infinite !
 Must be sacrifice precision when representing FP
numbers in some cases
 Precision lost when integer portion is too large
 Precision lost when fraction portion is too small
 Example
 How to represent 224 and 224+1 ?
 Both = 4B800000 in single precision
 How to represent 2-127 ? (use denormalized number ?? 0.1*2-126)
 How about 2-150? (use denormalized number ?? What is the
smallest number by denormalized number?)
39
Floating Point Complexities
 Operations are somewhat more complicated (see text)
 In addition to overflow we can have “underflow”
 Accuracy can be a big problem
 IEEE 754 keeps two extra bits, guard and round
 four rounding modes
 positive divided by zero yields “infinity”
 zero divide by zero yields “not a number”
 other complexities
 Implementing the standard can be tricky
 Not using the standard can be even worse
 see text for description of 80x86 and Pentium bug!
40