Principles of Computer Architecture Dr. Mike Frank

Download Report

Transcript Principles of Computer Architecture Dr. Mike Frank

Physical Limits of Computing
Dr. Mike Frank
CIS 6930, Sec. #3753X
Spring 2002
Lecture #23
Adiabatic Electronics & CMOS
Mon., Mar. 11
Administrivia & Overview
• Don’t forget to keep up with homework!
– We are 7 out of 14 weeks into the course.
• You should have earned ~50 points by now.
• Course outline:
– Part I&II, Background, Fundamental Limits - done
– Part III, Future of Semiconductor Technology - done
– Part IV, Potential Future Computing Technologies - done
– Part V, Classical Reversible Computing
• Fundamentals of Adiabatic Processes & logic - last Wed. & Fri.
•
•
•
•
•
(----------------------- Spring Break ------------------------)
Adiabatic electronics & CMOS logic families - TODAY
Limits of adiabatics: Leakage and clock/power supplies. - Wed. 3/13
RevComp theory I: Emulating Irreversible Machines - Fri. 3/15
RevComp theory II: Bounds on Space-Time Overheads - Mon. 3/18
(plus ~7 more lectures…)
– Part VI, Quantum Computing
– Part VII, Cosmological Limits, Wrap-Up
Adiabatic electronics &
CMOS implementations
Conventional Gates are Irreversible
• Logic gate behavior (on receiving new input):
– Many-to-one transformation of local state!
– Required to dissipate bT by Landauer principle
– Incurs ½CV2 dissipation in 2 out of 4 cases.
Example:
Static CMOS Inverter:
in
out
Transformation of local state:
Just before
transition:
in out
0 0
0 1
1 0
1 1
After
transition:
in out
0
1
1
0
Exact formula:



Ediss  f 1  f e 1/ f  1  CV 2
for frequency reduction
f : RC/t
Adiabatic Rules for Transistors
• Rule 1: Never turn on a transistor if it has a
nonzero voltage across it!
– I.e., between its source & drain terminals.
– Why: This erases info. & causes ½CV2 disspation.
• Rule 2: Never apply a nonzero voltage across a
transistor even during any onoff transition!
– Why: When partially turned on, the transistor has
relatively low R, gets high P=V2/R dissipation.
– Corollary: Never turn off a transistor if it has a
nonzero current going through it!
• Why: As R gradually increases, the V=IR voltage drop
will build, and then rule 2 will be violated.
Adiabatic Rules continued
• Transistor Rule 3: Never suddenly change the
voltage applied across any on transistor.
– Why: So transition will be more reversible;
dissipation will approach CV2(RC/t), not ½CV2.
Adiabatic rules for other components:
• Diodes: Don’t use them at all!
– There is always a built-in voltage drop across them!
• Resistors: Avoid moderate network resistances.
– e.g. stay away from range >10 k and <1 M
• Capacitors: Minimize, reliability permitting.
– Note: Dissipation scales with C2!
Transistor Rules Summarized
Legal transitions in green. (For n- or p-FETs.)
Dissipative states and transitions in red.
high
high
off
high
low
low
on
low
high
high
off
low
off
low
off
high
low
on
high
low
on
low
on
high

Transformation of local state:
Just before
transition:
After
transition:
in out
0 ½
1 ½
in out
0 1
1 0
Input-Barrier, Clocked-Bias Retractile
* Must reset output
prior to input.
* Combinational logic
only!
• Cycle of operation:
– Inputs raise or lower barriers
• Do logic w. series/parallel barriers
– Clock applies bias force which changes state, or not
Examples:
Hall’s logic,
SCRL gates,
Rod logic interlocks
0
0
0
Input barrier height
0
N
Clocked force applied 
1
Retractile Logic w. SCRL gates
• Simple combinational logic of any depth N:
– Requires N timing phases
– Non-pipelined
– No sequential reuse of
HW (even worse)
• Sequential logic
is required!
Time 
Sequential Retractile Logic
• Approach #1 (Hall ‘92):
– After every N stages, invoke an irreversible latch
• stores the output of the last stage
– Then, retract all the stages,
– and begin a new cycle
• Problems:
– Reduces dissipation by at most a factor of N
– Also reduces HW efficiency by order N!
• In worst case, compared to a pipelined, sequential circuit
• Approach #2 (Knight & Younis, ‘93):
– The “store output” stage can also be reversible!
– Gives fully-adiabatic, sequential, pipelined circuits!
• N can be as small 1 or 2 & still have arbitrarily high Q
Simple Reversible CMOS Latch
• Uses a standard CMOS transmission gate
• Sequence of operation:
(1) input initially matches latch contents (output)
(2) input changesoutput changes (3) latch closes
(4) input removed
P
in
out
P
Before
input:
in out
a a
Input
arrived:
in out
a a
b b
Input
removed:
in out
a a
a b
Resetting a Reversible Latch
• Can reversibly unlatch data as follows:
(exactly the reverse of the latching process)
– (1) Data value d stored on memory node M.
– (2) Present an exact copy of d on input.
– (3) Open the latch (connecting input to M).
• No dissipation since voltage levels match
– (4) Retract the copy of d from the input.
• Retracts copy stored in latch also.
Input-Bias Clocked-Barrier Logic
• Cycle of operation:
Can amplify/restore input signal
in clocking step.
– Data input applies bias
• Add forces to do logic
– Clock signal raises barrier
– Data input bias removed
Can reset latch
reversibly given
copy of contents.
Retract
input
0
Examples: Adiabatic
QDCA, SCRL latch, Rod
logic latch, PQ logic,
Buckled logic
1
Clock
barrier
up
0
Clock up
Input
“0”
0
1
Retract
input
N
Input
“1”
1
SCRL 6-tick clock cycle
Initial state: All gates off, all nodes neutral.
in
out
SCRL 6-tick clock cycle
Tick #1: Input goes valid, forward T-gate opens.
in
out
SCRL 6-tick clock cycle
Tick #2: Forward gate charges, output goes valid.
(Tick #1 of subsequent gate.)
in
out
SCRL 6-tick clock cycle
Tick #3: Forward T-gate closes, reverse gate charges.
in
out
SCRL 6-tick clock cycle
Tick #4: Reverse T-gate opens, forward gate discharges.
in
out
SCRL 6-tick clock cycle
Tick #5: Reverse gate discharges, input goes neutral.
in
out
SCRL 6-tick clock cycle
Tick #6: Reverse T-gate closes, output goes neutral.
Ready for next input!
in
out
24 ticks/cycle
in this versionincludes 2-level
retractile stages
Some Interesting Questions
• About pipelined, sequential, fully-adiabatic
CMOS logic:
– Q: Does it require these intermediate voltage levels?
• A: No, you can get by with only 2 different levels.
– Q: What is the minimum number of externally
provided timing signals you can get away with?
• A: 4 (12 if split levels are used)
– Q: Can the order-N different timing signals needed
for long retractile cascades be internally generated
within an adiabatic circuit?
• A: Yes, but not statically, unless N2 hardware is used
– where N is the number of stages per full sequential cycle
• We now demonstrate these answers.
2LAL: 2-level Adiabatic Logic
• Use simplified T-gate symbol:
2
• Basic buffer element:
– cross-coupled T-gates
• Only 4 timing signals,
4 ticks per cycle:
P
:
in
out
1
– i rises during tick i
– i falls during tick ((i+1) mod 4)+1
P
1
2
3
4
Tick #
1 2 3 4
P
2LAL Cycle of Operation
in1
in
21
in0
20
out1
11
in=0
11
10
21
out0
out=0
10
Shift Register Structure
• 1-tick delay per logic stage:
2
3
4
1
in
out
1
2
3
4
• Logic pulse timing & propagation:
1 2 3 4 ...
in
in
1 2 3 4 ...
More complex logic functions
• Non-inverting Boolean functions:


A
B
A
A
B
AB
AB
• For inverting functions, must use quad-rail
A=0
A=1
logic encoding:
A0
A0
• Zero-transistor A1
A1
“inverters.”
– To invert, just
swap the rails!
Hardware Efficiency issues
• Hardware efficiency: How many logic
operations per unit hardware per unit time?
• Hardware spacetime complexity: How much
hardware for how much time per logic op?
• We’re interested in minimizing:
(# of transistors) × (# of ticks) / (gate cycle)
• SCRL inverter, w. return path:
– (8 transistors)  (6 ticks) = 48 transistor-ticks
• Quad-rail 2LAL buffer stage:
– (16 transistors)  (4 ticks) = 64 transistor-ticks
More SCRL vs. 2LAL
• SCRL reversible NAND, w. all inverters:
– (23 transistors)  (6 ticks) = 138 T-ticks
• Quad-rail 2LAL AND:
– (48 transistors)  (4 ticks) = 192 T-ticks
• Result of comparison: Although 2LAL
minimizes # of rails, and # ticks/cycle, it does
not minimize overall spacetime complexity.
• The question of whether 6-tick SCRL
minimizes per-op spacetime complexity among
pipelined adiabatic CMOS logics is still open.