Transcript PhysLimL24

Physical Limits of Computing
Dr. Mike Frank
CIS 6930, Sec. #3753X
Spring 2002
Lecture #24
Adiabatic CMOS cont.
Wed., Mar. 13
Administrivia & Overview
• Don’t forget to keep up with homework!
– We are 8 out of 14 weeks into the course.
• You should have earned ~57 points by now.
• Course outline:
– Part I&II, Background, Fundamental Limits - done
– Part III, Future of Semiconductor Technology - done
– Part IV, Potential Future Computing Technologies - done
– Part V, Classical Reversible Computing
• Fundamentals of Adiabatic Processes & logic - last Wed. & Fri.
•
•
•
•
•
(----------------------- Spring Break ------------------------)
Adiabatic electronics & CMOS logic families, - Mon. & TODAY
Limits of adiabatics: Leakage and clock/power supplies. TODAY
RevComp theory I: Emulating Irreversible Machines - Fri. 3/15
RevComp theory II: Bounds on Space-Time Overheads - Mon. 3/18
(plus ~7 more lectures…)
– Part VI, Quantum Computing
– Part VII, Cosmological Limits, Wrap-Up
Adiabatic computing
in CMOS
Monday: Adiabatic switching, splitlevel retractile & pipelined logic.
Today: 2-Level Adiabatic Logic,
general adiabatic logic
Some Timing Terminology
For sequential adiabatic circuits:
• Tick: Time for a single ramp transition
– adiabatic speed fraction f times the RC gate delay.
• Phase: Latency for a data value to propagate
forward by 1 pipeline stage.
• Cycle: Minimum period for all timing
information to return to its initial state.
• Diadic: Two retractile levels per gate
Monadic:
– permits inverting or non-inverting logic.
• Dual rail: Two wires per logic value
– permits universal logic with monodic gates
only 1 level
Some Figures of Demerit
• Some quantities we may wish to minimize:
– Ticks/phase:
• proportional to logic propagation latency
– Ticks/cycle:
• reciprocal to rate of data throughput
– Transistor-ticks/cycle:
• reciprocal to HW cost-efficiency
– Number of required clock/power input signals:
• supplying these may be a significant component of
system cost
– Number of distinct voltage levels required:
• may affect reliability/power tradeoff
Some Interesting Questions
• About pipelined, sequential, fully-adiabatic
CMOS logic:
– Q: Does it require an intermediate voltage level?
• A: No, you can get by with only 2 different levels.
– Q: What is the minimum number of externally
provided timing signals you can get away with?
• A: 4 (12 if split levels are used)
– Q: Can the order-N different timing signals needed
for long retractile cascades be internally generated
within an adiabatic circuit?
• A: Yes, but not statically, unless N2 hardware is used
– where N is the number of stages per full sequential cycle
• We now demonstrate these answers.
Some Timing Examples
See next slide for some detailed timing diagrams.
• N-level retractile cascades:
– 2N ticks/phase × 1 phase/cycle = 2N ticks/cycle
• 3-phase fully-static diadic SCRL
– 8 ticks/phase × 3 phases/cycle = 24 ticks/cycle
• 2-phase fully-static monadic SCRL
– 5 ticks/phase × 2 phases/cycle = 10 ticks/cycle
• 2-phase fully-static diadic SCRL
– 6 ticks/phase × 2 phases/cycle = 12 ticks/cycle
• 6 tick/cycle dynamic SCRL detailed previously:
– 1 tick/phase × 6 phases/cycle = 6 ticks/cycle
Some SCRL timing diagrams
2LAL: 2-level Adiabatic Logic
P
• Dual-rail T-gate symbol:
• Basic buffer element:
– cross-coupled T-gates
• Only 4 different
timing signals,
4 ticks per cycle:
P
A
1
B
B : A
in
P
P
out
0
– i rises during tick i,
falls during tick (i+2) mod 4
• 1 tick/phase × 4 phases/cycle
= 4 ticks/cycle!
0
1
2
3
Tick #
0 1 2 3
– Optimizes latency & throughput per gate.
B
A
P
2LAL Cycle of Operation
Tick number:
1
2
0
in1
in
3
11
in0
10
out1
01
in=0
01
00
11
out0
out=0
00
Input-Barrier, Clocked-Bias Latching
(1) Input conditionally lowers barrier (logic w. series/parallel
barriers) (2) Clock applies bias force; conditional bit flip (3) Input
removed, raising barrier & locking in state-change (4) Clock bias
can retract.
1
2LAL is an
example of
this.
1
0
0
0
Input pulse
0
1
Pulse ends
N
1
Shift Register Structure
• 1-tick delay per logic stage:
2
3
4
1
in
out
1
2
3
4
• Logic pulse timing & propagation:
1 2 3 4 ...
in
in
1 2 3 4 ...
More complex logic functions
• Non-inverting Boolean functions:


A
B
A
A
B
AB
AB
• For inverting functions, must use quad-rail
A=0
A=1
logic encoding:
A0
A0
• Zero-transistor A1
A1
“inverters.”
– To invert, just
swap the rails!
Hardware Efficiency issues
• Hardware efficiency: How many logic
operations per unit hardware per unit time?
• Hardware spacetime complexity: How much
hardware for how much time per logic op?
• We’re interested in minimizing:
(# of transistors) × (# of ticks) / (gate cycle)
• SCRL inverter, w. return path:
– (8 transistors)  (6 ticks) = 48 transistor-ticks
• Quad-rail 2LAL buffer stage:
– (16 transistors)  (4 ticks) = 64 transistor-ticks
More SCRL vs. 2LAL
• SCRL reversible NAND, w. all inverters:
– (23 transistors)  (6 ticks) = 138 T-ticks
• Quad-rail 2LAL AND:
– (48 transistors)  (4 ticks) = 192 T-ticks
• Result of comparison: Although 2LAL
minimizes # of rails, and # ticks/cycle, it does
not minimize overall spacetime complexity.
– The question of whether 6-tick SCRL really
minimizes per-op spacetime complexity among
pipelined fully-adiabatic CMOS logics is still open.
• An opportunity for you to make a contribution!
Minimizing Power-Clock Signals
• How many external clock signals required?
– N-level-deep retractile cascade logic:
• 2N waveforms × 1 phase = 2N signals
– 6 tick/cycle, 6-phase dynamic SCRL:
• 6 waveforms × 6 phases = 36 signals
– 24 tick/cycle, 3-phase static SCRL:
• 12 waveforms × 3 phases = 36 signals
– 4 tick/cycle, 2LAL:
• 1 waveform × 4 phases = 4 signals!
• It turns out that 12 signals are sufficient to
implement any combination of 2-level or 3level logics (including retractile) on-chip!
How to Do It
• Circular 2LAL shifter; pulse-gated clocks
P1
0
P2
P3
P0
in
out
P0
P1
2
2
P2
2
P3
P0
P1
P2
P3
0
1
2
3
Tick #
0 1 2 3
12-rail system: pros & cons
• Pros:
– Completely solves adiabatic timing design problem
– Enables mixtures of retractile, SCRL, and other
logic styles on 1 chip
– Enables simple fully-adiabatic SRAM & DRAM
• Cons:
– Timing signals are dynamic
– Known fully-static alternatives use order N2 gates
and signals for N-tick-long cycles
– N can be large in a chip that includes deep retractile networks
– Energy waste in driving the source/drain junction
capacitances of all the T-gates even when timing
pulse isn’t present (SOI reduces these parasitics)
Fully-Adiabatic DRAM cell
• 6T, 6 lines/row, 1 line/column (in/out together)
• Read cycle:
–
–
–
–
–
Initially:  lines neutral, out neutral, R off
R for desired row turns on
 for desired row splits, driving out column
R turns off, out is read
 merges, out is reset
• Write cycle:
–
–
–
–
First, do read cycle.
in is set to out
W turns on
in changed to new value...
Fully-Adiabatic SRAM
• 10-T, 10 lines/row, 1 line/column
• Operation similar to DRAM, except:
• Read-out:
T2 off; N2 retracts; T3 on; N2 asserts; T2 on, T3 off
• Write:
T2 off; N2 retracts; N1 retracts, copy of M presented
on input; T1 on; in
changes; T1 off, N1
N1
N2
asserts; N2 asserts; T2 on
T1
in
M T2
T3
out