Principles of Computer Architecture Dr. Mike Frank

Download Report

Transcript Principles of Computer Architecture Dr. Mike Frank

Principles of Adiabatic Processes
Adiabatic Processes - overview
• Adiabatic steps in the reversible Carnot cycle
• Evolution of the meaning of “adiabatic”
• Time-proportional reversibility (TPR) of quasiadiabatic processes
• Adiabatic theorem of quantum mechanics
• Adiabatic transitions of a two-state system
• Logic & memory in irreversible and adiabatic
processes.
The Carnot Cycle
• In 1822-24, Sadi Carnot analyzed the efficiency
of an ideal heat engine all of whose steps were
reversible, and furthermore proved that:
– Any reversible engine (regardless of details) would
have the same efficiency (THTL)/TH.
– No engine could have greater efficiency than a
reversible engine w/o producing work from nothing
– Temperature itself could be defined on a
thermodynamic scale based on heat recoverable by a
reversible engine operating between TH and TL
Steps of Carnot Cycle
P
• Isothermal expansion at TH
• Adiabatic (without flow of
heat) expansion THTL
• Isothermal compression at TL
• Adiabatic compression TLTH
Reservoir
Isothermal
Adiabatic
Reservoir
TH
TL
V
Isothermal
Reservoir
Reservoir
Adiabatic
Carnot Cycle Terminology
• Adiabatic (Latin): literally “Without flow of heat”
– I.e., no entropy enters or leaves the system
• Isothermal: “At the same temperature”
– Temperature of system remains constant as entropy enters or
leaves.
• Both kinds of steps, in the case of the Carnot cycle, are
examples of isentropic processes
– “at the same entropy”
– I.e., no (known) information is transformed into entropy in
either process
• But, the usage of the word “adiabatic” in applied
physics has mutated to essentially mean isentropic.
Old and New “Adiabatic”
• Consider a closed system where you just
lose track of its detailed evolution:
– It’s adiabatic (no net heat flow),
– But it’s not “adiabatic” (not isentropic)
• Consider a box containing some heat,
flying ballistically out of the system:
“The System”
– It’s not adiabatic, (no heat flow)
• because heat is “flowing” out of the system
– But it’s “adiabatic” (no entropy is generated)
Box o’ Heat
Justifying the Modern Usage
• In an adiabatic process following a desired
trajectory through configuration space,
– No heat flows in or out of the subsystem consisting of
those particular degrees of freedom whose variation
carries out the motion along the desired trajectory.
• E.g., the computational degrees of freedom in a
computational process.
– No heat flow  no entropy flow
• Heat is just energy whose configuration info. is entropy
– No entropy flow  no sustained entropy generation
• Since bounded systems have a maximum entropy
Quasi-Adiabatic Processes
• Complete adiabaticity means absolutely zero rate
of entropy generation
– Requires infinite degree of isolation of system from
uncontrolled external environment!
•  Impossible to completely achieve in practice.
• Real processes are only adiabatic to the extent
that their entropy generation approaches zero.
– Term “quasi-adiabatic” emphasizes imperfection
• Asymptotically adiabatic designs conceptually
approach 0 in the limit of variation of specified
technology design parameter(s)
– E.g., low device frequency, large device size
Quasi-Adiabatic Processes
• No real process is completely adiabatic
– Because some outside system may always have enough
energy to interact with & disturb your system’s
evolution - e.g., cosmic ray, asteroid
•  Evolution of system state is never perfectly known
– Unless you know the exact quantum state of the whole universe
– Entropy of your system always increases.
• Unless it is already at a maximum (at equilibrium)
– Can’t really be at complete equilibrium with its surroundings
» unless whole universe is at utterly stable “heat death” state.
• Systems at equilibrium are sometimes called “static.”
• Non-equilibrium, quasi-adiabatic processes are
sometimes also called quasi-static
– Changing, but near a local equilibrium otherwise
Quantifying Adiabaticity
• An appropriate metric for quantifying the degree
of adiabaticity of any process is just to use the
quality factor Q of that process.
– Q isn’t just for oscillatory processes any more
• Q is generally the ratio Etrans / Ediss between the:
– Energy Etrans involved in carrying out a process
• transitioning between states along a trajectory
– Amount Ediss of energy dissipated during the process.
• Normally also matches the following ratios:
– Physical information content / entropy generated
– Quantum computation rate / decoherence rate
– Decoherence time / quantum-transition time
Degree of Reversibility
• The degree of reversibility (a.k.a. reversibility,
a.k.a. thermodynamic efficiency) of any quasiadiabatic process is defined as the ratio of:
– the total free energy at the start of the process
–  by the total energy spent in the process
• Or, equivalently:
– the known, accessible information at the start
–  by the amount that is converted to entropy
• This same quantity is referred to as the (percycle) “quality factor” Q for any resonant
element (e.g., LC oscillator) in EE.
E free (0)
Espent (t )
K0
S t
The “Adiabatic Principle”
• Claim: Any ideal quasi-adiabatic process
performed over time t has a thermodynamic
efficiency that is proportional to t,
– in the limit as t0.
• We call processes that realize this idealization
time-proportionally reversible (TPR) processes.
• Note that the total energy spent (Espent), and
the total entropy generated (S), are both
inversely proportional to t in any TPR process.
– The slower the process, the more energy-efficient.
Proving the Adiabatic Principle
(See RevComp memo #M14)
• Assume free energy is in generalized kinetic
energy of motion Ek of system through its
configuration space.
Ek = ½mv2  v2 = (/t)2  t2 for m,  const.
• Assume that every tf time, on average (mean free
time), a constant fraction f of Ek is thermalized
(turned into heat)
• Whole process thermalizes energy f(t/tf)Ek  tt2 =
t1. Constant in front is ½ fm2/tf : 2, where
=½fm/tf is the effective viscosity.
Example: Electrical Resistance
• We know Pspent=I2R=(Q/t)2R,
or Espent =Pt = Q2R/t.  Note scaling with 1/t
– Charge transfer through a resistor obeys the
adiabatic principle!
• Why is this so, microscopically?
– In most situations, conduction electrons have a large
Fermi velocity or thermal velocity relative to drift
velocity.
•  Scatter off of lattice-atom cross-sections with a mean
free time tf that is fairly independent of drift velocity
– Each scattering event thermalizes the electron’s drift
kinetic energy - a frac. f of current’s total Ek
• Therefore assumptions in prev. proof apply!
The Adiabatic Theorem
• A result in basic quantum theory
– Proved in many quantum mechanics textbooks
• Paraphrased: A system initially in its ground state (or
more generally, its nth energy eigenstate) will, after
subjecting it to a sufficiently slow change of applied
forces, remain in the corresponding state, with high
probability.
– Result has been recently shown to be very general.
• Amount of leakage out of desired state is
proportional to speed of transition, at low speeds.
– Quantum systems obey the adiabatic principle!
Some Loss-Inducing Interactions
For ordinary voltage-coded electronics:
• Interactions whose dissipation scales with speed:
– Parasitic EM emission from dynamic (C,L) reactances
– Scattering of ballistic electrons from lattice
imperfections, causing Ohmic resistance
• Interactions having different scaling laws:
– Interference from outside EM sources
– Thermally-activated leakage of electrons over potential
energy barriers
– Quantum tunneling of electrons through narrow barriers
(sub-Fermi wavelength)
Focus of most
of the work on
– Losses due to intentional treatment of known
adiabatics to
physical information as entropy (bit erasure)
date
Some Ways to Reduce Losses
• EM interference / emission: Add shielding, use high-Q
MEMS/NEMS oscillators
• Scattering/resistance: Ballistic FETs, superconductors
• Thermal leakage: avoid low VT and/or high temps
• Tunneling: thick tunnel barriers, high-κ dielectrics,
conductors w. low Fermi-level/high electron affinity,
vacuum-gap barriers?
• Intentional bit erasure: reduce voltages, use mostlyreversible adiabatic logic designs
Adiabatic Circuits &
Reversible Computing
Myths, Controversies &
Misconceptions
Some Claims Against Reversible Computing
Eventual Resolution of Claim
John von Neumann, 1949 – Offhandedly remarks during a lecture that computing
requires kT ln 2 dissipation per “elementary act of decision” (bit-operation).
No proof provided. Twelve years later, Rolf Landauer of IBM tries valiantly to
prove it, but succeeds only for logically irreversible operations.
Rolf Landauer, 1961 – Proposes that the logically irreversible operations that can be
seen to necessarily cause dissipation are irreducible.
Landauer’s argument for irreducibility of logically irreversible operations was
conclusively refuted by Bennett’s 1973 paper (partially presaged by Lecerf).
Bennett’s 1973 construction is criticized for using too much memory.
Bennett devises a more space-efficient version of the algorithm in 1989.
Bennett’s models criticized by various parties for depending on random Brownian
motion, and not making steady forward progress.
Fredkin and Toffoli at MIT, 1980, provide ballistic “billiard ball” model of
reversible computing that makes steady progress.
Various parties including Zurek note that Fredkin’s original classical-mechanical
billiard-ball model is chaotically unstable.
Zurek, 1984, shows that quantum models can avoid the chaotic instabilities.
(Though there are workable classical ways to fix the problem also.)
Various parties propose that classical reversible logic principles won’t work at the
nanoscale, for unspecified or vaguely-stated reasons.
Drexler, 1980’s, designs various mechanical nanoscale reversible logics and
carefully analyzes their energy dissipation.
Carver Mead, CalTech, 1980 – Attempts to show that the kT bound is unavoidable
in electronic devices, via a collection of counter-examples.
No general proof provided. Later he asked Feynman about the issue; in 1985
Feynman provided a quantum-mechanical model of reversible computing.
Various parties point out that Feynman’s model of reversible computing only
supports serial computation.
Margolus at MIT, 1990, demonstrates a parallel quantum model of reversible
computing—but only with 1 dimension of parallelism.
People question whether the various theoretical models can be validated with a
working electronic implementation.
Seitz and colleagues at CalTech, 1985, demonstrate working energy recovery
circuits using adiabatic switching principles.
Seitz, 1985—Has some working circuits, but unsure if arbitrary logic is possible.
Koller & Athas, Hall, and Merkle (1992) separately devise general reversible
combinational logics.
Koller & Athas, 1992 – Conjecture reversible sequential feedback logic impossible.
Younis & Knight @MIT do reversible sequential, pipelineable circuits in 1993-94.
Some computer architects (including anonymous ISCA reviewers) wonder whether
the constraint of reversible logic leads to unreasonable design convolutions.
Vieri, Frank and coworkers at MIT, 1995-99, refute these qualms by demonstrating
straightforward designs for fully-reversible and scalable gate arrays,
microprocessors, and instruction sets.
Some computer science theorists suggest that the algorithmic overheads of
reversible computing might outweigh their practical benefits.
Frank, 1997-2003, publishes a variety of rigorous theoretical analysis refuting these
claims for the most general classes of applications.
Various parties point out that high-quality power supplies for adiabatic circuits seem
difficult to build electronically.
Frank, 2000, suggests microscale/nanoscale electromechanical resonators for highquality energy recovery with desired waveform shape and frequency.
Frank, 2002—Briefly wonders if synchronization of parallel reversible computation
in 3 dimensions (not covered by Margolus) might not be possible.
Later that year, Frank devises a simple mechanical model showing that parallel
reversible systems can indeed be synchronized locally in 3 dimensions.
Adiabatic Circuits and
Reversible Computing
Commonly Encountered Myths,
Fallacies, and Pitfalls
(in the Hennessy-Patterson tradition)
Myths about Adiabatic Circuits
& Reversible Computing
• “Someone proved that
computing with <<kT
free-energy loss per bitoperation is impossible.”
• “Physics isn’t reversible.”
• “An energy-efficient
adiabatic clock/power
supply is impossible to
build.”
• “True adiabaticity doesn’t
require reversible logic.”
• “Sequential logic can’t be
done adiabatically.”
• “Adiabatic circuits require
many clock/power rails
and/or voltage levels.”
• “Adiabatic design is
necessarily difficult.”
Fallacies about Adiabatic Circuits
and Reversible Computing
• “Since speed scales with
energy dissipation in
adiabatic circuits, they
aren’t good for highperformance
computing.”
• “If I tried and failed to
invent an efficient
adiabatic logic, it must
be impossible.”
• “The algorithmic
overheads of reversible
computing mean it can
never be cost-effective.”
• “Since leakage gets
worse in nanoscale
devices, adiabatics is
doomed.”
Pitfalls in Adiabatic Circuits and
Reversible Computing
• Assuming that the best
• Using diodes in the
reversible and
charge-return path.
irreversible algorithms
• Forgetting to obey one of
are similar.
the transistor rules.
• Failing to optimize the
• Using traditional models
degree of reversibility of
of computational
a design.
complexity.
• Ignoring charge leakage
• Restricting oneself to an
in low-power/adiabatic
design.
asymptotically inefficient
design style.
Adiabatic/Reversible Computing
Basic Models and Concepts
Bistable Potential-Energy Wells
• Consider any system having an adjustable,
bistable potential energy surface (PES) in its
configuration space. (Landauer ’61)
• The two stable states form a natural bit.
– One state represents 0, the other 1.
• Consider now the P.E. well having
two adjustable parameters:
– (1) Height of the potential energy barrier
relative to the well bottom
– (2) Relative height of the left and right
states in the well (bias)
0
1
Possible Parameter Settings
• We will distinguish six qualitatively
different settings of the well parameters, as
follows…
Barrier
Height
Direction of Bias Force
One Mechanical Implementation
State
knob
Rightward
bias
spring
Barrier
wedge
spring
Barrier up
Barrier down
Leftward
bias
Possible Adiabatic Transitions
• Catalog of all the possible transitions in
these wells, adiabatic & not...(Ignoring superposition states.)
1
leak
0
0
0
1
1
leak
Barrier
Height
0
N
Direction of Bias Force
1
“1”
states
“0”
states
Ordinary Irreversible Logics
• Principle of operation: Lower a barrier, or not,
based on input. Series/parallel combinations of
barriers do logic. Major
1
dissipation in at least one of
the possible transitions.
Input
changes,
barrier
lowered
0
0
• Amplifies input signals.
Example: Ordinary CMOS logics
Output
irreversibly
changed to 0
Ordinary Irreversible Memory
• Lower a barrier, dissipating stored information.
Apply an input bias. Raise the barrier to latch
the new information
Retract
1
into place. Remove input
input
bias.
Dissipation
Retract
input
0
Example:
ordinary
DRAM
Barrier
up
0
(3)
Input
“0”
0
(2)
here can be
made as low
as kT ln 2
(1)
N
Barrier
up
Input
“1”
(2)
1
1
Input-Bias Clocked-Barrier Logic
• Cycle of operation:
– (1) Data input applies bias
• Add forces to do logic
Can amplify/restore input signal
in the barrier-raising step.
– (2) Clock signal raises barrier
– (3) Data input bias removed
(3)
1
1
(4)
Can reset latch
reversibly (4)
given copy of
contents.
(3)
0
0
(2) (4)
(4)
(4)
Examples: Adiabatic
QDCA, SCRL latch, Rod
logic latch, PQ logic,
Buckled logic
(2)
(1)
0
(4)
N
(1)
(4)
1
Input-Barrier, Clocked-Bias Retractile
• Cycle of operation:
– (1) Inputs raise or lower barriers
• Do logic w. series/parallel barriers
• Barrier signal amplified.
• Must reset output prior to
changing input.
• Combinational logic only!
– Clock applies bias force, which changes state, or not
0
0
0
(1) Input barrier height
Examples:
Hall’s logic,
SCRL gates,
Rod logic interlocks
0
N
(2) Clocked force applied 
1
Input-Barrier, Clocked-Bias Latching
● Cycle of operation:
1. Input conditionally lowers barrier
•
Do logic w. series/parallel barriers
2. Clock applies bias force; conditional bit flip
3. Input removed, raising the barrier &
(4)
locking in the state-change
(4)
4. Clock
0
0
bias can 0
(2)
(2)
retract
(1)
Examples: Mike’s
4-cycle 2-level adiabatic
CMOS logic (2LAL)
(2)
0
N
(2)
1
(3)
1
Full
Classical-Mechanical
Model
Claim: The following components
Sleeve
are sufficient for a complete,
scalable, parallel, pipelinable,
linear-time, stable, classical
(a)
reversible computing system:
(a) Ballistically rotating flywheel
driving linear motion.
(b) Scalable mesh to synchronize
local flywheel phases in 3-D.
(b)
(c) Sinusoidal to flat-topped
waveform shape converter.
(d) Non-amplifying signal inverter
(NOT gate).
(d)
(e) Non-amplifying OR/AND gate.
(f) Signal amplifier/latch.
Primary drawback: Slow propagation
speed of mechanical (phonon) signals.
(c)
(f)
(e)
cf. Drexler ‘92
Adiabatic electronics &
CMOS implementations
Conventional Gates are Irreversible
• Logic gate behavior (on receiving new input):
– Many-to-one transformation of local state!
– Required to dissipate bT, by Landauer principle
– Incurs ½CV2 dissipation in 2 out of 4 cases.
Example:
Static CMOS Inverter:
in
out
Transformation of local state:
Just before
transition:
in out
0 0
0 1
1 0
1 1
After
transition:
in out
0
1
1
0
Exact formula:



Ediss  f 1  f e 1/ f  1  CV 2
for frequency reduction
f : RC/t
Common Mistakes to Avoid
In Adiabatic Design
Common Mistakes to Avoid:
• Don’t use diodes in charge-return path!
– The built-in voltage drop kills adiabaticity.
• Don’t disobey adiabatic transistor rules by either:
– Turning on transistor with voltage across it
– Turning off transistor with current thru it!
• This one is often neglected!
• Use mostly-reversible logic!
– Optimize degree of reversibility for application
• Don’t over-constrain the design family!
– Asymptotically efficient circuits should be possible
Adiabatic Rules for Transistors
• Rule 1: Never turn on a transistor if it has a nonzero voltage
across it!
– I.e., between its source & drain terminals.
– Why: This erases info. & causes ½CV2 disspation.
• Rule 2: Never apply a nonzero voltage across a transistor even
during any onoff transition!
– Why: When partially turned on, the transistor has relatively
low R, gets high P=V2/R dissipation.
– Corollary: Never turn off a transistor if it has a nonzero
current going through it!
• Why: As R gradually increases, the V=IR voltage drop
will build, and then rule 2 will be violated.
Adiabatic Rules, continued…
• Transistor Rule 3: Never suddenly change the voltage
applied across any on transistor.
– Why: So transition will be more reversible; dissipation
will approach CV2(RC/t), not ½CV2.
Adiabatic rules for other components:
• Diodes: Don’t use them at all!
– There is always a built-in voltage drop across them!
• Resistors: Avoid moderate network resistances, if poss.
– e.g. stay away from range >10 k and <1 M
• Capacitors: Minimize, reliability permitting.
– Note: Dissipation scales with C2!
Transistor Rules Summarized
Legal adiabatic transitions in green. (For n- or p-FETs.)
Dissipative states and transitions in red.
high
high
off
high
low
low
on
low
high
high
off
low
off
low
off
high
low
on
high
low
on
low
on
high
SCRL: Split-level Charge
Recovery Logic
The First Pipelined Fully-Adiabatic
CMOS Logic
(Younis & Knight, MIT, ’94)

Transformation of local state:
Just before
transition:
After
transition:
in out
0 ½
1 ½
in out
0 1
1 0
Input-Barrier, Clocked-Bias Retractile
* Must reset output
prior to input.
* Combinational logic
only!
• Cycle of operation:
– Inputs raise or lower barriers
• Do logic w. series/parallel barriers
– Clock applies bias force which changes state, or not
Examples:
Hall’s logic,
SCRL gates,
Rod logic interlocks
0
0
0
Input barrier height
0
N
Clocked force applied 
1
Retractile Logic w. SCRL gates
• Simple combinational logic of any depth N:
– Requires N timing phases
– Non-pipelined
– No sequential reuse of
HW (even worse)
• We need
sequential
logic!
Time 
Sequential Retractile Logic
• Approach #1 (Hall ‘92):
– After every N stages, invoke an irreversible latch
• stores the output of the last stage
– Then, retract all the stages,
– and begin a new cycle
• Problems:
– Reduces dissipation by at most a factor of N
– Also reduces HW efficiency by order N!
• In worst case, compared to a pipelined, sequential circuit
• Approach #2 (Knight & Younis, ‘93):
– The “store output” stage can also be reversible!
– Gives fully-adiabatic, sequential, pipelined circuits!
• N can be as small 1 or 2 & still have arbitrarily high Q
Simple Reversible CMOS Latch
• Uses a standard CMOS transmission gate
• Sequence of operation:
(1) input initially matches latch contents (output)
(2) input changesoutput changes (3) latch closes
(4) input removed
P
in
out
P
Before
input:
in out
a a
Input
arrived:
in out
a a
b b
Input
removed:
in out
a a
a b
Resetting a Reversible Latch
• Can reversibly unlatch data as follows:
(exactly the reverse of the latching process)
– (1) Data value d stored on memory node M.
– (2) Present an exact copy of d on input.
– (3) Open the latch (connecting input to M).
• No dissipation since voltage levels match
– (4) Retract the copy of d from the input.
• Retracts copy stored in latch also.
Input-Bias Clocked-Barrier Logic
• Cycle of operation:
Can amplify/restore input signal
in clocking step.
– Data input applies bias
• Add forces to do logic
– Clock signal raises barrier
– Data input bias removed
Can reset latch
reversibly given
copy of contents.
Retract
input
0
Examples: Adiabatic
QDCA, SCRL latch, Rod
logic latch, PQ logic,
Buckled logic
1
Clock
barrier
up
0
Clock up
Input
“0”
0
1
Retract
input
N
Input
“1”
1
SCRL 6-tick clock cycle
Initial state: All gates off, all nodes neutral.
in
out
SCRL 6-tick clock cycle
Tick #1: Input goes valid, forward T-gate opens.
in
out
SCRL 6-tick clock cycle
Tick #2: Forward gate charges, output goes valid.
(Tick #1 of subsequent gate.)
in
out
SCRL 6-tick clock cycle
Tick #3: Forward T-gate closes, reverse gate charges.
in
out
SCRL 6-tick clock cycle
Tick #4: Reverse T-gate opens, forward gate discharges.
in
out
SCRL 6-tick clock cycle
Tick #5: Reverse gate discharges, input goes neutral.
in
out
SCRL 6-tick clock cycle
Tick #6: Reverse T-gate closes, output goes neutral.
Ready for next input!
in
out
24 ticks/cycle
in this versionincludes 2-level
retractile stages
Some Timing Terminology
For sequential adiabatic circuits:
• 1 Tick: Time for a single ramp transition
– adiabatic speed fraction f times the RC gate delay.
• 1 Phase: Latency for a data value to propagate
forward by 1 pipeline stage.
• 1 Cycle: Minimum period for all timing
information to return back to its initial state.
• Diadic: Two retractile levels per gate
– permits inverting or non-inverting logic.
• Dual rail: Two wires per logic value
– permits universal logic with monodic gates
Monadic:
only 1 level
Some Figures of Demerit
• Some quantities we may wish to minimize:
– Ticks/phase:
• proportional to logic propagation latency
– Ticks/cycle:
• reciprocal to rate of data throughput
– Transistor-ticks/cycle:
• reciprocal to HW cost-efficiency
– Number of required clock/power input signals:
• supplying these may be a significant component of
system cost
– Number of distinct voltage levels required:
• may affect reliability/power tradeoff
Some Interesting Questions
• About pipelined, sequential, fully-adiabatic
CMOS logic:
– Q: Does it require an intermediate voltage level?
• A: No, you can get by with only 2 different levels.
– Q: What is the minimum number of externally
provided timing signals you can get away with?
• A: 4 (12 if split levels are used)
– Q: Can the order-N different timing signals needed
for long retractile cascades be internally generated
within an adiabatic circuit?
• A: Yes, but not statically, unless N2 hardware is used
– where N is the number of stages per full sequential cycle
• We now demonstrate these answers.
Some Timing Examples
See next slide for some detailed timing diagrams.
• N-level retractile cascades:
– 2N ticks/phase × 1 phase/cycle = 2N ticks/cycle
• 3-phase fully-static diadic SCRL
– 8 ticks/phase × 3 phases/cycle = 24 ticks/cycle
• 2-phase fully-static monadic SCRL
– 5 ticks/phase × 2 phases/cycle = 10 ticks/cycle
• 2-phase fully-static diadic SCRL
– 6 ticks/phase × 2 phases/cycle = 12 ticks/cycle
• 6 tick/cycle dynamic SCRL detailed previously:
– 1 tick/phase × 6 phases/cycle = 6 ticks/cycle
Some SCRL timing diagrams
Reversible / Adiabatic Chips
Designed @ MIT, 1996-1999
By the author and other then-students in the MIT Reversible Computing group,
under AI/LCS lab members Tom Knight and Norm Margolus.
2LAL: 2-Level Adiabatic Logic
A Novel Alternative to SCRL
2LAL: 2-level Adiabatic Logic
(Implementable using ordinary CMOS transistors)
P
simplified T-gate symbol:
• Use
• Basic buffer element:
– cross-coupled T-gates
• Only 4 timing signals,
4 ticks per cycle:
:
1
in
0
– i rises during tick i
– i falls during tick (i+2) mod 4
out
P
0
1
2
3
Tick #
0 1 2 3
P
2LAL Cycle of Operation
Tick #0
Tick #1
in1
in
Tick #2
11
in0
Tick #3
10
out1
01
in=0
01
00
11
out0
out=0
00
2LAL Shift Register Structure
• 1-tick delay per logic stage:
1
2
3
0
in
out
0
1
2
3
• Logic pulse timing & propagation:
0 1 2 3 ...
in
in
0 1 2 3 ...
More complex logic functions
• Non-inverting Boolean functions:


A
B
A
A
B
AB
AB
• For inverting functions, must use quad-rail
logic encoding:
– To invert, just
swap the rails!
• Zero-transistor
“inverters.”
A=0
A0
A0
A1
A1
A=1
Hardware Efficiency issues
• Hardware efficiency: How many logic
operations per unit hardware per unit time?
• Hardware spacetime complexity: How much
hardware for how much time per logic op?
• We’re interested in minimizing:
(# of transistors) × (# of ticks) / (gate cycle)
• SCRL inverter, w. return path:
– (8 transistors)  (6 ticks) = 48 transistor-ticks
• Quad-rail 2LAL buffer stage:
– (16 transistors)  (4 ticks) = 64 transistor-ticks
More SCRL vs. 2LAL
• SCRL reversible NAND, w. all inverters:
– (23 transistors)  (6 ticks) = 138 T-ticks
• Quad-rail 2LAL AND:
– (48 transistors)  (4 ticks) = 192 T-ticks
• Result of comparison: Although 2LAL
minimizes # of rails, and # ticks/cycle, it does
not minimize overall spacetime complexity.
• The question of whether 6-tick SCRL
minimizes per-op spacetime complexity among
pipelined adiabatic CMOS logics is still open.
Minimizing Power-Clock Signals
• How many external clock signals required?
– N-level-deep retractile cascade logic:
• 2N waveforms × 1 phase = 2N signals
– 6 tick/cycle, 6-phase dynamic SCRL:
• 6 waveforms × 6 phases = 36 signals
– 24 tick/cycle, 3-phase static SCRL:
• 12 waveforms × 3 phases = 36 signals
– 4 tick/cycle, 2LAL:
• 1 waveform × 4 phases = 4 signals!
• It turns out that 12 signals are sufficient to
implement any combination of 2-level or 3level logics (including retractile) on-chip!
How to Do It
• Circular 2LAL shifter; pulse-gated clocks
P1
0
P2
P3
P0
in
out
P0
P1
2
2
P2
2
P3
P0
P1
P2
P3
0
1
2
3
Tick #
0 1 2 3
•
12-rail
system:
pros
&
cons
Pros:
– Completely solves adiabatic timing design problem
– Enables mixtures of retractile, SCRL, and other logic
styles on 1 chip
– Enables simple fully-adiabatic SRAM & DRAM
• Cons:
– Timing signals are dynamic
– Known fully-static alternatives use order N2 gates and
signals for N-tick-long cycles
• N can be large in a chip that includes deep retractile networks
– Energy waste in driving the source/drain junction
capacitances of all the T-gates even when timing pulse
isn’t present
• SOI reduces these parasitics
GCAL: General CMOS Adiabatic Logic
• A general CMOS adiabatic design methodology
– Currently under development at UF
• Combines best features of SCRL, 2LAL, and retractile logics:
– Permits designs attaining asymptotically optimal cost-efficiency
• For any combination of time, space, spacetime, energy costs
– Arbitrarily high degree of reversibility
– Permits using minimal 2-level and 3-level adiabatic gates
– Requires only 4 externally supplied clock/power signals for 2-level logic
• And only 12 total for mixed 2-level + 3-level logic
– Supports mixtures of fully-pipelined and retractile logic.
– Supports quiescent dynamic/static latches & RAM cells
• Tools currently under development:
– A new HDL specialized for describing adiabatic designs
– Digital circuit simulator with adiabaticity checker
– Adiabatic logic synthesis tool, with automatic legacy design converter
GCAL DRAM/SRAM cells
• GCAL DRAM cell
– 4 transistors
– 4 word lines/row
– 2 bit lines/col (or 1)
• GCAL SRAM cell
– 8 transistors
– 6 word lines/row
– 2 bit lines/col (or 1)
DRAM Cell Write Cycle
1. All nodes initially ½.
•
T-gate initially closed (off).
2. Transmission gate opens.
•
Internal node is connected to
bit-line (at matching voltage).
3. Bit line transitions to 0 or 1.
•
Pulls internal node to matching level.
4. Transmission gate closes.
•
Internal node latched to new level.
5. Bit line transitions back to ½.
•
Prepares for a new cycle.
Use the reverse sequence of operations to unwrite.
DRAM Cell Read Cycle
1. All external nodes initially ½.
•
•
T-gate initially off.
Internal node contains data.
2. Inverter rails split.
•
Bit line set to (inverted) data.
3. T-gate at end of column latches bit-line data.
4. Inverter rails merge.
•
Bit line restored to ½ level.
Can use the reverse sequence of operations to
unread copy of data available at end of column.
Fully-Adiabatic DRAM cell
• 6T, 6 lines/row, 1 line/column (in/out together)
• Read cycle:
–
–
–
–
–
Initially:  lines neutral, out neutral, R off
R for desired row turns on
 for desired row splits, driving out column
R turns off, out is read
 merges, out is reset
• Write cycle:
–
–
–
–
First, do read cycle.
in is set to out
W turns on
in changed to new value...
Fully-Adiabatic SRAM
• 10-T, 10 lines/row, 1 line/column
• Operation similar to DRAM, except:
• Read-out:
T2 off; N2 retracts; T3 on; N2 asserts; T2 on, T3 off
• Write:
T2 off; N2 retracts; N1 retracts, copy of M presented
on input; T1 on; in
changes; T1 off, N1
N1
N2
asserts; N2 asserts; T2 on
T1
in
M T2
T3
out
Limits of Adiabatics
Structured Systems
• A structured system is defined as a system
about whose state we have some knowledge.
– Some of its physical information is known.
–  Its entropy is not at a maximum (by defn.).
–  It is not at equilibrium (by defn.).
• For states with a given energy E,
– we say the system’s energy is distributed among
those states, in proportion to their probability.
The system’s
energy is
“in here”
States w.
prob. > 0
All states
of the abstract
system having
energy E
Desired Trajectories
Time
• Any structured system
we build to serve
some purpose
ConfigDesired trajectories
has some
uration
desired
trajectory, or set of
trajectories, through its configuration space that
we would ideally like it to follow at all times.
– Think of any given state as having a specific
“desirability” at any given time.
Energy Losses
• Energy dissipation can be viewed as a departure
of part of the system’s energy away from the
system’s desired trajectory.
Time
6
• E.g., 1 of 10 electrons
leaks out of a
ConfigDRAM cell =
system’s energy has uration
departed from desired
Energy that has
trajectory (all 106 stay)
departed from desired
by a small amount
trajectories
Limits of Adiabatics I:
Friction
Generalized Friction
• Any force leading to departure from desired
trajectory that obeys the adiabatic principle:
– I.e., force strength (& total energy loss) is
proportional to velocity along trajectory at low
velocities
• Examples:
–
–
–
–
–
Ordinary sliding friction
Fluid viscosity
Electrical resistance
Forces causing electromagnetic radiative losses
Forces causing losses in inelastic collisions
Ways to Quantify Friction
Normal friction measures referring to length,
mass, etc. may not apply to all processes.
For a given mechanism executing a specified
process (i.e., following a specified desired
trajectory or -ies) over a time t:
• Energy coefficient: cE = Elost·t = Elost/q
– Energy dissipated from traj. per unit of “quickness”
• Note quickness q = 1/t has units like Hz
• Entropy coefficient: cS = Smade·t = Smade/q
– New entropy generated per unit of quickness
• Note that cE = cS·T at temperature T.
What matters!
Energy Coefficient in Electronics
• For charging capacitive load C by voltage V
through effective resistance R:
cE = Elostt = (CV2RC/t)t = C2V2R
• If the resistances are voltage-controlled
switches with gain factor k controlled by the
same voltage V, then effective R  1/kV
cE = C2V/k
• In constant-field-scaled CMOS, k  1/dox  ,
C  , and V  , so
cE  3/ = 4; Elost = cE/t  4/ = 3
(like CV2 energy)
Degree of Reversibility of CMOS
• What is the Q of a min-size CMOS transistor?
Q = Efree/Ediss
= Efree/(cE/t)
= ½CV2/(C2V2R/t)
= ½(t/RC)
=½s
(s = slowdown factor)
• Note: Using transistors wider than minimumsize (larger C, smaller R) wouldn’t change RC
or Q, and would increase overall dissipation by
increasing cE.
Lower Bounds on Friction?
• No general (technology-independent) lower
bounds on friction coefficients for interesting
types of processes (e.g. computation) are
currently known.
• Clever engineering may eventually reduce the
friction in desired processes to values as small
as is desired.
• Some ways:
– Reduce number of moving parts (or particles)
– Isolate “moving parts” of system from unwanted
interactions w. environment
Entropy coefficients of some
reversible logic gate operations
From Frank, “Ultimate theoretical models of
nanocomputers” (Nanotechnology, 1998):
• SCRL, circa 1997:
~1 b/Hz
• Optimistic reversible CMOS: ~10 b/kHz
• Merkle’s “quantum FET:”
~1.2 b/GHz
• Nanomechanical rod logic:
~.07 b/GHz
• Superconducting PQ gate:
~25 b/THz
• Helical logic:
~.01 b/THz
How low can you go? We don’t really know!
Is Adiabatic Limit Achievable?
• Even if there is some lower bound on cS, it
seems we can have Smade 0 as t  .
• What factors may prevent this?
– Any lower bound >0 on the number of irreversible
bit-operations performed. (Each has Smade  1.)
• Fortunately, the lower bound can always be made 0.
– Any lower bound on the rate of energy leakage,
even when system is completely stopped.
– Any upper bound on the Q of the clocking &
synchronization system.
• The system dissipates Efree/Q on every cycle.
• No technology-independent upper bounds on Q known
Some Synonyms
• Leakage of energy or (equivalently) probability
mass out of a desired configuration or
trajectory.
• Occurrence of errors in the desired analog or
digital state of a system. (Motion away from
desired states.)
• Decay of structure of a structured system. (The
state departs from desired state.)
Leakage = Error = Decay
Perfect Mechanisms?
• If a structured system is perfectly closed,
– I.e. non-interacting with other systems, at all!
– And if its internal interactions are perfectly known,
• Then, and only then, is its (von Neumann)
entropy going to be a constant.
• Otherwise, its entropy will continuously
increase as we lose track of its state.
– In this case, no mechanism is perfect, in that some
of its energy (i.e. some probability mass) is always
leaking away from the desired trajector(y/ies) at
some nonzero base rate, even when the rate of
system’s progress along its trajectory is zero.
Leakage Limits
• Claim: No real, structured system can have
absolutely zero rate of energy leakage out of its
desired trajectories, even if not moving.
• However: No general,
Time
technology-independent
lower bound on
Configleakage rates
uration
is known (other
than zero.)
Energy that has
• Engineering advances might
leaked from desired
make leakage as small as desired.
configuration
Quantifying Leakage
• For a given structured system:
• Leakage power:
Pleak = dEleak / dt
• Spontaneous entropy generation rate:
•
Sleak = dSleak / dt
•
• Again, note Pleak = Sleak · T at temperature T.
Ways to Decrease Leakage
• Have high potential-energy barriers
– slows down thermally excited leakage exponentially
• Have thick potential-energy barriers
– slows down quantum tunneling exponentially
• Example: Older generations of CMOS!
• Mechanical (clockwork) systems have high
potential energy barriers, for their size:
– Decay may require atoms to diffuse out of tightlybonded spots.
– Mechanisms that avoid making/breaking contacts
(e.g. buckled logic) avoid losses due to stiction.
Limits of Adiabatics II:
Leakage
Minimum Losses w. Leakage
topt 
Pleak
Sleak

cE
cS
Etot = Eadia + Eleak
Eleak = Pleak·tr
 2 Pleak cE
 2T Sleak cS
Eadia = cE / tr
Minimum Loss Derivation
d Etot d( Eadia  Eleak ) d(cE / t r ) d( Pleak  t r )



d tr
d tr
d tr
d tr
 cE / t  Pleak
2
r
 0 (when Etot is minimum)
cE  Pleak  t
2
r
t  cE / Pleak
2
r
t r  cE / Pleak
Emin  cE / t r  Pleak  t r
 cE Pleak / cE  Pleak  cE / Pleak
 Pleak cE  Pleak cE
 2 Pleak cE
Leakage in CMOS
• In a given technology with constant-field
scaling, leakage becomes worse at smaller
scales because:
– Energy barriers between states are lower
• Higher rates of thermally-induced leakage, at given T
• Higher rates of quantum tunneling (temp.-independent)
– Energy barriers between states are narrower
• Higher rates of quantum tunneling
– These effects get worse exponentially with 1/length
(doubly-exponentially with time)
• Need alt. technologies w. high energy barriers!
Future Techs. w. Low Leakage?
• How can we achieve low entropy coefficients in
minimum-scale (atomic-scale) devices?
– Need high energy barriers:
• Can achieve using atomic (not just electronic)
interactions:
– E.g. mechanical logics (rod logic, buckling logic)
– If strong bonds (e.g. C-C) are used in structure, rates of
unwanted bond breakage can made be very low.
– Rate for an atom passing through another one (e.g. knobs in rod
logic) is extremely low due to
» height of barrier: strength of Coulombic & fermionic
repulsion between electrons, &
» width of barrier: large number of particles involved
• Other possibilities?
Minimum Dissipation with Variable V
Emin  2 Plk cE
2

1
2
Plk  I lkV

kV 3e V / T 2C 2V / k

4 V / T
2 C V e
2
2 V / 2T
 2CV e
• Notice that this function of V
approaches 0 exponentially as V→∞,
– This is even true if we scale CV!
• Thus, there is no lower limit to the energy
dissipation of adiabatic field-effect circuits!
– The key is to make devices larger, not smaller!
• Device sizes need grow only logarithmically.
3 V / T
 kV e
1
2
I lk  I max e
V / T
 12 kV 2 e V / T
cE  C 2V 2 R
 2C 2V / k
R  V / I max
 2 / kV
I max  12 kV 2
Maximum Q factor in terms of V
• The maximum logic Q factor is the maximum
ratio between the energy involved in carrying
out a logical transition, and the energy
dissipated by the circuit during the transition.
– We just calculated the minimum energy dissipated.
• Thus, Qmax = Einvolved/Ediss,min
= ½CV2/(2CV2exp(−V/2φT))
= (1/4)exp(V/2φT)
– Note that the maximum logic Q-factor goes up
exponentially with the logic-swing voltage V.
Minimum energy & Roff/Ron ratio
• (A simpler version of earlier derivation.) Note that:
cE = C2V2Ron
and if the dominant leakage mode is source/drain, then:
Pleak = V2/Roff
• So putting the two together:
cEPleak = C2V4(Ron/Roff)
Emin = 2(cEPleak)1/2 = 2CV2(Ron/Roff)1/2
• So we can rederive the maximum logic Q as follows:
Qmax = ½CV2 / (2CV2(Ron/Roff)1/2)
= ¼(Roff/Ron)1/2
= ¼(Imax/Ileak)1/2
= ¼(ron/off)1/2
Limits of Adiabatics III:
Clock/Power Supplies
See transparencies.
Timing in Adiabatic Systems
When multiple adiabatic devices interact, the relative timing must be
precise, in order to ensure that adiabatic rules are met.
• There are two basic approaches to timing:
– Global (a.k.a. clocked, a.k.a. synchronous) timing:
• This is the approach in nearly all conventional irreversible CPUs.
• Also is the basis for all practical adiabatic and quantum computing
mechanisms that have been proposed to date.
– Local (a.k.a. self-timed, a.k.a. asynchronous) timing:
• Implemented in a few commercial irreversible chips.
• Feynman ‘86 showed that a self-timed serial reversible computation
was implementable in QM, in principle.
• Margolus ‘90 extended this to a 2-D model with 1-D of parallelism. Can it still work in a full 3-D, fully-quantum-mechanical?
– Indications from considering classical-mechanical 3D meshes of coupled
oscillators is “yes.”
Global Timing
• Examples of adiabatic systems designed on the
basis of global, synchronous timing:
–
–
–
–
–
Adiabatic CMOS with external power/clock rails
Superconducting parametric quantron (Likharev)
Adiabatic Quantum-Dot Cellular Automaton (Lent)
Adiabatic mechanical logics (Merkle, Drexler)
All proposed quantum computers
• A potential problem: Synchronous timing may
not scale well to large machine sizes.
– Work by Janzig & others raises issues of possible
limits on timing systems due to quantum uncertainty.
• Issue is still unresolved.
Clock/Power Supply Desiderata
• Here are some requirements for a good adiabatic timing
signal / power supply for driving voltage-coded logic:
– Generate a trapezoidal voltage waveform with very flat
high/low regions.
• Needed to avoid current through transistors when turning them off
• The flatness of the signal limits the maximum Q factor of the logic.
• Waveform during the high↔low transitions should ideally be linear,
– But this does not affect the maximum logic Q, only the energy coefficient.
» So long as ramp slope scales down everywhere with transition time.
– Operate resonantly with the logic circuit, with a high Q factor.
• The power supply’s Q will limit the overall system Q
– If possible, scale Q  t (cycle time)
• Required to be considered an adiabatic mechanism.
• May conflict w. inductor scaling laws!
• At the least, Q should still be high at leakage-limited speed
– Have a reasonable cost, compared to the logic it powers.
– Be scalable to large meshes of mutually synchronized devices.
Supply concepts in my research
• Superpose several sinusoidal signals from
phase-synchronized oscillators at harmonics of
fundamental frequency
– Weight these frequency components as per Fourier
transform of desired waveform
• Create relatively high-L integrated inductors via
vertical, helical metal coils
– Only thin oxide layers between turns
• Use mechanically oscillating, capacitive MEMS
structures in vacuo as high-Q (~10k) oscillator
– Use geometry to get desired wave shape directly
Early supply concepts
• Inductors & switches.
– See transparency.
• Stepwise charging.
– See transparency.
Newer Supply Concepts
• Transmission-line-based adiabatic resonators.
– See transparency.
• MEMS-based resonant power supply
– See next couple of slides
• Ideal adiabatic supplies - Can they exist?
– Idealized mechanical model: See transparency.
– But, there may be quantum limits to
reusability/scalability of global timing signals.
• This is a very fundamental issue!
MEMS/NEMS Resonators
A Novel Clock/Power Supply
Technology for Adiabatic Circuits
MEMS/NEMS Resonators
• State of the art of technology demonstrated in lab:
– Frequencies up to the 100s of MHz, even GHz
– Q’s >10,000 in vacuum, several thousand even in air!
• Rapidly becoming
technology of choice
for commercial RF
filters, etc., in
communications
SoC (Systems-ona-Chip) e.g. for
cellphones.
U. Mich., poly, f=156 MHz, Q=9,400
34 µm
A MEMS Supply Concept
• Energy stored
mechanically.
• Variable coupling
strength → custom
wave shape.
• Can reduce losses
through balancing,
filtering.
Limits of Adiabatics:
Summary
Some final remarks.
Summary of Limiting Factors
When considering adiabaticizing a system, ask:
• What fraction of system power is in logic? fL
– Vs. Displays, transmitters, propulsion.
• What fraction of logic is done adiabatically? fa
– Can be all, but w. cost-efficiency overheads.
• How large is the Ion/Ioff ratio of switches?
– Affects leakage & minimum adiabatic energy.
• What is the Qsup of the resonant power supply?
• What is the relative cost of energy / logic? r$
– E.g. decreasing power cost by r$ by increasing
HW cost by  r$ will not help.
“Power premium”
Minimizing cost/performance
•
•
•
•
•
$P = Cost of power in original system
$H = Cost of logic HW in original system
$P = r$$H; $H = $P/r$
For cost-efficiency inverse to energy savings:
$tot,min = $Pr$−1/2 + $Hr$1/2
= 2 $Pr$−1/2
• $tot,orig = $P + $H = (1+r$)$H = ((1+r$)/r$) $P
• $tot,orig/$tot,min = ½(1+r$)r$−1/2
 ½r$1/2 for large r$
Summary of adiabatic limits
• Cost-effective adiabatic energy savings factor:
Sa = Econv / Eadia in cost-effective adiabatic system
• Some rough upper bounds on Sa:
Sa  ~ 1/(1fL)
Sa  ~ 1/(1fa)
Sa  ~ ¼(ron/off)1/2
Sa  Qsup
Sa  ~ r$1/2 (worse for non-ideal apps)
• But, this ignores benefits from adiabatics of
denser packing & smaller communications
delays in parallel algorithms. (More later.)