Transcript Document

Reversible Computing Theory I:
Reversible Logic Models
Reversible Logic Models
• It is useful to have a logic-level (Boolean) model
of adiabatic circuits.
• Can model all logic using maximally pipelined
logic elements, which consume their inputs.
– A.k.a., “input consuming” gates.
– Warning: In such models Memory requires
recirculation! Thus, this is not necessarily more
energy-efficient in practice for all problems than
retractile (non-input consuming) approaches!
• There is a need for more flexible logic models.
• If inputs are consumed, then the inputoutput
logic function must be invertible.
Input-consuming inverter:
in
out
Input arrow indicates input
data is consumed by element.
Alternate symbol:
in
out
(Symmetric)
E.g. SCRL implementation:
Before:
in out
0
1
-
After:
in out
1
0
Invertible!
An Irreversible Consuming Gate
• Input-consuming NAND gate:
Before:
A
A B out
out
0 0 B
0 1 4 possible inputs, 2
possible outputs.
1 0 At least 2 of the 4 possible
1 1 input cases must lead to
After:
A B out
- - 1
- - 0
dissipation!
• Because it’s irreversible, it has no
implementation in SCRL (or any fully adiabatic
style) as a stand-alone, pipelined logic element!
NAND w. 1 input copied?
• Still not invertible:
Before
After
A B A’ out A B A’ out
Delay buffer
0 0 - - - 0 1
A’
0 1 - - - 1 1
A
1 0 - - - 1 0
out
B
1 1 - • At least 1 of the 2 transitions to the A’=0, out=1
final state must involve energy dissipation of
order kBT. How much, exactly? See exercise.
NAND w. 2 inputs copied?
• Finally, invertible! Before:
After:
A B A’ B’ out A B A’ B’ out
A’
0 0 - - - - 0 0 1
A
0 1 - - - - 0 1 1
out
B
1 0 - - - - 1 0 1
B’
1 1 - - - - 1 1 0
• Any function can be made invertible by simply
preserving copies of all inputs in extra outputs.
– Note: Not all output combinations here are legal!
• Note there are more outputs than inputs.
– We call this an expanding operation.
– But, copied inputs can be shared by many gates.
SCRL Pipelined NAND
A
B
A
5T
out = AB
B
Inverters only needed
to restore A, B—
Can be shared with
other gates that take
A, B as inputs.
• Including inverters: 23 transistors
• Not including inverters: 7 transistors
Non-Expanding Gates
• Controlled-NOT (CNOT) or input-consuming XOR:
A
A’
A B A’ C
A
A’
0 0 0 0
B
C = AB
B
0
1
0
1
C = AB
1 0 1 1
Can implement w. a diadic gate in SCRL
1 1 1 0
• Not universal for classical reversible computing.
(Even together w. all other 1 & 2 output rev. gates.)
• However, if we add 1-input, 1-output quantum
gates, the resulting gate set is universal!
• More on quantum computing in a couple of weeks.
Toffoli Gate (CCNOT)
A’=A
B’=B
A
B
C’ = CAB
C
(XOR)
A
A’
B
B’
C
C’
A B C A’ B’ C’
0 0 0 0 0 0
0 0 1 0 0 1
0 1 0 0 1 0
0 1 1 0 1 1
1 0 0 1 0 0
1 0 1 1 0 1
1 1 0 1 1 0
1 1 1 1 1 1
• Subsumes AND, NAND, XOR,
NOT, FAN-OUT, …
• Note that this gate is its own inverse.
• Our first universal reversible gate!
Fredkin Gate
• The first universal reversible logic gate to be
discovered. (Ed Fredkin, mid 70’s)
A
A’
B
B’
C
C’
A B C A’ B’ C’
0 0 0 0 0 0
0 0 1 0 0 1
0 1 0 0 1 0
0 1 1 0 1 1
1 0 0 1 0 0
1 0 1 1 0 1
1 1 0 1 1 0
1 1 1 1 1 1
• B and C are swapped if
A=1, else passed unchanged.
• Is also conservative, conserves 1s and 0s.
– Thus in theory requires no separate power input,
even if 1 and 0 have different energy levels!
Reversible Computing Theory II:
Emulating Irreversible Machines
Motivation for this study
• We want to know how to carry out any
arbitrary computation in a way that is
reversible to an arbitrarily high degree.
– Up to limits set by leakage, power supply, etc.
• We want to do this as efficiently as possible:
– Using as few “device ticks” as possible (spacetime)
• Minimizes HW cost, & leakage losses
– Using as few adiabatic transitions as possible (ops)
• Minimizes frictional losses
• But, a desired computation may be originally
specified in terms of irreversible primitives.
General-Case vs. Special-Case
• We’d like to know two kinds of things:
– For arbitrary general-purpose computations,
• How to automatically emulate them in a fairly efficient
reversible way,
– w/o needing new intelligent/creative design work in each case?
• Topic of today’s lecture
– For various specific computations of interest,
• What are the most efficient reversible algorithms?
– Or at least, the most efficient that we can find?
• Note: These may not necessarily look anything like the
most efficient irreversible algorithms!
– More on this point later
The Landauer embedding
• The obvious embedding of irreversible ops into
“expanding” reversible ones leads to a linear
increase in space through time. (Landauer ‘61)
– Or, increase in width of an input-consuming circuit
“Expanding”
operations
(e.g., AND)
Desired
output
“Garbage”
bits
input
Circuit depth, or time 
Lecerf Reversal
• Lecerf (‘63) was interested in the group-theory
question of whether an iterated permutation of
items would eventually return to initial item.
– Proved undecidable by reducing Turing’s halting
problem to this question, w. a reversible TM.
• Reversible TM reverses direction instead of halting.
• Returns to initial state iff irreversible TM would halt.
• Only problem with this:
No useful output data!
Input
Desired
output
f
f 1
Garbage
Copy of
Input
The Bennett Trick
• Bennett (‘73) pointed out that you could simply
fan-out (reversibly copy) the desired output
before reversing.
– Note: O(T) storage is still temporarily needed!
Desired output
f
f 1
Copy of
Input
Input
Garbage
Improving Spacetime Efficiency
• Bennett ‘73 transforms a computation taking
spacetime S·T to one taking (S·T2) spacetime
in the worst case.
– Can we do better?
• Bennett ‘89: Described a technique that takes
spacetime (S  T log2 3  log T )  (S  T 1.59  log T )
– Actually, can generalize slightly and arrange for
exponent on T to be 1+, where 0 (very slowly)
• Lange, McKenzie, Tapp ‘97: Space (S) is
even possible, if you use time (exp((S)))
– Not any more spacetime-efficient than Bennett.
Reversible “Climbing Game”
• Suppose a guy armed with a hammer,
N spikes, & a rope is trying to climb a
cliff, while obeying the following rules.
– Question: How high can he climb?
• Rules:
– Standing on the ground or on a spike, he can
insert & remove a spike 1 meter higher up.
– He can raise & lower himself between
spikes & the ground using his rope.
• He can’t insert or remove a spike while
dangling from a higher spike!
– Maybe not enough leverage/stability?
Analogy w. Emulation Problem
• Height on cliff represents:
– How many steps of progress have
we made through the irreversible
computation?
• Number of spikes represents:
– Available memory of reversible machine.
• Spike in cliff at height H represents:
– Using a unit of memory to record the
state of the irreversible machine after H steps.
• Adding/removing a spike at height H+1
if there is a spike is at height H represents:
– Computing/uncomputing
state at H+1 steps given state at H.
Let’s Climb!
0. Standing on ground.
1. Insert spike @ 1.
2. Insert spike @ 2.
3. Remove spike @ 1.
4. Insert spike @ 3.
5. Insert spike @ 4.
6. Remove spike @ 3.
7. Insert spike @ 1.
8. Remove spike @ 2.
9. Remove spike @ 1.
10. Can use remaining
3 spikes to climb up
another 4 if desired!
How high can we climb?
• Using only N spikes, and the strategy
illustrated, we can climb to height 2N1 (wow!)
• Li & Vitanyi: (Theorem) This is the optimal
strategy for this game.
• Open question:
– Are there more efficient general reversiblization
techniques that are not based on this game model?
“Pebble Game” Representation
Triangle representation
k=2
n=3
k=3
n=2
Analysis of Bennett Algorithm
• n = # of recursive levels of algorithm (n+1 spikes)
• k = # of lower-level iterations to go forward 1
higher-level step
• Tr = # of reversible lowest-level steps executed
= 2(2k1)n
• Ti = # of irreversible steps emulated
= kn
• So, n = logk Ti, and so
Tr = 2(2k1)log Ti/log k = 2elog(2k1)log(Ti)/log k
= 2Tilog(2k 1)/log k
Linear-Space Emulation
(Lange, McKenzie, Tapp ‘97)
Unfortunately, the tree may have 2S nodes!
Can we do better?
• Bennett ‘73 takes order-T time, LMT ‘97 takes
order-S space.
– Can some technique achieve both, simultaneously?
• Theorem: (Frank & Ammer ‘97) The problem
of iterating a black-box function cannot be done
in time T & space S on a reversible machine.
– Proof really does cover all possible algorithms!
– The paper also proves loose lower bounds on the
extra space required by a linear-time simulation.
• Results might also be extended to the problem
of iterating a cryptographic one-way function.
– It’s not yet clear if this can be made to work.
One-Way Functions
• …are invertible functions f such that f is easy to
compute (e.g., takes polynomial time) but f 1 is
hard to compute (e.g., takes exponential time).
• A simple example:
– Consider: f(p,q) = pq with p,q prime.
• Multiplication of integers is easy.
• Factoring is hard (except using quantum computers).
– The “one-way-ness” of this function is essential to
the security of the RSA public-key cryptosystem.
• No function has yet been proven to be one-way.
– However, certain kinds of one-way functions are
known to exist if P  NP.
Elements of Frank-Ammer Proof
• Consider a chain of bit-strings (size S each) that
is incompressible by a certain compressor.
– This is easily proven to exist. (See next slide.)
• Machine’s job is to follow this chain from one node to the
next by using a black-box function.
• The compressor can run a reversible machine
backwards, to reconstruct earlier nodes in the
chain from later machine configurations.
– If the reversible machine only uses order-S space in
its configurations, then the chain is compressible!
• Contradicts choice of incompressible chain; QED.
Existence of Incompressibles
• A decompressor or description system s:{0,1}*
maps any bit-string description d to the
described string x.
– Notation f:D means a unary operator on D, f:DD
• x is compressible is s iff d: s(d)=x, |d|<|x|
– Notation |b| means the length of bit-string b in bits.
• Theorem: Every decompressor has an
incompressible input of any given length .
– Proof: There are 2 length- bit-strings, but only
 1 i

shorter
descriptions.
■
2

2

1
i 0

Cost-Efficiency Analysis
Cost Efficiency
Cost Measures in Computing
Generalized Amdahl’s Law
Cost-Efficiency
• Cost-efficiency of anything is %$ = $min/$,
– The fraction of actual cost $ that really needed to be
spent to get the thing, using the best poss. method.
– Measures the relative number of instances of the
thing that can be accomplished per unit cost,
• compared to the maximum number possible
– Inversely proportional to cost $.
– Maximizing %$ means minimizing $.
• Regardless of what $min actually is.
• In computing, the “thing” is a computational
task that we wish to carry out.
Components of Cost
• The cost $ of a computation may generally be a
sum of terms for many different components:
– Time-proportional (or related) costs:
• Cost to user of having to wait for results
– E.g., missing deadlines, incurring penalties.
– May increase nonlinearly with time for long times.
– Spacetime-proportional (or related) costs:
• Cost of raw physical spacetime occupied by computation.
– Cost to rent the space.
• Cost of hardware (amortized over its lifetime)
– Cost of raw mass-energy, particles, atoms.
– Cost of materials, parts.
• Cost of SW licenses
– Cost of assembly.
• Cost of parts/labor for operation & maintenance.
More cost components
• Continued...
– Area-time proportional (or related) costs:
• Cost to rent a portion of an enclosing convex hull for getting
things in & out of the system
– Energy, heat, information, people, materials, entropy.
• Some examples incurring area-time proportional costs:
– Chip area, power level, cooling capacity, I/O bandwidth, desktop
footprint, floor space, real estate, planetary surface
• Note that area-time costs also scale with the maximum
number of items that can be sent/received.
– Energy expenditure proportional (or related) costs:
• Cost of raw free energy expenditure (entropy generation).
• Cost of energy-delivery system. (Amortized.)
• Cost of cooling system. (Amortized.)
General Cost Measures
• The most comprehensive cost measure includes
terms for all of these potential kinds of costs.
$comprehensive = $Time + $SpaceTime + $AreaTime + $FreeEnergy
• $Time is an non-decreasing function f(tstartend)
– Simple model: $Time  tstartend
t end
• $FreeEnergy is most generally $ FreeEnergy   S (t )$ S (t ) d t
– Simple model: $FreeEnergy  Sgenerated
tstart
• $SpaceTime and $AreaTime are most generally:
$SpaceTime 
– Simple model:
• $SpaceTime  Space  Time
• $AreaTime  Area  Time

$
 Space (t ) d t
tstart
Max # ops that
could be done
Max # items that
could be I/O’d
t end
$AreaTime 
t end
 $
tstart
Area
(t ) d t
Generalized Amdahl’s Law
• Given any cost that is a sum of components,
$tot = $1 + … + $n,
– There are diminishing proportional returns to be
gained from reducing any single cost component (or
subset of components) to much less than the sum of
the remaining components.
• ∴ Design-optimization effort should concentrate
on those cost components that dominate total
cost for the application of interest.
– At a “design equilibrium,” all cost components will
be roughly equal (unless externally driven)
Reversible vs. Irreversible
• Want to compare their cost-efficiency under
various cost measures:
–
–
–
–
Time
Entropy
Area-time
Spacetime
Or, for some applications,
one quantity might be minimized
while another one (space, time, area)
is constrained by some hard limit.
• Note that space (volume, mass, etc.) by itself as
a cost measure is only significant if either:
– (a) The computer isn’t reusable, & so the cost to
build it dominates operating costs, or
– (b) I/O latency  V1/3 affects other costs.
Time Cost Comparison
• For computations with unlimited power/cooling
capacity, and no communication requirements:
– Reversible is worse than irreversible by a factor of
~s>1 (adiabatic slowdown factor), times maybe a
small constant depending on the logic style used.
$r,Time  $i,Time · s
Time Cost Comparison, cont.
• For parallelizable, power-limited applications:
– With nonzero leakage:
$r,Time  $i,Time / Ron/offg
• Worst-case computations: g  0.4
• Best-case computations: g = 0.5.
• For parallelizable, area-limited, entropy-fluxlimited, best-case applications:
– with leakage  0:
$r,Time  $i,Time / d 1/2
– where d is system’s physical diameter.
• (see transparency)
Time cost comparison, cont.
• For entropy-flux limited, parallel, heavily
communication-limited, best case applications:
– with leakage approaching 0:
(details later)
3/4
$r,Time  $i,Time
– where $i,Time scales up with the space requirement V
as
$i,Time  V2/9
– so the reversible speedup scales with the 1/18
power of system size.
• not super-impressive!
Bennett 89 alg. is not optimal
k=2
n=3
k=3
n=2
Just look at all the spacetime it wastes!!!
Parallel “Frank02” algorithm
• We can simply scrunch the triangles closer
together to eliminate the wasted spacetime!
• Resulting algorithm is linear time for all n and k
and dominates Ben89 for time, spacetime, &
energy!
Emulated time
k=2
n=3
k=3
n=2
k=4
n=1
Real time
Setup for Analysis
• For energy-dominated limit,
– let cost “$” equal energy.
• c$ = energy coefficient, r$ = r$(min) = leakage power
• $i = energy dissipation per irreversible state-change
• Let the on/off ratio
Ron/off = r$(max)/r$(min) = Pmax/Pmin.
• Note that
c$  $i·tmin = $i ·($i / r$(max)), so
r$(max)  $i2/c$
• So
Ron/off  $i2 / c$r$(min) = $i2 / c$r$
Time Taken
• There are n levels of recursion.
• Each multiplies the width of the base of the
triangle by k.
• Lowest-level triangles take time c·top.
• Total time is thus c·top·kn.
k=4
n=1
Width 4 sub-units
Number of Adiabatic Ops
• Each triangle contains k + (k  1) = 2k  1
immediate sub-triangles.
• There are n levels of recursion.
• Thus number of adiabatic ops is c·(2k  1)n
k=3
n=2
52 = 25
little triangles
(adiabatic
operations)
Spacetime Usage
• Each triangle includes the spacetime usage of
all k  1 of its subtriangles,
• Plus, k 2 (k  2)(k  1) k 2  3k  2
i 
i 1

2
2
additional spacetime units, each consisting of
n1
1
storage
unit,
for
time
t
·k
k=5
op
1 n=1
2
3
1+2+3 units
1 state of irrev. mach. Being stored
Time top kn-1
Resulting recurrence relation:
ST(k,0) = 1 (or c)
ST(k,n) = (2k1)·ST(k,n1) + (k23k+2)·kn1/2
Reversible Cost
• Adiabatic cost plus spacetime cost:
$r = $a + $r = (2k-1)n·c$/t + ST(k,n)·r$t
• Minimizing over t gives:
$r = 2[(2k-1)n · ST(k,n) ·c$ r$]1/2
• But, in energy-dominated limit,
c$ r$  $i2 / Ron/off,
• So:
$r = 2$i ·[(2k-1)n · ST(k,n) / Ron/off]1/2
Tot. Cost, Orig. Cost, Advantage
• Total cost $i for irreversible operation
performed at end of algorithm, plus reversible
cost, gives:
$tot = $i · {1 + 2[(2k-1)n · ST(k,n) / Ron/off]1/2}
• Original irreversible machine performing kn ops
would use cost $orig = $i·kn, so,
• Advantage ratio between reversible &
irreversible cost,
n
R$(i/r) (k , n) 
$orig
$tot

k
(2k  1) ST (k , n)
1 2
Ron/off
n
Optimization Algorithm
•
•
•
•
For any given value on Ron/off,
Scan the possible values of n (up to some limit),
For each of those, scan the possible values of k,
Until the maximum R$(i/r) for that n is found
– (the function only has a single local maximum)
• And return the max R$(i/r) over all n tried.
Cost-Efficiency Gains, Modified Ben89
100000000
0.6198
70
Advantage in Arbitrary Computation
y = 1.741x
10000000
60
1000000
50
100000
y = 0.3905x0.3896
10000
1000
40
30
100
20
10
k
1
10
n
0.1
1
100
10000
1000000
10000000
0
On/Off Ratio of Individual Devices
1E+10
0
1E+12
out
hw
n
k
Asymptotic Scaling
• The potential energy savings factor scales as
R$(i/r)  Ron/off~0.4,
• while the spacetime overhead goes only as
R$(i/r)  R$(i/r)~0.45, or Ron/off~0.18.
• E.g., with an Ron/off of 109, you can do worstcase computation in an adiabatic circuit with:
– An energy savings of up to a factor of 1,200× !
– But, this point is 700,000× less hardware-efficient,
if Frank02 algorithm is used for the emulation.
Various Cost Measures
• Entropy - advantage as per previous analysis
• Area times time - scales w. entropy generated
• Performance, given area constraint – In leakage-free limit, advantage proportional to d1/2
– With leakage, what’s the max advantage? (See hw)
• NOW:
– Are there any performance/cost advantages from
adiabatics even when there is no cost or constraint
for entropy or for area?
– YES, for flux-limited computations that require
communications. Let’s see why…
Perf. scaling w. # of devices
• If alg. is not limited by communications needs,
– Use irreversible processors spread in a 2-D layer.
– Remove entropy along perpendicular dimension.
– No entropy removal rate limits,
• so no speed advantage from reversibility.
• If alg. requires only local communication,
latency  cyc. time, in an ND×ND×ND mesh,
– Leak-free reversible machine perf. scales better!
– Irreversible tcyc = (ND1/3)
– Reversible tcyc = (ND1/4)… (ND1/12) × faster!
• To boost reversibility speedup by 10×, one must consider
~1036-CPU machines (1.7 trillion moles of CPUs!)
– 1.7 trillion moles of H atoms weighs 1.7 million metric tons!
» A ~100-m tall hill of H-atom sized CPUs!
Lower bound on irreversible time
•
•
•
•
•
•
•
•
•
Simulate Nproc = ND3 cells for Nsteps » ND steps.
Consider a sequence of ND update steps.
Final cell value depends on ND4 ops in time T.
All ops must occur within radius r = cT of cell.
Surface area A  T2, rate Rop  T2 sustainable.
Nops  Rop T  T3 needs to be at least ND4.
 T must be (ND4/3) to do all ND steps.
Average time per step must be (ND1/3).
Any irreversible machine (of any technology or
architecture) must obey this bound!
Irreversible 3-D Mesh
Reversible 3-D Mesh
Non-local Communication
• Best computational task for reversibility:
– Each processor must exchange messages with
another that is ND1/2 nodes away on each cycle
• Unsure what real-world problem demands this pattern!
– In this case, reversible speedup scales with number
of CPUs to “only” the 1/18th power.
• To boost reversibility speedup by 10×, “only” need 1018
(or 1.7 micromoles) of CPUs
• If each was a 1-nm cluster of 100 C atoms, this is only 2
mg mass, volume 1 mm3.
• Current VLSI:
Need cost level of ~$25B before you see a speedup.
Ballistic Machines
• In the limit if cS  0, the asymptotic benefit for
3-D meshes goes as ND1/3 or Nproc1/9.
– Only need a billion devices to multiply reversible
speedup by 10×.
• With 1 nm3 devices, a cube 1 m on a side (bacteria size)
would do it!
• Does Drexler’s rod logic have low enough cS
and small enough size to attain this
prediction…?
– (Need to check.)
Minimizing volume via folding
• Allows previous
solutions to be
packed in min.
volume.
• Volume scales
proportionally
to mass.
• No change in
speed or
entropy flux.
Cooling Technologies
Irreversible Max Perf. Per Area
Reversible Entropy Coeffs.
Rev. vs. Irrev. Comparisons
Sizes of Winning Rev. Machines
Some Analytical Challenges
• Combine Frank ‘02 emulation algorithm,
• Analysis of its energy and space efficiency as a
function of n and k,
• And plug it into the analysis for the 3-D
meshes, to see…
• What are the optimal speedups for arbitrary
mesh computations on rev. machines, as a
function of:
– Ron/off, device volume, entropy flux limit, machine
size.
– And, does perf./hw improve, and if so, how much?
Reversible Processor
Architecture
Why reversible architectures?
• What about automatic emulation algorithms?
– E.g.: Ben73, Ben89, LMT, Frank02.
• Transform an irreversible alg. to an equiv. rev’ble one.
– But, these do not yield the most cost-efficient
reversible algorithms for all problems!
• E.g., log(RE(i./r)/Ron/off) may be only 0.4 rather than 0.5.
• Finding the best reversible algorithm requires a creative
algorithm discovery process!
• An optimally cost-efficient general-purpose
architecture must allow the programmer to
specify a custom reversible algorithm that is
specific to his problem.
Reversibility Affects All Levels
• As Ron/off increases & cost of device manuf.
declines (while the cost of energy stays high),
– Maximizing overall cost-efficiency requires an
increasingly large fraction of all bit-ops be done
adiabatically.
• Maximizing the efficiency of the resulting
algorithms, in turn, requires reversibility in:
Programming
model
–
–
–
–
–
(unless a perfect emulator is found)
Logic design
Functional units
Increasing requirement
Instruction set architectures for degree of
reversibility
Programming languages
High-level algorithms
All Known Reversible Architectures
• Ed Barton (MIT class project, 1978)
– Conservative logic, w. garbage stack
• Andrew Ressler (MIT bachelor’s thesis, 1979; MIT master’s thesis, 1981)
– Like Barton’s, but more detailed. Paired branches.
• Henry Baker (1992)
– Reversible pointer automaton machine instructions.
• J. Storrs “JoSH” Hall (1994)
– Retractile-cascade-based PDP-10-like architecture.
• Carlin Vieri (MIT master’s thesis, 1995)
– Early Pendulum ISA, irrev. impl., full VHDL detail.
• Frank & Rixner (MIT class project, 1996)
– Tick: VLSI schematics & layout of Pendulum subset, w. paired branches
• Frank & Love (MIT class project, 1996)
– FlatTop: Adiabatic VLSI impl. of programmable reversible gate array
• Vieri (MIT Ph.D. thesis, 1999)
– Fully adiabatic VLSI implementation of Pendulum w. paired branches
Reversible Programmable GateArray Architectures
(as of May ‘99)
Photo of packaged FlatTop chip
A Bouncing BBMCA “Ball”
A BBMCA Fredkin Gate
Reversible von Neumann
Architectures
Reversible Architecture Issues
• Instruction-Set Architecture (ISA) Issues:
–
–
–
–
–
–
How to define irrev. ops (AND, etc.) reversibly?
How to do jumps/branches reversibly?
What kind of memory interface to have?
What about I/O?
How to permit efficient reversible algorithms?
Should the hardware guarantee reversibility?
• Microarchitectural issues:
–
–
–
–
Register file interface
Reversible ALU operations
Shared buses
Program counter control
The Trivial Cases
• Many typical instructions already reversible:
– NOT a
• Set register a to its bitwise logical complement, a := ~a
– NEG a
• Set a to its two’s complement negation
a := -a
or
a := ~a + 1
– INC a
• Increment a by 1 (modulo 2).
– ADD a b
• Add register b into register a
– XOR a b
• Exclusive-or b into a
(a := (a + b) mod 2)
(a := a  b)
– ROL a b
• Rotate bits in register a left by # positions given by b.
The Nontrivial Cases
• Other common instructions are not reversible…
– CLR a
• Clear register a to 0.
– LD a b
• Load register a from addr. pointed to by b.
– LDI a 3
• Load immediate value 3 into register a.
– AND a b
• Set a to the bitwise AND of a and b
– JMP a
• Jump to the instruction pointed to by a.
– SLL a b
• Shift the bits in a left by b bits, filling with 0’s on right.
Irreversible Data Operations
• How to do expanding ops reversibly?
– E.g., AND a b - Prior value of a is lost.
• Approach #1: “Garbage Stack” approach.
– Based on Landauer’s embedding.
– Push all data that would otherwise be destroyed
onto a special “garbage” stack hidden from pgmr.
– Can unwind computation when finished to recover
stack space. (Lecerf ‘63/Bennett ‘73 approach)
• Problems: Large garbage stack memory needed.
– Limits computation length.
– Leaves programmer no opportunity to choose a
more efficient reversible algorithm!
Illustrating Garbage Stack
• Let “” mean reversible move, “” mean
reversible copy, “” a reversible uncopy.
AND a b
Garbage Stack
Memory (GSM)
implemented by...
1. t  a
2. a  t & b
3. t  GSM[GSP++]
Garbage Stack
Pointer (GSP)
3
230 0
46 1
17 2
0 3
0 4
...
Programmer-Controlled Garbage
• Put extra data in a programmer-manipulable
location.
– What if destination location isn’t empty?
• Signal an error, or
• Use an op that does something reversible anyway
• Provide undo operations to accomplish
“unexpanding” inverses of expanding ops.
• 1st method: Errors on non-empty destination:
– AND A B C
– UNAND A B C
-If (A=0) AB&C else error
-If (A=B&C) AB&C else error
• 2nd method: Use always-reversible store ops.
– ANDX A B C
- A  A  (B & C) (self-undoing!)
Pendulum - packaged die photo