Transcript ppt

CS252
Graduate Computer Architecture
Lecture 28
Esoteric Computer Architecture
DNA Computing & Quantum Computing
Prof John D. Kubiatowicz
http://www.cs.berkeley.edu/~kubitron/cs252
DNA Computing
• Can we use DNA to do massive
computations?
– Organisms do it
– DNA has very high information
density:
» 4 different base pairs:
• Adenine/Thymine
• Guanine/Cytosine
» Always paired on opposite
strands  Energetically
favorable
– Active operations:
» Copy: Split strands of DNA
apart in solution, gain 2 copies
» Concatenate: eg:
GTAATCCT will combine
XXXXXCATT with AGGAYYYYY
» Polymerase Chain Reaction (PCR):
amplifies region of molecule
between two marker molecules
5/11/2009
cs252-S09, Lecture 28
©1999 Access Excellence
@ the National Health Museum
2
DNA Computing and Hamiltonian Path
• Given a set of cities and costs
between them (possibly directed
paths):
– Find shortest path to all cities
– Simpler: find single path that
visits all cities
• DNA Computing example is latter version:
– Every city represented by unique 20 base-pair strand
– Every path between cities represented by complementary pairs:
10 pairs from source city, 10 pairs from destination
– Shorter example: AAGT for city 1, TTCG for city 2
Path 1->2: CAAA
Will build: AAGTTTCG
..CAAA..
– Dump “city molecules” and “path molecules” into testtube.
Select and amplify paths of right length. Analyze for result.
– Been done for 6 cities! (Adleman, ~1998!)
5/11/2009
cs252-S09, Lecture 28
3
Even more promising uses of DNA
• Self-assembly of components
DNA
Active Region
Bonding
– DNA serves as substrate
– Attach active elements in middle of components.
– Final step – metal deposited over DNA
Active
• Other interesting structures could be
builtRegion
5/11/2009
cs252-S09, Lecture 28
4
Use Quantum Mechanics to Compute?
• Weird but useful properties of quantum mechanics:
– Quantization: Only certain values or orbits are good
» Remember orbitals from chemistry???
– Superposition: Schizophrenic physical elements don’t quite know
whether they are one thing or another
• All existing digital abstractions try to eliminate QM
– Transistors/Gates designed with classical behavior
– Binary abstraction: a “1” is a “1” and a “0” is a “0”
• Quantum Computing:
Use of Quantization and Superposition to compute.
• Interesting results:
– Shor’s algorithm: factors in polynomial time!
– Grover’s algorithm: Finds items in unsorted database in time
proportional to square-root of n.
– Materials simulation: exponential classically, linear-time QM
5/11/2009
cs252-S09, Lecture 28
5
Quantization: Use of “Spin”
North
Representation:
|0> or |1>
Spin ½ particle:
(Proton/Electron)
South
• Particles like Protons have an intrinsic “Spin”
when defined with respect to an external
magnetic field
• Quantum effect gives “1” and “0”:
– Either spin is “UP” or “DOWN” nothing between
5/11/2009
cs252-S09, Lecture 28
6
Kane Proposal II
(First one didn’t quite work)
Single Spin
Control Gates
Inter-bit
Control Gates
Phosphorus
Impurity Atoms
• Bits Represented by combination of proton/electron spin
• Operations performed by manipulating control gates
– Complex sequences of pulses perform NMR-like operations
• Temperature < 1° Kelvin!
5/11/2009
cs252-S09, Lecture 28
7
Now add Superposition!
• The bit can be in a combination of “1” and “0”:
– Written as: = C0|0> + C1|1>
– The C’s are complex numbers!
– Important Constraint: |C0|2 + |C1|2 =1
• If measure bit to see what looks like,
– With probability |C0|2 we will find |0> (say “UP”)
– With probability |C1|2 we will find |1> (say “DOWN”)
• Is this a real effect? Options:
– This is just statistical – given a large number of protons, a fraction
of them (|C0|2 ) are “UP” and the rest are down.
– This is a real effect, and the proton is really both things until you
try to look at it
• Reality: second choice!
– There are experiments to prove it!
5/11/2009
cs252-S09, Lecture 28
8
A register can have many values!
• Implications of superposition:
– An n-bit register can have 2n values simultaneously!
– 3-bit example:
= C000|000>+ C001|001>+ C010|010>+ C011|011>+
C100|100>+ C101|101>+ C110|110>+ C111|111>
• Probabilities of measuring all bits are set by
coefficients:
– So, prob of getting |000> is |C000|2, etc.
– Suppose we measure only one bit (first):
» We get “0” with probability: P0=|C000|2+ |C001|2+ |C010|2+ |C011|2
Result: =
(C000|000>+ C001|001>+ C010|010>+ C011|011>)
» We get “1” with probability: P1=|C100|2+ |C101|2+ |C110|2+ |C111|2
Result: =
(C100|100>+ C101|101>+ C110|110>+ C111|111>)
• Problem: Don’t want environment to measure
before ready!
– Solution: Quantum Error Correction Codes!
5/11/2009
cs252-S09, Lecture 28
9
Spooky action at a distance
• Consider the following simple 2-bit state:
= C00|00>+ C11|11>
– Called an “EPR” pair for “Einstein, Podolsky, Rosen”
• Now, separate the two bits:
Light-Years?
• If we measure one of them, it instantaneously sets other one!
– Einstein called this a “spooky action at a distance”
– In particular, if we measure a |0> at one side, we get a |0> at the other
(and vice versa)
• Teleportation
– Can “pre-transport” an EPR pair (say bits X and Y)
– Later to transport bit A from one side to the other we:
» Perform operation between A and X, yielding two classical bits
» Send the two bits to the other side
» Use the two bits to operate on Y
» Poof! State of bit A appears in place of Y
5/11/2009
cs252-S09, Lecture 28
10
Model:
Operations on coefficients + measurements
Input
Complex
State
Unitary
Transformations
Measure
Output
Classical
Answer
• Basic Computing Paradigm:
– Input is a register with superposition of many values
» Possibly all 2n inputs equally probable!
– Unitary transformations compute on coefficients
» Must maintain probability property (sum of squares = 1)
» Looks like doing computation on all 2n inputs simultaneously!
– Output is one result attained by measurement
• If do this poorly, just like probabilistic computation:
– If 2n inputs equally probable, may be 2n outputs equally probable.
– After measure, like picked random input to classical function!
– All interesting results have some form of “fourier transform” computation being
done in unitary transformation
5/11/2009
cs252-S09, Lecture 28
11
•
Security of Factoring
•
Easy
Easy
Hard
Easy
Easy
Easy
Easy
5/11/2009
The Security of RSA Public-key cryptosystems
depends on the difficult of factoring a number N=pq
(product of two primes)
–
–
Classical computer: sub-exponential time factoring
Quantum computer: polynomial time factoring
Shor’s Factoring Algorithm (for a quantum computer)
1) Choose random x : 2  x  N-1.
2) If gcd(x,N)  1, Bingo!
3) Find smallest integer r : xr  1 (mod N)
4) If r is odd, GOTO 1
5) If r is even, a  x r/2 (mod N)  (a-1)(a+1) = kN
6) If a = N-1 GOTO 1
7) ELSE gcd(a ± 1,N) is a non trivial factor of N.
cs252-S09, Lecture 28
12
Finding r with xr
k
\
\
k/ 1 /

r 1

r 1 w  0 y
Quantum
Fourier
Transform
(
w0
0
k\
/
x/
)
w\
x/
w\
\
w ry/ x /
1
r r
k
\
k
• Finally: Perform measurement


 1 (mod N)
k
r
– Find out r with high probability
– Get |y>|aw’> where y is of form k/r and w’ is related
5/11/2009
cs252-S09, Lecture 28
13
ION Trap Quantum Computer:
Promising technology
Top
CrossSectional
View
• IONS of Be+ trapped in
oscillating quadrature field
– Internal electronic modes of IONS
used for quantum bits
– MEMs technology
– Target? 50,000 ions
– ROOM Temperature!
• Ions moved to interaction regions
– Ions interactions with one another
moderated by lasers
5/11/2009
Top View
Proposal: NIST Group
cs252-S09, Lecture 28
14
Ion Trap Quantum Computer
• Ballistic Movement
- Apply pulse sequences to electrodes
- Electrostatic forces move ion
Q1
Two-Qubit Gate
• Major Components
- Data = an ion
- Gate = a location
- Intersections similar, but more
complicated pulse sequences
Q2
5/11/2009
cs252-S09, Lecture 28
15
Ballistic Movement Network
One-Qubit
Gate
Q2
One-Qubit
Gate
Two-Qubit
Gate
R
R
Q3
R
R
Two-Qubit
Gate
Q4 Q5
Memory Cell
Q1
Interconnection
Network
Memory Cell
• Problem: Noise accumulation!
5/11/2009
cs252-S09, Lecture 28
16
Noise Accumulation from Movement
1.0E-02
Qubit Error
1.0E-03
1.0E-4 inital
1.0E-5 inital
1.0E-6 inital
1.0E-7 inital
1.0E-8 inital
1.0E-04
1.0E-05
1.0E-06
1.0E-07
error
error
error
error
error
1.0E-08
0
16
32
48
64
Distance Moved in Gates
• Noise may increase error by factor of 100
5/11/2009
cs252-S09, Lecture 28
17
Movement Option 2: Teleportation
Source Location
2. Local
Ops
D
3. Transmit two
classical bits
Entanglement
E1
E2
Target Location
4. Local Ops
D?
D
1. Generate EPR pair
• Goal: Transfer the state, not the data ion
• Problem: EPR pairs become noisy
• Teleportation Benefits
- Error Correction of data (arbitrary state): ~100 ms
Purification of EPR pair (known state): ~120 µs
- Pre-distribution of EPR pairs
5/11/2009
cs252-S09, Lecture 28
18
EPR Pair Distribution Network
One-Qubit
Gate
Q2
One-Qubit
Gate
Two-Qubit
Gate
R EPRR Pair
R
Generators
R
Q3
Two-Qubit
Gate
Q4 Q5
Memory Cell
Q1
5/11/2009
Interconnection
Network
cs252-S09, Lecture 28
Memory Cell
19
Setting Up a Teleportation Link
• Purification = Amplification of EPR pair link
- Two EPR pairs  One “purer” pair, one junk pair
- Chance of failure
• Need to send multiple pairs
STRONGER
EPR Qubits EntanglementEPR Qubits
P
G
Recycled Qubits
P
Recycled Qubits
For Data Teleportation
5/11/2009
cs252-S09, Lecture 28
20
Chained Teleportation
Teleportation
T
P
G
T
G
Teleportation
T
G
T
G
T
G
T
• Adjacent T nodes linked for teleportation
P
• Positive Features
- T node linking not on critical path
- Pre-purification (Link Amplification): part of link setup
5/11/2009
cs252-S09, Lecture 28
21
Quantum Network Architecture
T
G
T
G
G
P
Gate
T
G
P
Gate
T
G
G
P
Gate
G
P
Gate
G
T
G
T
G
T
P
Gate
P
Gate
T
P
Gate
P
Gate
• Grid of T nodes , linked by G nodes
• Packet-switched network
- Dimension-order routing
• Each qubit has associated message
5/11/2009
cs252-S09, Lecture 28
22
Classical Control
• Quantum Datapath Layer
- T Nodes and G Nodes (P Nodes and Gates not shown)
• Classical Control Layers
- Messages Associated with Qubits
- Teleportation and Purification Bits
5/11/2009
cs252-S09, Lecture 28
23
Running a Quantum Circuit
• Simple gates (transversal)
• More complex gates (non-transversal)
– Exist in any universal set
• Quantum Error Correction (QEC)
– 10-8 to 10-6 error rates from gates, movement and idleness
– Data must be encoded and periodically error corrected
• Ancilla (helper) qubits
– Necessary for complex gates and for QEC
– Computation with ancilla qubits > 90% of quantum program
Q0
Q1
T
time
5/11/2009
H
QEC
T
QEC
QEC
C
X
QEC
QEC
QEC
QEC
QEC
T
T
QEC
QEC
H
QEC
QEC
Serial Circuit Latency
cs252-S09, Lecture 28
24
Running a Quantum Circuit
• Ancilla qubits are independent of data
– Preparation may be pulled offline
– Ancilla qubits should be ready just in time to avoid noise from
idleness
Q0
H
Q1
T
QEC
C
X
QEC
QEC
T
QEC
QEC
H
QEC
Parallel Circuit Latency
5/11/2009
cs252-S09, Lecture 28
25
Running at The Speed of Data
• Ideally, execution time determined solely by data
Operations
involving
data qubits
hardware
time
Ancilla encoding
5/11/2009
cs252-S09, Lecture 28
26
Limited Ancilla Bandwidth
Execution Time of a
32-Bit QCLA (μs)
900000
800000
700000
600000
500000
400000
300000
200000
100000
0
1
10
100
1000
Encoded Ancilla Bandwidth Available (Ancillae per ms)
• 32-bit Quantum Carry-Lookahead Adder
– Varying rate at which encoded zero ancillae are provided for QEC
– Conclusion: design architecture with “ancilla factories”
5/11/2009
cs252-S09, Lecture 28
27
Ancilla Factory Design I
• “In-place” ancilla preparation
QEC
0 Prep
?
Verify
QEC
Bit
Correct
Cat Prep
0 Prep
?
Verify
Phase
Correct
Cat Prep
0 Prep
?
Verify
Cat Prep
Ancilla Generation Circuit
Encoded Ancilla
Verification Qubits
• Ancilla factory consists of many of these
–
–
5/11/2009
Encoded ancilla prepared
in many places, then
moved to output port
Movement is costly!
cs252-S09, Lecture 28
In-place
Prep
In-place
Prep
In-place
Prep
In-place
Prep
28
Idealized Qalypso Architecture
• Dense data region
– Data qubits only
– Local communication
• Shared Ancilla Factories
–
–
–
–
Distributed to data as needed
Fully multiplexed to all data
Output ports ( ): close to data
Input ports (
): may be far from
data, since recycled qubits have
irrelevant state
• Goals
– Design ancilla factories
– Answer Question: How much hardware is needed for ancilla
generation to run at the speed of data?
5/11/2009
cs252-S09, Lecture 28
29
Discussion of ISCA 2009 paper
• “A Fault Tolerant, Area Efficient Architecture for
Shor’s Factoring Algorithm”
– Mark Whitney, Nemanja Isailovic, Yatish Patel, and John Kubiatowicz
– ISCA 2009
• How to compare layouts? (what is good)?
– Probabilistic circuits  need metric that includes probability of failure
– ADCR = Probabilistic version of area-delay product
ADCR 
Area  Latency single
Psuccess
– Lower is better
• What to optimize?
– Many different “datapath organizations”
– Far too much error correction
5/11/2009
cs252-S09, Lecture 28
30
Datapath Organizations
• QLA: “Quantum Logic Array”
– Every compute region has space for 2 bits and ancilla generation
for 2 bits (to correct after every operation
• CQLA: “Compressed Quantum Logic Array”
– Same compute regions as QLA, but ability to have less ancilla
generation/bit for memory (idle bits less prone to error)
• Qalypso
– Matching ancilla generation to needs
5/11/2009
cs252-S09, Lecture 28
31
Error Correction Optimization
• Selectively error correction placement
– Standard techniques: correct after every error
– Instead – only correct bits that are particularly “dirty”
• Error correction modeled after retiming optimization
– Only place correction when approximate “EDist” parameter reaches threshold
– Then, perform full mapping
– Choose EDist by optimizing ADCR
• Very successful at reducing area/latency
• Even improves probability of success in some cases!
– Why? Because error correction involves operations which can introduce error
5/11/2009
cs252-S09, Lecture 28
32
Shor’s Factoring Circuit
• Most of time spent in modular exponentiation
– QFT is much smaller fraction of time
– Easiest way to build modular exponentiation: with adders
» Build multiplier instead? Not studied yet – would be very large
• Paper result: can factor in 7659 mm2
– Previous result was 0.9 m2
5/11/2009
cs252-S09, Lecture 28
33
Conclusion
• Computing can be done in a variety of ways
– Normal silicon gates not required
• DNA Computing
– Limited use “demonstration of concept”
– Form of massive parallelism
– Interesting consequences: self assembly
• Quantum Computing:
– Computing using superposition and quantization
– Ion Traps: a particularly promising technology
• CAD Tools for Quantum Computing
– Can actually optimize circuits – just like classical case
– ADCR = probabilistic version of Area-Delay product
5/11/2009
cs252-S09, Lecture 28
34