No Slide Title

Download Report

Transcript No Slide Title

Devil’s Advocate View:
CMOL, FPNI, nanoPLA….
André DeHon
[email protected]
Benjamin Gojman, Nikil Mehta
During the canonization process of the Roman Catholic Church, the Promoter of the Faith (Latin
Promotor Fidei), popularly known as the Devil's Advocate (Latin advocatus diaboli), was a canon lawyer
appointed by the Church to argue against the canonization of the candidate. It was his job to take a skeptical
view of the candidate's character, to look for holes in the evidence, to argue that any miracles attributed to
the candidate were fraudulent, etc. -- Wikipedia
DeHon 2008
1
Case
•
•
•
•
DeHon 2008
Molecules are not miraculous.
Miracle of high density is exaggerated.
Miracle of low energy is a slight of hand.
Curse of variation falls on all who would
dare reach the atomic-scale.
2
Two Ideas
Benefits follow from two hypotheses:
1. Can fabricate parallel wires denser
than arbitrary topology
2. Can place resistance-varying switch
with quasi-non-volatile state in space
of dense wire crossing
•
Hysteretic switching
•
No extra area to program
Valid Prospects?
“Let’s build regular architectures around resistive switches!”
DeHon 2008
3
Inquisition
• What problem does CMOL/FPNI solve?
• Is this the bottleneck to scaling?
DeHon 2008
4
Problem Solved?
• What problem do these technology
hypotheses address?
– Density
– (Economical) density
11
ASIC Mgates/cm2
140
180
220
280
360
450
ITRS 2007 Execsum Table 1i; assume 4TR/gate
DeHon 2008
711
2800
5
Unpack Assumptions
• Previous table appears to assume
– 100,000 F2 per “gate” in FPGA case
• 250,000 F2 / 4-LUT × 2.5 gates/4-LUT
• Plausible, conservative
– 64 FCMOS2 per “gate” in CMOL case
• assuming each buffer is a gate and buffer is 64F2
– This assumption is stated in FPGA2006 paper.
– Optimistically small. …plausibly within factor of 2.
• Ignores that most of these buffers will act as route
through (provide no gates).
DeHon 2008
6
Right Problem?
• Is logic density of gates the
bottleneck in scaling?
– Economical logic density?
– Density of programmable gates?
DeHon 2008
7
What is the Scaling Bottleneck?
•
•
•
•
•
DeHon 2008
Density?
Delay?
Power Density?
Reliability?
Test and handling economics?
8
Methodology:
Benchmark-Level Quantification
• For following, map Toronto 20
benchmarks
– 20 Largest MCNC benchmarks
– Order of 10K gates each
• (so think small cores)
• Composite density/performance/energy
– Includes overheads, route-through,
fanout…
DeHon 2008
9
Density: Mapped Logic
Strukov and Likharev FPGA2006
• Only about 1 in 4 “gates” used as logic
DeHon 2008
– 775/4 ≈ 190  comparable to ASIC gate density
10
Density: Mapped Logic
• PDC benchmark – 2 cases:
– Conservative
• (defective wires, stochastic assembly, lithographic support overhead)
– Optimistic Extreme (ideal, no litho overhead)
FCMOS (nm)
50
45
50
36
32
30
28
26
24
22
Fnano (nm)
20
18
16
14
12
10
6
4
3.5
3
(Mgates/cm2)
7
9
11
14
19
27
65
130
160
210
51
63
80
100
140
200
570
1300
1700
2300
0.4
0.5
0.6
0.8
1
1.1
1.3
1.5
1.7
2.1
CMOL
revised
160
190
250
300
375
425
500
575
675
800
CMOS ASIC
140
180
220
280
360
Conservative
Extreme
CMOS FPGA
DeHon 2008
450
710
11
How much density from nanowires?
• Look at Fcmos=Fnano=22nm (Fcmos/Fnano largest)
– 42 Mgates/cm2
• 20× better than CMOS FPGA
• 5--20× worse than Fnano=3nm
FCMOS (nm)
50
45
50
36
32
30
28
26
24
22
Fnano (nm)
20
18
16
14
12
10
6
4
3.5
3
(Mgates/cm2)
7
9
11
14
19
27
65
130
160
210
51
63
80
100
140
200
570
1300
1700
2300
0.4
0.5
0.6
0.8
1
1.1
1.3
1.5
1.7
2.1
CMOL
revised
160
190
250
300
375
425
500
575
675
800
CMOS ASIC
140
180
220
280
360
Conservative
Extreme
CMOS FPGA
DeHon 2008
450
710
12
Delay
• Challenge has been to turn capacity
(area) into performance
– Linear scaling considered excellent
• Something which is 10× denser
– Better be less than 10× slower
• E.g. we expect 10 cores running at
100MHz to run slower than 1 core
running at 1GHz
• If give up too much delay, no benefit.
DeHon 2008
13
Obtaining Performance
• Highly Pipelined nanoPLA designs
– Conservative (demonstrated tech.)
• Ronxpoint=100KW, rSi=10-3W-cm, rNiSi=10-5W-cm
• Only NiSi non-active areas
FCMOS (nm)
50
45
50
36
32
30
28
26
24
22
Fnano (nm)
20
18
16
14
12
10
6
4
3.5
3
Delay (ns)
1.83
1.70
1.57
1.45
1.32
1.20
0.99
0.90
0.86
0.82
Conservative
7
9
11
14
19
27
65
130
160
210
Extreme
51
63
80
100
140
200
570
1300
1700
2300
CMOS FPGA
0.4
0.5
0.6
0.8
1
1.1
1.3
1.5
1.7
2.1
CMOL revised
160
190
250
300
375
425
500
575
675
800
CMOS ASIC
140
180
220
280
360
450
Pipe delay stages = 452
DeHon 2008Likharev only claim about 1GHz (unpipelined).
710
14
(Nanoarch2007)
“What-If” Extensions
FCMOS (nm)
50
45
50
36
32
30
28
26
24
22
Fnano (nm)
20
18
16
14
12
10
6
4
3.5
3
Conservative (ns)
1.83 1.70 1.57 1.45 1.32 1.20 0.99 0.90 0.86 0.82
Defect Free(ns)
1.07 0.99 0.91 0.83 0.75 0.67 0.53 0.46 0.44 0.42
Perfect Restore(ns)
0.68 0.62 0.57 0.52 0.47 0.42 0.33 0.30 0.29 0.27
Defect Free Perfect
Restore(ns)
0.37 0.34 0.31 0.28 0.25 0.22 0.17 0.15 0.14 0.13
Copper 2 CMOS Buf.
No Litho (ns)
0.81 0.72 0.63 0.55 0.46 0.37 0.20 0.12 0.11 0.10
Extreme (ns)
0.28 0.25 0.22 0.19 0.16 0.13 0.07 0.04 0.04 0.03
DeHon 2008
15
Power Density
• Clock rates stopped scaling due to power
density
• We can already fabricate more transistors
than we can afford to activate.
– Looking at gate capacitance alone (45nm)
•
•
•
•
•
DeHon 2008
(highly optimistic, no wire)
6×10-17 J/Tr/op (Vdd=1V)
×700MTr/cm2
×10GHz
= 420W/cm2 (3000W/cm2 at 22nm, Vdd=0.7V)
16
Power Density: Quantitative
• What if we run them at full speed?
FCMOS (nm)
50
45
50
36
32
30
28
26
24
22
Fnano (nm)
20
18
16
14
12
10
6
4
3.5
3
(ns)
1.83
1.70
1.57
1.45
1.32
1.20
0.99
0.90
0.86
0.82
(Mgates/cm2)
7
9
11
14
19
27
65
127
162
215
Vdd=0.7V
(W/cm2)
12
14
17
20
24
29
46
60
68
78
(ns)
0.28
0.25
0.22
0.19
0.16
0.13
0.07
0.04
0.04
0.03
(Mgates/cm2)
51
63
80
104
142
205
569
1280
1672
2276
Vdd=0.7V
(W/cm2)
237
287
354
453
599
856 2428 5490 6816 8717
Vdd=0.3V
(W/cm2)
44
53
65
83
110
157
Conservative
Extreme
DeHon 2008
CMOL dodge here is
446 1008 1252 1601
17
assuming Vdd=0.3V.
Power Density: Quantitative
• What can we use at 100W/cm2?
FCMOS (nm)
50
45
50
36
32
30
28
26
24
22
Fnano (nm)
20
18
16
14
12
10
6
4
3.5
3
(ns)
0.28
0.25
0.22 0.19 0.16 0.13
0.07
0.04
0.04
0.03
(Mgates/cm2)
51
63
80
104
142
205
0.7V (W/cm2)
237
287
354
453
599
856 2428 5490 6816 8717
0.7V (W/cm2)
100
100
100
100
100
100
100
100
100
100
Extreme
Factor
2.4
2.9
3.5
4.5
6.0
8.6
24.3
54.9
68.2
87.2
(Mgates/cm2)
22
22
23
23
24
24
23
23
25
26
(ns)
0.7
0.7
0.8
0.9
1.0
1.1
1.7
2.3
2.6
2.9
DeHon 2008
569 1280 1672 2276
18
Energy per Gate Evaluation
(CMOL)
Fcmos
50
45
40
36
32
30
28
26
24
22
Fnano
20
18
16
14
12
10
6
4
3.5
3
Cwire (fF)
0.32 0.28 0.24 0.22 0.19 0.20 0.26 0.32 0.31 0.30
Egate(0.3v) fJ
0.17 0.15 0.13 0.12 0.10 0.11 0.14 0.18 0.17 0.16
Egate(0.3v)/kTln(2)
/ 1000
59.5 52.3 45.3 40.9 36.6 37.3 49.8 61.0 58.5 56.4
40,000—60,000 kTln(2) per gate at T=300K
Cg,total (FO4)≈0.18fF 22nm CMOS W=2Fcmos
Vdd=0.65  13,000 kTln(2) for T=300K
Vdd=0.3  2,800 kTln(2)
DeHon 2008
19
Reliability:
Can we lower the voltage?
• Lower voltage
+ Lower energy/op
– Less headroom for Vt variation
• More leakage, lower performance
• More bad parts  compensate with sparing
? Subthreshold Operation
• Trade energy for performance
– Fewer electrons defining state
• Higher susceptibility to transient upset
– Thermal, shot ionizing particles.
DeHon 2008
20
Upset Rates
• Lower Voltage to achieve 100W/cm2
– Assume (10% activity)
– V=176mV (1GHz, 22nm,3Ggates/cm2)
• 1cm2 FIT Rates
– Thermal
– Shot
10-6233 [calc. based on Kish PhysLetA 2002]
10-700 [calc. based on Kish FNL2004]
• Increase in upset rate V=700mV to 176mV
– Ionizing Particle upsets increase 20-100×
• [calc. based on Cohen IEDM1999, Degalahal ISQED2004]
– Lack information for absolute grounding.
Suggestions for better sources for reliability calculations
appreciated.
DeHon 2008
21
Variation and Yield
• Are voltages plausible given variation?
• ASIC: optimistic bound
–
–
–
–
Require devices have 0<Vth<Vdd
Vth(1-ks)>0 and Vth(1+ks)<Vdd
Say Vth=Vdd/2
For 3Ggates  k≈6-7  s≤14%
• With ability to avoid gates
– Let valid range be +/-1s  68% of devices
– Good buffer 46% of time  density impact ~ 2
– Tolerates much larger variation
DeHon 2008
22
Testing and Handling
• Highly defective
• nanoPLA/CMOL/FPNI exploit componentspecific mapping to tolerate
• Demands painful paradigm shift
• Assume can run mapping in 4 hrs on 250W
workstation
– 1KWhr/chip x $0.15/1KWhr = $0.15
– (2000 Wafers/day x 675 dies/wafer) / 6
= 225,000 Workstations
» But those live at customer site…
• not to mention handling ….
DeHon 2008
Penn IC Group have ideas to address.
23
Bottleneck Conclusion
• Work in an E-D-A-Relability trade space
• Density is not the clear limiter
• Big hope is to trade this density to
address other problems
– Power density
– Energy
– Variation
– Reliability
DeHon 2008
24
Additional Assumptions By Style
• CMOL
– Pins above metallization
• FPNI
– Nanoscale alignment of lithographic
contacts
• Not just parallel lines
• Kuekes says litho rotated (7/12)
• nanoPLA
– Relatively reliable assembly
of large number of NWs
– Reasonably controlled production
of doped (coded) NWs
DeHon 2008
25
Inquisition Report
• If believed could achieve roadmap
– CMOS ASICs provide density @ higher
performance
• If need fine-grained programmability
– Variation
– Economics force few unique platforms
• …benefit from inexpensive programmability
– 100-400× density benefit
– Plausible performance (as far as energy allows)
• Maybe 1GHz instead of 10GHz (1/10th the speed)
– Reduce energy through sparing/repair to contain
variation
• Will cost post-fabrication handling
DeHon 2008
26
Summing Up
• Molecules are not miraculous.
• Miracle of high density is exaggerated.
– Non-existent compared to ASIC
– Closer to 2 orders of magnitude than 3 for FPGA
• Miracle of low energy is a slight of hand.
– Comes with a curse on reliability.
• Curse of variation falls on all who would dare
reach the atomic-scale.
– …grace of repair may be all that saves us
– Not unique to CMOL
• Small switches may help.
DeHon 2008
27
References
• nanoPLA articles
http://www.seas.upenn.edu/~andre/sublithographic.html
• Likharev, “Hybrid CMOS/Nanoelectronic
Circuits(CMOL, FPNI, etc.)”, White Paper for ITRS
ERD Working Group 2008
• Strukov and Likharev, FPGA 2006
• Likharev and Strukov, Nanoarch 2007
DeHon 2008
28
Backup/Support Slides
DeHon 2008
29
FPNI -- Interface
DeHon 2008
30
CMOL Interface Pins
DeHon 2008
31
Simple Nanowire-Based PLA
NOR-NOR = AND-OR PLA Logic
DeHon 2008
DeHon&Wilson
FPGA 2004 32
Interconnected
nanoPLA Tile
DeHon 2008
DeHon JETC 2005
33
Langmuir-Blodgett (LB) transfer
• Can transfer tight-packed, aligned SiNWs
onto surface
– Maybe grow sacrificial outer radius, close pack,
and etch away to control spacing
Transfer aligned
NWs to patterned
substrate
Transfer second
layer at right
angle
+
DeHon 2008
Whang, Nano Letters 2003 v7n3p951
34
DeHon 2008
Whang, Nano Letters 2003 v7n3p951
35