Transcript Slide 1

Circuit Design with Alternative
Energy-Efficient Devices
Elad Alon
Collaborators: Hei Kam, Fred Chen (MIT),
Tsu-Jae King-Liu, Vladimir Stojanovic (MIT),
Dejan Markovic (UCLA), Mark Horowitz (Stanford)
Dept. of EECS, UC Berkeley
CMOS is Scaling, Power Can Not
1000
Predictions
(ca. 2000)
100
Pentium III
Power (W)
Reality
(Core 2)
Itanium II
Itanium
Pentium 4
Pentium Pro
10
Pentium
80286
Pentium II
486DX
8086
8088
1
386DX
8080
8008
S. Borkar, Intel
4004
0.1
1970
1975
1980
1985
1990
1995
2000
2005
2010
2
Ed Nowak, IBM
Drain Current Id
Supply and Threshold Voltages
Scaling Vth, Vdd
Gate Voltage Vg
• kT/q doesn’t scale, so lowering Vth increases leakage
• Fixed Vth, Vdd  power density doesn’t scale well
3
• Many new devices with
S-1<60mV/dec proposed
• But, many of these are
slow (low Ion)
– And/or have other “weird”
characteristics
Drain Current Id
Alternative Devices to the Rescue?
New Device
Slope=S-1
Gate Voltage Vg
• Can these devices reduce energy? If so, at
what performance?
– Need to look at the circuits
4
Outline
• Energy-Performance Analysis
• Circuit Design with Relays
• Conclusions
5
Processor Power Breakdown
• Most components track performance vs.
energy curves of logic
• Control, Datapath, Clock
• Use proxy circuit to examine tradeoffs
6
Proxy Circuit for Static Logic
Vdd
Vdd
0V
Output
Input
Ld stages
Switching activity factor = ,
Gate capacitance per stage = C
• tdelay = LdCVdd/(2Ion)
• Edyn+Eleak = αLdCVdd2 + LdIoffVddtdelay
7
Simple Optimization Rule
• Optimal Ion/Ioff  Ld/α
– Derived in CMOS
– But holds for nearly
all switching devices
Nose and Sakurai
• Pleak/Pdyn ~constant
– ~30-50% across wide
range of parameters
8
MOSFET
“New Device”
Energy
Drain Current Id
Using the Rule to Compare
“New Device”
MOSFET
Vddx
Vddx
Gate Voltage Vg
Performance
• Match Ioff by adjusting “VT”
• New device wins if:
Ion,new(Vdd) > Ion,MOS(Vdd)
9
What Else Matters: Variability
Relative Leakage Contribution
2.5
A0
A1
2
Leakage
O1
1.5
A2
A3
1
Vth
A4
A5
0.5
0
0.2
0.3
0.4
O2
0.5
Vth
• Leakage:
– E(Ioff) vs. E(Vth)
• Delay:
– Finite Ld
– Cycle time set by
worst-case
10
What Else Matters: Wires & Area
Vdd
Vdd
0V
Output
Input
Cw
Cw
Cw
Cw
• Devices don’t drive just other devices
• Need to look at extrinsic cap (wires) too
– Especially if device has area overhead
11
Parallelism
Energy
Serial: Perf.  f
Parallel: Perf.  2f, E/op ~const
“New Device”
MOSFET
Performance
• If available, parallelism allows slower devices
– Extends energy benefit to higher performance
12
Seff-1
Gate Voltage Vg
Normalized Energy/cycle
Drain Current Id
Minimum Energy
2.0
1.5
1.0
Lower Seff
0.5
0.1
0.2
0.3
Vdd(V)
• At low performance or high parallelism:
– Lowest Vdd for required Ion/Ioff wins
• Vdd,min  Seff, Emin  Seff2
13
Source
P
Gate
Drain
N
Drain Current Id (A/mm)
Example: Tunneling FET
Ion ≈A(Vgs+VT)exp[-B/(Vgs+VT)] [1]
[1]J. Chen et al., IEEE Electron Device Lett.,
vol. EDL-8, no. 11, pp. 515–517, Nov. 1987.
1.E-05
1.E-07
1.E-09
1.E-11
1.E-13
1.E-15
0
0.2 0.4 0.6 0.8
1
Gate Voltage Vg (V)
• Band-to-band tunneling device
– Steep transition (<60mV/dec) at low current
– Low Ion(<~100μA)
• Assume work function can be tuned
14
Energy-Performance Tradeoff
Energy (J)
1.E-14
TFET
30 stages
α=0.01
1.E-15
MOSFET
1.E-16
1.E-17
1.E-02
1.E+00
Performance (GHz)
• Competitive with subthreshold CMOS
• TFETs promising below ~100MHz
15
Outline
• Energy-Performance Analysis
• Circuit Design with Relays
• Conclusions
16
Conductance
Nano-Electro-Mechanical Relay
Gon
Vrl
Vpi
Gate Voltage Vg [V]
• Based on mechanically making and
breaking contact
– No leakage, perfectly abrupt transition
• Reliability is the key challenge
17
Circuit Design with Relays
CMOS:
A0
A1
Relay:
A0
O1
O1
A3
A2
A3
A4
A5
A1
A0
A1
A2
A3
O2
A4
A5
O2
A4
A5
• CMOS delay set by electrical time constant
– Distribute logical/electrical effort over many stages
• Relay: mechanical delay (~10ns) >>
electrical t (~1ps)
– Implement logic as a single complex gate
18
Relay Energy-Perf. Tradeoff
• No leakage
TFET
Energy (J)
• Stack of 30
series relays
MOSFET
Relay
– Vdd,min set only
by functionality
(surface force)
Performance (GHz)
• How about real logic circuits?
19
Relay-Based Adder
• Manchester carry
chain
A
A
B
Cin
Cout
A
• Ripple carry
A
B
B
A
– Cascade full adder
cells
Sum
A
B
• N-bit adder still 1
mechanical delay
B
A
B
Cin
Cout
B
A
B
20
Adder Energy-Delay
• Compare vs. optimal
CMOS adder
• ~10-40x slower
– Low Rcont not critical
• ~10-100x lower E/op
– Lower Cg
– Fewer devices, all minimum size
– Lower Vdd,min
21
Parallelism and Area
• If parallelism available, can trade area for
throughput
• Competing with sub-threshold CMOS
– Area-overhead bounded
22
Power Breakdown Revisited
Typical Power Breakdown for
Embedded Processor
Power Breakdown: core
implemented in relays
Control +
Datapath
45%
25%
25%
5%
77%
Memory
Clock
I/O
Control +
Datapath
Memory
Clock
15%
5%
3%
I/O
• Better logic  “uncore” power dominant
• Need to analyze (and leverage) devices for
entire system…
– Relay DRAM or NVM (not SRAM)?
– Relay ADC/DACs?
23
Outline
• Simple Energy-Performance Analysis
• Circuit Design with Relays
• Conclusions
24
Summary
• New devices need circuit level analysis
• Ion/Ioff set by logic depth, activity factor
• Don’t forget about variability, wires
• Tailor circuit style to the device
• If available, parallelism may allow
slower (low Ion) devices
• Don’t forget about the rest of the system
25
Good News/Bad News
• Parallelism still
available in CMOS
• But eventually limited
by Emin
Today:
Parallelism
lowers E/op
• Opportunity for new
devices…
• At least in sub100MHz applications
Future: Parallelism doesn’t
help
-1
26
Acknowledgements
•
•
•
•
Berkeley Wireless Research Center
NSF
DARPA
FCRP
27