Transcript Document

Keeping Hot Chips Cool
Circuits
R-US
Ruchir Puri, Leon Stok, Subhrajit Bhattacharya
IBM T.J. Watson Research Center
Yorktown Heights, NY
So, What’s Going On ?
Power Density (W/cm ^2)
1 E+ 4
1 E+ 2
Active Power
1 E+ 0
Shrinking Margin
1 E-2
1 E-4
SubThreshold
Power
1 E-6
1 E-8
0 .0 1
0 .1
1
10
Gate Length (m icr ons)
 At 65nm node Static Power is equal to Active Power

Clock distribution accounts for half of active power
Why Can’t We Keep Scaling Vt ?
40
1000
30
100
20
10
10
1
0
0
0.1
0.2
Threshold Voltage
0.3
0.4
Leakage (nA/um)
Delay (ps)
Device Leakage vs Delay
Low Power Opportunities
5%
Late Mode Tim ing Checks (Thousands)
200
150
10%
15%
20%
Exploiting positive slacks
100
50
0
280
260
240
220
200
180
160
140
120
100
80
60
40
20
0
-20
-40
Tim ing S lack (psec)
Power4 Timing Histogram
 Most of the Power reduction techniques exploit this
positive slack.
Low Power Levers
 Structural Techniques
Voltage Islands
 Multi-threshold devices
 Multi-oxide devices
 Minimize capacitance by custom design
 Power efficient circuits
 Parallelism in micro-architecture

 Dynamic Techniques
Clock gating
 Power gating
 Variable frequency
 Variable voltage supply
 Variable device threshold

Outline
Voltage
Clock & Latch
Power
Islands
Optimization
Gating
Active Power
Clock Power
Leakage Power
Outline
Voltage
Clock & Latch
Power
Islands
Optimization
Gating
Active Power
Clock Power
Leakage Power
Minimizing Active Power:
Coarse Grained Voltage Islands
Vdd1
Vdd2
 Trade off power for delay by
Vdd0
running functional blocks at
different voltages
SWITCH
SWITCH
 Can use mix of Low and High Vt to
balance performance and leakage
High VT
LOGIC
LOGIC
 Switch off inactive blocks to
reduce leakage power
IP 1
IP 2
E.g.: Telecom ASIC 1.0/1.2 V islands saved:
16 % active power
Power Management Unit
50 % standby power
Fine-Grained Voltage Islands
PowerPC 405
Secondary power drop
Vddl = 1.2V
Vddh = 1.5V
 No timing degrade, and no area increase for the core!
Outline
Voltage
Clock & Latch
Power
Islands
Optimization
Gating
Active Power
Clock Power
Leakage Power
Minimizing Clock Power:
Local Clock buffer - Latch clustering
 Clocks consume large amount of power in high-performance designs

Large portion of that power goes to the last stage of the clock tree
 Minimize the Capacitive loading on local clock buffers by clustering
latches around them.


Tradeoff between latch placement flexibility and clock power savings
Reduction in clock skew between capturing and launching latch
compensates for loss in latch placement flexibility.
Clock Power Savings
70
% Capacitance Savings
60
Wire
Total
50
40
30
20
10
c1_0
c1_1
c1_2
c1_3
c1_4
c1_5
c1_6
c1_7
c1_8
c1_9
c1_10
c1_11
c1_12
c2_0
c2_1
c2_2
c2_3
c2_4
c2_5
c2_6
c2_7
c2_8
c2_9
c2_10
c2_11
c2_12
0
Clock Net
 Reduces total capacitance on the local clock buffer by 25%
 Direct savings in clock power in the Random Control Logic
Outline
Voltage
Clock & Latch
Power
Islands
Optimization
Gating
Active Power
Clock Power
Leakage Power
Minimizing Leakage Power:
Power Supply Gating
Logic Block
SLEEP
Footer Switch
 Leakage power is now more than switching power

Limits the performance of microprocessors
 Power gating is one of the most effective ways of minimizing
leakage power

Cut-off power to inactive units/components



Dynamic/workload based power gating
Reduces both gate and sub-threshold leakage
Over 20-2000x reduction in leakage with little or no cycle time penalty.
Power Gating Concept
Performance on Demand
P1
P2
Dedicated Units
off
on
P1
L2
P3
P2
L2
P4
P3
P4
More Power Available to Scalar Units
Dedicated Units Available for
Higher SPEC Performance
Higher Application Performance
Normal Operation Mode
VDDL
IDS,MAX
CORE
VGS = VDD
IDS
VGND
VDS,LINEAR
VGS = 0 V
IACTIVE
VDS
GNDL
To reduce the performance degradation, the voltage
drop across SLEEP transistor should be minimized to
reduce active leakage current. Requires sizing up of
footer device
Sleep Mode
VDDL
CORE
IDS,MAX
VGS = VDD
IDS
VGND
VGS = 0 V
VDS
GNDL
During the sleep mode, all of the internal
capacitive nodes and VGND node are charged
up to near VDD. Requires sizing down of footer
device to reduce standby leakage.
Wake-Up Mode
VDDL
IDS,MAX
CORE
VGS = VDD
IDS
VGND
ITURN_ON
Rs
VGS = 0 V
VDS
GNDL
When the SLEEP transistor is turned on,
the maximum instant current can flow.
Requires sizing up of footer device.
Sleep / Wake / Run State Control
Exit sleep
state
off
assert discharge
wake
Enter sleep
state
run
enable
fence
assert disable
&
run
fence
deassert
wake/run
run
off
charge
Power Supply Current (leakage) - gxpsi_channel_mac)
50
discharge
cycle
(wake)
45
40
current (mA)
35
30
charge
cycles
25
20
15
sleep
10
5
0
0
sleep
run
(idle)
2
4
6
time (nsec)
8
10
Footer Selection and Sizing
Power Gate Area vs. Frequency and Leakage Reduction
5
6.5
15.5x
10x-20x Leakage Reduction
4.5
6
Frequency loss (%)
5
20x
3.5
4.5
4
25x
3
3.5
2.5
Reg. Vth
Reg. Vth lkg
3
33x
2
2.5
2
50x
1.5
1.5
1
1
100x
< 1% Frequency Loss
0.5
0.5
0
0
50
100
150
200
Footer gate width (um)
250
%
of reference
leakage
Reduction
Leakage
5.5
4
300
0
350
Power vs Performance Tradeoff
130nm Hardware
~8% Performance Degradation
Due to Sleep Transistor
with 1% area overhead
Target Specification: 250MHz at 0.9V ~ 500MHz at 1.4V
1% footer size is used for a 2-stage pipelined 40-bit ALU
Sleep Transistor Sizing and Performance
130nm Hardware
Less Than 2%
Performance
Degradation
More Than 8%
Performance
Degradation
Leakage Power Reduction
130nm Hardware
Leakage Suppression Using
VDD Scaling
~8.4 x
~2000 x
Leakage Suppression using Power Gating
Structure with 1% area overhead
Physical Design:
External Footer Switch
Global
Grid
GND
VGND
Macro/Core
M1 metal
Virtual
Grid
M2 metal
Footer Switch
Location
Physical Design:
Internal Footer Switch
GND
VDD
GND
VDD
GND
VDD
VGND
VGND
M1 metal
VDD
VDD
VGND
Footer Locations
M2 metal
 Internal fine-grained power gating is more efficient in
addressing:

Electro-Migration and Current Delivery.
Ground Redistribution
The ‘real’ chip-level ground
distribution is M4 and above.
It is unchanged by power gating
This part of the
redistribution is
electrically
similar to an
unmodified
distribution
Virtual ground
M3
V2
M2
V1
M1
Contact
Logic Device
Global ground
Footer Cell
Physical Design: Footer Insertion
Footer Rows
Without Footers
With Footers
Power Gating in High-Performance
Gated and non-gated
logic have
identical width
5% total area
overhead
for power gating
20X leakage
reduction
<1% performance
degradation
Non-gated Logic
Gated Logic
Power Gating: Footer area overhead
% WC Area Overhead
14
12
10.4%
10
8
Custom
5.7%
6
RLM
4
2
0
1
3
5
7
9
Macro
11
13
15
10mV Virtual Ground
Conclusions
 Power is the limiting factor in traditional CMOS scaling
and must be dealt with aggressively
Controlling leakage is crucial for future scaling
 Power gating and voltage islands are effective
techniques to minimize leakage and active power
 Special consideration to clock distribution must be given
in high performance designs to minimize clock power

 In order to keep hot chips cool, a holistic power
minimization approach across the whole design stack is
required which must include :
Device level techniques
 Circuit level techniques
 System level power management
