Interconnect power

Download Report

Transcript Interconnect power

Modeling and Optimization of VLSI Interconnect
049031
Lecture 6: Interconnect power
Avinoam Kolodny
Konstantin Moiseev
1
Outline

Interconnect power modeling





Definition
Activity factor (AF) and signal probability (SP) and relations between them
Cross-coupling power. Miller Coupling Factor for timing and power
Relation between MCF and AF
AF and SP generation
 Interconnect power breakdown




Interconnect length distribution
Local and global interconnects and their power
Clock power
Interconnect power of total power
 Interconnect power prediction

Interconnect length prediction
 Rent’s rule
 Donath’s model

Fanout prediction
2
1 Google search = ?
 Same energy as 11-watt light bulb for an 1 hr
 Emit 7gr CO2
 There are 0.4B Google searches daily
Adopted from Muhammad Abozaed, Intel
3
So why power is important?
 Mobile – battery life
 Reliability - Power density
1000
Pentium® 4 proc
100
Power
(Watts)
10
1
 User experience – skin temperature
1000's of
Watts?
Pentium® proc
8086
0.1
1970
386
8080
1980
1990
2000
2010
2020
 Servers – cooling costs, environmental heating
4
Electrical Energy
 Energy is defined as the ability to do work
 Electrical energy is energy stored in an electric field or
transported by an electric current
 Electrical energy can be:



Dissipated as heat by an electric current flowing through
resistor
Stored in a capacitor
Transformed to magnetic field energy
 The work performed by current
difference V during time T is:
I on section with voltage
T
E
 I (t )  V (t )dt
0
5
Power
 Power is work performed per unit time
 Measured in Watts
 In VLSI, the power is usually either consumed or
dissipated
 Consumed from the source
 Dissipated by resistors (converted to heat)
 The average power dissipation by current
with voltage difference V during time T is:
1
P
T
I
T
 I (t )  V (t )dt
0
6
Power dissipation sources
Power
dissipation
Dynamic
power
Static power
Short-circuit
power
7
Energy dissipation in RC circuit
 First stage – charging capacitor:
R
• Capacitor current:
dV
IC  C C
dt
VR
• Energy stored in the capacitor:
T
T
0
0
EC   Vc (t ) I (t )dt C  Vc (t )
C
VDD

0
dVC
dt 
dt
ES   VDD I (t )dt CVDD
0
Vc
C
CVDD 2
Vc (t )dVC 
2
• Energy dissipated by the source:
T
VDD
I
VDD
 dV
C
Assumption: VC (t  T )  VDD
 CVDD 2
0
• Energy dissipated by the resistor (converted to
heat):
CVDD 2
ER  ES  EC 
2
8
Energy dissipation in RC circuit
 Second stage – discharging capacitor:
R
• Capacitor current:
IC  C
dVC
dt
VR
• Energy freed by the capacitor:
T
T
0
0
EC   Vc (t ) I (t )dt   C  Vc (t )
dVC
dt 
dt
VDD
Vc
I
C
0
CVDD 2
 C  Vc (t )dVC 
2
VDD
• Energy dissipated by the source:
T
ES   VDD I (t )dt 0
Assumption: VC (t  T )  0
0
CVDD 2
• Energy dissipated by the resistor (converted to heat): ER  EC 
2
9
Dynamic power dissipation in
VLSI
 So, for two capacitor switches (charge and
discharge), the energy dissipated is CVDD2
 For two switches of signal during time T (clock
period), the average power dissipation is
CVDD 2
P
 CVDD 2 f
T
 If the signal switches 2 times in average during
time T, then the average power dissipation is
P   CVDD f
2

is called activity factor
10
Dynamic power contributors
 Dynamic power dissipation:
P   CVDD 2 f
 The capacitance is contributed by three elements:
 Self-capacitance and cross-coupling capacitance
Layer 3
Coupling
capacitance
Cupper
Area and
fringe
capacitance
Layer 2
Cside
Cside
Clower
Layer 1
C
= Clower +Cupper +Cside,1 +Cside,2
P = α  Carea+fringe +Ccoupling
V
DD
2
f
11
Coupling capacitance
calculation
 Coupling capacitance
value depends on
neighbor wires


L
S
T
For quiet neighbors (tied to VDD or ground)
L T
Cside  
S
For switching neighbors the capacitance will
depend on switching direction


Power calculation by equivalent circuit method
Power calculation by application of Miller’s theorem
12
Equivalent circuit method
R1
 Equivalent circuit for two
coupled lines:
 Simplest case – wire is
switched from 0 to VDD;
neighbor is quite and tied to
ground, R1=R2
 Energy dissipated by each
resistor (wire) in this case is

V1
Cc
R2
V2
R1
VDD
Cc
R2
CcVDD 2
E
4
CcVDD 2
Total energy dissipated is E 
2
13
Equivalent circuit method
 For all cases of one quite wire and one switched wire the
same results as in previous slide are obtained
R1


Second case – both wires are
switched simultaneously from 0 to
VDD
The current through resistors is
VCc
I
0
R1  R2
( VCc is voltage on the capacitor)

No power dissipation in this case!
Cc
R2
Before
R1
VDD
Cc
R2
VDD
After
14
Equivalent circuit method
R1
 Third case – both wires are switched
simultaneously in opposite directions
 Current in the circuit:
I
VDD  VCc
R1  R2
(VC is the capacitor voltage)
c
 Energy consumed by the second
source is zero (voltage of source is
zero)
 Energy consumed by the first source:
E  2VDD 2 CC
R2
Cc
VDD
Before
R1
VDD
 No energy change of the capacitor
 It means all the energy is dissipated
by resistors
 Each resistor dissipates CC VDD 2, totally 2CCVDD 2
Cc
R2
After
15
Miller’s theorem
 Z is impedance
Vx
Vy
Z
Z
Z1 
(1  AV )
Z
Z2 
1
(1  AV )
AV 
Vy
Z1
Z2
Vx
16
Usage of Miller’s theorem for coupling
capacitance and power calculations
VDD
VDD
0
0
0
VX
Vy
0
VX
VDD
VDD
VDD
0
0
0
0
Vy
VX
CC
CC
AV  0 Z1  Z Z 2  0 Ptotal
VDD
VDD
AV  1
VDD
0
0
VX
Vy
VX
Vy
Vy
disconnected
Ptotal  0
Z1  Z 2  
VDD
VDD
VDD
0
0
0
VX
Vy
VX
2CC 2CC
Vy
CC
CC
AV   Z1  0 Z 2  Z
VX
CC
CCVDD 2

2
VDD
VDD
Vy
Ptotal
CV
 C DD
2
2
AV  1 Z1  Z 2 
Z
2
Ptotal  2CCVDD 2
17
Observations
 Miller’s theorem gives the same results for total
power dissipation as equivalent circuit method,
however, the results for each wire power
dissipation are inaccurate
 Total power dissipation calculated by using of
both methods is follows:



For one-wire switch – power dissipation is
CCVDD 2
Ptotal 
2
For simultaneous switch in the same direction –
there is no power dissipation
For simultaneous switch in opposite directions:
Ptotal  2CCVDD 2
18
Miller factor for power
 Miller factor is used in order to account effects of
changing coupling capacitance due to switching
 Nominal coupling capacitance is multiplied by Miller
Coupling Factor (MCF) in order to obtain real
capacitance:

For one-wire switching, MCF = 1
P

(Carea  fringe  Ccoupling )VDD 2
2
For switching in the same direction, MCF = 0
P

Carea  fringeVDD 2
2
For switching in opposite directions, MCF = 4
P
Carea  fringeVDD 2
2
 2CcouplingVDD 2
19
Recall: MCF for delay
k
V y
Vx
Vx
Vy
Z
Vx
Vy
Zx
Zx 
Z
1 k
Zy
Zy 
kZ
k 1
20
Activity factor
 Activity Factor (AF) ( a.k.a toggle rate) is an average fraction of
cycles in which signal changes from 0 to 1 or from 1 to 0, as
compared to clock signal
#signal_toggles_in_ 2 N_cycles
AF 
2N


Clock toggles twice a cycle, so its AF = 1
Combinational logic data signal normally will have maximum AF =
0.5
clk
clk
data

out
Domino signal can have AF = 1
clk
d1
d2

out
Domino
out
clk
out
Is it possible for signal to have AF > 1?
 Yes, because of glitches
21
Signal probability
 Signal probability (SP) is an average fraction of cycles in
which signal has logic value of “1”
CLK
SP = 0.5
1
SP = 1
0
1
SP ≈ 1
0
22
Relation between MCF and AF
 Assume two neighbor uncorrelated signals make N1 and
N 2 transitions during N clock cycles
 It can be shown that number of simultaneous transitions
of the signals is negligible no more than 4
 Therefore, energy dissipated by cross capacitance
between signals is
E
1
 N1  N 2  CxVdd2
2
 The power dissipated during N cycles is:
N1  N 2  CxVdd2  N1 N 2 

E
2
2
P



 CxVdd f  1   2  CxVdd f
N  tcycle
2 N  tcycle
 2N 2N 
 For the same reason, it is usually assumed that MCF=1
for uncorrelated signals
23
Activity Factors Generation
Power test vectors generation
(worst case for high power, unit stressing)
RTL full-chip simulation
(results in blocks primary inputs: Activity,Probability)
Monte-Carlo based block inputs generation
(based on the RTL statistics)
Transistor level simulation - per block
(Unit delay, tuning for glitches)
Per node activity factor
Source -”Intel® Pentium® M Processor Power Estimation, Budgeting, Optimization, and Validation”, ITJ 2003
24
Interconnect power breakdown
case study
25
Case study
 Low-power, state-of-the-art μ-processor
 Dynamic switching power analysis
 Interconnect attributes:







Length
Capacitance
Fan Out (FO)
Hierarchy data
Net type
Activity factors (AF)
Miscellaneous.
26
Power Estimation accuracy
Simulated activity
density
IREM measurement
Source -”Intel® Pentium® M Processor Power Estimation, Budgeting, Optimization, and Validation”, ITJ 2003
27
Interconnect Length Distribution
10000
1000
Number of nets
100
10
Pentium® 0.5 [um]
Pentium® MMX 0.35 [um]
Pentium® Pro 0.5 [um]
Pentium® II 0.35 [um]
Pentium® II 0.25 [um]
Pentium® III 0.18 [um]
Low Power Processor 0.13 [um]
1
0.1
0.01
0.001
1
10
100
1000
10000
100000
Net Length [um]
Source: Shekhar Y. Borkar, CRL - Intel
28
Interconnect Length Distribution
Nets vs. Net Length
• Log – Log
scale
1000
Local
100
• Global clock –
not included
Total
Number of Nets
• Exponential
decrease with
length
Global
Total
10
1
0.1
0.01
0.001
1
10
100
1000
10000
100000
Length [um]
29
Total Dynamic Power
Total Power vs. Net Length
Peak 2
 Total Dynamic
Power
 Local
nets = 66%
 Global
nets = 34%
Peak 1
90
Local
Interconnect
Total
Global
80
Normalized Dynamic Power
 Global clock –
not included
100
Total
Total
Nets: 75k
Cap:
20[nF]
FO: 20
AF: 0.055
Nets: 390k
Cap:
10[nF]
FO: 2
AF: 0.0485
70
60
50
40
30
20
10
0
1
10
1000
100
10000
100000
Length [um]
30
Local and Global Interconnect
Local Power breakdown vs. Net Length
100%
IC
Diff
Gate
80%
[uw
]
Power
60%
40%
20%
0%
4.16
8.32
16.64
32.864
65.728 131.456 262.496 523.744 1044.99 2084.99
4160
8300.45 16561.4
33930
83850
Length [um]
Global Power breakdown vs. Net Length
10 0%
IC
Diff
80%
Gate
[uw
]
60%
Power
 Local and Global IC
are different:
 Number by Length
breakdown
 IC breakdown –
cap and power
 Fan out
 Metal usage
 AF is similar
40%
20%
0%
4.16
8.32
16.64
32.86 4
65.72 8 13 1.45 6 26 2.49 6 52 3.74 4 10 44.99 20 84.99
41 60
83 00.45 16 56 1.4
33 93 0
83 85 0
Le ngth [um]
31
Power Breakdown by Net Types
Global clock included
global
signals
34%
local signals
27%
global
signals
21%
local signals
37%
global clock
13%
global clock
19%
local clock
20%
local clock
29%
Interconnect power
Total power
(Interconnect only)
(Gate, Diffusion and Interconnect)
32
Interconnect Length Prediction
 Technology projections - ITRS
 Interconnect length predictions:


?
ITRS model: 1/3 of the routing space
Davis model:
o Rent’s rule based
o Predicts number of nets as function of:
the number of gates and complexity factors
• Models calibrated based on the case study
Time
33
Future of Interconnect Power
Dynamic Power breakdown
100%
Gate
90%
80%
70%
Diffusion
60%
50%
40%
Interconnect
30%
% G POW
20%
% D POW
% IC POW
10%
0%
0.15
Source - ITRS 2001 Edition adapted
data
0.13
0.1
0.09
0.08
0.07
0.065
0.045
0.032
0.022
Technology generation [μm]
Generation
Interconnect power grows to 65%-80% within 5 years !
(using optimistic interconnect scaling)
34
Interconnect Power Prediction
Interconnect length projection
1001
Measured
N u m b er o f N ets
100 .1
Number of Nets
(normalized)
 The number of nets vs. unit
length –
Modified Davis model
model
1 0 .01
0 .001
0.1
Upper local bound
0 .0001
0.01
Lower global bound
0
0.001
0 .00001
1
10
100
1000
Length
10000
100000
[um ]
Dynamic power breakdown
100%
Interconnect
Interconnect
Diff
80%
Gate
70%
Power
Power
 The dynamic power average
breakdown
90%
60%
50%
40%
Diffusion
30%
20%
Gate
10%
0%
Local
Local
Intermediate
Intermediate
Global
Global
35
Interconnect Power Model
 Multiplication of the number of interconnects with power
breakdowns gives:
Projected dynamic power vs. net
length
6
Measured power
Proj ection
5
Power
P ow e r
(normalized)
4
3
2
1
0
1
10
100
1000
Length [μm]
L en gth
10000
100000
[um ]
The power model matches processor power distribution !
36
Experiment - Power-Aware Router
 Routing Experiment optimizing processor’s blocks

Local nodes (clock and signals) consume 66% of dynamic
power

10% of nets consume 90% of power

Min. spanning trees can save over 20% Interconnect power

Routing with spacing can save up to 40% Interconnect power
Small block’s local clock network
37
Power-Aware Router Flow
Power grid routing
Clock tree:
high FO, long lines, very active
Avoiding congestion
Clock tree routing
With spacing
Top n% power consuming
signal nets routing
Global and Detailed Routing of the un-routed nets
(timing and congestion driven)
Rip-up: not high power nets
All nets
routed?
No Power-aware Rip up
and re-route
Yes
Followed by downsizing
Finish
38
Results - Power Saving
Dynamic power saving
60%
Driver Downsizing
50%
Router Power Saving
40%
30%
Downsize
saving
20%
Average
10%
Router
saving
0%
Block A
Block B
Block C
Block D
Block E
Average saving results: 14.3% for ASIC blocks 1
1 - Estimated based on clock interconnect power
39
Backup
40
Rent’s Rule
Empirical rule
Terminals versus
Number of gates.
 Published by:
Taken from Krishna Saraswat in SLIP 2000
B. S. Landman and R. L. Russo. On a pin versus block relationship for partitions of logic graphs.
IEEE Trans. on Comput., vol. C--20: pages 1469--1479, 1971.
41
Rent’s parameters
Rent’s rule: T = k N r
T
N
k
r
= # of I/O terminals (pins)
= # of gates
= avg. I/O’s per gate
= Rent’s exponent
can be: 0 < r < 1 , but common (simple) 0.5 < r < 0.75 (complex)
N gates
T terminals
42
Rent’s Rule Example
Lets assume Rent’s parameters: r=0.79 and k=2.
For a single gate: N=1
T  k  N r  2 10.79  2
For a block of four gates: N=4
T  k  N r  2  40.79  6
Fan out is implied by Rent.
43
Is Rent’s rule a coincidence ?
Random circuits do not obey Rent.
Rent’s parameters are correlated with Place and
Route algorithms.
P. Verplaetse J. Dambre D. Stroobandt J. Van Campenhout. On Partitioning vs. Placement Rent
Properties. In Proc. of Intl. Workshop on System-Level Interconnect Prediction, March 2001.
Self similarity within circuits – Obeys Rent.
Assumption: the complexity of the interconnection topology is equal at all levels.
Conclusion – Rent’s rule is a result of the design and synthesis.
44
Donath’s Hierarchical Placement Model
1. Partition the circuit
4 equal sized modules, with a minimal cut.
2. Partition the Manhattan grid
4 equal sized modules, with a minimal cut.
3. Map the modules to the grid
Arbitrary mapping.
4. Repeat recursively
Until each block is assigned to one cell.
Result – Rent’s parameters
W. E. Donath. Placement and Average Interconnection Lengths of Computer Logic.
IEEE Trans. on Circuits & Syst., vol. CAS-26, pp. 272-277, 1979.
45
Donath’s length estimation model
For the i-th level:
There are
4i blocks
r
N
k   i  terminals
4 
r
k N
  i  nets
2 4 
For each block there are:
Assuming two-terminal nets :
The nets of the i-1 level must be substracted.
Nets for level i : ni=
r
r
r
i k  N 
i 1 k  N 
i k  N 
4    i  - 4    i 1   4    i   1  4r 1 
2 4 
2 4 
2 4 
46
Average interconnection length
The wires can be of two types A and D.

LA =


   i
iA 1 j A 1 iB 1 jB 1
A
 iB  jB  jA 
4
1
  
3
3
4
 2  iA  jA  iB  jB 

LD =




iA 1 j A 1 iB 1 jB 1

14  
2

The average: ri=
9
9
I
ni  ri

Overall : R  i 1I
equals
 ni
4
 2 
Taken from a SLIP 2001 tutorial by Dirk Stroobandt
2  N r 0.5  1 1  N r 1.5   1  4r 1 
  7  r 0.5


r 1.5  
r 1 
9  4
1 1  4
1

N
 

i 1
47
Results Donath
Scaling of the
average length L as a
function of the
number of logic
blocks N :
N
(r  0.5)

L  log( N ) (r  0.5)
 f (r )
( r  0.5)

r  0.5
30
25
r = 0.7
20
L 15
r = 0.5
10
r = 0.3
5
0
1
10 100 103 104 105 106 107
G
N
Similar to measurements on placed designs.
Taken from a SLIP 1999 tutorial by Dirk Stroobandt
48
Donath’s Model - overview
Provides average net length based on the
circuit’s size and Rent parameters.
Can provide a rough net length distribution.
Obvious limitations:
Uniform distribution.
Partitioning algorithm.
Two terminals nets only.
Assumes perfect similarity.
49