Transcript PPT - Unife

The Interconnect
Delay Bottleneck
Interconnect delay
Relative delay is growing even
for optimized interconnects
Wire delay does not scale with technology,
literature says. Is it true?
Experimental Setup
The link performance was explored with an experimental setup consisting of
a 2-switch test architecture tuning the following physical synthesis parameters:
 The link length spanned from 1.5mm to 10mm
 Up to 9 pipeline stages inserted
 The channel width ranged from 250um to 10um by means of non-routable
obstructions.
 Target frequency for synthesis: from 250MHz to 1Ghz.
 Two technology libraries utilized:
• Low-Power Low-Vth 65nm
• Low-Power Std-Vth 45nm
(so that buffers have almost the same delay and the net impact of wire
parasitics is pointed out)
Switch 1
Switch 0
LINK LENGTH
Link Performance (1)
SW
0
SW
1
LINK LENGTH
800
Target Frequency (MHz)
Achieved Frequency (MHz)
1000
600
400
200
0
65nm
45nm
1.5mm
65nm
45nm
3mm
65nm
45nm
5mm
65nm
45nm
8mm
65nm
45nm
10mm
Link length
The performance of the link degrades by incrementing the inter-switch spacing
In 65nm even a loose target of 250MHz is not achieved for 8mm links while 1GHz is
hardly affordable at 1.5mm.
In 45nm the synthesis tool does not achieve the 65nm performance even for the shorter
links.
Link Buffer Distribution
 The place&route in 45nm has required a much higher number of buffer cells with high
driving strength.
 Physical properties of on-chip interconnects in 45nm are responsible for the
performance degradation!
Link Performance (2)
SW
0
1000
800
SW
1
LINK LENGTH
600
400
200
0
65nm 45nm 45nm 65nm 45nm 45nm 65nm 45nm 45nm 65nm 45nm 45nm 65nm 45nm 45nm
topo
topo
topo
topo
topo
1.5mm
3mm
5mm
8mm
10mm
Link length
A relevant perfomance speedup is achievable by utilizing a topographical approach:
 45nm library outperforms 65nm library for long links and aggressive speeds
 The awereness of back-end information becomes a must in 45nm and beyond
Target Frequency (MHz)
Achieved Frequency (MHz)
Let us now use a more modern synthesis flow based on placement-aware
logic synthesis (hereafter named the “topographical synthesis”).
Link Pipelining
The required number of pipeline stages to meet the target speed of 1Ghz
on the link was determined with incremental place&route steps:
Library
1.5mm
3mm
8mm
10mm
45nm
1
2
7
9
65nm
1
2
7
8
=> Pipeline stages are inserted manually so to break the link into segments
of equal length.
Interestingly the trend for both the 45nm and 65nm library is the same!
Switch 0
Switch 1
Gate delay
Let S be the scaling factor (S=0.7):
Load capacitance
Voltage swing of interest
CL VSWING
Device delay
τ
I
Drive current of the device
Gate delay
Shrinking of geometries
Power and delay reduction
Constant power density
Ideal scaling of MOS transistors
Smaller interconnect yields larger delays due to
the decreasing cross-sectional area
Very high level model
which neglects
sidewall coupling
tdi
and fringing
capacitances
L
C int
W
H
E
dielectric
substrate
R int
ε di

WL
t di

L

L R
HW
W
There are two interconnect scaling scenarios:
1. Local interconnects (10-500 um at 0.18 um)
(length scale set by the size of a gate)
2. Global interconnects
(length scale set by functional unit size and chip edge)
Interconnect scaling
Let S be the scaling factor (S=0.7):
Ideal scaling:
Horizontal and vertical
dimensions are equally scaled
to preserve packing density
To preserve packing density
For process integration
For process integration
Driven by gate shrinking
Bad degradation!
Tolerable
RC stays constant in spite of the scaling trend
Reliability problems
Interconnect scaling
To preserve
packing density
To reduce
resistance
Improvement by means of:
Quasi ideal scaling:
wires scaled more in the
horizontal rather than the
vertical direction, so that:
RC delay tracks S closer!!
packing density preserved
To keep
capacitance limited
Better than
ideal scaling
Should scale slightly,
but sidewall capacitance accounted for.
Tracks s closer
Better
Interconnect scaling
Ideal scaling:
Horizontal and vertical
dimensions are equally scaled
to preserve packing density
Increases
with die
size
Degradation not tolerable
Interconnect scaling
Improvement by means of:
Constant dimension scaling:
By maintaining wide and thick
wires at the higher metal levels,
RC delay can be controlled
routing density penalized!
Interconnect size unaffected
Thanks to constant cross-section
area
Thanks to constant width and ILD
Just the impact of increased
wirelength. Much better!! But still
this is a reverse scaling!!
0.13 um Cu interconnect stack
A cross-layer concern
 The
physical-layer tricks documented
above are complemented by other
techniques to tackle the interconnect
delay bottleneck:
 Migration to new bus architectures
 Link pipelining
 Placement-aware logic synthesis