Transcript globallinkx
High-Speed and Low-Power
On-Chip Global Link Using
Continuous-Time Linear Equalizer
Yulei Zhang1, James F. Buckwalter1, and Chung-Kuan Cheng2
1Dept.
of ECE, 2Dept. of CSE, UC San Diego, La Jolla, CA
19th Conference on Electrical Performance of Electronic Packaging and Systems
Oct 25, 2010 Austin, USA
Outline
Introduction
Equalized On-Chip Global Link
Driver Design for On-Chip Transmission-Line
CTLE modeling
CTLE design example
Driver-Receiver Co-Design for Low Energy per Bit
Guideline for tapered CML driver
Driver design example
Continuous-Time Linear Equalizer (CTLE) Design
Overall structure
Basic working principle
Methodology
Overall link design example
Conclusion
2
Research Motivation
Global interconnect planning becomes a challenge in
ultra-deep sub-macron (UDSM) process
Performance gap between global wire and logic gates
Conventional buffer insertion brings in larger extra power
overhead
Uninterrupted wire configurations are used to tackle
the on-chip global communication issues
On-chip T-lines to reduce interconnect power
Equalization to improve the bandwidth
State-of-the-art[Kim2009]
2Gb/s/um, < 1pJ/b, signaling over 10mm global wire in 90nm
3
Our Contributions
Contributions
Build up a novel equalized on-chip T-line structure for
global communication
Tapered CML driver + CTLE receiver
Accurate small-signal modeling on CTLE receiver to
improve the optimization quality
A design methodology to achieve driver-wire-receiver cooptimization to reduce the total energy per bit
Results of our design
20Gbps signaling over 10mm, 2.2um-pitch on-chip T-line
11ps/mm latency and 0.2pJ/b energy per bit in 45nm
4
Equalized On-Chip Global Link
Overall structure
Tapered current-mode logic (CML) drivers
Terminated differential on-chip T-line
Continuous-time linear equalizer (CTLE) receiver
Sense-amplifier based latch
5
Basic Working Principle
Tapered CML Driver
T-line
Differential wire w/ P/G shielding
Geometries (width, pitch) and termination resistance RT
CTLE Receiver
Provide low-swing differential signals to driver T-line
Tapered factor u, number of stages N, fan-out X, final stage
current ISS, driver resistance RS
Recover signal and improve eye-quality
Load resistance RL, source degeneration resistance RD and
capacitance CD, over-drive voltage Vod.
Sense-amplifier based latch
Synchronize and convert signal back to digital level
6
Tapered CML Driver Design
Output swing constraint
Design guideline [Tsuchiya2006, Heydari2004]
Begin from the final stage
For given VSW, output resistance RS
optimized with RT to increase eye-opening
Transistor size
Need to design:
1) Output resistance RS
2) Tail current ISS
3) Size of transistors W
Tapered factor u = 2.7 for delay reduction
Number of stages
Each previous stage is designed backward
by scaling with the factor u
7
CML Driver Study w/ Loaded T-line
Assume 45nm 1P11M CMOS
T-line built on M9 with M1 as reference
T = 1.2um, H = 3.5um (fixed)
Optimize W and S for eye-opening
Change of the eye-opening
with width for fixed 2um pitch
Change of the eye-opening with
pitch for equal width/spacing
8
CML Driver Design Example
Experimental observations
Design methodology
Optimal eye happens when width=spacing
Eye-opening improves with larger pitch
Choose the minimum pitch that satisfied the wire-end eyeopening requirement
Design example
9
Accurate CTLE Modeling
Design Variables: RL, RD, CD, Vod(Size)
(Vod ), (Vod ), K K (Vod )
gm
I Bias
V Vic W 2 I Bias
1
, rds
, Ibias dd
,
1.2
Vod
I Bias
RL
L KVod
CSpara 1.5fF/um W , CDpara 1.5fF/um W
CD CDex CSpara , CL CLex CDpara
[Hanumolu2005]
GainDC
Small Signal Circuit to derive H(s):
vin
G
D
gmvgs
rds ( RLCL RD CD ) ( g m rds 1) RD RLCL RL RDCD
( g m rds 1) RD rds RL
b
rds RD CD RLCL
( g m rds 1) RD rds RL
RL
CL
z
S
RD
a
vout
rds
CD
1 sRD CD
1 as bs 2
g m rds RL
( g m rds 1) RD rds RL
H ( s ) GainDC
1
RD CD
p1 1/ a
p2 a / b
10
CTLE Modeling Validation
Test case:10mm, 16mV-eye@wire-end
Blue lines: simple modeling, not consider rds and parasitics
Red line: only consider rds
Black line: the proposed accurate model
11
CTLE Design Example
Observations of CTLE study
Eye-opening improves with relaxed power constraints but tends
to be saturated
Design example
Based on the pre-optimized CML driver + T-line design
Eye-opening improved by 4X after CTLE
12
Driver-Receiver Co-Design
Methodology
Optimization Flow
Optimize driver-wire-receiver together by setting Veye/Power as
the cost function
Choose pre-designed CML/T-line/CTLE as initial solution
Driver-to-receiver step-response generation based on SPICE
simulation and CTLE modeling
Eye-opening estimation based on step-response
SQP-based non-linear optimization
Variables: [ISS,RT,RL,RD,CD,Vod]
Performance Comparison
Option A:Driver/Receiver independent design
Option B:Low-power driver/receiver co-design
13
Low Energy-per-Bit Optimization Flow
Pre-designed CML driver
Pre-designed CTLE receiver
Driver-Receiver Co-Design Initial Solution
Change variables
[ISS,RT,RL,RD,CD,Vod]
Co-Design Cost Function Estimation
SPICE generated
T-line step response
Receiver Step-Response
using CTLE modeling
Cost-Function
Veye/Power
Step-Response Based
Eye Estimation
Internal SQP (Sequential Quadratic
Optimization) routine to generate best solution
Best set of design variables in terms of
overall energy-per-bit
14
Simulated Eye Diagrams
Methodology A: driver/receiver separate design
Methodology B: driver/receiver co-design for low-power
15
Summary of Performance Comparison
Methodology A
driver/receiver
separate design
Methodology B
driver/receiver codesign for low-power
RS/ohm
47
148
RT/ohm
94
1100
RL/ohm
440
890
RD/ohm
110
1430
CD/fF
680
150
Vod/mV
60
58
Eye-Opening@CTLE/mV 91
113
Power Consumption/mW 8.1
3.8
Note: driver/receiver co-design methodology uses much larger
driver/termination resistance to reduce power, but will close the eye-opening
at the driver output and wire-end. Final eye is recovered by fully utilizing CTLE.
16
Conclusion
We propose a novel equalized on-chip global link
using CML driver and CTLE receiver
Accurate modeling for CTLE is provided to achieve
<10% correlation error and will improve eye-opening
optimization quality
Our design achieves
20Gbps signaling over 10mm, 2.2um-pitch on-chip T-line
11ps/mm latency and 0.2pJ/b energy
17
Thank You!
Q&A
18