Transcript Lecture #1

ECE 679: Digital Systems
Engineering
Patrick Chiang
Office Hours: 1-2PM Mon-Thurs
GLSN 100
Class Introductions
• Who am I
• Who are you
Class Basics
• Class basics
– 4 Homeworks (%20) (groups of 2)
– Midterm (%40)
– Final Project (%40)
• 4-page IEEE report
• 10 minute presentation (groups of 2)
• Guest lecture (Dr. Frank O’Mahony)
– Intel Research Labs (May 4th)
– Intel Field Trip (June 7th) TBD
• Presentations of 1-2 best project reports
• Homework
Class Homework
– Skim Dally/Poulton “Digital Systems Engineering”
• Chapter 3
– Skim Overview Paper:
http://mos.stanford.edu/papers/mh_micro_98.pdf
– Includes running Stat Eye
• Oregon State Matlab (eecs.oregonstate.edu/it)
• www.stateye.org
– Problem Set #1
• rlc files -- ~pchiang/hspice (rlc_spice_deck; rlc.rlc)
• Spice models -- ~pchiang/hspice/process_files/
– 130nm to 22nm
– Simulator lang = spice
• Spectre models –
DEFINE gpdk090 /nfs/guille/analog/c/cdsmgr/process/gpdk090_v3.8/libs.cdb/gpdk090
What does this mean for analog
designers?
• Ever build an ADC?
– Ever wonder what to do with the digital bits?
8-16 bits
@ 100MHz, 200MHz, 400MHz
Goes to Vector analyzer
Analog
Fs = 600MHz
• Why does this clock rate not increase?
• What really is this output doing? Where
is it going?
Brief Summary
• Introduction to the area
– Why serial links are important
– What are the current technology
trends/limitations
4Gb/s Low Power, Area Efficient Serial Links
Memory
IBM Processor
•
Interconnection between
different chips
•
Transmitter Equalization
•
Receiver Offset Cancellation
2000 0.25um Testchip 2001 0.25um Testchip
CPU
High-speed I/Os
CPU
From/to other
subsystems
(e.g. backplane)
Transmitter
Output
Receiver
Input
Router Backplane(1m, FR4)
Ming-Ju E. Lee, William J. Dally, John W. Poulton, Patrick Chiang, Stephen F. Greenwood. An 84-mW 4Gb/s Clock and Data Recovery Circuit for Serial Link Applications. VLSI Circuits
Symposium, Kyoto, Japan, June 2001, pp. 149-152.
Ming-Ju E. Lee, William Dally, Patrick Chiang. Low-Power Area-Efficient High-Speed I/O Circuit Techniques. IEEE Journal of Solid-State Circuits, November 2000, Vol. 35, No. 11, pp.
1591-1599.
Scaling Serial Links:
From 4Gb/s->20Gb/s
• Thesis: Develop 20Gb/s Serial Link
– Area: 500um x 500um
– Power: 200mW/link
• 1 bit time = 1FO4
• Timing uncertainty becomes KEY issue
v
4Gb/s
Eye Diagram
250ps
v
t
20Gb/s
Eye Diagram
50ps
t
Transmitter Block Diagram
No post-PLL
Clock Buffers
Test Chip
Test Interface
700um
10GHz
PRBS Check
PLL
Phase
Interpolators
DLL
TX
Transmitter
Clock
Muxing
RX
Test
Structures
Recovery
PRBS Gen
1.1mm
• UMC 1.2V, 0.13um CMOS(single Vt)
• Die size 700um x 1.15mm
• 50 Ohm Pad Termination using Wafer Probes
PLL Measurements
Power Spectrum

Open Loop VCO
Phase Noise
@ 1MHz
-97dBc/Hz
10GHz Jitter (RMS)
0.97ps
10GHz Jitter(pk-pk)
8.0ps
PLL Power
38.6mW
VCO Power
6mW
Tuning Range
1.14-1.31
R ICPKVCO
2
2
(a)
Q=10 Jitter
(b)
Q=5 Jitter
(c)
• Jitter limited by 1.25GHz input reference clock
– HP 8133A input clock (1.2ps RMS, 8.9ps pk-pk)
Eye Diagram
Jitter
2.2ps RMS
15.6ps pk-pk
• Data Rate = 19.2Gb/s
• Voltage ripple caused by lack of current source
at differential pair tail node
High Speed Transmitter Comparisons
P. Chiang
VLSI 2004
Data Rate (Gb/s) 20Gb/s
Power
165mW
Area
0.2275mm^2
Jitter (RMS, pk-pk)2.37ps, 15ps
Technology
0.13um CMOS
J. Kim
ISSCC 2005
40Gb/s
2.7W
9.18mm^2
1.53ps, 8.11 ps
0.13um CMOS
U. Singh
D. Shaeffer
VLSI 2005
ISSCC 2003
34Gb/s
40Gb/s
1.335W
4.9W
4.16mm^2
8.25mm^2
1.44ps, 9.44ps 880fs, 5.1ps
0.18um CMOS 0.09um SiGe
A 250mW Full-Rate 10Gb/s Transceiver Core in 90nm CMOS
using a Tri-State Binary PD with 100ps Gated Digital Output
T. Masuda, et. al., ISSCC 2007.
A full-rate 10Gb/s transceiver core employing a tri-state binary PD
with 100ps gated digital output is implemented in a 90nm CMOS
process. Direct drive from the VCO is utilized to eliminate the
10GHz clock buffer current. The RX exhibits a recovered jitter
of 906fs(rms) and an input sensitivity of 5.9mV. The TX generates
a jitter of 5mUI(rms). The chip consumes 250mW.
Conventional Serial Link Receivers
• Conventional architectures
also use multi-phase PLL
– Static Phase Offset
– Power Supply Sensitivity
Multiphase
PLL
ck[0] ck[1] ck[2] ck[3]
D[0]
D[1]
In Data
20Gb/s
Pre-Amp
D[2]
D[3]
2nd Generation Transmitter
Charge
Varactor
Control
Equalizing
Path
10GHz CLK
10GHz
10GHz CLKB
Oscillator
Pump
Equalizing
Path
Off Chip
Phase
5GHz->2.5GHz
Divider
1.25GHz
2:1
Divider
@ 1.25GHz Comparator
4 Low-High
4
50ps
Delay
2:1
MUX
Buffers
8
2.5GHz
Low-High
Buffers
8 phases @
5GHz
2:1
PRBS/
BER
Checker
10GHz->5GHz
Divider
Data
2:1
Retiming
2:1
2:1
4 phases @
5GHz
5Gb/s
5Gb/s
2:1
MUX
10Gb/s
2:1
MUX
5Gb/s
5Gb/s
2:1
MUX
10Gb/s
• 2-Tap Equalizer implemented for compensating
for channel losses
– Achieve 50ps analog delay with CML buffers
20Gb/s
Main
Path
Fabrication: Test Chip
• ST Microelectronics 0.13um test
chip
Receiver
2.5GHz->
1.25GHz
L-H
10Ghz-5>Ghz
Divider
Low
Swing
5GHz
10GHz
VCO
10GHz
M
D
0
I
R
=
6
0
M
I
D
0
b
M
A
N
M
B
C
0
L
I
In0
L-H Out0
In1
Low
Swing
L-H Out1
. 8. .
L-H
In1
2.5GHz
Digital
Swing
R
=
6
0 M
In1[a]
In1[b]
Out1[a]
Out1[b]
WP / WN =
4/1
I
N
M N
MINA1,
A
B
C
MINA2 =
CLKb
1
1
L
20.48u/
K
K
M 0.26u
VCUR
B
all other
0
M=
40.96u/
CML Divider
0.13u
N
CLK
0
500um
600um
Digital
Swing
From
Charge Pump
350um
Transmitter
– 307mW / transceiver
– 0.46mm^2
– 20mV input sensitivity
To Phase
Comparator
2006 0.13um Test Chip
450um
NormalSized
Inverters
Stage
Low-High
Converter [L-H]
Results
80mV
20Gb/s
Ideal Channel
All Results
Single-Ended
43ps
33mV
20Gb/s
-6.5dB @ 10GHz
37ps
Results (cont’d)
20Gb/s
Ideal Channel
with α=0.37
20Gb/s
-6.5dB @ 10GHz
with α=0.37
72mV
36.4ps
62mV
35ps
Rationale for Multi-cores
•
Next generation computing – Multi-core Processing
– i.e. multiple, parallel DSPs (i.e. MACs)
•
Why we cannot achieve faster frequencies?
– Wire delays don’t scale like transistors
– Power increases exponentially
(when pushing process technology)
– Timing margins degraded by
• Variability
• Power supply noise
• Digital crosstalk
•
NOTE: More independent threads require more
memory bandwidth
Intel, 80 Cores, ISSCC 2007
Research: Explore Parallel Serial Links
Serial Links also exhibit the same characteristics
– Channel losses get worse
– Power consumption increases significantly with bandwidth
– Timing precision limited by:
• Static Phase Offset (process variation)
• Power-supply Induced Jitter
• Interchannel Crosstalk
Serial Links need to to also push for high amounts of parallelism
– How is this different than conventional link design?
• Channel equalization becomes more difficult
– Adjacent channel crosstalk
– Difficult channel estimation problem
(power, flexibility, data-rate, equalizer design, channel, distance)
• Amortize Clock Power for Multiple Links
– Distributed resonant clocking of analog/mixed-signal front-end’s
Problem of IO
• 2500 pins / 2 = 1200 Differential pins
• Assume 10Gbs / link = 12 Tb/s Bandwidth
• 100mW/Gb(bandwidth) = 120W
Stateye Playing
• Fun with Stat-Eye
– 5Gb/s -> 10Gb/s
– Worse Channels
– Worse timing jitter
• Homework examples
Next Time
• Telegrapher’s Equation
– Reflection coefficients
• Channel Models
– Skin Effect
– Dielectric constant
– vias