High Rate Wave-pipelined Asynchronous On-chip Bit

Download Report

Transcript High Rate Wave-pipelined Asynchronous On-chip Bit

Technion – Israel Institute of Technology
Electrical Engineering Department – VLSI Lab
High Rate Wave-pipelined
Asynchronous On-chip Bit-serial
Data Link
R. Dobkin, T. Liran, Y. Perelman,
A. Kolodny, R. Ginosar
March 12, 2007
ASYNC07
Presentation Outline
• Why Serial Link?
• Fast Asynchronous Serial Link
• Transmitter, Fast LEDR Encoder
• Receiver, Fast Toggle Circuit
• Channel, Current Mode Async Signaling
• Performance
• Summary
2
ASYNC07
Serial Link Employment Benefits
• Why Serial Link?
•
•
•
•
Link
Length
[mm]
Less interconnect area
Less routing congestion
Less coupling
Less power (depends on range)
• The relative improvement grows
with technology scaling. The
example on the right refers to:
• Single gate delay serial link
• Fully-shielded parallel link with
8 gate delay clock cycle
• Equal bit-rate
• Word width N=8
Serial Link
dissipates less power
Parallel Link
dissipates less power
Serial Link
requires less area
Parallel Link
requires less area
Technology Node [nm]
3
ASYNC07
Serial Link Applications
• P2P long-range interconnect
• Long range NoC links
• Pin-limited on-chip module interfaces
• Presently chips are pin-limited, and that will migrate
inside
• Cross-bar
• Simpler routing and congestion
• Communications inside many-core CMPs
4
ASYNC07
Serial Link – Top Structure
Word Ack
Sender
Synchr.
Serializer
& LEDR
Encoder
P
Bit-Serial Channel
S
DeSerializer
& LEDR
Decoder
Synchr.
Receiver
• Transition signaling instead of sampling: two-phase
NRZ Level Encoded Dual Rail (LEDR) asynchronous
protocol, a.k.a. data-strobe (DS)
• Acknowledge per word instead of per bit
• Wave-pipelining over channel
• Differential encoding (DS-DE, IEEE1355-95)
• Low-latency synchronizers
5
ASYNC07
Encoding –Two Phase NRZ LEDR
• Two Phase Non-Return-to-Zero Level Encoded Dual Rail
• “delta” encoding (one transition per bit)
Uncoded (B)
0
0
1
1
0
0
0
0
1
0
Phase bit (P)
State bit (S)
 B(i ), i odd
P(i )  
 B(i ), i even
S (i )  B(i) i
6
ASYNC07
Transmitter – Fast SR Approach
Transition
Generator
Uncoded Data
Load Enable
Parallel Load Interface
T0
T0
T90
T90
OT0
P
P
Shift-Register, SR
Beven
(B)
Beven
LEDR P
Encoder S
P
Bodd
Bodd
S
S
OT0
OT90
OT90
S
• Targeted Speed: One gate delay between bits
7
ASYNC07
Fast Asynchronous Shift Register
XL[W-2]
XL[1]
XL[0]
Not
Connected
T
OUT
Data(W-1)(W-2)
Data10
Data-0
Parallel Load Interface
8
ASYNC07
Wave-pipelined Control Characteristics
Cut-off
65
Frequency
60
[GHz]
• The highest speed (the single gatedelay cycle) relates to the pole of
the Bode diagram
• This operating point results in
signal degradation along the
inverter chain
55
50
45
40
35
30
Single Gate Delay Rate
25
0
Voltage
Swing 8
Gain
6
[dB]
5
10
15
20
25
30
Inverter Chain Length, N
Bias
Gain 5
[dB]
0
SWING
4
-5
2
0
-2
-4
-6
-8
1.00E+09
9
-10
0.5 Full Swing
0.6 Full Swing
0.7 Full Swing
0.8 Full Swing
0.9 Full Swing
Full Swing
-15
-20
1.00E+10
Frequency [Hz]
1.00E+11
-25
1.00E+09
BIAS
DVb=0
DVb=0.05
DVb=0.1
DVb=0.15
DVb=0.2
DVb=0.25
DVb=0.3
1.00E+10
1.00E+11
Frequency [Hz]
ASYNC07
Splitter Architecture
•
•
•
•
The shift-register is partitioned into M shift-registers
M slower operation in each shift-register
Signal is no longer degraded 
Single gate-delay operation is localized to output (input) stage only
PARALLEL LOAD
PARALLEL READ
Shift-Register for Odd Bits
Shift-Register for Odd Bits
Merge
Shift-Register for Even Bits
Shift-Register for Even Bits
PARALLEL LOAD
PARALLEL READ
Transmitter
10
Split
Receiver
ASYNC07
Transmitter Splitter Architecture
LEDR ENCODER
BODD
P
SRODD
BODD
P
C
XLODD[N/2]
MergeODD
XOR
XLEVEN[N/2]
MergeEVEN
C90
SREVEN
S
BEVEN
S
BEVEN
11
Transmitter – SPICE Simulation (65nm node)
60 ps
C0
BEVEN
1
0
0
0
1
0
1
0
C90
BODD
START
BIT=1
0
0
1
0
1
0
1
Dummy
C
30 ps
Simulations
done at
P
S
1
12
1
0
0 0
0
1 0
0
1
1
0
0
1
1
0
ASYNC07
Receiver
13
ASYNC07
Receiver Splitter Architecture
SPLIT
SODD
SRODD
A
XLODD[1]
C
T
TOGGLE
XLEVEN[1]
S
B
SREVEN
SEVEN
14
Toggle Circuit
• Straightforward implementation (fundamental asynchronous
state machine) is too slow (supports only ~1.5 gate delay cycle)
• Novel toggle:
• Single gate delay operation support
• Internal and output latches
T
B
15
A
ASYNC07
Channel
• Four transmission lines (DS-DE)
• High metal layers utilization
• Metals 5-8 of 65nm process
• RLC modeled
• Careful layout
• Small crosstalk
• Small relative variations
16
ASYNC07
LEDR Interconnect Layout
P
17
S
P
S
P
S
ASYNC07
Differential Channel Driver and Receiver
• Current mode differential low-swing signaling
• Currents in opposite directions
• Controllable current return path
Driver
Receiver
i
P/S
a
a
z
z
R
o
b
R
SA
P/S
o
i
18
ASYNC07
Channel Characteristic Impedance
Z
• Z depends on F
• Voltage changes with F
• Fast changes  voltage drifts
• The drifts bound the
operating speed
F
S
Z0 
S
19
R  j L
  j C
R  RDC    (1  j )  
Based on data from BPTM. Drawn for constant
R, L, C
ASYNC07
Channel Driver with Adaptive Control
• Compensates for Z changes
• Turned on for low frequencies
IN
OUT
Adaptive Control
Inertial Delay
20
ASYNC07
Adaptive Control – Simulation Example
• SPICE simulation setup:
• 65nm technology, 4mm range, 67Gbps data rate
• RLC modeled channel (using Raphael-like three-dimensional field solver)
• Adaptive control is turned on only for low frequencies
Data
Adaptive
Control
Currents
21
Low Frequency
Turns Adaptive Control On
ASYNC07
Channel Receiver Amplifier
R
B
R
IN
B
OUT
22
ASYNC07
Performance
• SPICE simulation show correct operation at target data
cycle of 15ps (65nm technology node)
• Power for 67Gbps 4mm 16-bit word link under 100%
utilization:
TX-SR
RX-SR
• Total power: 150mW
• Channel differential pair: 18mW
• Leakage power: 4mW
(due to low VT transistors employment)
Channel Diff Pair
• Power reduction
• Deeper split ( M power reduction)
• Circuit optimizations
• Circuit shut down during idle states
23
ASYNC07
In-Die Variations
• Splitter architecture
• High-speed operation localized to input and output stages
• High-speed components design and verification
• Monte-Carlo simulations (>5)
• 26 PVT Corners
• Iterative design with legging and sizing for sensitive
transistors
• Asynchronous structure
• Supports any slow down
• Minimal time separation between successive bits must be
provided!
24
ASYNC07
Summary
• High speed Serial Link requires special circuits:
• Fast serializers and de-serializers
• Wave-pipelined control
• Splitter architecture:
• Long word transmission
• Power reduction
• On-the-fly LEDR encoding
• Adaptive control for fast asynchronous signals handling
• Low crosstalk interconnect layout
• Single FO4 inverter delay data cycle support (15ps on 65nm process, 67 Gbps)
• The Serial Link preferred over Parallel Link thanks to:
• Reduced Interconnect and Active area
• Easier routing, less coupling
• Reduced power for long on-chip interconnects
25
ASYNC07
The End
• Thank you
26
ASYNC07