A Novel Clock Distribution Network with Dynamic Deskew

Download Report

Transcript A Novel Clock Distribution Network with Dynamic Deskew

A Novel Clock Distribution
and Dynamic De-skewing
Methodology
Arjun Kapoor – University of Colorado at Boulder
Nikhil Jayakumar – Texas A&M University, College Station
Sunil P. Khatri – Texas A&M University, College Station
Introduction




Clock Distribution critical in ICs.
In typical ICs, clock is distributed to several sites
on the IC from one central clock signal.
Requirement is to minimize skew between these
sites.
One of the available networks – H-Tree


Zero skew without considering process variations
With diminishing feature size, increasing die
size, intra-die variations lead to increased skew
across a die.
Previous Approaches –
Hierachical H-tree De-skew


Phase detectors located on
the domain boundaries of
each leg of the H-tree.
Possible worst case skew
between 2 neighboring
leaves can be as high as
(2n+1)D where,
D = guardband of the
phase detector
n = number of levels
- “A Design for Digital Dynamic Clock Deskew”, Dike et.al.
Previous Approaches –
Mesh Deskew



Phase detectors used
between each pair of leaf
nodes of the H-tree.
Clock skew between
neighboring leaves is now
= D (guardband of phase
detector).
Clock skew across die is
still high mD between any 2 leaf
nodes where, m = number
of phase detectors
between the 2 leaf nodes
- “A Design for Digital Dynamic Clock Deskew”, Dike et.al.
Our Approach




Clock signal is returned
from leaf nodes.
Single phase detector at
center of tree.
All returned clock signals
are compared with the
same delayed reference
signal.
De-skewing can be done at
boot-up time or dynamically
during free cycles.
Our Approach

Use a modified buffered H-tree.

Have buffers at each level.
 Not
typically done due to process variation in
buffers.

Wire width sizing reversed.
H-tree – width decreases with level.
 Our H-tree – width increases with level to make
sure buffer at each level sees same load.
 Typical

We utilize clock shield wires and one phase
detector.
Network Topology
• Clock assumed to be routed on metal 6.
•Typical H-tree requires clock wire and 2 shield
wires on either side.
•We use an additional return wire of same width
as clock wire.
The H-Tree
• Each section of the Htree has tri-stateable
inverters in both the
forward and return clock
networks.
•Forward network –
always ON.
•Return network – only
sections on path to be
deskewed turned ON.
Wire Widths
Traditional H-tree
Our clock tree
Level
Length
Width
Length
Width
1
5000
50
5000
1.5
2
5000
20
5000
1.5
3
2500
6
2500
3
4
2500
3
2500
3
5
1250
1.5
1250
6
6
1250
1.5
1250
6
Sizes(in microns)
derived for 20mm x
20mm die.
1GHz targeted
clock frequency.
• Traditional H-tree: Wire widths larger at center,
narrower near leaf nodes – necessary to ensure
clean signals at leaf nodes.
•Our H-tree: Wire widths larger near leaf nodes and
narrower at center – to ensure each buffer sees
same load.
Deskewing Operation
We use only one phase detector unlike
previous deskewing methods.
 Clock signal returned from each node
compared with a single reference signal.

Single phase detector at chip center
 Largest skew (after deskewing) between any
2 nodes is not a function of the phase
detector – phase detector
accuracy/guardband unimportant.


Required delay achieved using tune-able
capacitor bank.
Deskewing Operation

Deskewing performed at slower clock rate
Slower clock required for phase detector to
work.
 Minimize cross-talk

 When
clock signal returns on return path, forward
path should be stable.
 Ensure that half the time period of the clock >
round trip delay of the clock signal.

Return path is grounded (acts as shield)
during non-deskew mode
Tune-able Bank at Leaf Nodes



Capacitors are binary
weighted to facilitate
precise control of delay.
Resistor added to
increase the incremental
delay per capacitor.
Value of resistor chosen
such that slew rate of
last segment is not
appreciably changed and
incremental delay is as
desired.
The Phase detector
• Condition LAG: O is low at T1 and
high at T2 -> A lags B, phase
detector not tripped.
• Phase detector said to be tripped
when condition LAG does not hold.
• Delay is incrementally increased till
the LAG condition FAILS to hold
(phase detector trips).
• Guardband of phase detector is
hence unimportant
Communicating with Tune-able
Banks and Tri-stateable Buffers.





Use a 2 wire serial communication scheme.
Use shift registers at each tune-able bank, tristateable buffer.
At most 6 bits required to address each tristate-able
node of a 6 level H-tree network.
7 bits required for a 7 bit capacitor bank.
First assert reset signal (derived from the signal
wires) – then send a 6 bit address (to address the
correct capacitance bank, return path). Next send 7bit data (capacitance value)
Addressing Mechanism
010
110
011
111


10
0
1
11
00
000
01
100
3-level H-tree
001
101
up, right = 1
down, left = 0
m-bit Decoder to Address the
Tristate-able Buffers




Serial shift registers serially shift in ‘m’ bits of the address (m is the
level in the H-tree at which the tri-state buffer is located).
Clocking stopped by last Flip-flop.
Combinational logic checks if the m-bits in the shift register match the
address of the tri-state buffer.
HIT signal generated if all m-bits are in and address is a match
7-bit Decoder for Selecting
Capacitance Value



Data shifted in serially (similar to the scheme used to address the tristate buffers).
HIT signal from the decoder of the last tristate-able buffer produces a
reset pulse
Clocking stopped by last Flip-flop (let go again only when the next HIT
signal arrives).
Overall Operation of the Serial
Communication Scheme
Follow the sequence of:
Serial-reset – transmit address – transmit-data
sequence
 Each such sequence requires 13 clock cycles
 Each leaf node requires at most 27 (for a 7-bit
capacitor bank) such sequences.
 With deskew done at 100Mhz, a 6-level H-tree
(64 leaf nodes) would be deskewed in about
1ms.

Experimental Results

Simulated process variations (tox,µ, leff, VT)
- values as suggested by:
“Characterization and modelling of clock skew with
process variations”, Zarkesh-Ha et.al.
Initial Skew 115 ps
After dynamic de-skew skew reduced to 3ps
….Experimental Results


Compared against traditional (non-buffered) Htree with no deskew mechanism (operating at
1Ghz).
7.9% lower power in our network


Many small buffers used.
Wire loads involved are smaller (improvement would be
higher for higher frequencies).
Category
Orig. Area
Our Area
Ovh.
Wiring
1.635x106
2.21x106
34.86%
480
–
18432
18432
TS inverters
–
4408
TS controllers
–
307
Capacitance
controllers
–
410
Capacitors
–
4880
Central Ck
Driver
Regenerators
24.56%
Conclusions
We have a novel clock distribution network
with dynamic de-skewing capability
 We can de-skew nodes that are skewed
by 300ps down to 3ps
 We do this with a 7.9% power reduction
and 34% area overhead when compared
to a traditional H-tree

Thank you.