Timing Issues II

Download Report

Transcript Timing Issues II

Digital Integrated
Circuits
A Design Perspective
Jan M. Rabaey
Anantha Chandrakasan
Borivoje Nikolić
Timing Issues
Revised from Digital Integrated Circuits, © Jan M. Rabaey el
EE141 Integrated
© Digital
Circuits2nd
1
Timing Issues
Case Study: DEC Alpha 21164
Early version of DEC Alpha 21064 used a single global
clock driver operating at 200MHz. Clock skew <200ps
Clock Frequency: 300 MHz - 9.3 Million Transistors
Total Clock Load: 3.75 nF
Power in Clock Distribution network : 20 W (out of 50)
Uses Two Level Clock Distribution:
• Single 6-stage driver at center of chip
• Secondary buffers drive left and right side
clock grid in Metal3 and Metal4
Total driver size: 58 cm!
EE141 Integrated
© Digital
Circuits2nd
2
Timing Issues
21164 Clocking
tcycle= 3.3ns
trise = 0.35ns

tskew = 150ps
Clock waveform





final drivers


pre-driver
Location of clock
driver on die
EE141 Integrated
© Digital
Circuits2nd
2 phase single wire clock,
distributed globally
2 distributed driver channels


Reduced RC delay/skew
Improved thermal distribution
3.75nF clock load
58 cm final driver width
Local inverters for matching
Conditional clocks in caches to
reduce power
More complex race checking
Device variation
3
Timing Issues
Clock Drivers
EE141 Integrated
© Digital
Circuits2nd
4
Timing Issues
Clock Skew in Alpha Processor
The resulting RC-network was extracted at every grid
points and simulated
0 skew at driver
Maximum
skew
90ps
EE141 Integrated
© Digital
Circuits2nd
5
Timing Issues
EV6 (Alpha 21264) Clocking
600 MHz – 0.35 micron CMOS
tcycle= 1.67ns
trise = 0.35ns
Global clock waveform

tskew = 50ps
2 Phase, with multiple conditional
buffered clocks
 2.8 nF clock load
 40 cm final driver width




PLL
EE141 Integrated
© Digital
Circuits2nd
Local clocks can be gated “off” to
save power
Reduced load/skew
Reduced thermal issues
Multiple clocks complicate race
checking
6
Timing Issues
21264 Clocking
EE141 Integrated
© Digital
Circuits2nd
7
Timing Issues
EV6 Clock Results
ps
300
305
310
315
320
325
330
335
340
345
ps
5
10
15
20
25
30
35
40
45
50
GCLK Skew
GCLK Rise Times
(20% to 80% Extrapolated to 0% to 100%)
EE141 Integrated
© Digital
Circuits2nd
8
Timing Issues
EV7 Clock Hierarchy
Active Skew Management and Multiple Clock Domains
+ widely dispersed
drivers
DLL
DLL
DLL
NCLK
(Mem Ctrl)
+ DLLs compensate
static and lowfrequency variation
GCLK
(CPU Core)
SYSCLK
EE141 Integrated
© Digital
Circuits2nd
L2R_CLK
(L2 Cache)
PLL
L2L_CLK
(L2 Cache)
+ divides design and
verification effort
- DLL design and
verification is added
work
+ tailored clocks
9
Timing Issues
PLL Block Diagram
Reference
clock
Up
Phase
detector
Charge
pump
Loop
filter
vcont
VCO
Down
Local
clock
Divide by
N
System
Clock
 PLL is an analog circuit, which is sensitive to all sources
of noise (it might need to be guarded by careful layout).
 When stable, the system clock is N times the reference
clock frequency and ideally 0 skew from the reference clock
EE141 Integrated
© Digital
Circuits2nd
10
Timing Issues
Clock Generation using DLLs
Delay-Locked Loop (Delay Line Based)
U
fREF
Phase
Det
D
Charge
Pump
DL
Filter
fO
Make the clocks aligned by adjusting the delay (phase)
Phase-Locked Loop (VCO-Based)
fREF
U
÷N
PD
D
CP
VCO
Filter
fO
EE141 Integrated
© Digital
Circuits2nd
11
Timing Issues
DLL-Based Clock Distribution
VCDL
•••
Digital
Circuit
•••
Digital
Circuit
CP/LF
Phase
Detector
VCDL
GLOBAL CLK
The output
clock
frequency of
DLL is the
same as the
input clock
and also zero
skew
CP/LF
Phase
Detector
This kind of configuration have become popular in today’s
high-performance microprocessors
12
EE141 Integrated Circuits2nd
© Digital
Timing Issues
Self-timed and Asynchronous
Design
Functions of clock in synchronous design
1) Acts as completion signal
2) Ensures the correct ordering of events
Truly asynchronous design
1) Completion is ensured by careful timing analysis
2) Ordering of events is implicit in logic
Self-timed design
1) Completion ensured by completion signal
2) Ordering imposed by handshaking protocol
EE141 Integrated
© Digital
Circuits2nd
13
Timing Issues
Synchronous Pipelined Datapath
R1
D Q
In
CLK
Logic
Block #1
tpd,reg
R2
D Q
tpd1
Logic
Block #2
R3
D Q
tpd2
Logic
Block #3
R4
D Q
tpd3
Possible problems:
Clock skew and jitter
Significant noise when all clock transitions
Logic delay at each stage is not the same
EE141 Integrated
© Digital
Circuits2nd
14
Timing Issues
Self-Timed Pipelined Datapath
Req
Req
HS
Ack
In
Done
F1
tpF1
EE141 Integrated
© Digital
HS
Ack
Start
R1
Req
Circuits2nd
Start
R2
Req
HS
Ack
Done
F2
tpF2
ACK
Start
R3
Done
F3
Out
tpF3
15
Timing Issues
Properties of Self-Timed Datapath
 In contrast to the global approach, timing signals are
generated locally, which avoids skew and jitter problems.
 Separating the physical and logical ordering functions
can result in a potential increase in performance. (the
completed data does not have to wait for arrival of the
next clock edge in order to proceed to the subsequent
processing stages)
 Block can be shut down to save power
 Self-time circuits are, by nature, robust regarding
variations in manufacturing and operating conditions.
Whereas synchronous design is limited by their
performance at the extremes of the operating conditions
EE141 Integrated
© Digital
Circuits2nd
16
Timing Issues
Completion Signal Generation
Using Redundant Signal Encoding
EE141 Integrated
© Digital
Circuits2nd
17
Timing Issues
Ratioed logic with Improved Loads
V DD
M1
V DD
M2
Out
Out
A
A
B
B
PDN1
PDN2
V SS
V SS
Differential Cascode Voltage Switch Logic (DCVSL)
EE141 Integrated
© Digital
Circuits2nd
18
Timing Issues
Completion Signal in DCVSL
VDD
VDD
B0
Start
Done
B1
B0
B1
In1
In1
In2
In2
PDN
PDN
Start
When start=0, B0,B1 are pre-charged low, thus done=0
When start=1, one of B0,B1 becomes high, thus done=1
EE141 Integrated
© Digital
Circuits2nd
19
Timing Issues
Self-Timed Adder
VDD
VDD
Start
C0
C0
P0
C1
G0
P1
C2
G1
P2
C3
G2
P3
Start
C4
C4
G3
Start
VDD
C4
C4
C3
C3
C2
C2
C1
C1
Order?
Start
Start
C0
C0
Done
P0
K0
C1
P1
K1
C2
P2
K2
Start
(a) Differential carry generation
EE141 Integrated Circuits2nd
© Digital
C3
P3
K3
C4
C4
(b) Completion signal
This safely assumes that
sum generation that
depends on the arrival of
the carry signal, is faster
than Done signal. 20
Timing Issues
Completion Signal Using Current Sensing
 Redundant coding approach allows tracking of the signal
statistics, but comes at expense of power dissipation since
every single gate must make a transition for each new input.
 Another way is to use critical path replica approach which
is configured as a delay element.
 The critical path replica should track the critical path of
the logic without any glitching transition.
 When the output of the delay line goes high/low, the logic
computation is complete.
 The advantage of this approach is that the logic can be
implemented as complementary logic style.
EE141 Integrated
© Digital
Circuits2nd
21
Timing Issues
Completion Signal Generation
LOGIC
In
Out
NETWORK
Start
DELAY MODULE
Done
Using Delay Element (e.g. in memories)
EE141 Integrated
© Digital
Circuits2nd
22
Timing Issues
Inputs
Start
Input Register
Completion Signal Using Current Sensing
VDD
Start
Output
Static CMOS Logic
tdelay
A
GNDsense
Current Sensor
toverlap
A
B
tMDG
Done
Done
Min Delay Generator
tpd-NOR
B
Output
valid
In series with logic, monitors the current
flowing through the logic.
0 when logic idle, 1 when logic switching
EE141 Integrated
© Digital
Circuits2nd
23
Timing Issues
Hand-Shaking Protocol
 Besides the generation of completion signals, a self-timed
approach also requires a hand-shaking protocol to order the
circuit events to avoid races.
 The four events:
data change,
request
data acceptance
acknowledge
proceed in a cyclic order (note: successive cycles may take
different amount of time)
 A two-phase protocol: active cycle of sender and active cycle
of receiver.
24
EE141 Integrated Circuits2nd
© Digital
Timing Issues
Hand-Shaking Protocol
Two Phase Handshake, active cycle of the sender and receiver
EE141 Integrated
© Digital
Circuits2nd
25
Timing Issues
Implementation
 The correct operation requires a strict ordering of
signaling events.
 Muller-C elements performs an AND operation on
events. It is a basic element in Self-timed circuit design.
EE141 Integrated
© Digital
Circuits2nd
26
Timing Issues
Event Logic – The Muller-C Element
A
F
C
B
(a) Schematic
VDD
A
A
B
S
R
(a) Logic
Q
A
B
Fn+1
0
0
1
1
0
1
0
1
0
Fn
Fn
1
(b) Truth table
VDD
VDD
B
F B
F
B
A
A
F
B
B
(b) Majority Function
EE141 Integrated
© Digital
Circuits2nd
(c) Dynamic
27
Timing Issues
2-Phase Handshake Protocol
Initially, Req, Ack and Data ready all set to 0. Then Data ready
goes to 1, Req 1. Then Data acceptance 1, Ack 1. Send data,
Data ready 0, so Req 0 finally. This completes one cycle.
EE141 Integrated
© Digital
Circuits2nd
28
Timing Issues
Example: Self-timed FIFO
Out
In
R1
En
R2
R3
Done
Reqi
Req0
C
C
C
Acko
Acki
All 1s or 0s -> pipeline empty
Alternating 1s and 0s -> pipeline full
EE141 Integrated
© Digital
Circuits2nd
29
Timing Issues
4-Phase Handshake Protocol
Also known as RTZ (Return to
Slower, but unambiguous Zero)
send data only when H-L
EE141 Integrated
© Digital
Circuits2nd
30
Timing Issues
4-Phase Handshake Protocol
Implementation using Muller-C elements
EE141 Integrated
© Digital
Circuits2nd
31
Timing Issues
Self-Resetting Logic
completion
detection
(L1)
Precharged
Logic Block
(L1)
completion
detection
(L2)
Precharged
Logic Block
(L2)
completion
detection
(L3)
Precharged
Logic Block
(L3)
Pipelined version of self-timed logic. Precharging L1 happens
when L2 received the data and does not need the output of
L1 anymore.
VDD
int
out
A
B
EE141 Integrated
© Digital
C
Circuits2nd
Post-charge logic:
it is possible
to precharge a block
based on completion
of its own output. But
inputs should be reset
32
Timing Issues
Clock-Delayed Domino
GND
CLK2 (to next stage)
CLK1
VDD
Q1 (also D2)
D1
Pulldown
Network
Used in IBM 1Ghz processor
No global clock: clock for the next stage is derived from that of the
previous stage
The two inverters and TG emulates the worst case delay of the
dynamic logic (can be controlled by sizing)
It can provide both inverting (remove Q1) and non-inverting logic
EE141 Integrated
© Digital
Circuits2nd
33
Timing Issues
Asynchronous-Synchronous Interface
fin
Synchronous system
Asynchronous
system
fCLK
Synchronization
Consider a computer. The internal operation of the computer is
synchronized by a global clock, but the keyboard input is
asynchronous.
A synchronous need to sample at regular intervals and check its
value. If the sampling rate is high enough, no transitions will be
missed.
But it might happen that the signal is in transition when being
sampled. This could result in undefined state, which must be resolved before interpreted further.
EE141 Integrated
© Digital
Circuits2nd
34
Timing Issues
Synchronizers and Arbiters

Problem: Circuit has to make a decision in
limited time - which decision is not important

Synchronizer: Circuit that implements a
decision-making regarding the interface is
called a synchronizer.

Caveat: Building a perfect synchronizer that
always delivers a legal answer is impossible!

Solution: But, we can decrease the error
probability at the expense of delay
EE141 Integrated
© Digital
Circuits2nd
35
Timing Issues
A Simple Synchronizer based on a
CLK
latch
int
D
I1
Q
I2
CLK
Suppose that the latch is sampling the input on the rising edge of the
clock.
There is possibility that the setup time and hold time is violated. The
output maybe undefined.
But finally it will go to 0 or 1 (two stable state for crossed-coupled
inverters)
EE141 Integrated
© Digital
Circuits2nd
36
Timing Issues
Synchronizer: Output Trajectories
Vout
2.0
1.0
0.0
0
100
200
300
time [ps]
Single-pole model for a flip-flop
EE141 Integrated
© Digital
Circuits2nd
37
Timing Issues
Positive Feedback: Bi-Stability
Vi2
V o1
1
o
V
1
o
V
52
i
V
V i1
V o2
A
V i 2 = V o1
C
1
o B
V
52V i 1 = V o2
i
V
When the gain of inverter in transient region is larger than 1,
A & B are the only stable operating points, C is metastable.
EE141 Integrated
© Digital
Circuits2nd
38
Timing Issues
Arbiters
Req1
Req2
Ack1
Arbiter
Req1
A
Ack2
B
Ack2
Ack1
(a) Schematic symbol
Req2
Req1
(b) Implementation
Req2
VT gap
A
B
metastable
Ack1
Ack2
EE141 Integrated
© Digital
Circuits2nd
(c) Timing diagram
t
When both request are
submitted, A, B are
temporarily metastable
and then drive to mutually
exclusive states
39
Timing Issues
PLL-Based Clock Synchronization
Chip 1
Chip 2
Data
Digital
System
Digital
System
fsystem = N x fcrystal
Divider
PLL
reference
clock
PLL
Clock
Buffer
fcrystal , 200<Mhz
Crystal
Oscillator
Crystal oscillator generate accurate signal less than
200Mhz. For Giga hz clock, PLL is need.
Also, PLL can be used to synchronize the two clocks.
EE141 Integrated
© Digital
Circuits2nd
40
Timing Issues