UNIT- III SEQUENTIAL LOGIC CIRCUITS
Download
Report
Transcript UNIT- III SEQUENTIAL LOGIC CIRCUITS
UNIT- III
SEQUENTIAL LOGIC CIRCUITS
Static Latches and Registers
The Bistability Principle:
• Static memories use positive feedback to create a
bistable circuit — a circuit having two stable states that
represent 0 and 1.
2 cascaded inverters
Static Latches and Registers
voltage transfer characteristics
Static Latches and Registers
• The resulting circuit has only three possible operation
points (A, B, and C), as demonstrated on the combined
VTC.
• Under the condition that the gain of the inverter in the
transient region is larger than 1, only A and B are stable
operation points, and C is a metastable operation point.
•
A bistable circuit has two stable states.
Static Latches and Registers
•
In absence of any triggering, the circuit remains in a
single state (assuming that the power supply remains
applied to the circuit), and hence remembers a value.
•
A trigger pulse must be applied to change the state of
the circuit.
• Another common name for a bistable circuit is flip-flop.
SR Flip-Flops
• The cross-coupled inverter pair provides an approach
to store a binary variable in a stable way.
• However, extra circuitry must be added to enable
control of the memory states.
NOR-based SR flip-flop
SR Flip-Flops
SR Flip-Flops
•
When both S and R are 0, the flip-flop is in a quiescent
state and both outputs retain their value.
• If a positive (or 1) pulse is applied to the S input, the Q
output is forced into the 1 state.
•
Vice versa, a 1 pulse on R resets the flip-flop and the Q
output goes to 0.
• The characteristic table is the truth table of the gate
and lists the output states as functions of all possible
input conditions.
SR Flip-Flops
•
Most systems operate in a synchronous fashion with
transition events referenced to a clock.
• One possible realization of a clocked SR flip-flop— a
level-sensitive positive latch.
• It consists of a cross-coupled inverter pair i , plus 4 extra
transistors to drive the flip-flop from one state to another
and to provide clocked operation.
SR Flip-Flops
SR Flip-Flops
•
The combination of transistorsM4 , M7 , and M8 forms
a ratioed inverter.
• In order to make the latch switch, we must succeed in
bringingQ below the switching threshold of the
inverterM1 -M2.
• Once this is achieved, the positive feedback causes the
flip-flop to invert states.
• The presented flip-flop does not consume any static
power.
Multiplexer Based Latches
• Multiplexer based latches can provide similar
functionality to the SR latch, but has the important added
advantage that the sizing of devices only affects
performance and is not critical to the functionality.
Multiplexer Based Latches
• For a negative latch, when the clock signal is low, the
input 0 of the multiplexer is selected, and the D input is
passed to the output.
• When the clock signal is high, the input 1 of the
multiplexer, which connects to the output of the latch, is
selected.
• The feedback holds the output stable while the clock
signal is high.
• Similarly in the positive latch, the D input is selected
when clock is high, and the output is held (using
feedback) when clock is low.
Multiplexer Based Latches
Transistor level implementation of
a positive latch built using transmission gates.
• When CLK is high, the bottom transmission gate ison
and the latch is transparent - that is, the D input is copied to
the Q output.
• During this phase, the feedback loop is open since the
top transmission gate is off.
Master-Slave Based Edge Triggered
Register
• The most common approach for constructing an edgetriggered register is to use a master-slave configuration.
• The register consists of cascading a negative latch
(master stage) with a positive latch (slave stage).
Master-Slave Based Edge Triggered
Register
• On the low phase of the clock, the master stage is
transparent and the D input is passed to the master
stage output, QM.
• During this period, the slave stage is in the hold mode,
keeping its previous value using feedback.
• On the rising edge of the clock, the master slave stops
sampling the input, and the slave stage starts sampling.
• During the high phase of the clock, the slave stage
samples the output of the master stage (QM), while the
master stage remains in a hold mode.
Master-Slave Based Edge Triggered
Register
• When clock is low (CLK = 1), T1 is on and T2 is off, and the D input is
sampled onto node QM.
• When the clock goes high, the master stage stops sampling the input
and goes into a hold mode.
Low-Voltage Static Latches
• The scaling of supply voltages is critical for low power
operation.
• Unfortunately, certain latch structures don’t function at
reduced supply voltages.
• Scaling to low supply voltages hence requires the use of
reduced threshold devices.
• When the registers are constantly accessed, the
leakage energy is typically insignificant compared to the
switching power.
• However, with the use of conditional clocks, it is
possible that registers are idle for extended periods and
the leakage energy expended by registers can be quite
significant.
Low-Voltage Static Latches
• Many solutions are being explored to address the
problem of high leakage during idle periods.
Dynamic Latches and Registers
• Storage in a static sequential circuit relies on the concept
that a cross-coupled inverter pair produces a bistable
element and can thus be used to memorize binary
values.
•
The major disadvantage of the static gate, however, is
its complexity.
• The principle is exactly identical to the one used in
dynamic logic — charge stored on a capacitor can be
used to represent a logic signal.
• The absence of charge denotes a 0, while its presence
stands for a stored 1.
Dynamic Transmission-Gate Based
Edge-triggred Registers
•When CLK = 0, the input data is sampled on storage node
1, which has an equivalent capacitance of C1 consisting of
the gate capacitance of I1 , the junction capacitance of T1 ,
and the overlap gate capacitance of T1 .
Dynamic Transmission-Gate Based
Edge-triggred Registers
• During this period, the slave stage is in a hold mode,
with node 2 in a high-impedance (floating) state.
• On the rising edge of clock, the transmission gate T2
turns on, and the value sampled on node 1 right before
the rising edge propagates to the output Q (note that
node 1 is stable during the high phase of the clock since
the first transmission gate is turned off).
• Node 2 now stores the inverted version of node 1.
•
This implementation of an edge-triggered register is
very efficient as it requires only 8 transistors.
2
C MOS Dynamic Register: A Clock Skew
Insensitive Approach
2
The C MOS Register
C MOS Dynamic Register: A Clock Skew
Insensitive Approach
------
• CLK = 0 (CLK = 1):
•
The first tri-state driver is turned on, and the master
stage acts as an inverter sampling the inverted version
of D on the internal node X.
•
The master stage is in the evaluation mode.
•
Meanwhile, the slave section is in a high-impedance
mode, or in a hold mode.
• The roles are reversed when CLK = 1.
True Single-Phase Clocked Register
(TSPCR)
• In the two-phase clocking schemes described above,
care must be taken in routing the two clock signals to
ensure that overlap is minimized.
• The True Single-Phase Clocked Register (TSPCR) uses
a single clock (without an inverse clock) .
True Single-Phase Clocked Register
(TSPCR)
•
For the positive latch, when CLK is high, the latch is in
the transparent mode and corresponds to two cascaded
inverters; the latch is non-inverting, and propagates the
input to the output.
• When CLK = 0, both inverters are disabled, and the latch
is in hold-mode.
• Only the pull-up networks are still active, while the pulldown circuits are deactivated.
• A register can be constructed by cascading positive and
negative latches.
True Single-Phase Clocked Register
(TSPCR)
• The main advantage is the use of a single clock phase.
• The disadvantage is the slight increase in the number of
transistors — 12 transistors are required.
• TSPC offers an additional advantage: the possibility of
embedding logic functionality into the latches.
•
This reduces the delay overhead associated withthe
latches.
True Single-Phase Clocked Register
(TSPCR)
• When CLK = 0, the input inverter is sampling the inverted D
input on node X.
• The second (dynamic) inverter is in the precharge mode.
• The third inverter is in the hold mode.
Pulse Registers
• A fundamentally different approach for constructing a
register uses pulse signals.
• The idea is to construct a short pulse around the rising
(or falling) edge of the clock.
• This pulse acts as the clock input to a latch, sampling
• the input only in a short window.
• Race conditions are thus avoided by keeping the
opening time (i.e, the transparent period) of the latch
very short.
• The combination of the glitch generation
• circuitry and the latch results in a positive edge-triggered
register.
Pulse Registers
Pulse Registers
• This in turn activates MN, pulling X and eventually CLKG
low.
• The length of the pulse is controlled by the delay of the
AND gate and the two inverters.
Pulse Registers
• The advantage of the approach is the reduced clock load
and the small number of transistors required.
• The glitch-generation circuitry can be amortized over
multiple register bits.
• The disadvantage is a substantial increase in verification
complexity.
• This has prevented a wide-spread use.
Sense-Amplifier Based Registers
• A sense amplifier structure to implement an edgetriggered register.
• Sense amplifier circuits accept small input signals and
amplify them to generate rail-to-rail swings.
• There are many techniques to construct these amplifiers,
with the use of feedback (e.g., cross-coupled inverters).
Sense-Amplifier Based Registers
Positive edge-triggered
register based on sense-amplifier
Sense-Amplifier Based Registers
• The circuit uses a precharged front-end amplifier that
samples the differential input signal on the rising edge of
the clock signal.
• The outputs of front-end are fed into a NAND crosscoupled SR FF that holds the data and gurantees that
the differential outputs switch only once per clock cycle.
• The differential inputs in this implementation don’t have
to have rail-to-rail swing and hence this register can be
used as a receiver for a reduced swing differential bus.
Pipelining: An approach to optimize
sequential circuits
• Pipelining is a popular design technique often used to
accelerate the operation of the datapaths in digital
processors.
• The goal of the presented circuit is to computelog(|a b|), where both a and b represent streams of numbers,
that is, the computation must be performed on a large
set of input values.
• The minimal clock period Tmin necessary to ensure
correct evaluation is given as:
Pipelining: An approach to optimize
sequential circuits
• Where tc-q and tsu are the propagation delay and the setup time of the register, respectively.
• The term tpd,logic stands for the worst-case delay path
through the combinatorial network, which consists of the
adder, absolute value, and logarithm functions.
• Pipelining is a technique to improve the resource
utilization, and increase the functional throughput.
Pipelining: An approach to optimize
sequential circuits
Pipelining: An approach to optimize
sequential circuits
• The advantage of pipelined operation becomes apparent
when examining the minimum clock period of the
modified circuit.
• The combinational circuit block has been partitioned into
three sections, each of which has a smallerp ropagation
delay than the original function.
• This effectively reduces the value of the minimum
allowable clock period:
Pipelining: An approach to optimize
sequential circuits
• Suppose that all logic blocks have approximately the
same propagation delay, and that the register overhead
is small with respect to the logic delays.
• The pipelined network outperforms the original circuit by
a factor of three under these assumptions, oTr min,pipe=
Tmin/3.
• The increased performance comes at the relatively small
cost of two additional registers, and an increased
latency.
Latch- vs. Register-Based Pipelines
• Pipelined circuits can be constructed using levelsensitive latches instead of edge-triggered registers.
• The pipeline system is implemented based on passtransistor-based positive and negative latches instead of
edge triggered registers.
• That is, logic is introduced between the master and slave
latches of a master-slave system.
• Latch-based systems give significantly more flexibility in
implementing a pipelined system, and often offers higher
performance.
Latch- vs. Register-Based Pipelines
Operation of two-phase
pipelined circuit using dynamic registers
NORA-CMOS— A Logic Style for
Pipelined Structures
• This topology has one important property:
• A
- based pipelined circuit is race-free as long as
all the logic functionsF (implemented using static logic)
between the latches are noninverting.
• The only way a signal can race from stage to stage
under this condition is when the logic functionF is
inverting where F is replaced by a single, static CMOS
inverter.
NORA-CMOS— A Logic Style for
Pipelined Structures
NORA-CMOS— A Logic Style for
Pipelined Structures
• Logic and latch are clocked in such a way that both are
simultaneously in either evaluation, or hold (precharge)
mode.
----• A block that is in evaluation during CLK = 1 is called a
CLK-module,
while the inverse is called a CLK-module.
----• A NORA datapath consists of a chain of alternating CLK
and CLK modules.
• While one class of modules is precharging with its output
latch in hold mode, preserving the previous output value,
the other class is evaluating.
Memory architecture
Semiconductor Memory
Classification
Read-Write Memory
Random
Access
Non-Random
Access
SRAM
FIFO
DRAM
LIFO
Shift Register
CAM
Non-Volatile
Read-Write
Memory
Read-Only Memory
EPROM
Mask-Programmed
E2PROM
Programmable (PROM)
FLASH
Memory Timing: Definitions
Memory Architecture:
Decoders
M bits
S0
S1
S2
words
N SN 2 2
SN 2 1
M bits
S0
Word 0
Word 1
Word 2
Storage
cell
Word 0
A0
Word 1
A1
Word 2
AK2 1
Word N 2 2
Storage
cell
Decoder
Word N 2 2
Word N 2 1
Word N 2 1
K 5 log2N
Input-Output
(M bits)
Intuitive architecture for N x M memory
Too many select signals:
N words == N select signals
Input-Output
(M bits)
Decoder reduces the number of select signals
K = log2N
Contents-Addressable Memory
Commands
I/O Buffers
Data (64 bits)
I/O Buffers
I/O Buffers
Priority Encoder
CAM Array
2 words 3 64 bits
9
29 Validity Bits
Control Logic R/W Address (9 bits)
Mask
Address Decoder
Commands
Commands
Comparand
92Validity Bits
Priority
BitsEncode
Address Decoder9 Validity
2 Priority Encod
Address Decoder
Memory Timing:
Approaches
DRAM Timing
Multiplexed Adressing
SRAM Timing
Self-timed
Read-Only Memory Cells
BL
BL
BL
VDD
WL
WL
WL
1
BL
WL
BL
BL
WL
WL
0
GND
Diode ROM
MOS ROM 1
MOS ROM 2
MOS OR ROM
BL [0]
BL [1]
BL [2]
BL [3]
WL [0]
V DD
WL [1]
WL [2]
V DD
WL [3]
V bias
Pull-down loads
MOS NOR ROM
V DD
Pull-up devices
WL [0]
GND
WL [1]
WL [2]
GND
WL [3]
BL [0]
BL [1]
BL [2]
BL [3]
MOS NAND ROM
V DD
Pull-up devices
BL [0]
BL [1]
BL [2]
BL [3]
WL [0]
WL [1]
WL [2]
WL [3]
All word lines high by default with exception of selected row
Equivalent Transient Model for MOS
NOR ROM
V DD
Model for NOR ROM
BL
r word
WL
cword
• Word line parasitics
– Wire capacitance and gate capacitance
– Wire resistance (polysilicon)
• Bit line parasitics
– Resistance not dominant (metal)
– Drain and Gate-Drain capacitance
Cbit
Equivalent Transient Model for MOS
NAND ROM
V DD
Model for NAND ROM
BL
CL
r bit
WL
r word
cbit
cword
Word line parasitics
Similar to NOR ROM
Bit line parasitics
Resistance of cascaded transistors dominates
Drain/Source and complete gate capacitance
Non-Volatile Memories
The Floating-gate transistor
(FAMOS)
Floating gate
Gate
Source
D
Drain
G
tox
tox
n+
p
n+_
S
Substrate
Device cross-section
Schematic symbol
Floating-Gate Transistor
Programming
20 V
10 V
5V
S
Avalanche injection
0V
20 V
D
25V
S
5V
0V
D
Removing programming
voltage leaves charge trapped
2 2.5 V
S
5V
D
Programming results in
higher V T .
Flash EEPROM
Control gate
Floating gate
erasure
n1 source
Thin tunneling oxide
programming
p-substrate
Many other options …
n1 drain
Basic Operations in a NOR Flash
Memory―Erase
Basic Operations in a NOR Flash
Memory―Write
Basic Operations in a NOR Flash
Memory―Read
NAND Flash Memory
Word line(poly)
Unit Cell
Source line
(Diff. Layer)
Courtesy Toshiba
Read-Write Memories (RAM)
STATIC (SRAM)
•
•
•
•
Data stored as long as supply is applied
Large (6 transistors/cell)
Fast
Differential
DYNAMIC (DRAM)
•
•
•
•
Periodic refresh required
Small (1-3 transistors/cell)
Slower
Single Ended
6-transistor CMOS SRAM Cell
WL
V DD
M2
M5
Q
M1
BL
M4
Q
M6
M3
BL
CMOS SRAM Analysis (Read)
WL
V DD
M4
BL
Q= 0
M5
V DD
Cbit
M1
Q= 1
V DD
BL
M6
V DD
Cbit
CMOS SRAM Analysis
(Write)
WL
V DD
M4
M5
Q= 1
M1
BL = 1
M6
Q= 0
V DD
BL = 0
3-Transistor DRAM Cell
BL 1
BL 2
WWL
WWL
RWL
M3
X
M1
CS
M2
RWL
V DD 2 V T
X
BL 1
BL 2
V DD
DV
V DD 2 V T
No constraints on device ratios
Reads are non-destructive
Value stored at node X when writing a “1” = V
WWL-VTn
1-Transistor DRAM Cell
Write: C S is charged or discharged by asserting WL and BL.
Read: Charge redistribution takes places between bit line and storage capacitance
CS
DV = VBL – V PRE = V BIT – V PRE -----------C S + CBL
Voltage swing is small; typically around 250 mV.
DRAM Cell Observations
• 1T DRAM requires a sense amplifier for each bit line, due to
charge redistribution read-out.
• DRAM memory cells are single ended in contrast to SRAM
cells.
•The read-out of the 1T DRAM cell is destructive; read and
refresh operations are necessary for correct operation.
• Unlike 3T cell, 1T cell requires presence of an extra
capacitance that must be explicitly included in the design.
• When writing a “1” into a DRAM cell, a threshold voltage is
lost. This charge loss can be circumvented by bootstrapping
the word lines to a higher value than VDD
Static CAM Memory Cell
Bit
Bit
Bit
Bit
Bit
Word
CAM
Word
•••
•••
CAM
M4
M8
M9
M6
M7
CAM
•••
•••
Bit
Word
CAM
M3
Match
Wired-NOR Match Line
S
int
S
M2
M1
M5
CAM in Cache Memory
CAM
SRAM
ARRAY
ARRAY
Hit Logic
Address Decoder
Input Drivers
Address
Tag
Sense Amps / Input Drivers
Hit
R/W
Data
Row Decoders
•Collection of 2M complex logic gates
•Organized in regular and dense fashion
(N)AND Decoder
NOR Decoder
Hierarchical Decoders
Multi-stage implementation improves performance
•••
WL 1
WL 0
A 0A 1 A 0A 1 A 0A 1 A 0A 1
A 2A 3 A 2A 3 A 2A 3 A 2A 3
•••
NAND decoder using
2-input pre-decoders
A1 A0
A0
A1
A3 A2
A2
A3
Dynamic Decoders
Precharge devices
GND
VDD
GND
WL 3
VDD
WL 3
WL 2
WL 2
VDD
WL 1
WL 1
V DD
WL 0
WL 0
VDD f
A0
A0
A1
2-input NOR decoder
A1
A0
A0
A1
A1
2-input NAND decoder
f
4-input pass-transistor based
column
decoder
BL
BL
BL
BL
0
A0
1
2
3
S0
S1
S2
A1
S3
2-input NOR decoder
D
• Advantages: speed (tpd does not add to overall memory
access time)
• Only one extra transistor in signal path
•Disadvantage: Large transistor count
4-to-1 tree based column
decoder
BL
BL
BL
BL
0
1
2
3
A0
A0
A1
A1
D
Number of devices drastically reduced
Delay increases quadratically with # of sections; prohibitive for large
decoders
Solutions ::buffers
progressive sizing
combination of tree and pass transistor approaches
Decoder for circular shiftregister
V DD
V DD
V DD
WL 0
R
V DD
V DD
V DD
WL 1
f
f
f
f
R
V DD
WL 2
f
f
f
f
R
f
f
f
f
• • •
Sense Amplifiers
×D V
C
tp = ---------------Iav
large
make D V as small
as possible
small
Idea: Use Sense Amplifer
small
transition
s.a.
input
output
Differential Sense Amplifier
V DD
M3
M4
y
bit
M1
SE
M2
Out
bit
M5
Directly applicable to
SRAMs
Differential Sensing ― SRAM
V DD
PC
V DD
BL
BL
EQ
V DD
y M3
WL i
M1
x
SE
V DD
M4
M2
2y
2x
2x
x
SE
M5
SE
SRAM cell i
Diff.
x Sense 2x
Amp
V DD
Output
y
SE
Output
(a) SRAM sensing scheme
(b) two stage differential amplifier
Latch-Based Sense Amplifier
(DRAM)
EQ
BL
BL
VDD
SE
SE
Initialized in its meta-stable point with EQ
Once adequate voltage gap created, sense amp enabled
with SE
Positive feedback quickly forces output to a stable
Sources of Power Dissipation in
Memories
V
DD
I DD 5 SCi DV i f 1S I DCP
CHIP
nCDE V INT f
m
selected mi act
CPTV INT f
I DCP
n
ROW
DEC
PERIPHERY
non-selected m(n 2 1)i hld
ARRAY
mC DE V INT f
COLUMN DEC
V SS
From [Itoh00]
Suppressing Leakage in
SRAM
V DD
V DD
low-threshold transistor
V DDL
sleep
V DD,int
sleep
V DD,int
SRAM
cell
SRAM
cell
sleep
Inserting Extra Resistance
SRAM
cell
SRAM
cell
SRAM
cell
V SS,int
Reducing the supply voltage
SRAM
cell
Clocking
• Synchronous systems use a clock to keep operations in
sequence
– Distinguish this from previous or next
– Determine speed at which machine operates
• Clock must be distributed to all the sequencing elements
– Flip-flops and latches
• Also distribute clock to other elements
– Domino circuits and memories
Clock Distribution
• On a small chip, the clock distribution network is just a
wire
– And possibly an inverter for clkb
• On practical chips, the RC delay of the wire resistance
and gate load is very long
– Variations in this delay cause clock to get to different
elements at different times
– This is called clock skew
• Most chips use repeaters to buffer the clock and
equalize the delay
– Reduces but doesn’t eliminate skew
Review: Skew Impact
Q1
D2
Tc
clk
tpcq
Q1
D2
F1
t pd Tc t pcq tsetup tskew
Q1
CL
D2
F2
clk
sequencing overhead
tskew
clk
thold
Q1 tccq
D2
tskew
tpdq
clk
tcd thold tccq tskew
Combinational Logic
F2
clk
F1
• Ideally full cycle is
available for work
• Skew adds sequencing
overhead
• Increases hold time too
clk
tcd
tsetup
• Reduce clock skew
– Careful clock distribution network design
– Plenty of metal wiring resources
• Analyze clock skew
– Only budget actual, not worst case skews
– Local vs. global skew budgets
• Tolerate clock skew
– Choose circuit structures insensitive to skew
Skew Tolerance
• Flip-flops are sensitive to skew because of hard edges
– Data launches at latest rising edge of clock
– Must setup before earliest next rising edge of clock
– Overhead would shrink if we can soften edge
• Latches tolerate moderate amounts of skew
– Data can arrive anytime latch is transparent
Skew: Latches
pdq
sequencing overhead
tcd 1 , tcd 2 thold tccq tnonoverlap tskew
f1
T
c tsetup tnonoverlap tskew
2
f2
tborrow
Pulsed Latches
t pd Tc max t pdq , t pcq tsetup t pw tskew
sequencing overhead
tcd thold t pw tccq tskew
tborrow t pw tsetup tskew
Q1 Combinational
Logic 1
D2
f1
Q2 Combinational
Logic 2
D3
L3
2t
f2
L2
t pd Tc
D1
L1
f1
2-Phase Latches
Q3
Dynamic Circuit Review
• Static circuits are slow because fat pMOS load input
• Dynamic gates use precharge to remove pMOS
transistors from the inputs
– Precharge:
f = 0 output forced high
– Evaluate:
f = 1 output may pull low
A
B
f
C
D
A
B
Y
C
D
A
Y
B
C
D
Domino Circuits
• Dynamic inputs must monotonically rise during
evaluation
– Place inverting stage between each dynamic gate
– Dynamic / static pair called domino gate
• Domino gates can be safely cascaded
domino AND
W
X
A
B
f
dynamic static
NAND inverter
Clock Skew
• Skew increases sequencing overhead
– Traditional domino has hard edges
– Evaluate at latest rising edge
– Setup at latch by earliest falling edge
clk
Latch
Dynamic
clk clk
Static
Dynamic
Dynamic
clk
Static
clk
Latch
clk clk
Dynamic
Static
Dynamic
clk
Static
clk
Dynamic
t pd Tc 2tsetup 2tskew
clk
tsetup tskew
Time Borrowing
• Logic may not exactly fit half-cycle
– No flexibility to borrow time to balance logic between
half cycles
• Traditional domino sequencing overhead is about 25% of
cycle time in fast systems!
clk
Latch
clk
Static
clk
Dynamic
clk
Static
clk
Dynamic
Static
Dynamic
clk
Static
Dynamic
clk
Latch
clk
tsetup tskew
Skew-Tolerant Domino
• Use overlapping clocks to eliminate latches at phase
boundaries.
– Second phase evaluates using results of first
No latch at
phase boundary
b
c
f1
f1
f2
f2
a
a
b
b
c
c
Static
a
Dynamic
f2
Static
Dynamic
f1
d
Full Keeper
• After second phase evaluates, first phase precharges
• Input to second phase falls
– Violates monotonicity?
• But we no longer need the value
• Now the second gate has a floating output
– Need full keeper to hold it either high or low
f
H
X
f
weak full
keeper
transistors
Time Borrowing
• Overlap can be used to
– Tolerate clock skew
– Permit time borrowing
• No sequencing overhead
toverlap
tborrow tskew
f1
Phase 1
Phase 2
Static
Dynamic
f2
Static
Dynamic
f2
Static
Dynamic
f2
Static
Dynamic
f1
Static
Dynamic
f1
Static
Dynamic
f1
Static
Dynamic
f1
Static
f1
Dynamic
t pd Tc
f2
Multiple Phases
• With more clock phases, each phase overlaps more
– Permits more skew tolerance and time borrowing
f1
f2
f3
f4
Phase 1
Phase 2
Phase 3
Phase 4
Static
Dynamic
f4
Static
Dynamic
f4
Static
Dynamic
f3
Static
Dynamic
f3
Static
Dynamic
f2
Static
Dynamic
f2
Static
Dynamic
f1
Static
Dynamic
f1
Clock Generation
en clk
f1
f2
f3
f4
Timing issues
• Set up and hold time:
• Every flip-flop has restrictive time regions around the
active clock edge in which input should not change
• We call them restrictive because any change in the input
in this regions the output may be the expected one
• It may be derived from either the old input, the new input,
or even in between the two.
Timing issues
• The setup time is the interval before the clock where the
data must be held stable.
• The hold time is the interval after the clock where the
data must be held stable.
• Hold time can be negative, which means the data can
change slightly before the clock edge and still be
properly captured.
• Most of the current day flip-flops has zero or negative
hold time.
Timing issues
Timing issues
• To avoid setup time violations:
•The combinational logic between the flip-flops should be
optimized to get minimum delay.
• Redesign the flip-flops to get lesser setup time.
• Tweak launch flip-flop to have better slew at the clock pin,
this will make launch flip-flop to be fast there by helping
fixing setup violations.
• Play with clock skew (useful skews).
•To avoid hold time violations:
• By adding delays (using buffers).
•One can add lockup-latches (in cases where the hold time
requirement is very huge, basically to avoid data slip).