Transcript ppt slides
Recap: Lecture 4
Logic Implementation Styles:
Static CMOS logic
Dynamic logic, or “domino” logic
Transmission gates, or “pass-transistor” logic
1
Static CMOS logic
Advantages:
output always strongly driven
pull-up and pull-down networks are fully-complementary;
exactly one of them is “on” always
good immunity from noise and leakage
both inverting and non-inverting functions implementable
each gate is inverting
cascade two gates together to get non-inverting logic
Disadvantages:
slow/big PMOS devices needed (in addition to NMOS)
greater chip area
higher power consumption
slower switching speed
2
Dynamic Logic, or “domino”
Key idea:
only use NMOS’s to compute function
use a single PMOS to reset
Advantages:
significantly fewer transistors smaller chip area
higher speed, lower power
less “loading” on wires (drive fewer transistors)
for async: no storage elements needed
Disadvantages:
need extra control input to precharge
logic is typically non-inverting only
more vulnerable to noise and leakage effects
3
Dynamic Logic, or “domino” (contd.)
Gate has 2 phases:
precharge (=reset): output reset to ‘0’
evaluate: output computed either stays ‘0’, or switches to ‘1’
control input
PC
data
inputs
pull-up
network
pull-down
network
controls
“precharge”
data
output
controls
“evaluation”
PC =0 (asserted)
precharge
PC =1 (de-asserted)
evaluate
Pull-up and pull-down must never both be simultaneously active:
ensure that data inputs are reset while gate is precharging
or, add a “footer” device
4
Transmission Gates
Key Idea:
transistors used in a different configuration
when switched on: instead of connecting output to Vdd or
Gnd, they connect output to the input
Advantage:
very efficient for implementing switches and multiplexors
Disadvantage:
not very useful for logic functions
5
Lecture 5:
A Classic Dynamic Pipeline
Williams and Horowitz’s PS0 pipeline:
Structure
Operation
Performance
6
A Classic Approach: PS0 Pipeline
Williams/Horowitz (Stanford U.) [1986-91]:
successfully used in fabricated chips [Stanford ’87] [HAL ’90s]
Stage 2
Stage 1
Stage 3
ack
Data
in
data
Processing
Block
Data
out
Completion
Detector
Implemented using “dynamic logic”
7
PS0 Pipeline Stage
A PS0 stage consists of dynamic gates and a
completion detector:
ack
PC
data
inputs
Completion
Detector
“keeper”
Pull-down
network
Processing Block
data
outputs
8
Dual-Rail Completion Detector
Combines dual-rail signals
Indicates when all bits are valid (or reset)
C-element:
if all inputs=1, output 1
if all inputs=0, output 0
bit0
OR
bit1
OR
bitn
OR
else, maintain output value
C
Done
OR together 2 rails per bit
Merge results using “C-element”
9
PS0 Protocol
PRECHARGE N: when N+1 completes evaluation
delete data: after next stage has copied it
EVALUATE N:
when N+1 completes precharging
accept new data: after next stage is emptied
indicates “done”
6
3
N+1 5
N
1
evaluates
indicates “done”
2
precharges
evaluates
4
N+2
3
evaluates
Complete
cycle:
6 events3
Evaluate
events3 events
Precharge
Precharge:
Evaluate:
another
10
PS0 Performance
6
4
5
2
1
Cycle Time =
3
3 TEVAL TPRECH 2 TDETECT
TE VA L
Evaluation Time
TP RE CH
Precharge Time
TDE TECT Completion Detection Time
11
Summary: PSO Pipelining
Datapaths are latch-free:
dynamic gates themselves provide implicit latches
+: chip area savings
+: extremely low latency
Data items kept separate by control
stage deletes data: only after next stage has copied it
stage accepts new data: only if next stage is empty
distinct data items always separated by “spacers”
Control is extremely simple: each controller = single wire
completion detector directly controls previous stage
+: chip area savings
+: low control overhead
12
Drawbacks of PSO Pipelining
1.
Poor throughput:
long cycle time: 6 events per cycle
data “tokens” are forced far apart in time
2.
Limited storage capacity:
max only 50% of stages can hold distinct tokens
data tokens must be separated by at least one spacer
13
Comparison to a Clocked Pipeline
How would you design the pipeline if you actually had a clock?
1. Replace handshaking with “magic clocking”
each stage gets its own clock
successive clocks are slightly skewed
essentially, clocked simulation of asynchronous handshaking!
– need multiple clock phases!
Ck
Ck’
latch
2. Use a single clock, but insert latches between stages
latches are simple, level-sensitive
consecutive stages receive complementary clock signals
14
Comparison … (contd.)
Cycle Times?
15