Transcript ppt slides

Recap: Lecture 4
Logic Implementation Styles:
 Static CMOS logic
 Dynamic logic, or “domino” logic
 Transmission gates, or “pass-transistor” logic
1
Static CMOS logic
Advantages:
 output always strongly driven
 pull-up and pull-down networks are fully-complementary;
exactly one of them is “on” always
 good immunity from noise and leakage
 both inverting and non-inverting functions implementable
 each gate is inverting
 cascade two gates together to get non-inverting logic
Disadvantages:
 slow/big PMOS devices needed (in addition to NMOS)
 greater chip area
 higher power consumption
 slower switching speed
2
Dynamic Logic, or “domino”
Key idea:
 only use NMOS’s to compute function
 use a single PMOS to reset
Advantages:
 significantly fewer transistors  smaller chip area
 higher speed, lower power
 less “loading” on wires (drive fewer transistors)
 for async: no storage elements needed
Disadvantages:
 need extra control input to precharge
 logic is typically non-inverting only
 more vulnerable to noise and leakage effects
3
Dynamic Logic, or “domino” (contd.)
Gate has 2 phases:
 precharge (=reset): output reset to ‘0’
 evaluate: output computed  either stays ‘0’, or switches to ‘1’
control input
PC
data
inputs
pull-up
network
pull-down
network
controls
“precharge”
data
output
controls
“evaluation”
PC =0 (asserted)
 precharge
PC =1 (de-asserted)
 evaluate
Pull-up and pull-down must never both be simultaneously active:
 ensure that data inputs are reset while gate is precharging
 or, add a “footer” device
4
Transmission Gates
Key Idea:
 transistors used in a different configuration
 when switched on: instead of connecting output to Vdd or
Gnd, they connect output to the input
Advantage:
 very efficient for implementing switches and multiplexors
Disadvantage:
 not very useful for logic functions
5
Lecture 5:
A Classic Dynamic Pipeline
Williams and Horowitz’s PS0 pipeline:
 Structure
 Operation
 Performance
6
A Classic Approach: PS0 Pipeline
Williams/Horowitz (Stanford U.) [1986-91]:
 successfully used in fabricated chips [Stanford ’87] [HAL ’90s]
Stage 2
Stage 1
Stage 3
ack
Data
in
data
Processing
Block
Data
out
Completion
Detector
Implemented using “dynamic logic”
7
PS0 Pipeline Stage
A PS0 stage consists of dynamic gates and a
completion detector:
ack
PC
data
inputs
Completion
Detector
“keeper”
Pull-down
network
Processing Block
data
outputs
8
Dual-Rail Completion Detector
 Combines dual-rail signals
 Indicates when all bits are valid (or reset)
C-element:
if all inputs=1, output  1
if all inputs=0, output  0
bit0
OR
bit1
OR
bitn
OR
else, maintain output value
C
Done
 OR together 2 rails per bit
 Merge results using “C-element”
9
PS0 Protocol
 PRECHARGE N: when N+1 completes evaluation
 delete data: after next stage has copied it
 EVALUATE N:
when N+1 completes precharging
 accept new data: after next stage is emptied
indicates “done”
6
3
N+1 5
N
1
evaluates
indicates “done”
2
precharges
evaluates
4
N+2
3
evaluates
Complete
cycle:
6 events3
Evaluate
events3 events
Precharge
Precharge:
Evaluate:
another
10
PS0 Performance
6
4
5
2
1
Cycle Time =
3
3 TEVAL  TPRECH  2 TDETECT
TE VA L 
Evaluation Time
TP RE CH 
Precharge Time
TDE TECT  Completion Detection Time
11
Summary: PSO Pipelining
Datapaths are latch-free:
 dynamic gates themselves provide implicit latches
+: chip area savings
+: extremely low latency
Data items kept separate by control
 stage deletes data: only after next stage has copied it
 stage accepts new data: only if next stage is empty
 distinct data items always separated by “spacers”
Control is extremely simple: each controller = single wire
 completion detector directly controls previous stage
+: chip area savings
+: low control overhead
12
Drawbacks of PSO Pipelining
1.
Poor throughput:
long cycle time: 6 events per cycle
 data “tokens” are forced far apart in time

2.
Limited storage capacity:

max only 50% of stages can hold distinct tokens

data tokens must be separated by at least one spacer
13
Comparison to a Clocked Pipeline
How would you design the pipeline if you actually had a clock?
1. Replace handshaking with “magic clocking”
each stage gets its own clock
 successive clocks are slightly skewed

 essentially, clocked simulation of asynchronous handshaking!
– need multiple clock phases!
Ck
Ck’
latch
2. Use a single clock, but insert latches between stages
latches are simple, level-sensitive
 consecutive stages receive complementary clock signals

14
Comparison … (contd.)
Cycle Times?
15