Logic Families and Williams` Pipeline
Download
Report
Transcript Logic Families and Williams` Pipeline
Clockless Computing
Montek Singh
Thu, Sep 6, 2007
Review: Logic Gate Families
A classic asynchronous pipeline by Williams
1
Review:
Logic Gate Families
Static CMOS logic (“standard”)
Transmission gates, or “pass-transistor” logic
Dynamic logic, or “domino” logic
2
Static CMOS logic: Summary
Advantages:
output always strongly driven
pull-up and pull-down networks are fully-complementary;
always exactly one of them is “on”
good immunity from noise and leakage
both inverting and non-inverting functions implementable
each gate is inverting
cascade two gates together to get non-inverting logic
Disadvantages:
slow/big PMOS devices needed (in addition to NMOS)
greater chip area
higher power consumption
slower switching speed
3
Complementary CMOS
Complementary CMOS logic gates
– nMOS pull-down network
– pMOS pull-up network
inputs
– a.k.a. static CMOS
Pull-up OFF
Pull-up ON
Pull-down OFF Z (float)
1
Pull-down ON
X (crowbar)
OPTIONAL
MATERIAL
0
Credit: David Harris, Harvey Mudd College
pMOS
pull-up
network
output
nMOS
pull-down
network
4
Series and Parallel
nMOS: 1 = ON
pMOS: 0 = ON
Series: both must be ON
Parallel: either can be ON
a
a
0
g1
g2
(a)
b
OFF
OFF
ON
a
a
a
a
0
1
1
0
1
b
b
b
ON
OFF
OFF
OFF
a
a
a
a
0
0
1
1
0
1
1
b
b
b
b
OFF
ON
ON
ON
a
a
a
a
a
g2
1
b
0
b
g1
1
OFF
a
(c)
0
b
(b)
g2
1
b
0
g1
1
1
0
g2
a
b
a
g1
a
0
0
b
b
0
0
b
(d)
OPTIONAL
MATERIAL
a
Credit: David Harris, Harvey Mudd College
0
1
1
0
1
1
b
b
b
b
ON
ON
ON
OFF
5
CMOS Gate Design
Activity:
– Sketch a 4-input CMOS NOR gate
A
B
C
D
Y
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
6
CMOS Gate Design
Activity:
– Sketch a 4-input CMOS NAND gate
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
7
Conduction Complement
Complementary CMOS gates always produce 0 or 1
Ex: NAND gate
– Series nMOS: Y=0 when both inputs are 1
– Thus Y=1 when either input is 0
Y
– Requires parallel pMOS
A
B
Rule of Conduction Complements
– Pull-up network is complement of pull-down
– Parallel -> series, series -> parallel
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
8
Compound Gates
Compound gates can do any inverting function
Ex: Y A B C D (AND-AND-OR-INVERT, AOI22)
A
C
A
C
B
D
B
D
(a)
A
(b)
B C
D
(c)
C
D
A
B
(d)
C
D
A
B
A
B
C
D
Y
A
C
B
D
Y
(f)
(e)
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
9
Transmission (“Pass”) Gates
Key Idea:
transistors used in a different configuration
when switched on: instead of connecting output to Vdd or
Gnd, they connect output to the input
Advantage:
very efficient for implementing switches and multiplexers
Disadvantage:
signal degradation unless both NFET and PFET passgates are
used in a complementary configuration
10
Pass Transistors
Transistors can be used as switches
g
s
d
g
s
OPTIONAL
MATERIAL
d
Credit: David Harris, Harvey Mudd College
11
Pass Transistors
Transistors can be used as switches
g=0
g
s
d
s
d
Input g = 1 Output
0
strong 0
g=1
s
d
g=0
g
s
s
g=1
Input
d
d
g=1
s
OPTIONAL
MATERIAL
1
d
degraded 1
g=0
0
Output
degraded 0
g=0
Credit: David Harris, Harvey Mudd College
strong 1
12
Transmission Gates
Single pass transistors produce degraded outputs
– pMOS good only for transmitting “1”
– nMOS good only for transmitting “0”
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
13
Transmission Gates
Single pass transistors produce degraded outputs
Complementary Transmission gates pass both 0 and
1 well
Input
g
a
b
gb
a
b
gb
OPTIONAL
MATERIAL
g = 0, gb = 1
a
b
g = 1, gb = 0
0
strong 0
g = 1, gb = 0
a
b
g = 1, gb = 0
strong 1
1
g
g
a
g
b
gb
Output
a
b
gb
Credit: David Harris, Harvey Mudd College
14
Multiplexers
2:1 multiplexer chooses between two inputs
S
S
D1
D0
Y
0
X
0
0
0
X
1
1
1
0
X
0
1
1
X
1
OPTIONAL
MATERIAL
D0
0
Y
D1
Credit: David Harris, Harvey Mudd College
1
15
Transmission Gate Mux
Nonrestoring mux uses two transmission gates
– Only 4 transistors
S
D0
Y
S
D1
S
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
16
Gate-Level Mux Design
Y SD1 SD0 (too many transistors)
How many transistors are needed? 20
D1
S
D0
D1
S
D0
OPTIONAL
MATERIAL
Y
4
2
4
2
4
2
Y
2
Credit: David Harris, Harvey Mudd College
17
Dynamic Logic, or “domino”
Key idea:
only use NMOS’s to compute function
use a single PMOS to reset
Advantages:
significantly fewer transistors smaller chip area
higher speed, lower power
less “loading” on wires (drive fewer transistors)
for async: no storage elements needed
Disadvantages:
need extra control input to precharge
logic is typically non-inverting only
more vulnerable to noise and leakage effects
18
Dynamic Logic, or “domino” (contd.)
Gate has 2 phases:
precharge (=reset): output reset to ‘0’
evaluate: output computed either stays ‘0’, or switches to ‘1’
control input
PC
data
inputs
pull-up
network
pull-down
network
controls
“precharge”
data
output
controls
“evaluation”
PC =0 (asserted)
precharge
PC =1 (de-asserted)
evaluate
Pull-up and pull-down must never both be simultaneously active:
ensure that data inputs are reset while gate is precharging
or, add a “footer” device
19
Outline: Several Pipeline Styles
Classic static logic pipeline: Sutherland
Recent static logic pipeline: MOUSETRAP
Classic dynamic logic pipeline: Williams/Horowitz’
PS0
20
A Classic Asynchronous
Dynamic Pipeline
Williams and Horowitz’s PS0 pipeline:
Structure
Operation
Performance
21
A Classic Approach: PS0 Pipeline
Williams/Horowitz (Stanford U.) [1986-91]:
successfully used in fabricated chips [Stanford ’87] [HAL ’90s]
Stage 2
Stage 1
Stage 3
ack
Data
in
data
Processing
Block
Data
out
Completion
Detector
Implemented using “dynamic logic”
22
PS0 Pipeline Stage
A PS0 stage consists of dynamic gates and a
completion detector:
ack
PC
data
inputs
Completion
Detector
“keeper”
Pull-down
network
Processing Block
data
outputs
23
Dual-Rail Completion Detector
Combines dual-rail signals
Indicates when all bits are valid (or reset)
C-element:
if all inputs=1, output 1
if all inputs=0, output 0
bit0
OR
bit1
OR
bitn
OR
else, maintain output value
C
Done
OR together 2 rails per bit
Merge results using “C-element”
24
PS0 Protocol
PRECHARGE N: when N+1 completes evaluation
delete data: after next stage has copied it
EVALUATE N:
when N+1 completes precharging
accept new data: after next stage is emptied
indicates “done”
6
3
N+1 5
N
1
evaluates
indicates “done”
2
precharges
evaluates
4
N+2
3
evaluates
Complete
cycle:
6 events3
Evaluate
events3 events
Precharge
Precharge:
Evaluate:
another
25
PS0 Performance
6
4
5
2
1
Cycle Time =
3
3 TEVAL TPRECH 2 TDETECT
TE VA L
Evaluation Time
TP RE CH
Precharge Time
TDE TECT Completion Detection Time
26
Summary: PS0 Pipelining
Datapaths are latch-free:
dynamic gates themselves provide implicit latches
+: chip area savings
+: extremely low latency
Data items kept separate by control
stage deletes data: only after next stage has copied it
stage accepts new data: only if next stage is empty
distinct data items always separated by “spacers”
Control is extremely simple: each controller = single wire
completion detector directly controls previous stage
+: chip area savings
+: low control overhead
27
Comparison to a Clocked Pipeline
How would you design the pipeline if you actually had a clock?
1. Replace handshaking with “magic clocking”
each stage gets its own clock
successive clocks are slightly skewed
essentially, clocked simulation of asynchronous handshaking!
– need multiple clock phases!
Ck
Ck’
latch
2. Use a single clock, but insert latches between stages
latches are simple, level-sensitive
consecutive stages receive complementary clock signals
28
Drawbacks of PS0 Pipelining
1.
Poor throughput:
long cycle time: 6 events per cycle
data “tokens” are forced far apart in time
2.
Limited storage capacity:
max only 50% of stages can hold distinct tokens
data tokens must be separated by at least one spacer
My Research Goals have been: address both issues
still maintain very low latency
29
Homework #4 (due Tue Sep 18)
1. Enumerate ALL of the timing assumptions inherent
in Williams’ PS0 style
Assume all gate and wire delays can be arbitrary
For which scenarios can there be a malfunction?
2. Compare the cycle times of PS0 with an ideal
clocked dynamic pipeline (slide #28)
30