Logic Families and Williams` Pipeline

Download Report

Transcript Logic Families and Williams` Pipeline

Clockless Computing
Montek Singh
Thu, Sep 6, 2007
 Review: Logic Gate Families
 A classic asynchronous pipeline by Williams
1
Review:
Logic Gate Families
 Static CMOS logic (“standard”)
 Transmission gates, or “pass-transistor” logic
 Dynamic logic, or “domino” logic
2
Static CMOS logic: Summary
Advantages:
 output always strongly driven
 pull-up and pull-down networks are fully-complementary;
always exactly one of them is “on”
 good immunity from noise and leakage
 both inverting and non-inverting functions implementable
 each gate is inverting
 cascade two gates together to get non-inverting logic
Disadvantages:
 slow/big PMOS devices needed (in addition to NMOS)
 greater chip area
 higher power consumption
 slower switching speed
3
Complementary CMOS
 Complementary CMOS logic gates
– nMOS pull-down network
– pMOS pull-up network
inputs
– a.k.a. static CMOS
Pull-up OFF
Pull-up ON
Pull-down OFF Z (float)
1
Pull-down ON
X (crowbar)
OPTIONAL
MATERIAL
0
Credit: David Harris, Harvey Mudd College
pMOS
pull-up
network
output
nMOS
pull-down
network
4
Series and Parallel




nMOS: 1 = ON
pMOS: 0 = ON
Series: both must be ON
Parallel: either can be ON
a
a
0
g1
g2
(a)
b
OFF
OFF
ON
a
a
a
a
0
1
1
0
1
b
b
b
ON
OFF
OFF
OFF
a
a
a
a
0
0
1
1
0
1
1
b
b
b
b
OFF
ON
ON
ON
a
a
a
a
a
g2
1
b
0
b
g1
1
OFF
a
(c)
0
b
(b)
g2
1
b
0
g1
1
1
0
g2
a
b
a
g1
a
0
0
b
b
0
0
b
(d)
OPTIONAL
MATERIAL
a
Credit: David Harris, Harvey Mudd College
0
1
1
0
1
1
b
b
b
b
ON
ON
ON
OFF
5
CMOS Gate Design
 Activity:
– Sketch a 4-input CMOS NOR gate
A
B
C
D
Y
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
6
CMOS Gate Design
 Activity:
– Sketch a 4-input CMOS NAND gate
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
7
Conduction Complement
 Complementary CMOS gates always produce 0 or 1
 Ex: NAND gate
– Series nMOS: Y=0 when both inputs are 1
– Thus Y=1 when either input is 0
Y
– Requires parallel pMOS
A
B
 Rule of Conduction Complements
– Pull-up network is complement of pull-down
– Parallel -> series, series -> parallel
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
8
Compound Gates
 Compound gates can do any inverting function
 Ex: Y  A B  C D (AND-AND-OR-INVERT, AOI22)
A
C
A
C
B
D
B
D
(a)
A
(b)
B C
D
(c)
C
D
A
B
(d)
C
D
A
B
A
B
C
D
Y
A
C
B
D
Y
(f)
(e)
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
9
Transmission (“Pass”) Gates
Key Idea:
 transistors used in a different configuration
 when switched on: instead of connecting output to Vdd or
Gnd, they connect output to the input
Advantage:
 very efficient for implementing switches and multiplexers
Disadvantage:
 signal degradation unless both NFET and PFET passgates are
used in a complementary configuration
10
Pass Transistors
 Transistors can be used as switches
g
s
d
g
s
OPTIONAL
MATERIAL
d
Credit: David Harris, Harvey Mudd College
11
Pass Transistors
 Transistors can be used as switches
g=0
g
s
d
s
d
Input g = 1 Output
0
strong 0
g=1
s
d
g=0
g
s
s
g=1
Input
d
d
g=1
s
OPTIONAL
MATERIAL
1
d
degraded 1
g=0
0
Output
degraded 0
g=0
Credit: David Harris, Harvey Mudd College
strong 1
12
Transmission Gates
 Single pass transistors produce degraded outputs
– pMOS good only for transmitting “1”
– nMOS good only for transmitting “0”
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
13
Transmission Gates
 Single pass transistors produce degraded outputs
 Complementary Transmission gates pass both 0 and
1 well
Input
g
a
b
gb
a
b
gb
OPTIONAL
MATERIAL
g = 0, gb = 1
a
b
g = 1, gb = 0
0
strong 0
g = 1, gb = 0
a
b
g = 1, gb = 0
strong 1
1
g
g
a
g
b
gb
Output
a
b
gb
Credit: David Harris, Harvey Mudd College
14
Multiplexers
 2:1 multiplexer chooses between two inputs
S
S
D1
D0
Y
0
X
0
0
0
X
1
1
1
0
X
0
1
1
X
1
OPTIONAL
MATERIAL
D0
0
Y
D1
Credit: David Harris, Harvey Mudd College
1
15
Transmission Gate Mux
 Nonrestoring mux uses two transmission gates
– Only 4 transistors
S
D0
Y
S
D1
S
OPTIONAL
MATERIAL
Credit: David Harris, Harvey Mudd College
16
Gate-Level Mux Design
 Y  SD1  SD0 (too many transistors)
 How many transistors are needed? 20
D1
S
D0
D1
S
D0
OPTIONAL
MATERIAL
Y
4
2
4
2
4
2
Y
2
Credit: David Harris, Harvey Mudd College
17
Dynamic Logic, or “domino”
Key idea:
 only use NMOS’s to compute function
 use a single PMOS to reset
Advantages:
 significantly fewer transistors  smaller chip area
 higher speed, lower power
 less “loading” on wires (drive fewer transistors)
 for async: no storage elements needed
Disadvantages:
 need extra control input to precharge
 logic is typically non-inverting only
 more vulnerable to noise and leakage effects
18
Dynamic Logic, or “domino” (contd.)
Gate has 2 phases:
 precharge (=reset): output reset to ‘0’
 evaluate: output computed  either stays ‘0’, or switches to ‘1’
control input
PC
data
inputs
pull-up
network
pull-down
network
controls
“precharge”
data
output
controls
“evaluation”
PC =0 (asserted)
 precharge
PC =1 (de-asserted)
 evaluate
Pull-up and pull-down must never both be simultaneously active:
 ensure that data inputs are reset while gate is precharging
 or, add a “footer” device
19
Outline: Several Pipeline Styles
 Classic static logic pipeline: Sutherland
 Recent static logic pipeline: MOUSETRAP
 Classic dynamic logic pipeline: Williams/Horowitz’
PS0
20
A Classic Asynchronous
Dynamic Pipeline
Williams and Horowitz’s PS0 pipeline:
 Structure
 Operation
 Performance
21
A Classic Approach: PS0 Pipeline
Williams/Horowitz (Stanford U.) [1986-91]:
 successfully used in fabricated chips [Stanford ’87] [HAL ’90s]
Stage 2
Stage 1
Stage 3
ack
Data
in
data
Processing
Block
Data
out
Completion
Detector
Implemented using “dynamic logic”
22
PS0 Pipeline Stage
A PS0 stage consists of dynamic gates and a
completion detector:
ack
PC
data
inputs
Completion
Detector
“keeper”
Pull-down
network
Processing Block
data
outputs
23
Dual-Rail Completion Detector
 Combines dual-rail signals
 Indicates when all bits are valid (or reset)
C-element:
if all inputs=1, output  1
if all inputs=0, output  0
bit0
OR
bit1
OR
bitn
OR
else, maintain output value
C
Done
 OR together 2 rails per bit
 Merge results using “C-element”
24
PS0 Protocol
 PRECHARGE N: when N+1 completes evaluation
 delete data: after next stage has copied it
 EVALUATE N:
when N+1 completes precharging
 accept new data: after next stage is emptied
indicates “done”
6
3
N+1 5
N
1
evaluates
indicates “done”
2
precharges
evaluates
4
N+2
3
evaluates
Complete
cycle:
6 events3
Evaluate
events3 events
Precharge
Precharge:
Evaluate:
another
25
PS0 Performance
6
4
5
2
1
Cycle Time =
3
3 TEVAL  TPRECH  2 TDETECT
TE VA L 
Evaluation Time
TP RE CH 
Precharge Time
TDE TECT  Completion Detection Time
26
Summary: PS0 Pipelining
Datapaths are latch-free:
 dynamic gates themselves provide implicit latches
+: chip area savings
+: extremely low latency
Data items kept separate by control
 stage deletes data: only after next stage has copied it
 stage accepts new data: only if next stage is empty
 distinct data items always separated by “spacers”
Control is extremely simple: each controller = single wire
 completion detector directly controls previous stage
+: chip area savings
+: low control overhead
27
Comparison to a Clocked Pipeline
How would you design the pipeline if you actually had a clock?
1. Replace handshaking with “magic clocking”
each stage gets its own clock
 successive clocks are slightly skewed

 essentially, clocked simulation of asynchronous handshaking!
– need multiple clock phases!
Ck
Ck’
latch
2. Use a single clock, but insert latches between stages
latches are simple, level-sensitive
 consecutive stages receive complementary clock signals

28
Drawbacks of PS0 Pipelining
1.
Poor throughput:
long cycle time: 6 events per cycle
 data “tokens” are forced far apart in time

2.
Limited storage capacity:

max only 50% of stages can hold distinct tokens

data tokens must be separated by at least one spacer
My Research Goals have been: address both issues

still maintain very low latency
29
Homework #4 (due Tue Sep 18)
1. Enumerate ALL of the timing assumptions inherent
in Williams’ PS0 style
Assume all gate and wire delays can be arbitrary
 For which scenarios can there be a malfunction?

2. Compare the cycle times of PS0 with an ideal
clocked dynamic pipeline (slide #28)
30