Spongepaint - Massachusetts Institute of Technology

Download Report

Transcript Spongepaint - Massachusetts Institute of Technology

Physical Effects: Delay
RP
RW
Cd
CW/2
CW/2
Cg
6.375 Complex Digital Systems
Krste Asanovic
March 5, 2007
6.375 Standard Cell Design Flow
Bluespec SystemVerilog source
Bluespec Compiler
Blueview
Verilog 95 RTL
C
Bluespec C sim
Cycle
Accurate
Verilog sim
VCD output
Legend
files
Bluespec tools
3rd party tools
RTL synthesis
gates
Debussy
Visualization
How do RTL choices affect
resulting physical design?
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 2
Measuring Chip “Quality”
Most important metrics for a chip design:
• Area
– Size affects manufacturing and packaging costs
• Performance
– Does chip meet market performance goals?
• Power
– Peak power affects packaging cost (current supply,
heat removal)
– Energy usage affects battery life
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 3
Iron Law of Performance
Operations Clock Cycles
Performance 

Clock Cycle
Second
Concurrency in
RTL Design
Clock Frequency of
Physical Design
These are not independent
parameters!
Clock frequency set by delay of circuit
components in critical path
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 4
Basic CMOS Components
Gates
Transistors
Wires
output
input0
input1
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 5
Metal Oxide-Semiconductor
Field-Effect (MOSFET) Transistor
Gate
Inversion
happens here
Source
diffusion
Drain
diffusion
Eh
Ev
bulk
INVERSION:
A sufficiently strong vertical field
will attract enough electrons to
the surface to create a conducting
n-type channel between the
source and drain.
CONDUCTION:
If a channel exists, a
horizontal field will cause a
drift current from the drain to
the source.
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 6
Key qualitative characteristics of
MOSFET transistors affecting delay
Width
Vout
Vin
Cgate
Cdrain
Reff
Length
• Increase Width (W)  Increase current  Decrease Reff
• Increase Length (L)  Decrease current  Increase Reff
• Cgate proportional to (W x L) and Cdrain proportional to W
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 7
CMOS Transistors, Gates, and Wires
Gates
Transistors
Wires
output
input0
input1
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 8
The most basic CMOS gate
is an inverter
Let’s make the following assumptions
WP/LP
Vin
2α
Vout
WN/LN
1α
1. All transistors are minimum length
2. All gates should have equal rise/fall
times. Since PMOS are ~twice as slow
as NMOS they must be twice as wide
to have the same effective resistance
3. Normalize all transistor widths to
minimum width NMOS
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 9
The most basic CMOS gate
is an inverter
VDD
WP/LP
2α
PMOS
Vin
Vout
WN/LN
1α
A
Y
NMOS
GND
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 10
A simple RC model for the
inverter can provide significant insight
Reff
Vin
Vout
Vin
Vout
Cg
Cd
Reff
Reff = Reff,N = Reff,P
Cg = Cg,N + Cg,P
Cd = Cd,N + Cd,P
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 11
A simple RC model for the
inverter can provide significant insight
Reff
Vout
Vin
Cg
Cd
CL
Reff
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 12
A simple RC model for the
inverter can provide significant insight
Reff
Vout
Vin = “0”
Cg
Cd
CL
Reff
Charge RC Time Constant (TPLH) = Reff x ( Cd + CL )
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 13
The most basic CMOS gate
is an inverter
Reff
Vout
Vin = “1”
Cg
Cd
CL
Reff
Discharge RC Time Constant (TPHL) = Reff x ( Cd + CL )
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 14
Larger gates are faster since they
decrease Reff (but they also increase Cd!)
Process gen = 0.25μm
Supply voltage = 5V
Min width NMOS = 0.5μm
2
2
1
1
Param Value Units
Cd,N/μm
1.42 fF/μm
Cd,P/μm
2.40 fF/μm
Cg,N/μm
1.55 fF/μm
Cg,P/μm
1.48 fF/μm
Reff,N x μm 4.93 kΩ/μm
Cd = (0.5x1.42) + (1x2.40) = 3.11 fF
CL = (0.5x1.55) + (1x1.48) = 2.26 fF
Cd+CL = 5.37 fF
TPLH = 2.2 x (10.83/1) x 5.37 = 128ps
TPHL = 2.2 x (4.93/0.5) x 5.37 = 116ps
Double size of driver
4
2
2
1
Reff,P x μm 10.83 kΩ/μm
Ignores the fact that previous
gate now must drive a bigger
gate capacitance!
Cd = (1x1.42) + (2x2.40) = 3.66 fF
CL = (0.5x1.55) + (1x1.48) = 2.26 fF
Cd+CL = 5.92 fF
TPLH = 2.2 x (10.83/2) x 5.92 = 70.5ps
TPHL = 2.2 x (4.93/1) x 5.92 = 64.2ps
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 15
More complicated gates use more
transistors in pullup/pulldown networks
VDD
Pullup network, connects output
to VDD, contains only PMOS
Input 0
Input 1
Input N
VOUT
Pulldown network, connects output
to GND, contains only NMOS
For every set of input logic values, either pullup or pulldown
network makes connection to VDD or GND
– If both connected, power rails would be shorted together
– If neither connected, output would float (tristate logic)
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 16
Series and parallel MOSFET networks
provide natural duals of each other
A
A
A
B
B
Conducts if A=0
Conducts if A=0 OR B=0
Conducts if A=0 AND B=0
A
A
A
B
B
Conducts if A=1
Conducts if A=1 AND B=1
Conducts if A=1 OR B=1
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 17
NAND and NOR gates illustrate the dual
nature of the pullup/pulldown networks
NAND Gate
A
B
NOR Gate
A
B
(A.B)
(A+B)
A
(A.B)
B
B
(A+B)
A
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 18
A methodical approach to build more
complex gates
• Goal is to create an logic function f ( x1, x 2 , )
– We can only implement inverting logic with one CMOS stage
• Implement pulldown network
– Write PD  f ( x1, x 2 , )
– Use parallel NMOS for OR of inputs
– Use series NMOS for AND of inputs
• Implement pullup network
– Write pullup network PU  f ( x1, x 2 , )  g( x1, x 2 , )
– Use parallel PMOS for OR of complemented inputs
– Use series PMOS for AND of complemented inputs
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 19
Example of complex gate
A
f  (A  B)  C
PD  ( A  B)  C
B
(A+B).C
C
PU  ( A  B)  C
 ( A  B)  C
 ( A  B)  C
• Should we map every function into a single complex gate?
• What gates should we put into a standard cell library?
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 20
Examples illustrating unit-less delay (d)
of gates with equal drive strength (Reff)
4
8
4
4
2
10
4
10
2
4
Inverter
delay = 2.67
8
NAND
delay = 3.67
2
10
NOR
delay = 3.67
Less parasitic drain
capacitance (Cd) loading
output
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 21
Examples illustrating unit-less delay (d)
of gates with similar area
2.5
4
2.5
6
3
10
2.5
10
1
2.5
Inverter
delay = 2.11
4
NAND
delay = 4.67
1
10
NOR
delay = 5.33
PMOS worse than NMOS,
series path is limiter
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 22
Which gate topology and transistor sizing
is optimal?
Given a logic function, there are many possible
logic gate topologies and transistor sizings.
1. What is the optimal transistor sizing?
2. What is the optimal number of logic
stages?
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 23
Optimal sizing and delays for example
topologies
Topology B
Topology A
4/3
2
4/3
2
4/3
2
4/3
2
2
4
Topology C
10/3
8
2
4
1
5/3
2
5/3
2
4/3
2
1
5/3
2
G
N P
DOPT
Optimal delay
for output
loading H
H=1
H=12
A 2.96
4
7
4(2.96H)1/4 + 7 12.25 16.77
B 3.33
2
6
2(3.33H)1/2 + 6
C 3.33
2
9
2(3.33H)1/2 + 9 12.65 21.64
9.65
18.64
[ For more explanation of how these numbers were
derived, see Logical Effort link on website ]
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 24
CMOS Transistors, Gates, and Wires
Gates
Transistors
Wires
output
input0
input1
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 25
Wires are an old problem
Cray-1
1976
Cray-3
wiring
Cray-3
1993
Cray-1 Wiring
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 26
Modern interconnect stacks
have six to nine or more metal layers
Metal 6
© IBM
Via 5-6
Metal 5
Metal 4
Metal 3
Metal 2
Metal 1
Via 1-2
IBM CMOS7 process
© IBM
6 layers of copper wiring
1 layer of tungsten local interconnect
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 27
Wire resistance is a function
of height, width, and length
resistance
Height
Length
Width
 length  resistivity 
 height  w idth 
bulk aluminum
2.8x10-8 -m
bulk copper
1.7x10-8 -m
bulk silver
1.6x10-8 -m
• Height (Thickness) fixed in given manufacturing process
• Resistances quoted as /square
• TSMC 0.18µm 6 Aluminum metal layers
– M1-5 0.08 /square (0.5 µm x 1mm wire = 160 )
– M6 0.03 /square (0.5 µm x 1mm wire = 60 )
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 28
Wire capacitance is relative to the
substrate and to neighboring wires
H2
W2
D12
H1

W 1 S1
DD1
• Capacitance depends on geometry of surrounding wires and
relative permittivity (r) of insulating dielectric
– silicon dioxide (SiO2)
– silicon flouride (SiF4)
– SiLKTM polymer
r = 3.9
r = 3.1
r = 2.6
Capacitive coupling to
neighbors is becoming a
serious problem!
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 29
Wire capacitance is relative to the
substrate and to neighboring wires

H2
2

12


W2
D12
H1
1
D1
W 1 S1
DD1
• Capacitance depends on geometry of surrounding wires and
relative permittivity (r) of insulating dielectric
– silicon dioxide (SiO2)
– silicon flouride (SiF4)
– SiLKTM polymer
r = 3.9
r = 3.1
r = 2.6
• Can have different materials between wires and between layers,
and also different materials on higher layers
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 30
This IBM experimental 130nm process
includes two metals and two dielectrics
Al
E. Barth, IBM Microelectronics
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 31
Distributed RC wire model gives accurate
results but is computationally expensive
Rdriver
R1
R2
RN
Cload
C1
C2
CN
Use Penfield-Rubenstein equation to find delay
 ji 
Delay     R j  Ci

i  j 1
N
How does the delay scale with longer wires?
– Wire delay increases quadratically
– Edge rate also degrades quadratically
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 32
Lumped  model can provide a
quick reasonable approximation
Rdriver
Rw
Cload
Cw/2
Delay  R driver
Cw/2
Cw
 Cw


 R driver  R w   
 Cload 
2
 2

Rw is lumped resistance of the wire
Cw is lumped capacitance
Partition half of Cw at each end
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 33
Estimate the rise time of node A
using an RC delay model
Process gen = 0.25μm
Supply voltage = 5V
Min width NMOS = 0.5μm
Param
Value Units
Cd,N / μm
1.42
fF/μm
Cd,P / μm
2.40
fF/μm
Cg,N / μm
1.55
fF/μm
Cg,P / μm
1.48
fF/μm
CA,M2 / μm2 0.016 fF/μm2
Metal 2 wire
(250µm x 0.250µm)
16
8
2
1
A
RP
RW
Cd
CW/2
CW/2
Cg
CL,M2 / μm 0.084 fF/μm
Reff,N x μm 4.93
kΩ/μm
Reff,P x μm 10.83 kΩ/μm
RM2 / sq
0.07
Ω/sq
Cg = ( 0.5 x 1.55 ) + ( 1 x 1.48 ) = 2.26 fF
Cd = (4 x 1.42 ) + ( 8 x 2.40 ) = 24.88 fF
Rp = 10.83/8 = 1.35 kΩ
Rw = ( 250 / 0.25 ) x 0.07 = 70 Ω
Cw = (( 250 x 0.25 ) x 0.0016 ) + ( 250 x 0.084 ) = 21.14 fF
TPLH = 2.2 x ( 1350 x (21.14/2 + 24.88)
+ (1350 + 70) x (21.14/2 + 2.26) ) = 66ps
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 34
Estimate the rise time of node A
using an RC delay model
Process gen
= 0.25μm
Supply voltage = 5V
Min width NMOS = 0.5μm
Param
Value
Units
Cd,N / μm
1.42
fF/μm
Cd,P / μm
2.40
fF/μm
Cg,N / μm
1.55
fF/μm
Cg,P / μm
1.48
fF/μm
CA,M2 / μm2 0.016
fF/μm2
CL,M2 / μm
0.084
fF/μm
Reff,N x μm
4.93
kΩ/μm
Reff,P x μm 10.83
kΩ/μm
RM2 / sq
0.07
Metal 2 wire
(250u x 0.250u)
16
8
2
1
A
How should we buffer up this signal?
Should we have a few big stages or
many small stages?
2
8
16
2
6
10
14
16
1
2
8
1
3
5
7
8
Ω/sq
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 35
How many stages of inverters
required if want to drive large load?
Cin
…
Cout
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 36
A good rule-of-thumb is to
target a stage effort around four
Cin
Cout
Minimum delay when:
– Stage effort = logical effort x electrical effort ≈ 3.4-3.8
– Some derivations use e = 2.718.. – this ignores parasitics
– Broad optimum, stage efforts of 2.4-6.0 within 15-20% of minimum
Fan-out-of-four (FO4) is convenient design size (~5t
FO4 delay: Delay of
inverter driving four
copies of itself
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 37
Large RC makes long wires slow
Rdriver
R1
R2
RN
Cload
C1
C2
CN
 ji 
Delay     R j   Ci

i  j 1
N
Wire delay increases quadratically
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 38
Adding repeaters gives linear growth in
delay
Rdriver
R1
R2
RN
Cload
C1
C2
CN
N
Delay   R i Ci
i
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 39
Several issues with repeater insertion
•
•
•
•
Repeater must connect to transistor layers
Blocks other routes with vias that connect down
Requires space on active layers for buffer transistors
Repeaters often grouped in preallocated repeater
boxes spread around chip, and thus repeater
location might not give ideal spacing
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 40
Wire delay in standard-cell flow
• Front-end tools include approximate wire-load
models
– Usually statistical in nature, quite inaccurate
– Helps synthesis tool with technology mapping
• Back-end tools include better wire-load models
– After trial placement can use Manhattan distance
– Tool will automatically insert repeaters where
necessary
– Note: Tools cannot add extra pipeline stages if wires
are too long -> cycle time will suffer if you have global
combinational logic paths
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 41
Wire Delay: Impact on RTL
• Need extra pipeline stages for wire delay
– Pentium-4 included stages just for driving signals
– Requires very early physical prototyping
– RTL changes if communication latency changes!
• Use latency-insensitive methodology to avoid
reworking RTL design at late stage
– Create macroblocks with FIFO interfaces
– Use rules that don’t depend on number of cycles
to propagate data through FIFOs
– Can change effective latency through FIFO after
physical layout without changing RTL design in
macroblocks
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
TC Next IP
TC Fetch
Drive
Alloc
Rename
Queue
Schedule 1
Schedule 2
Schedule 3
Dispatch 1
Dispatch 2
Register File 1
Register File 2
Execute
Flags
Branch Check
Drive
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 42
In deep submicron technologies many
predicted an interconnect doomsday
National Technology Roadmap for Semiconductors, SIA, 1997
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 43
Is there really an
interconnect doomsday looming?
Local wire delay
tracks improvement
in gate delay
Scaling
Impact
Affect on
Affect on
Resistance Capacitance
Length
Decreases
Decreases
Decrease
Width
Decreases
Increases
Decrease
Height
~ Constant
--
--
R. Ho, K. Mai, M. Horowitz, Proc. of the IEEE, Apr 2001
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 44
Is there really an
interconnect doomsday looming?
Scaling
Impact
Affect on
Affect on
Resistance Capacitance
Length
~ Constant
--
--
Width
Decreases
Increases
Decrease
Height
~ Constant
--
--
R. Ho, K. Mai, M. Horowitz, Proc. of the IEEE, Apr 2001
Global wire delay
increases relative to
wire delay!
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 45
No doomsday, just one more physical
design issue to carefully manage
National Technology Roadmap for Semiconductors, SIA, 2005
6.375 Spring 2006 • L04 CMOS Transistors, Gates, and Wires • 46