A Comprehensive Look at System Level Modeling

Download Report

Transcript A Comprehensive Look at System Level Modeling

A Comprehensive Look at System Level Modeling
Ken Rose, Bibiche Geuskens,
Ramon Mangaser, Christopher Mark
Center for Integrated Electronics and Electronics Manufacturing
Department of Electrical, Computer and Systems Engineering
Rensselaer Polytechnic Institute
Troy, NY 12180-3590
[email protected]
518.276.2981
SLIP Workshop, April 2001
Ken Rose
RIPE
Rensselaer Interconnect Performance Estimator
RIPE 3.0 models are described in
‘Modeling Microprocessor Performance’
by B. Geuskens and K. Rose, Kluwer, 1998.
It is available for use on line at
http://latte.cie.rpi.edu/ripe.html
RIPE was developed with partial support from IBM and SRC.
SLIP Workshop, April 2001
Ken Rose
Co-Authors:
• Bibiche Geuskens (RIPE 1.0, 2.0, 3.0)
PhD. June 1997
Intel Corporation, Hillsboro, Oregon
• Ramon Mangaser (RIPE 3.1, 4.0, 4.1)
PhD. Nov. 1999
Sun Microsystems, Chelmsford, Massachusetts
• Christopher Mark (RIPE 4.2)
PhD. Sep. 2000
Intel Corporation, Hillsboro, Oregon
SLIP Workshop, April 2001
Ken Rose
RIPE Genesis:
• H.B. Bakoglu
‘Circuits, Interconnections, and Packaging for VLSI’
Addison-Wesley, 1990.
SUSPENS model coded in RIPE 1.0
• G. A. Sai-Halasz
Proc. IEEE, 83/1, p. 20, 1995.
Basis for RIPE 2.0
SLIP Workshop, April 2001
Ken Rose
RIPE 3.0 Inputs and Outputs
RIPE
System Description
Device/Technology
Description
Interconnect
Description
System/Area
Interconnect
Device
Wireability
Performance
Power
Dissipation
Wiring density
Interconnect RC delay
Cycle time
Power dissipation
Capacitance
Resistance
Crosstalk
Electromigration
Yield
Reliability
SLIP Workshop, April 2001
Ken Rose
RIPE 3.0 Sample Benchmark
(DEC Alpha 21164)
RIPE INPUTS
System
Parameters:
Chip Area [cm2]: 2.99
Number of Transistors [M]: 9.3
SRAM [KBytes]: 112
Signal I/O: 294
(Logic Depth: 14, 15)
Technology Parameters:
Feature Size [mm]: 0.5
Number of Wire Levels: 4
Power Supply [V]: 3.3
Interconnect Parameters:
Pitch [mm]: 1.125, 1.125, 3.0, 3.0
Rint [W/cm]: 1440, 1440, 178, 178
Cint [pF/cm]: 2.0, 2.0, 2.0, 2.0
Data: W.J. Bowhill et al., Dig. Tech. Journal; ISSCC 1996
SLIP Workshop, April 2001
Ken Rose
Cycle Time Estimation Model (Ch. 7)
50%
2



]
Tdelay

0
.
693
R
C

C

R
lC

R
C
l

0
.
377
R
C
l
 per  stage
dr
dr
L
W
L
dr W
W W
Sai-Halasz (1995)
Sakurai (1993)
SLIP Workshop, April 2001
Ken Rose
RC Interconnect Parameters (Ch. 3)
Interconnect Resistance (3.1)
R = reff lint /A wint2
A = Aspect Ratio
Interconnect Capacitance (3.2)
C = 2(CV + CL)
l 2eeff e0 lint wint
(1/TILD + A/Swire)
TILD = Thickness of Interlevel
CL
CV
Dielectric
Swire = Spacing between wires
Yang (1998)
SLIP Workshop, April 2001
Ken Rose
Transistor Count and Area Models (Ch. 4)
Processor Logic, Memory, and I/O Buffers are treated separately
Alpha 21164
Memory
I/O
Random Logic
Transistors
9.3 M
6.7
-------2.6 M
# Transistors
# Gates
Logic Area
SLIP Workshop, April 2001
Area
299 mm2
101
17
181 mm2
Average Logic
Gate Size
Ken Rose
Logic Wireability (Ch. 5)
R(Ng ,p) = average interconnect length in gate pitches
Based on Rent’s rule for the number of pins, Np = Kp (Ng)p
lw = long wire length = 2 (Alogic)1/2
Nw = number of long wires = [fg/(fg+1)] Nptotal
where Nptotal is the total number of pins for functional
blocks and fg is the average logic gate fanout.
SLIP Workshop, April 2001
Ken Rose
Device Parameters (Ch. 6)
We need to have values for transistor resistors and
capacitors, Rdr and Cdr . These have been superseded in
RIPE 4.0.
Cycle Time Estimation Model (Ch. 7)
Tcycle = (fld – 1) Tgavg + 2Tginv + time_of_flight
where fld is the logic depth
SLIP Workshop, April 2001
Ken Rose
Power Dissipation (Ch. 8)
Ptot = fd Ctot Vdd Vswing fc + Isc Vdd + Ileak Vdd
l Si fdi Csw,i) Vdd2 fc
where fd is the activity factor.
1.
2.
3.
4.
5.
random logic
clock distribution
memory
interconnections
off-chip drivers
fd Csw,rl
fd,clk Csw,clk
fd Csw,mem
fd Csw,int
fd Csw,dr
For the Alpha 21164 fd,clk = 0.75, fd = 0.15 based on
published details.
SLIP Workshop, April 2001
Ken Rose
RIPE 3.0 Sample Benchmark
(DEC Alpha 21164)
Memory Transistors:
Area memory:
Pad ring area:
Clock frequency:
Power Dissipation:
Power clock distribution:
SLIP Workshop, April 2001
RIPE Results
Al/SiO2
Actual
6.73 M
1.01 cm2
0.16 cm2
291 MHz
52 W
21 W
7.2M
1.02 cm2
0.17 cm2
300 MHz
50 W
20 W
RIPE Results
Cu/SiO2
6.73 M
1.01 cm2
0.16 cm2
373 MHz
66 W
27 W
Ken Rose
RIPE 3.0 Benchmark Results
Processor
Chip Parameters
Actual RIPE
Alpha 21164
(0.5 mm
CMOS)
Clock frequency (MHz)
Power dissipation (W)
Number of metal levels
300
50
4
290
52
4
150
15-20
4
150
152
19
4
150
18
18
4
4
Pentium
(0.6 mm
BiCMOS)
Clock frequency (MHz)
Power dissipation (W)
Number of metal levels
PowerPC 604 Clock frequency (MHz)
(0.5 mm
Power dissipation (W)
static CMOS) Number of metal levels
SLIP Workshop, April 2001
Ken Rose
RIPE Simulation Modes: RIPE 3.0 to RIPE 4.0
Performance Estimator
-n and -d
modes
Wiring
Strategy
RIPE 3.0
Clock
Frequency,
Power,
Wireability
Wiring Allocator
-aw mode
Clock
Frequency
SLIP Workshop, April 2001
RIPE 4.0
Wiring
Strategy
Ken Rose
Intel Wiring Distribution Model
 #Nets /  Nets l B Lnetsb , b = -1.65
#Nets l A (#Transistors), A l 0.25
S. Yang, MRS Symposium on Advanced Interconnects, April 1998.
#Nets = [B/(b + 1)] [Lmaxb 1 - Lminb1]
Demand = [B/(b + 2)] [Lmaxb 2 - Lminb2]
We have taken
Lmax = 2 (Logic_Area)1/2
and solve the above equations for B and Lmin .
SLIP Workshop, April 2001
Ken Rose
Algorithm for RIPE 4.0 Cycle-Time Based Wiring Allocation
1. Set the input clock frequency and logic depth.
2. Use RIPE’s critical path model to estimate total average
delay, including gate and wire delay.
3. Determine the maximum allowable long wire delay by
subtracting the total average delay from the target cycle
time.
4. Allocate wires using this maximum total long wire delay
as a constraint, but allowing a maximum number of
repeaters.
SLIP Workshop, April 2001
Ken Rose
Modifying the Cycle-Time Model for RIPE 4.0
Tcycle = fld Tavg + Tlong + time_of_flight
Tavg = 0.377(rint cint lint2) + 0.693{Rgout (Cgout + fg Cgin)
+ Rgout [(fg + 1)/2] cint lint + rint [(fg + 1)/2] lint Cgin}
Tlong = 0.377(rint cint llong2) + 0.693[R’gout (C’gout + C’gin)
+ R’gout cint llong + rint llong C’gin]
SLIP Workshop, April 2001
Ken Rose
RIPE 4.0 Benchmark Results
Processor
Chip Parameters
Actual RIPE
Alpha 21164
(0.5 mm
CMOS)
Clock frequency (MHz)
Power dissipation (W)
Number of metal levels
300
50
4
278
57
4
100
15-20
4
133
113
19
4
134
18
20
4
4
Pentium
(0.6 mm
BiCMOS)
Clock frequency (MHz)
Power dissipation (W)
Number of metal levels
PowerPC 604 Clock frequency (MHz)
(0.5 mm
Power dissipation (W)
static CMOS) Number of metal levels
SLIP Workshop, April 2001
Ken Rose
Katmai Wiring Strategy Calculated by RIPE 4.0
Level
Pitch
[x0.64mm]
rint
[W/cm]
1
2-3
4
5
1.0
1.45
2.5
4.0
3451
891
365
158
Level
Repeaters
for Lmax
1
2-3
4
5
SLIP Workshop, April 2001
0
0
2
3
cint
Lmax
[pF/cm] [mm]
2.37
2.61
2.40
2.34
Level Wiring
Efficiency
0.02
0.30
0.50
0.52
0.006
4.4
12.3
20.5
Total Wiring
Efficiency
0.02
0.18
0.23
0.25
Ken Rose
RIPE Inclusions
•
•
•
•
•
•
•
BEOL Yield
Signal Integrity
Electromigration
Cache Memory Performance
Repeater Insertion
Interconnect Inductance
Accurate MOSFET Models
SLIP Workshop, April 2001
Ken Rose
BEOL Yield in RIPE
•
•
•
Critical Area
Cube law distribution of defect sizes
Poisson distribution of faults
Ytotal = e-lopen e-lshort
SLIP Workshop, April 2001
Ken Rose
Katmai (250 nm Pentium III) Transition to
180nm Technology
Katmai Shrink (Katmai-180)
• number of transistors
• chip size
• clock frequency
• metal layers
9.5M
1.23
600
5
0.62 cm2
850 MHz
6
4 wiring domains
Katmai Shrink and Doubling (Katmai2)
• number of transistors
• chip size
• clock frequency
• metal layers
19M
1.24 cm2
850 MHz
10
9 wiring domains
SLIP Workshop, April 2001
Ken Rose
Contributions of Different Metal Levels to Random
Defect Yields for Katmai and Katmai2
Katmai
250 nm
M1
3.9%
M2
34.8
M3
34.8
M4
18.8
M5
7.7
Total
Faults
0.105
Poisson
Yield
90%
SLIP Workshop, April 2001
M1
M2
M3
M4
M5
Total
Faults
Poisson
Yield
Katmai2
180 nm
2.7
M6
34.3
M7
34.3
M8
13.4
M9
6.0
M10
3.8
2.4
1.6
1.1
0.4
0.464
63%
Ken Rose
Signal Integrity Limits
Sakurai (1993)
1
Ccint
Vp  Vdd
2 C pint + Ccint
V p c pint
2
fraction of victim wire  Vdd ccint
Vp
parallel to attacker
1 2
Vdd
SLIP Workshop, April 2001
Ccint
Cpint
Ken Rose
Vp Comparison between SPICE, Sakurai Model, and the
Modified HP Model for Deschutes (250 nm Pentium II)
Metal
Levels
M2-M3
M4
M5
Line
SPICE
Lengths
(mm)
(mV)
Sakurai
%
Error
Modified %
HP
Error
(mV)
0.01
1.3
643
Big
1.25
4
6
403
643
60
436
8
6
300
578
93
314
5
10
369
578
57
399
8
12
321
532
66
335
4
21
382
532
39
411
8
SLIP Workshop, April 2001
(mV)
Ken Rose
Cache Memory Performance
We assume that the cycle time is defined by the logic
subsystem. Calculated cache access times greater than this
cycle time will be flagged and reported by RIPE. RIPE will
then assume that the cache requires multiple clock cycles
for proper operation.
RIPE 4.1 implements the model of Wada et al. (1992) IEEE
JSSC, 27, p. 1147. It can be linked to the more accurate
CACTI model of Wilton and Jouppi (1996) IEEE JSSC, 31,
p. 677.
SLIP Workshop, April 2001
Ken Rose
Inductance in RIPE 4.2
• RIPE has good estimates of wire capacitance (per unit length)
[Geuskens and Rose, 98, Mangaser (Ph.D. Thesis), 99]
• Estimate wire inductance from wire capacitance
– Assume homogeneous medium and TEM mode propagation
• Inductance analysis performed in two steps
– Identification of wiring levels with significant inductance effects
 Incorporate Ismail’s formulas for an inductance figure of merit (FOM) to
define upper and lower bounds for wire lengths that are susceptible to
inductance effects on each wiring level
 Use constant RC values to estimate rise times needed in FOM
– Optimization of inductance-susceptible levels
 Revert to wire pitch from the last, previous wiring level without
inductance effects
 Given long-wire delay constraint, use Ismail’s RLC-based formulas to
determine maximum wire length (per level)
SLIP Workshop, April 2001
Ken Rose
RIPE 4.2 wire level projections using Cu/low-K(=2)
• Using ITRS’99 scaling trends
Technology Node (nm)
180
130
100
RC 0 Repeaters
7
12
24
6
11
22
RC 5 Repeaters
RLC 0 Repeaters
5
11
17
6
10
18
RLC  5 Repeaters
* 40%/60% Logic area to memory area ratio assumed
70
40
35
29
25
50
>50
>50
>50
42
35
>50
>50
>50
>50
• Using RPI and Bohr scaling trends with ITRS’99 clock frequencies
Technology Node (nm)
180
130
100
RC 0 Repeaters
6
7
8
6
6
7
RC 5 Repeaters
RLC 0 Repeaters
6
7
8
6
6
6
RLC  5 Repeaters
* 20%/80% Logic area to memory area ratio assumed
70
10
8
9
7
50
12
10
10
7
35
18
15
12
10
• ITRS’99 scaling trends for MOSFETs, chip size and transistor counts
are overly aggressive !!
SLIP Workshop, April 2001
Ken Rose
A Constant RC Input-Signal-Transition-Inherent
(CRISTI) gate delay model:
Constant RC model of an inverter chain
Vdd
Rpu1
Rpd1
For Inverter 2
Rpu2
Cnode1
Rpd2
Rpu3
Cnode2
Rpd3
Cnode3
tpdr  0.7 * Rpu 2 * Cnode2  b * Rpd 1* Cnode1
tpdf  0.7 * Rpd 2 * Cnode2  b * Rpu1* Cnode1
tpdav  Rdr * Cnode * 0.7  b ]  K * Rdr * Cnode
(assuming brbfb)
SLIP Workshop, April 2001
Ken Rose
Previous approaches to estimating constant RC values
• Resistance
Gate Length (mm)
Rpu (KW)
This work
Step input gate delay/Load capacitance [Weste et al.]
Equation (1) [Menezes et al., Qian et al.]
1/(maximum drain conductance) [Sakurai]
1/(minimum drain conductance) [Sakurai]
Slope-chord technique at Vgs=Vds=Vdd [Watt et al.]
Equations (2) [Rabaey]
1.2
0.9
0.6
12.1
8.0
6.7
4.1
667
15.7
8.9
11.7
8.2
6.1
3.9
909
16.7
8.7
9.5
6.7
5.5
2.9
167
13.2
6.7
7.7
6.6
4.0
1.8
233
12.8
5.6
13.0
9.1
5.6
2.5
667
18.1
7.9
11.4
8.0
5.1
2.2
317
16.1
7.1
Rpd (KW)
This work
Step input gate delay/Load capacitance [Weste et al.]
Equation (1) [Menezes et al., Qian et al.]
1/(maximum drain conductance) [Sakurai]
1/(minimum drain conductance) [Sakurai]
Slope-chord technique at Vgs=Vds=Vdd [Watt et al.]
Equations (2) [Rabaey]
t t
Rdr  90 50
Ceff ln 5
SLIP Workshop, April 2001
(1)
,
PMOS
PMOS

1  Vds 
 Vds 
Rpu  




2  Ids Vout0  Ids VoutVdd / 2 
(2)
Ken Rose
Two general methods of determining constant
RC values
• Method 1
- Given a full set of SPICE parameters, determine R and C
from SPICE simulations of inverter chains
- Use actual gates, not step or ramp inputs, to drive inverters
under investigation  better characterization of RC values
- Use a constant RC input-signal-transition-inherent gate
delay model for inverters
• Method 2
- Given limited MOSFET information, determine R and C
from the “CV/I” metric
- Use this method to project RC values for deep sub-micron
CMOS technologies
SLIP Workshop, April 2001
Ken Rose
C-IRSIM
• CRISTI model for inverters was extended to multitransistor (>2) logic gates
– 3-input NAND gates used initially
• Focus placed on transistors in series stacks
– Relative topological position and relative turn-on order
• These combined features determine the appropriate R and
C value for each transistor in a series stack
– Ignoring these features leads to significant errors in delay
estimation relative to SPICE
• Elmore delay terms included with bRC term to account for
distributed RC effects in complex gates
• CRISTI incorporated into IRSIM  C-IRSIM
SLIP Workshop, April 2001
Ken Rose
C-IRSIM simulation examples
• 1056-transistor, 6-bit DADDA multiplier circuit in 0.18mm
technology
Bit Pattern
Set #
1
2
3
4
5
Total
multiplication
time
% error relative
to AIM-Spice
C-IRSIM’S
% Impr.
Slowest settling
product bit ->
final value
P4 -> 1
P8 -> 0
P8 -> 0
P7 -> 0
P6 -> 0
Total execution time (s)
SLIP Workshop, April 2001
AIM-Spice
(ns)
C-IRSIM
(ns)
IRSIM
(ns)
2.65
4.76
6.80
4.75
4.49
23.45
3.75
4.32
7.33
3.94
5.57
24.91
3.47
6.22
8.21
3.56
5.70
27.16
+6.2
+15.8
60.6
AIM-Spice
405
C-IRSIM
1
IRSIM
1
Ken Rose
Significance of good device models
• Selected cycle-time components from RIPE 4.2
Technology Node (nm)
Target cycle time (ns)
Ctrn (fF)
ITRS '99
RPI/Bohr
(ITRS ’99 clock
frequencies)
Reff-trn (KW)
Total logic-stage
delay (ns)
Total logic-stage
delay fraction of
target cycle time
Total long-wire
delay (ns)
Ctrn (fF)
Reff-trn (KW)
Total logic-stage
delay (ns)
Total logic-stage
delay fraction of
target cycle time
Total long-wire
delay (ns)
180
0.83
0.270
20.5
0.55
130
0.62
0.131
27.3
0.38
100
0.50
0.097
28.3
0.30
70
0.40
0.058
30.3
0.20
50
0.33
0.044
28.2
0.14
35
0.28
0.028
40.8
0.14
0.66
0.61
0.60
0.50
0.42
0.50
0.28
0.24
0.20
0.20
0.19
0.14
0.152
27.8
0.50
0.071
34.8
0.33
0.046
41.1
0.27
0.025
51.4
0.21
0.014
65.6
0.18
0.007
88.8
0.15
0.60
0.53
0.54
0.53
0.54
0.54
0.33
0.29
0.23
0.19
0.15
0.13
• Fraction of cycle time consumed by total logic delay can be relatively large
(0.5-0.66) !!  Devices cannot be neglected altogether
• Small change in device delay  potentially big change in total wiring levels
SLIP Workshop, April 2001
Ken Rose
Conclusions
•
Reasonable estimates can be made of microprocessor
performance on the basis of limited information.
•
Models should be robust with a limited number of
arbitrary fitting parameters.
•
Interconnect limitations constrain design and
manufacture.
SLIP Workshop, April 2001
Ken Rose
RIPE 4.0 Sample Benchmark
Intel’s Deschutes (Pentium II) processor
RIPE INPUTS
System Parameters
Technology Parameters
Wire Parameters
Circuit Area (mm2): 1.31
Technology Generation
Pitch (mm): 0.64, 0.93
Number of Transistors
(mm): 0.25
0.93, 1.60, 2.56
(M): 7.5
LGATE(mm): 0.18
rint (W/cm): 3451, 891,
SRAM cells (mm2): 10.26
Num. of wire levels: 5
891, 365, 158
SRAM (Kbytes): 32
(Aluminum)
cint (pF/cm): 2.4, 2.6,
Signal I/O: 242
Core Supply (V): 1.8
2.6, 2.4, 2.3
RIPE RESULTS
ACTUAL
Clock Frequency (MHz)
459
450
Power Dissipation (W)
18.7
18.9
SLIP Workshop, April 2001
Ken Rose
Wiring strategy results from RIPE 4.1 for a 100nm,
Cu/low-K(=2) technology using RPI/Bohr/ITRS’99 scaling
• No inductance analysis
• Repeaters chosen to maximize chip wireability
Level #
------1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Pitch
(xPmin)
------1.00
2.00
2.00
3.00
5.00
7.00
8.00
10.00
13.00
15.00
17.00
20.00
23.00
26.00
29.00
32.00
SLIP Workshop, April 2001
Repeaters
(for Lmax)
---------0
0
0
2
1
1
3
3
1
2
2
1
1
1
1
1
Rw
(ohm/cm)
-------11000
2750
2750
1222
440
224
172
110
65
49
38
28
21
16
13
11
Lmax
(um)
---2
2774
2774
5938
8641
10992
13360
15477
17246
18880
20403
21759
22984
24105
25139
26800
ew
total_ew
----
--------
0.0136
0.3000
0.3000
0.2999
0.2999
0.2999
0.2999
0.3000
0.3000
0.3000
0.2999
0.2999
0.3000
0.2999
0.3000
0.2997
0.0136
0.1568
0.1568
0.1772
0.1869
0.1929
0.1977
0.2012
0.2038
0.2059
0.2077
0.2091
0.2104
0.2114
0.2124
0.2132
Ken Rose
Wiring strategy results from RIPE 4.2 for a 100nm,
Cu/low-K(=2) technology using RPI/Bohr/ITRS’99 scaling
• Inductance analysis performed
• Repeaters again chosen to maximize chip wireability
– Compromise between maximizing chip wireability and minimizing RLC delay
Level #
------1
2
3
4
5
6
7
8
9
10
11
12
Pitch
(xPmin)
------1.00
2.00
2.00
3.00
5.00
7.00
7.00
7.00
8.00
10.00
10.00
32.00
Repeaters
(for Lmax)
---------0
0
0
2
1
1
3
3
1
2
2
1
Rw
(ohm/cm)
-------11000
2750
2750
1222
440
224
224
224
172
110
110
11
Lmax
(um)
---2
2774
2774
5938
8641
10992
16863
16863
19964
25657
25657
26800
ew
---0.0136
0.3000
0.3000
0.2999
0.2999
0.2999
0.3000
0.3000
0.2999
0.3000
0.3000
0.1372
total_ew
-------0.0136
0.1568
0.1568
0.1772
0.1869
0.1929
0.2033
0.2033
0.2072
0.2128
0.2128
0.2121
• Wire inductance reduces the effect of wire resistance
 Smaller
wire pitches but longer wire lengths
 Reduction in total number of wire levels
SLIP Workshop, April 2001
Ken Rose