ppt - University of Virginia, Department of Computer Science

Download Report

Transcript ppt - University of Virginia, Department of Computer Science

Impact of Parameter Variations on
Multi-core chips
E. Humenay, D. Tarjan, K. Skadron
© 2004, Kevin Skadron
Department of Computer Science
University of Virginia
1
© 2004, Kevin Skadron
Motivation
•
Process variations are projected to severely
impact the yield of high-performance
semiconductors
•
Multi-core architectures have become the
future trend of high-performance chips
•
Understanding how process variations
interact with CMPs is required
2
Variation Types
•
PVT Variations
- Process
- Voltage
- Temperature
© 2004, Kevin Skadron
This work primarily focuses on process
variations
3
Process Variations
P variations stem from a variety of
sources
• Within-Die (WID)
• Die-to-Die (D2D)
• Wafer-to-Wafer (W2W)
• Core-to-Core (C2C)
© 2004, Kevin Skadron
•
4
WID Variations
•
WID variations can be further sub-divided
© 2004, Kevin Skadron
•
•
Systematic (WIDsys)
Random (WIDrand)
•
Threshold voltage, Vth, and effective channel
length, Leff, are the 2 parameters most susceptible
to random variations
•
Systematic Variations cause parameter values to be
spatially correlated
•
•
Can be modeled as deterministic or random
WID variations cause C2C variations
5
Drain Induced Barrier Lowering (DIBL)
•
Ideally, Vth and Leff values are independent of
each other
•
The DIBL effect introduces a dependency
Vth  Vth0  VDDe
© 2004, Kevin Skadron
•
(DIBL Leff )
DIBL causes there to be an exponential
dependency between Leff and sub-threshold
leakage
6
© 2004, Kevin Skadron
Modeling Methodology
•
In order to estimate the impact of P
variations on delay it is necessary to have a
critical path (CP) model
•
Prior CP models vary inputs into RC delay
equation for Monte-Carlo analyses.
•
Simplicity comes at the expense of
accuracy.
7
CP Modeling: Prior Work
•
Fmax GCP model (Bowman, JSSC ‘02)
– Ncp ~ Number of critical paths
– Lcp ~ Number of gates in critical path (Logic Depth)
© 2004, Kevin Skadron
Ncp
•
Lcp
Marculescu DAC ’05
•
Ncp ~ stage’s device count.
8
Importance of Ncp
•
As Ncp increases mean delay increases and
delay variation decreases
0.04
0.035
Ncp
0.025
1
0.02
2
4
16
0.015
128
0.01
0.005
1.066
1.061
1.055
1.050
1.044
1.039
1.033
1.028
1.022
1.017
1.011
1.006
1.000
0.995
0.989
0.984
0.978
0.973
0.967
0.962
0
0.956
Count/Samples
© 2004, Kevin Skadron
0.03
Normalized Delay
9
Modified CP Model
•
Goal: More accurately describe each functional
unit’s delay distribution in order to determine
which functional units will affect the final
frequency distribution
•
Improvements
Considering wire delay when determining Lcp
Better Ncp assignments
Importance of Weff: Vth ~ 1 / Weff  Leff
© 2004, Kevin Skadron
•
•
•
10
Modified CP Model
• Categorize each stage as being either SRAM
or combinational logic
• SRAM
•
•
•
•
•
© 2004, Kevin Skadron
•
L1s
TLBs
Register File
Rename Map
Issue Queue
Logic
•
•
•
Type
SRAM
LOGIC
Ncp
Hi
Lo
Lcp
Lo
Hi
Weff
Lo/Hi
Hi
Execution Units
Decode Stage
Issue Select
11
SRAM model
•
Modified version of CACTI 4.0 is used to estimate
fraction of access time susceptible to device
variations
•
Ncp ~ number of read ports
•
Weff is dependent on unit type
•
© 2004, Kevin Skadron
•
L1 caches are assumed to be optimized for area (minimal
sized Weff)
Time critical SRAM units have larger widths
(Assume 5x larger than min)
•
Only consider variation in SRAM access time
12
Combinational Logic Model
•
Logic model is based off
of Sklansky adder
•
Delay modeled with
Horowitz delay equation
i:k
© 2004, Kevin Skadron
•
Critical path is carry
circuitry
Weff is chosen to
alleviate fan-out delay
i:k
i:j
Gi:k
Pi:k
Gk-1:j
•
k-1:j
Pk-1:j
k-1:j
i:j
i:j
Gi:j
Gi:k
Pi:k
Gk-1:j
Pi:j
i:j
Gi:j
Gi:j
Gi:j
Pi:j
Pi:j
13
WIDrand: SRAM delay
•
•
Because of large Ncp L1 is likely to be
slowest SRAM unit
Nominal Frequency is 3GHz
0.09
64KB L1
Count/Samples
120 Entry RF
8KB TLB
0.06
6.96
6.78
6.60
6.42
6.24
6.06
5.88
5.70
5.52
5.34
5.16
4.98
4.80
4.62
4.44
4.26
4.08
3.90
3.72
3.54
3.36
3.18
3.00
2.82
2.64
2.46
0
2.28
© 2004, Kevin Skadron
0.03
% Frequency Slowdown Due to Random Process Variations
14
WIDrand: SRAM vs. Logic
•
L1 will also be slower than logic
Count/Samples
0.09
0.06
64b Adder Critical Path
64KB L1 Cache
6.96
6.42
5.88
5.34
4.8
4.26
3.72
3.18
2.64
2.1
1.56
1.02
0.48
-0.1
-0.6
-1.1
-1.7
-2.2
-2.8
0
-3.3
© 2004, Kevin Skadron
0.03
% Frequency Slowdown Due to Random Process Variations
15
WIDsys Pattern
•
WIDsys model is derived from actual
measurements (Friedberg ISQED’05)
Fast, High-leakage
Leff
28
POWER4-like core
scaled to 45nm
© 2004, Kevin Skadron
14mm
27
26
Slow, Low-leakage
14mm
25
16
Impact of WIDsys on Delay
•
•
WIDsys can cause frequency from core-to-core to
differ by as much as 5%
Large Lcp value causes combinational logic units
to be more affected by WIDsys variation
12
% Frequency Slowdown
© 2004, Kevin Skadron
10
8
64KB L1
6
Logic
4
2
0
0
2
4
6
8
10
12
% WID Systematic Variation in Leff
17
Random Leakage Variation
•
WIDrand will not have an impact on leakage at
the architectural level since total leakage is
an aggregate sum
0.05
Number of
Transistors
0.045
0.04
1
2
4
0.03
0.025
0.02
0.015
0.01
0.005
14.8
14
13.3
12.5
11.8
11
10.3
9.5
8.75
8
7.25
6.5
5.75
5
4.25
3.5
2.75
2
1.25
0
0.5
Count/Samples
© 2004, Kevin Skadron
0.035
Normalized Aggregate Leakage
18
C2C Leakage Variation
•
•
Figure shows core leakage when considering all
possible core locations on a die
3 different magnitudes of DIBL are considered
•
BSIM suggests .15 (best-case)
120
DIBL
80
0.15
60
0.14
0.13
40
20
Normalized Core Leakage
45
1.
42
1.
39
1.
36
1.
3
33
1.
1.
27
1.
24
1.
21
1.
18
1.
15
1.
12
1.
09
1.
06
1.
03
1
0
1.
# of Core Positions on Chip
© 2004, Kevin Skadron
100
19
© 2004, Kevin Skadron
Conclusions
•
L1 caches will determine the WID mean frequency.
Variations in other units will not directly affect the
frequency distribution
•
Considering wire delay in CP model causes device
variations to have less of an impact on the
frequency distribution
•
WID variations do not result in significant C2C
frequency differences
•
At 45nm, C2C sub-threshold leakage variation may
be as much as 45%
20