Transcript 35% Wiring

Techniques to Mitigate the Effects of
Congenital Faults in Processors
Smruti R. Sarangi
Process Variation
Corner rounding, edge shortening
(courtesy IBM Microelectronics)
2
Smruti R. Sarangi
Semiconductor
Fabrication facility
(courtesy tabalcoaching.com)
3
Smruti R. Sarangi
Photolithography Unit
(Courtesy Upenn)
4
Smruti R. Sarangi
Basic Lithographic Process
 The source of light is typically a argon-flouride laser
 The light passes through an array of lenses to reach the
silicon substrate
 The resolution limit is given by:
R = k1λ / NA
NA = n sin θ
 To decrease the resolution we need to :
 Decrease the wavelength
 Increase the refractive index
5
Smruti R. Sarangi
Parameter Variation
Parameter Variation
P
Process
Threshold Voltage – Vt
V
Supply Voltage
T
Temperature
Transistor Length – Leff
6
Smruti R. Sarangi
Why is Variation a Problem ?
Unpredictability of Vt , Leff and T implies :
  Lower chip frequency and higher leakage
courtesy Shekhar Borkar, Intel
7
Smruti R. Sarangi
Implications on Design Decisions
 Static timing analysis not possible
 Overly conservative designs
 Chips too slow
 Performance of a generation lost
 Possible solution
 Clock the chip at an unsafe frequency
 Tolerate resulting timing errors
 Reduce timing errors
 Architectural techniques
 Circuit techniques
8
Smruti R. Sarangi
Overview
Model for Process Variation
Model for Timing Errors due to
Process Variation
Techniques to
Tolerate Timing Errors
Techniques to
Reduce Timing Errors
Dynamic Optimization
9
Smruti R. Sarangi
Process Variation
Process Variation
Systematic Variation
Random Variation
Lens aberrations
Mask deformities
Thickness variation in CMP
Photo-lithographic effects
Variable dopant density
Line edge roughness
10
Smruti R. Sarangi
Modeling Systematic Variation
Break into a million cells
Variation Map
1000
1000
11
Smruti R. Sarangi
Systematic and Random Variation
 Distribution of systematic components
 Normal distribution
Normal Distribution
Spatial Correlation
Multi-variate
Normal Distribution
 Superimpose random variation on top of
systematic
12
Smruti R. Sarangi
Overview
Model for Process Variation
Model for Timing Errors due to
Process Variation
Techniques to
Tolerate Timing Errors
ISQED ‘07
Techniques to
Reduce Timing Errors
Dynamic Optimization
13
Smruti R. Sarangi
Timing Errors
P(E) = 1 – cdf(tclk)
Timing errors
Distribution of path delays
in pipe stage: No variation
Distribution of path delays
in pipe stage: With variation
14
Smruti R. Sarangi
Model for Timing Errors
Basic assumptions
 A structure consists of many critical paths
 The critical path depends on the input
 critical path delay > clock period  timing error
 clock period = delay of the longest critical path at
 maximum temperature
 no variation
 All pipeline stages are tightly designed  0 slack
15
Smruti R. Sarangi
Paths in a Pipeline Stage
t
Timing errors
1
f
pdf(t)  cdf (t)
Error rate: PE (t) = 1 – cdf(t)
16
Smruti R. Sarangi
Basic Kinds of Structures
Logic
Memory
 Heterogeneous critical paths
 ALUs, comparators, sense-amps
 Homogenous critical paths
 SRAMs, CAMs
Mixed
 x% memory and (100-x)% logic
 Used to model renamer, wakeup/select
17
Smruti R. Sarangi
Logic
Critical Path
35% Wiring
65% Gates
Elmore Delay Model
Alpha Power Law
Tg 
LeffVDD
 (T )(VDD  Vth)
18
Smruti R. Sarangi
Logic Delay
Distribution of path delays – no variation
dwire + dgate = 1
Dwire
Dvarlogic = (d
logi + * dgate)* Dlogi
c
+dgatec*Dextra
Distribution of
path delays
with variation
Relative gate delay
due to systematic
variation in P,V, T
Delay due to variation
in the random and syst.
component within a stage
 Obtain Dlogic using a timing analysis tool
19
Smruti R. Sarangi
Memory Delay
Memory Cell
Delay dist.
Memory Line
 Use Kirchoff’s equations
 Long channel trans. equations
 Multi-variable Taylor expansion
extend analysis
done by Roy et. al.
IEEE TCAD ‘05
max. distribution
Delayline = max(Delaycell)
20
Smruti R. Sarangi
Combined Error Model
We have the delay distributions – cdf(t) –
for memory and logic with variation
For each structure
 per access, P(E) = 1 – cdf(t)
 P(E) per inst. = P(E) , =accesses/inst.
 Combined error rate per instruction
P(E)total =  P(E)
21
Smruti R. Sarangi
Validation – Logic
S. Das et. al. ‘05
22
Smruti R. Sarangi
Overview
Model for Process Variation
Model for Timing Errors due to
Process Variation
Techniques to
Tolerate Timing Errors
Techniques to
Reduce Timing Errors
Dynamic Optimization
23
Smruti R. Sarangi
Variation Aware Timing
Speculation (VATS)
Multicore
Chip
Unsafe
frequency
Checker
Error free:
- Lower freq
- Safe design
Diva
Checker
Processor
Core
L0 Cache
Razor Latches
L1 Cache
24
Smruti R. Sarangi
Other VATS Checkers
TIMERRTOL – Uht et. al.
Razor – Dan Ernst et. al., MICRO 2003
X-Checker – X. Vera et. al, SELSE 2006
X-Pipe – X. Vera et. al., ASGI 2006
Sato and Arita, COSLP 2003
25
Smruti R. Sarangi
Overview
Model for Process Variation
Model for Timing Errors due to
Process Variation
Techniques to
Tolerate Timing Errors
Submitted to
ISCA ‘07
Techniques to
Reduce Timing Errors
Dynamic Optimization
26
Smruti R. Sarangi
Errror Rate(PE)
Tilt
f
frequency
Shift
Errror Rate(PE)
Error Rate(PE)
Basic Mechanisms – Shift and Tilt
Before
f
After
frequency
f
Before
After
frequency
27
Smruti R. Sarangi
Architectural Mechanisms
Resizable issue queue
(Albonesi et. al.)
 switch pass trans. off
 smaller queue
 shifts the error rate curve
Original
New error
rate
SRAM/CAM array
Pass Transistors
SRAM/CAM array
Pass Transistors
SRAM/CAM array
Sense Amps
28
Smruti R. Sarangi
Gate Sizing
Transistor Width – W
Delay  A + B/W
Power  W
Make faster paths
slower to save power
Gate Sizing
Original path
delay dist.
29
Smruti R. Sarangi
Optimization: Replicate ALUs
Difference in Error Rate
 Tradeoff is power vs errors
 IDEA : Switch between the two ALUs
 Use gate sized ALU if it is not timing critical and vice versa
30
Smruti R. Sarangi
 Adaptive Body Bias (ABB) – Vbb
 Vbb  Delay Leakage
 Vbb  Delay Leakage
Error Rate(PE)
Fine Grain ABB and ASV
 Adaptive Supply Voltage (ASV) -- Vdd
 Vdd  Delay Leakage
Dynamic
f
frequency
Multicore
Chip
Vary:
Supply Voltage(ASV)
Body Voltage (ABB)
Core
31
Smruti R. Sarangi
Overview
Model for Process Variation
Model for Timing Errors due to
Process Variation
Techniques to
Tolerate Timing Errors
Techniques to
Reduce Timing Errors
Dynamic Optimization
32
Smruti R. Sarangi
Dynamic Behavior
Temperature
Activity Factors
33
Smruti R. Sarangi
Formulate an Optimization Problem
Optimization
Input
Constraints
Output
Goals
Constraints
 Temperature – At all points T < TMAX
 Power – Total core power < PMAX
 Error – Total errors < ErrMAX
Goal – Maximize performance
34
Smruti R. Sarangi
Outputs
Outputs: 1 + 30 + 1 + 1 = 33
ALU
 15 ABB/ASV regions
Vdd
Vbb
f
 30 values of (Vdd, Vbb)
 33 outputs
 f, Vdd, Vbb can take
many values
 Very large state
space
Issue queue
size
35
Smruti R. Sarangi
Dimensionality Reduction
Find the max. frequency that each stage can support
Find the slowest stage
This is the core frequency
Minimize power in the rest of the units
Minimum Frequency
Max. Frequency




core frequency
1
2
3
4
5
Stages
6
7
36
Smruti R. Sarangi
Inputs
Phase
Heat sink cycle
Inputs : , TH, Vt0, Rth, Kleak
activity factor
accesses/cycle
Forever
Heat sink
Thermal
temperature resistance
Constant in
Leakage eqn.
37
Smruti R. Sarangi
Optimization Overview
fcore
min
fcore
Inputs
f(1)
Freq. Algorithm
Inputs
Inputs
f(15)
Freq. Algorithm
Power
Algorithm
Power
Algorithm
Inputs
Vdd
Vbb
Vdd
Vbb
38
Smruti R. Sarangi
Fuzzy Logic Based Algorithm
Fuzzy Logic
Exhaustive
Search
based
Algorithm
(Freq/Power)
Inputs
+ Very fast computation times
- Computationally expensive
+ Incorporates detailed models
- Requires detailed models
- Slight inaccuracy
+ Accurate Results
39
Smruti R. Sarangi
Final Picture
fcore
min
fcore
Inputs
f(1)
Fuzzy
SubController1
Inputs
Inputs
f(15)
Fuzzy
SubController15
Fuzzy
SubController1
Fuzzy
SubController15
Inputs
Vdd
Vbb
Vdd
Vbb
40
Smruti R. Sarangi
Timeline
Heat Sink Cycle  2-3 secs
Phase  120 ms
Phase
t

20 s 6 s 10 s
New Phase
Detected
1 step
Test configuration
2 ms

STOP
Retuning Cycles
0.5 s
2 ms
Bring to chosen working point
Run Fuzzy Controller Algorithm
Measure IPC and i
41
Smruti R. Sarangi
Results
42
Smruti R. Sarangi
Evaluation Framework
Processor Modeled
Athlon 64 floorplan
3-wide processor
12 stage pipeline
45 nm, Vdd = 1 V, 6 GHz
Sherwood phase
detector (ISCA ’03)
10 SpecInt and 10 SpecFp
benchmarks, 1 billion insts.
Core C
Core C
Core C
Core C
4-core private L2 cache
 Variation Modeling
 PVT maps for 100 dies
 Fuzzy controller
 10,000 training examples
 25 rules
43
Smruti R. Sarangi
Terminology
Baseline
Proc. with variation effects
TS
Baseline+DIVA checker
TS+FU
TS + FU replication
TS+Queue
TS + issue-queue resizing
TS+ABB+ASV Both circuit level techniques
TS+Dyn
TS + dynamic optimization
TS+All
TS+FU+Queue+ABB+ASV+dyn
NoVar
Without any variation effects
44
Smruti R. Sarangi
Error Plots
Maximum Perf.
point
Maximum Perf.
point
ErrMAX
TS only
ALL = TS + ABB + ASV
45
Smruti R. Sarangi
Execution Point
constant
constant
errorpower
Power
frequency
power
constant
freq.
power
errors
frequency
errors
Frequency
Log (Timing Error Rate)
46
Smruti R. Sarangi
Frequency
Oracle
Fuzzy
49%
23%
Static
 Frequency increase: 10 – 49 %
 50% of the gains are due to dynamic opts.
47
Smruti R. Sarangi
Performance
34%
19%
Static
 We can nullify effects of variation and even speedup
 The performance loss due to fuzzy logic is minimal
48
Smruti R. Sarangi
Conclusion
 Do not design processors for worst case
 Need to tolerate variation induced errors
 Contributions




Model for timing errors
New framework for tradeoffs in P, f and P(E)
High dimensional dynamic adaptation
Eval. of arch. techniques to tolerate/mitigate P(E)
 10-49% increase in frequency
 7-34% increase in performance
49
Smruti R. Sarangi
Conclusion II
CADRE (DSN’06)
 Arch. support to make a board level computer
cycle-accurate deterministic
Phoenix (MICRO’06 & Top Picks’07)
 arch. support to detect and patch processor
design bugs
50
Smruti R. Sarangi
BACKUP
51
Smruti R. Sarangi
Algorithm
Inputs :
 f, Vdd, Vbb

Pdyn
Verify T < TMAX
T
, Rth, TH
Pleak
Verify Err < ErrMAX
Find fmax
Delay
, Pleak0, Vt
Vt
Error Model
52
Smruti R. Sarangi
Memory Delay
Tmem 
WL
VDD
1
Icell
Y
 Solve for Icell using long
channel eqns.
 Icell = f(VtX,VtY,LX,LY)
 VtX,VtY,LX and LY are
gaussian variables
Icell
X
BL
BR
 vtx, vty, lx, ly are the systematic components
 vtx, vty, lx, ly are the random components
53
Smruti R. Sarangi
Memory Delay - II
 Find a distribution for Tmem
 Tmem is a function of four gaussian variables
 Model Tmem as a normal distribution
 Find the  and  for Tmem using multi-variable Taylor
expansion
 This is the access time dist. for 1 bit
 A typical entry has 32-128 bits
 Find the max distribution of 32-128 normal variables
 Error probability = 1 – cdf(tmem)
54
Smruti R. Sarangi
Fuzzy Low Level
X
Xj

i
ij
W

y
ij
yi
y
j
Wij = exp[ -(( -
)/ )2]
W y
W
i i
Final Output
i
W
Wi i  Wij
j
55
Smruti R. Sarangi
Recovery Penalty
56
Smruti R. Sarangi
Validation – Memory
57
Smruti R. Sarangi
Power
Max Power Limit
 Proc. with no variation – 25 W, PMAX = 30 W
58
Smruti R. Sarangi