Transcript 35% Wiring
Techniques to Mitigate the Effects of
Congenital Faults in Processors
Smruti R. Sarangi
Process Variation
Corner rounding, edge shortening
(courtesy IBM Microelectronics)
2
Smruti R. Sarangi
Semiconductor
Fabrication facility
(courtesy tabalcoaching.com)
3
Smruti R. Sarangi
Photolithography Unit
(Courtesy Upenn)
4
Smruti R. Sarangi
Basic Lithographic Process
The source of light is typically a argon-flouride laser
The light passes through an array of lenses to reach the
silicon substrate
The resolution limit is given by:
R = k1λ / NA
NA = n sin θ
To decrease the resolution we need to :
Decrease the wavelength
Increase the refractive index
5
Smruti R. Sarangi
Parameter Variation
Parameter Variation
P
Process
Threshold Voltage – Vt
V
Supply Voltage
T
Temperature
Transistor Length – Leff
6
Smruti R. Sarangi
Why is Variation a Problem ?
Unpredictability of Vt , Leff and T implies :
Lower chip frequency and higher leakage
courtesy Shekhar Borkar, Intel
7
Smruti R. Sarangi
Implications on Design Decisions
Static timing analysis not possible
Overly conservative designs
Chips too slow
Performance of a generation lost
Possible solution
Clock the chip at an unsafe frequency
Tolerate resulting timing errors
Reduce timing errors
Architectural techniques
Circuit techniques
8
Smruti R. Sarangi
Overview
Model for Process Variation
Model for Timing Errors due to
Process Variation
Techniques to
Tolerate Timing Errors
Techniques to
Reduce Timing Errors
Dynamic Optimization
9
Smruti R. Sarangi
Process Variation
Process Variation
Systematic Variation
Random Variation
Lens aberrations
Mask deformities
Thickness variation in CMP
Photo-lithographic effects
Variable dopant density
Line edge roughness
10
Smruti R. Sarangi
Modeling Systematic Variation
Break into a million cells
Variation Map
1000
1000
11
Smruti R. Sarangi
Systematic and Random Variation
Distribution of systematic components
Normal distribution
Normal Distribution
Spatial Correlation
Multi-variate
Normal Distribution
Superimpose random variation on top of
systematic
12
Smruti R. Sarangi
Overview
Model for Process Variation
Model for Timing Errors due to
Process Variation
Techniques to
Tolerate Timing Errors
ISQED ‘07
Techniques to
Reduce Timing Errors
Dynamic Optimization
13
Smruti R. Sarangi
Timing Errors
P(E) = 1 – cdf(tclk)
Timing errors
Distribution of path delays
in pipe stage: No variation
Distribution of path delays
in pipe stage: With variation
14
Smruti R. Sarangi
Model for Timing Errors
Basic assumptions
A structure consists of many critical paths
The critical path depends on the input
critical path delay > clock period timing error
clock period = delay of the longest critical path at
maximum temperature
no variation
All pipeline stages are tightly designed 0 slack
15
Smruti R. Sarangi
Paths in a Pipeline Stage
t
Timing errors
1
f
pdf(t) cdf (t)
Error rate: PE (t) = 1 – cdf(t)
16
Smruti R. Sarangi
Basic Kinds of Structures
Logic
Memory
Heterogeneous critical paths
ALUs, comparators, sense-amps
Homogenous critical paths
SRAMs, CAMs
Mixed
x% memory and (100-x)% logic
Used to model renamer, wakeup/select
17
Smruti R. Sarangi
Logic
Critical Path
35% Wiring
65% Gates
Elmore Delay Model
Alpha Power Law
Tg
LeffVDD
(T )(VDD Vth)
18
Smruti R. Sarangi
Logic Delay
Distribution of path delays – no variation
dwire + dgate = 1
Dwire
Dvarlogic = (d
logi + * dgate)* Dlogi
c
+dgatec*Dextra
Distribution of
path delays
with variation
Relative gate delay
due to systematic
variation in P,V, T
Delay due to variation
in the random and syst.
component within a stage
Obtain Dlogic using a timing analysis tool
19
Smruti R. Sarangi
Memory Delay
Memory Cell
Delay dist.
Memory Line
Use Kirchoff’s equations
Long channel trans. equations
Multi-variable Taylor expansion
extend analysis
done by Roy et. al.
IEEE TCAD ‘05
max. distribution
Delayline = max(Delaycell)
20
Smruti R. Sarangi
Combined Error Model
We have the delay distributions – cdf(t) –
for memory and logic with variation
For each structure
per access, P(E) = 1 – cdf(t)
P(E) per inst. = P(E) , =accesses/inst.
Combined error rate per instruction
P(E)total = P(E)
21
Smruti R. Sarangi
Validation – Logic
S. Das et. al. ‘05
22
Smruti R. Sarangi
Overview
Model for Process Variation
Model for Timing Errors due to
Process Variation
Techniques to
Tolerate Timing Errors
Techniques to
Reduce Timing Errors
Dynamic Optimization
23
Smruti R. Sarangi
Variation Aware Timing
Speculation (VATS)
Multicore
Chip
Unsafe
frequency
Checker
Error free:
- Lower freq
- Safe design
Diva
Checker
Processor
Core
L0 Cache
Razor Latches
L1 Cache
24
Smruti R. Sarangi
Other VATS Checkers
TIMERRTOL – Uht et. al.
Razor – Dan Ernst et. al., MICRO 2003
X-Checker – X. Vera et. al, SELSE 2006
X-Pipe – X. Vera et. al., ASGI 2006
Sato and Arita, COSLP 2003
25
Smruti R. Sarangi
Overview
Model for Process Variation
Model for Timing Errors due to
Process Variation
Techniques to
Tolerate Timing Errors
Submitted to
ISCA ‘07
Techniques to
Reduce Timing Errors
Dynamic Optimization
26
Smruti R. Sarangi
Errror Rate(PE)
Tilt
f
frequency
Shift
Errror Rate(PE)
Error Rate(PE)
Basic Mechanisms – Shift and Tilt
Before
f
After
frequency
f
Before
After
frequency
27
Smruti R. Sarangi
Architectural Mechanisms
Resizable issue queue
(Albonesi et. al.)
switch pass trans. off
smaller queue
shifts the error rate curve
Original
New error
rate
SRAM/CAM array
Pass Transistors
SRAM/CAM array
Pass Transistors
SRAM/CAM array
Sense Amps
28
Smruti R. Sarangi
Gate Sizing
Transistor Width – W
Delay A + B/W
Power W
Make faster paths
slower to save power
Gate Sizing
Original path
delay dist.
29
Smruti R. Sarangi
Optimization: Replicate ALUs
Difference in Error Rate
Tradeoff is power vs errors
IDEA : Switch between the two ALUs
Use gate sized ALU if it is not timing critical and vice versa
30
Smruti R. Sarangi
Adaptive Body Bias (ABB) – Vbb
Vbb Delay Leakage
Vbb Delay Leakage
Error Rate(PE)
Fine Grain ABB and ASV
Adaptive Supply Voltage (ASV) -- Vdd
Vdd Delay Leakage
Dynamic
f
frequency
Multicore
Chip
Vary:
Supply Voltage(ASV)
Body Voltage (ABB)
Core
31
Smruti R. Sarangi
Overview
Model for Process Variation
Model for Timing Errors due to
Process Variation
Techniques to
Tolerate Timing Errors
Techniques to
Reduce Timing Errors
Dynamic Optimization
32
Smruti R. Sarangi
Dynamic Behavior
Temperature
Activity Factors
33
Smruti R. Sarangi
Formulate an Optimization Problem
Optimization
Input
Constraints
Output
Goals
Constraints
Temperature – At all points T < TMAX
Power – Total core power < PMAX
Error – Total errors < ErrMAX
Goal – Maximize performance
34
Smruti R. Sarangi
Outputs
Outputs: 1 + 30 + 1 + 1 = 33
ALU
15 ABB/ASV regions
Vdd
Vbb
f
30 values of (Vdd, Vbb)
33 outputs
f, Vdd, Vbb can take
many values
Very large state
space
Issue queue
size
35
Smruti R. Sarangi
Dimensionality Reduction
Find the max. frequency that each stage can support
Find the slowest stage
This is the core frequency
Minimize power in the rest of the units
Minimum Frequency
Max. Frequency
core frequency
1
2
3
4
5
Stages
6
7
36
Smruti R. Sarangi
Inputs
Phase
Heat sink cycle
Inputs : , TH, Vt0, Rth, Kleak
activity factor
accesses/cycle
Forever
Heat sink
Thermal
temperature resistance
Constant in
Leakage eqn.
37
Smruti R. Sarangi
Optimization Overview
fcore
min
fcore
Inputs
f(1)
Freq. Algorithm
Inputs
Inputs
f(15)
Freq. Algorithm
Power
Algorithm
Power
Algorithm
Inputs
Vdd
Vbb
Vdd
Vbb
38
Smruti R. Sarangi
Fuzzy Logic Based Algorithm
Fuzzy Logic
Exhaustive
Search
based
Algorithm
(Freq/Power)
Inputs
+ Very fast computation times
- Computationally expensive
+ Incorporates detailed models
- Requires detailed models
- Slight inaccuracy
+ Accurate Results
39
Smruti R. Sarangi
Final Picture
fcore
min
fcore
Inputs
f(1)
Fuzzy
SubController1
Inputs
Inputs
f(15)
Fuzzy
SubController15
Fuzzy
SubController1
Fuzzy
SubController15
Inputs
Vdd
Vbb
Vdd
Vbb
40
Smruti R. Sarangi
Timeline
Heat Sink Cycle 2-3 secs
Phase 120 ms
Phase
t
20 s 6 s 10 s
New Phase
Detected
1 step
Test configuration
2 ms
STOP
Retuning Cycles
0.5 s
2 ms
Bring to chosen working point
Run Fuzzy Controller Algorithm
Measure IPC and i
41
Smruti R. Sarangi
Results
42
Smruti R. Sarangi
Evaluation Framework
Processor Modeled
Athlon 64 floorplan
3-wide processor
12 stage pipeline
45 nm, Vdd = 1 V, 6 GHz
Sherwood phase
detector (ISCA ’03)
10 SpecInt and 10 SpecFp
benchmarks, 1 billion insts.
Core C
Core C
Core C
Core C
4-core private L2 cache
Variation Modeling
PVT maps for 100 dies
Fuzzy controller
10,000 training examples
25 rules
43
Smruti R. Sarangi
Terminology
Baseline
Proc. with variation effects
TS
Baseline+DIVA checker
TS+FU
TS + FU replication
TS+Queue
TS + issue-queue resizing
TS+ABB+ASV Both circuit level techniques
TS+Dyn
TS + dynamic optimization
TS+All
TS+FU+Queue+ABB+ASV+dyn
NoVar
Without any variation effects
44
Smruti R. Sarangi
Error Plots
Maximum Perf.
point
Maximum Perf.
point
ErrMAX
TS only
ALL = TS + ABB + ASV
45
Smruti R. Sarangi
Execution Point
constant
constant
errorpower
Power
frequency
power
constant
freq.
power
errors
frequency
errors
Frequency
Log (Timing Error Rate)
46
Smruti R. Sarangi
Frequency
Oracle
Fuzzy
49%
23%
Static
Frequency increase: 10 – 49 %
50% of the gains are due to dynamic opts.
47
Smruti R. Sarangi
Performance
34%
19%
Static
We can nullify effects of variation and even speedup
The performance loss due to fuzzy logic is minimal
48
Smruti R. Sarangi
Conclusion
Do not design processors for worst case
Need to tolerate variation induced errors
Contributions
Model for timing errors
New framework for tradeoffs in P, f and P(E)
High dimensional dynamic adaptation
Eval. of arch. techniques to tolerate/mitigate P(E)
10-49% increase in frequency
7-34% increase in performance
49
Smruti R. Sarangi
Conclusion II
CADRE (DSN’06)
Arch. support to make a board level computer
cycle-accurate deterministic
Phoenix (MICRO’06 & Top Picks’07)
arch. support to detect and patch processor
design bugs
50
Smruti R. Sarangi
BACKUP
51
Smruti R. Sarangi
Algorithm
Inputs :
f, Vdd, Vbb
Pdyn
Verify T < TMAX
T
, Rth, TH
Pleak
Verify Err < ErrMAX
Find fmax
Delay
, Pleak0, Vt
Vt
Error Model
52
Smruti R. Sarangi
Memory Delay
Tmem
WL
VDD
1
Icell
Y
Solve for Icell using long
channel eqns.
Icell = f(VtX,VtY,LX,LY)
VtX,VtY,LX and LY are
gaussian variables
Icell
X
BL
BR
vtx, vty, lx, ly are the systematic components
vtx, vty, lx, ly are the random components
53
Smruti R. Sarangi
Memory Delay - II
Find a distribution for Tmem
Tmem is a function of four gaussian variables
Model Tmem as a normal distribution
Find the and for Tmem using multi-variable Taylor
expansion
This is the access time dist. for 1 bit
A typical entry has 32-128 bits
Find the max distribution of 32-128 normal variables
Error probability = 1 – cdf(tmem)
54
Smruti R. Sarangi
Fuzzy Low Level
X
Xj
i
ij
W
y
ij
yi
y
j
Wij = exp[ -(( -
)/ )2]
W y
W
i i
Final Output
i
W
Wi i Wij
j
55
Smruti R. Sarangi
Recovery Penalty
56
Smruti R. Sarangi
Validation – Memory
57
Smruti R. Sarangi
Power
Max Power Limit
Proc. with no variation – 25 W, PMAX = 30 W
58
Smruti R. Sarangi