Transcript Slide

1
VARIUS: A Model of Process
Variation and Resulting Timing
Errors for Microarchitects
Sarangi et al
Prateeksha Satyamoorthy
CS 8501
2
Parameter Variation
• Deviation of process, voltage and temperature values from
specifications
• Technology scaling beyond 90nm => higher levels of device
parameter variations => design problem from deterministic to
probabilistic
• Key process parameters: Vth and Leff
▫ Determine transistor and gate speeds
• Vth variation impacts:
▫ Frequency , leakage power
• Variation => some sections of chip are slower than others =>
corresponding circuits suffer timing errors
• Lose benefits from scaling to a technology generation
3
Cross-section of a MOSFET
• The clock cycle of a chip is determined by the delay of its
longest path, usually referred to as the critical path
4
Impact of Vth and Leff
Source: www.ocw.mit.edu
5
Impact of process variation on processor frequency
6
Varius
• To study parameter variation affects timing
errors in high-performance processors
▫ A novel model for process variation
 Within-die parameter variation (WID)
▫ A novel model for timing errors
7
Process Variation Model
• Systematic variation
▫ Exhibits spatial correlation
▫ Assumptions:
 position independence, isotropy
▫ Spherical model
- initially linear
- then tapers off
to zero
[range] - no correlation
at this distance
8
They finally assume phi = 0.5
9
• Random variation
▫ Level of individual transistors
▫ Assumption: Vth and Leff normally distributed with
zero mean, uncorrelated
• Final σ and
▫ Total WID variation is normally distributed, so
=½
Assumptions
10
VATS - Model for variation-induced timing
errors in processor pipelines
Pdf – probability density function
All paths that have become
longer than 1 generate errors
PE – probability of error
Cdf – cumulative density function
11
Timing errors in logic
Dvarlogic distribution
Cdfvarlogic
Error rate
12
Timing errors in SRAM Memory
More errors as
paths fail
First path
fails
Distribution
13
Validation
180 nm process
14
Validation
1. Generate Vth and Leff variation map
2. Apply timing error model to get error rate vs. frequency
for each pipeline stage
15
How Varius is used
• Variation-Aware Dynamic Voltage/Frequency Scaling - Herbert et al :
Vth and Leff are generated and the values are used to determine the
maximum frequency and subthreshold leakage of each core across Vdd
and temperature. Variability-aware schemes maintain significant
improvement of power/throughput over the variability-unaware ones,
upto 9.9%
• Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors Feng et al : Showed that for CMPs without damage profiles, temperature
sensors and performance counters are inadequate in environment with
significant process variations, so they propose low-level damage sensors
• EVAL: Utilizing Processors with Variation-Induced Timing Errors –
Sarangi et al : design for closer to nominal values, and provide some
transistor budget to tolerate unavoidable variation induced errors
• Facelift: Hiding and Slowing Down Aging in Multicores - Tiwari et al :
determine how variation impacts the delay of each gate of each critical
path. The slowest of the critical paths in a processor determines the
processor frequency.
16
Related Work, Contribution
• Delay of an inverter from Vth and Leff
• Mukhopadhyay et al. proposed models for timing errors in SRAM memory
due to random Vth variation. The VATS model, is extension of their model
of access time errors by
▫
▫
▫
▫
including systematic variation effects,
considering variation in Leff,
modeling the maximum access time of a line of SRAM rather than a single cell
using the alpha-power model that uses an [alpha] equal to 1.3
• Memik et al. modeled errors in SRAM memory due to cross-talk noise as
they overclock circuits. They use high degrees of overclocking — twice the
nominal frequency and more. In the less than 25% overclocking regime that
we consider, such cross-talk errors are negligible. For very small feature-size
technologies, however, the situation may change.
• Ernst et al. and Karl et al. measured the error rate of a multiplier and an
SRAM circuit, respectively, by reducing the voltage beyond safe limits to
save power. They plot curves for error rate versus voltage. In this paper, we
outlined a procedure to extract the distribution of path delays from these
curves, and validated parts of our model by comparing it against their
curves.