Process Variation Mitigation in Pipelines
Download
Report
Transcript Process Variation Mitigation in Pipelines
Microprocessor Design in the Face
of Process Variations
Csaba Andras Moritz
Electrical & Computer Engineering
University of Massachusetts, Amherst
Csaba Andras Moritz © 2007
Outline
Introduction
Impact of Process Variations
A Process Variation Resilient Pipeline
A Process Variation Resilient Adaptive Cache
Architecture
Results
Conclusion
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Introduction
As technology scales, the feature size
reduces thereby requiring a sophisticated
fabrication process.
The process variations increase as the
feature reduces due to the difficulty of
fabricating small structures consistently
across a die or a wafer.
These variations cause mismatches
between identical structures.
Device and interconnect variation trends
With respect to circuits, this translates to
a change in all devices or interconnects
parameters from their mean value.
for different technology generations
Trends since 2007 even worse
Sani Nassif, etl. “Models of Process Variations in Device and Interconnect”. IEEE Press 2000
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Introduction
Two main sources of process variation:
Physical factors (intrinsic variation)
Environmental factors (dynamic variation)
The physical factors are permanent and result from limitations in the
fabrication process
Effective Channel Length (Geometric Variations):
Threshold Voltage (Electrical Parameter Variation):
Variation in device geometry
Random dopant fluctuations
changes in oxide thickness
The environmental factors depend on the operation of the circuit and
include variations in:
Imperfections in photolithography (mask, lens, photo system deviations)
Temperature, Power Supply, Switching Activity
The performance and power consumption of integrated circuits can be
greatly affected.
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Pipeline design
10-20 gate delays typically
Smaller
in very high clock-speed designs
Let us review variation with a NAND
chain
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
15 NAND gates and NAND2
15 NAND Gates
A = “1”
B = “0”→“1”
C = “1”→“0”
“1”
“1”
“1”
Cload
VBP
C
A
VBN
VBN
B
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Assumptions
The designs we show target a 32-nm technology process where leakage
and process variation started to be significant
Problem exacerbated in later nodes
In the nominal delay we assume there is no process variations impact on
the pipeline stage.
In worst-case we assume the worst values of the parameter variations at
each transistor that will result in the maximum delay or power
consumption.
A body bias is a voltage applied between the source or drain of a
transistor and its substrate, effectively changing the transistor’s Vth.
Depending on the polarity of the voltage applied, Vth increases or
decreases. If it increases, the transistor becomes less leaky and slower
(reverse body bias); if it decreases, the transistor becomes leakier and
faster (forward body bias).
Table 1 shows parameter values of process variations for different cases.
Figure 3 and Table 2 show delay of the pipeline at different body bias
voltages. Figure 4 and Table 3 show average power consumption of the
pipeline stage with different body bias voltages.
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Device parameter variations
Leff, Vdd, and Vth
Table 1. Parameter values for different cases
Threshold
Voltage
Effective Channel
Length
(Leff)
Supply
Voltage
(Vdd)
(Vthn)
(Vthp)
Nominal
25.32 nm
0.90V
0.20V
-0.21V
Best-case
20.26 nm
0.96V
0.18V
-0.19V
Worstcase
30.38 nm
0.84V
0.22V
-0.23V
Case
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Delay of
Pipeline Stage
Table 2. Delay of the pipeline stage.
Nominal Body Bias
VBN
VBP
Nominal
0V
0.9V
1.363 ns
Bestcase
0V
0.9V
0.646 ns
Worstcase
0V
0.9V
3.811 ns
Case
“A 28nm SOI test chip by
STMicroelectronics
ARM Cortex-A57 used forward
body bias
to increase the peak frequency to
3GHz. By operating at a reduced
nominal 0.5V supply and FBB,
the circuitry ran at a much slower
300MHz but with reduced power
consumption.”
Source: ST Microelectronics
Delay
Case
Forward Body Bias
Delay
VBN
VBP
Nominal
0.5V
0.4V
1.271 ns
Bestcase
0.5V
0.4V
0.631 ns
Worstcase
0.5V
0.4V
3.389 ns
Case
Reverse Body Bias
Delay
VBN
VBN
Nominal
-0.5V
1.4V
1.608 ns
Bestcase
-0.5V
1.4V
0.696 ns
Worstcase
-0.5V
1.4V
4.731 ns
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Delay of Pipeline Stage
Delay (ns)
Delay of Pipelien Stage
Nominal-case
5
4.5
4
3.5
3
worst-case
Best-case
2.5
2
1.5
1
0.5
0
Forward Body Bias
Nominal Body Bias
Reverse Body Bias
Body Bias Volatge
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Power of
Pipe
Stage
Table 3. Average power of the pipeline stage.
Nominal Body Bias
Case
Average
Power
VBN
VBP
Bestcase
0V
0.9V
7.843 μW
Nominal
0V
0.9V
22.45 μW
Worstcase
0V
0.9V
219.4 μW
Case
Forward Body Bias
Average
Power
VBN
VBP
Bestcase
0.5V
0.4V
13.00 μW
Nominal
0.5V
0.4V
30.32 μW
Worstcase
0.5V
0.4V
294.5 μW
Case
Reverse Body Bias
Average
Power
VBN
VBN
Bestcase
-0.5V
1.4V
7.772 μW
Nominal
-0.5V
1.4V
19.68 μW
Worstcase
-0.5V
1.4V
178.7 μW
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Average Power with BB
Average Power of Pipeline Stage
Nominal-case
350
Best-case
Worst-case
Average Power (μW)
300
250
200
150
100
50
0
Forward Body Bias
Nominal Body Bias
Reverse Body Bias
Body Bias voltage
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Effect of BB on delay and power
Table 4. Effect of Body Bias Technique.
Body Bias Voltages
Case
Forward
Body
Bias
Nominal
Reverse
Body
Bias
Delay
(ns)
Average Power
(μW)
VBN
VBP
0.85V
-0.6V
1.087
677.0
0.65V
-0.1V
1.271
410.9
0.50V
0.4V
1.275
30.32
0V
0.9V
1.363
22.45
-0.5V
1.4V
1.608
19.68
-1.0V
1.9V
1.941
17.59
-1.5V
2.3V
2.346
16.94
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Delay Distribution
Nominal Body Bias, VBN=0V, VBP=0.9V
Probability Density Function
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
7
1.
66
1.
62
1.
58
1.
54
1.
5
1.
46
1.
42
1.
38
1.
34
1.
3
1.
26
1.
22
1.
18
1.
14
1.
0
0
Pipeline Delay (ns)
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
All parameters summary
Table 5. Effect of all parameters on pipeline delay
Maximum (ns)
1.703
Minimum (ns)
1.214
Mean (ns)
1.389
Sigma
0.056
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Power Distribution
Nominal Body Bias, VBN=0V, VBP=0.9V
Probability Density Function
0.16
Nominal
0.14
0.12
0.1
0.08
0.06
0.04
0.02
.6
5
29
.9
4
28
.2
3
28
.5
2
27
.1
.8
1
26
26
.3
9
25
.6
8
24
.9
7
23
.2
6
23
.5
5
22
.8
4
21
.1
3
21
.4
2
20
.7
1
19
0
0
Pipeline Average Power (uW)
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Summary power consumption
Table 6. Effect of all parameters on pipeline power consumptions.
Maximum (uW)
29.65
Minimum (uW)
19.51
Mean (uW)
24.05
Sigma
1.168
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Razor Latches
Latch concept to sample output of a
stage two different times
Compare outputs
If not equal resample inter-stage latch
and delay pipeline by one cycle
Implications?
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Recovery Technique 1:
Global Clock Gating
If any stage detects a timing problem
Stall the entire pipeline for one clock cycle.
Use this additional clock cycle to recompute using
the correct shadow-latch values
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Recovery Technique 2:
Counterflow Pipelining
When a mismatch (between regular and
shadow latch contents) is detected:
Assert
a bubble signal, to specify that the erring
pipeline slot is now to be considered a bubble.
In the subsequent cycle, inject the shadow latch
value into the next stage, allowing the errant
operation to continue with the correct values
Trigger a flush train, traveling backwards from the
errant stage, flushing operations at each stage it
visits
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Process Variation Impact on Memory
Systems
The process variations are expected to become significant in the
smaller geometry transistors commonly used in memories.
Process variations in caches affect the performance of circuits like
Sense amplifiers that require identical device characteristics
SRAM cells that require near-minimum-sized cell stability for large
arrays in embedded, low-power applications
The delay of the address decoders suffer from the process
variations that can result in shorter time left for accessing the
SRAM cells
Question is whether there is a significant delay variation overall that
will drive a change in memory architecture design.
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Motivation
To account for the worst-case scenario we might need to increase the
cache access time by 2 to 3 cycles in conventional design.
1 Cycle
3
2 Cycles
3 Cycles
2.5
IPC
2
1.5
1
0.5
0
bzip
mcf
gcc
vpr
ammp00
art
equake
SPEC2000 Benchmarks
Application performance could be impacted by as much as 30-40%!
These results suggest that process variations must be taken into
consideration
New types of circuits and architectures?
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Introduction
There are several ideas that could be exploited in a
memory system:
reduce performance by operating at a lower clock
frequency (conservative approach)
increase cache access latency assuming worst-case
delay (conservative approach)
variable-delay cache architecture (adaptive approach)
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Cache Organization Overview
The focus of this presentation is on CAM-based caches.
Virtual Address:
31
9 8
Tag
5 4
2 1
0
Word Byte
Bank
16
Banks
Cache Bank
CAM
Tags
Matchline
8 words
Data
32
SRAM
lines
MUX
Data
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Critical Path of CAM-tag Cache
Bank Index
Global Decoder
Tag Array
Tag Compare
ENB
Input Tag Bits
XOR
Search bitline
Stored Tag Bits
Row S.A.
Matchline
0
0
0
0
0
Reference
Source
Wordline
Column MUX
Data Bus
Column S.A.
0
0
0
BL
0
0
BLB
SRAM
LWL
Decoded
word Bits
Data Array
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Experiment Setup
Cadence tool was used to design the circuits at layout level, and HSPICE
simulation used to evaluate the performance.
All the circuits were designed using 32-nm CMOS technology and
simulated with a supply voltage of 0.9V.
Configuration of our 16 KB Low Power Cache
Cache Component
Power Techniques
Bank Decoder
4-input Static NOR gates
Tag Array
10-transistor CAM Cell
Data Array
6T SRAM Cell
Cache line
Wordline Gating
Line decoder
Two level decoding: 1st level 3-input DNAND
gate and 2nd level 2-input NOR gate
Tag & Data Arrays
Cache subbanking (16 banks)
Bank size
1KB
Sense Amplifiers
Alpha latch & Sharing Sense Amps.
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Worst-Case Conditions
Effective Channel Length variation:
Imperfections in photolithography (mask, lens, photo system
deviations)
A 40% variation in Leff is expected within a die [Sani Nassif, IEEE press 2000].
1.6
1.4
Delay (ns)
1.2
1
0.8
0.6
0.4
0.2
0
20.25
21.52
22.78
24.05
25.32
26.58
27.85
29.11
30.38
Leff (nm)
Global Decoder
CAM Cells
Wordline Gating
Column Sense Amps.
Tristate I/O
Row Sense Amps.
SRAM Cells
Total Cache Delay
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Worst-Case Conditions
Effective Channel Length variation:
A small variation in the Leff value causes a change in the
leakage power by as such as 60X from the nominal value.
3
Leakage Power
Dynamic power
26.58
29.11
Total Power (mW)
2.5
2
1.5
1
0.5
0
20.25
21.52
22.78
24.05
25.32
27.85
30.38
Leff (nm)
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Worst-Case Conditions
Threshold Voltage Variation:
Accurate control of Vth is very important for many performance
and power optimizations and for correct execution.
1.8
1.6
1.4
Delay (ns)
1.2
1
0.8
0.6
0.4
0.2
0
0.17
0.176
0.184
0.192
0.2
0.208
0.216
0.224
0.23
Vth (V)
Global Decoder
CAM Cells
Wordline Gating
Column Sense Amps.
Tristate I/O
Row Sense Amps.
SRAM Cells
Total Cache Delay
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Worst-Case Conditions
Threshold Voltage Variation
The impact on leakage power could be as much as 40X.
2.5
Leakage Power
Dynamic Power
Total Power (mW)
2
1.5
1
0.5
0
0.17
0.176
0.184
0.192
0.2
0.208
0.216
0.224
0.23
Vth (V)
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Worst-Case Conditions
Power Supply Variation
One of the most important environmental factors that cause
variations in operating condition is supply voltage.
Voltage variations due to non uniform power-supply distribution,
switching activity, and IR drop;
A total variation of 15% in Vdd was considered with a nominal value
of 0.9V.
Vdd (V)
Delay (ns)
Power (W)
0.83
0.746
0.183
0.86
0.717
0.187
0.90
0.667
0.191
0.93
0.634
0.213
0.97
0.601
0.266
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Expected Conditions
To accurately predict cache critical path delay distribution at the circuit
level, cache delay variability can be studied through Monte-Carlo in
HSPICE circuit simulations.
Monte-Carlo simulations verify model predictions over a wide range of
process and design conditions and provide an estimate for expected
behavior.
We assume parameter variations to be normally distributed with mean
and sigma values derived from PTM and ITRS sources.
Parameter values and σvariations
Technology
Device
Leff
Vth
32nm
NMOS
PMOS
25.32 nm (± 20%)
0.2V (± 7.5%)
-0.2V (± 7.5%)
Vdd
0.9V (± 7.5%)
Temperature
75oC
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Expected Conditions
The distribution of delay of a cache critical path was determined by
performing Monte-Carlo sampling at different supply voltages, threshold
voltages, and transistor lengths.
Leff
Vth
Probability Density
0.4
Vdd
Combine
Nominal
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
0
07
.8
6
0
22
.8
8
8
4
2
4
6
8
2
4
6
8
83 853 868 883 898 .91 929 944 959 974
.
0
0
.
.
.
.
.
.
.
.
0
0
0
0
0
0
0
0
0.
99
52
04
00 .02
.
1
1
Cache Delay (ns)
under the expected condition a large fraction of accesses would be still
close to the nominal value
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Architectural Techniques
How do we design a memory system in the face of process variations
and help mitigate the negative impact on performance?
We can select a cache design using worst case assumptions
ALL VARIATIONS and ALL COMPONENTS on the critical path
Alternatively, we need to design circuits and architectures that would
work adaptively depending on actual delay
Process variation resilient design
Resilience against delays in different parts of the cache
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Proposed Adaptive Cache Architecture
Two phases of operation: classification and execution
F
D
EX
MEM
address
CAM
Tag
Adaptive
Controller
WB
data
Data
Array
Test Mode
Classifier
Delay
Storage
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Classification Phase
During classification phase
The cache is equipped with a built-in-self-test (BIST) technique to
detect speed difference due to process variation.
Each cache line is tested using BIST when the test mode signal is on.
A block is considered medium, slow, failure.
Data Array
Row Address
Delay
Storage
Column MUX
Speed
Information
BIST
Sense
Amplifiers
Test
Mode
Data Out
Operating
Conditions
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Execution Phase
During execution phase
The speed information stored in the delay storage is used to control
sense amplifiers during regular operations of the circuit.
Data Array
Row Address
Delay
Storage
Column MUX
Controller
Column
Address
Sense Amplifiers
Data Out
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Experimental Setup
SimpleScalar parameters for CPU
The adaptive cache architecture
is implemented in the SimpleScalar.
Instruction Window
RUU=16; LSQ=8
Fetch, dispatch, commit width
4
Integer ALU/multi-div
4/1
We have conducted simulations
of SPEC2000 benchmarks using
the adaptive approach.
FP ALU/multi-div
4/1
Number of Banks
16 banks
L1 D-cache Size
16KB, 32-way set-assoc, 32B blocks
L1 I-cache Size
16KB, 32-way set-assoc, 32B blocks
The adaptive cache based on the
delay distribution is determined by
the Monte-Carlo simulation.
L2 Unified Cache Size
128KB, 64-way, 64B blocks, 8cycle
Memory Latency
100 cycles
Memory ports
2
TLB Size
128-entry, fully assoc., 30 cycles
miss penalty
Branch Predictor
Comb. Of bimodal and 2-level
gshare; bimodal size 2048; level 1
1024-entry, history 10; level 2 4096entry (global)
Branch Target Buffer
512-entry, 4-way associative
Return-address-stack
8-entry
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Performance Speedup
Baseline: 3 cycle D-cache with worst-case delay, 16KB total size, 16
banks each 32-way. Out of order 4-way issue.
Adaptive caching scheme: 1% 3 cycle, 24% 2 cycle. 75% 1 cycle cache
line access.
Results below show performance is improved by 9% to 31%!
Conservative
3
Adaptive
2.5
IPC
2
1.5
1
0.5
0
equake
mcf
vpr
crafty
bzip
parser
gcc
ammp
mesa
SPEC2000 Benchmarks
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Sensitivity to Issue Width
Speedup values are normalized with respect to the worst-case delay of
3 cycles.
As we can see, the 8-way issues design benefits more than the 4-way
issues from the adaptive cache architecture.
four issue
1.6
eight issue
1.4
Speedup
1.2
1
0.8
0.6
0.4
0.2
0
equake
mcf
vpr
crafty
bzip
parser
gcc
ammp
mesa
SPEC2000 Benchmarks
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Hardware Required
Hardware required :
BIST circuit
delay storage
control circuitry
We have evaluated the hardware needed for the adaptive cache by
using the Synopsys Design Compiler tool.
Circuit
BIST, delay storage, and control circuitry
Cache
Delay
0 ns
0.95 ns
Power
0.55 mW
27.67 mW
Area
0.0048 mm^2
0.54 mm^2
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Power Issues
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Leakage Power Variation
Probability Density Function
0.35
40% variation
30% variation
20% variation
10% variation
0.3
0.25
in
in
in
in
Leff
Leff
Leff
Leff
0.2
0.15
0.1
0.05
0
0
0.01
0.02 0.03
0.04
0.05 0.06
0.07
0.08 0.09
0.1
0.11
Cache Leakage (W)
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Leakage (contd.)
Probability Density Function
0.45
15% variation in Vth
10% variation in Vth
5% variation in Vth
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
05
9
0.
05
5
0.
05
1
0.
04
7
0.
04
3
0.
03
9
0.
03
5
0.
03
1
0.
02
7
0.
02
3
0.
01
9
0.
01
5
0.
0.
11
8
0
Cache Leakage (W)
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Leakage (contd.)
Probability Density Function
0.6
15% varaition in Vdd
10% varaition in Vdd
5% variation in Vdd
0.5
0.4
0.3
0.2
0.1
0.
05
3
0.
04
9
0.
04
5
0.
04
1
0.
03
7
0.
03
3
0.
02
9
0.
02
4
0.
02
0.
01
6
0.
01
2
0
Cache Leakage (W)
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Leakage Enhanced Cells
In the inactive state, when the cell is not being written to or read from, most of the
leakage power is dissipated by the transistors that are off and that have a voltage
differential across their drain and source. If the cell were storing a “0”, transistors
T1, N1 and P2 would dissipate leakage power. A simple technique for reducing
leakage power would be to replace all transistors with high-Vth ones, but this
would degrade the bitlines discharge times affecting cell read performance
significantly. In our design we instead applied the same high-Vth for all the NMOS
transistors – asymmetric cell design. By changing the Vth we change perfomance
and power tradeoffs.
BL
BLB
WL
P1
VL=‘0’
P2
VR=‘1’
T1
T2
N1
N2
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016
Tradeoffs between performance and
power – what is visible at appl. level?
Distribution of cache delay and leakage power for different high-Vth schemes.
Results obtained by Monte Carlo simulations with adaptive cache for various scenarios.
Scheme
Vth (V)
Delay
(ns)
Mean
Leakage
(W)
1
cycle
2
cycles
3
cycles
Conventional
0.23
2.34
0.190
0%
0%
100%
A1
0.20
0.952
0.467
75%
24%
1%
A2
0.25
0.972
0.182
68%
30%
2%
A3
0.27
1.091
0.116
56%
40%
4%
A4
0.30
1.122
0.076
45%
50%
5%
Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007-2016