Roy1005 - Purdue College of Engineering

Download Report

Transcript Roy1005 - Purdue College of Engineering

Self-Repairing and Self-Calibration: A
Design/Test Strategy for Nano-scale
CMOS
Kaushik Roy
S. Mukhopadhyay, H. Mahmoodi, A. Raychowdhury,
Chris Kim, S. Ghosh, K. Kang
School of Electrical and Computer Engineering,
Purdue University, West Lafayette, IN
Process Variation in Nano-scale Transistors
Normalized Frequency
1.4
1.3
30%
1.2
130nm
1.1
Source: Intel
1.0
5X
0.9
1
2
3
4
Normalized Leakage (Isb
(Isb))
5
Mean Number of Dopant Atoms
Delay and Leakage Spread
10000
Source: Intel
1000
100
10
1000
500
250
130
65
32
Technology Node (nm)
Inter and Intra-die Variations

Random dopant fluctuation
Device parameters are no longer deterministic
Process Variations: Failures, Test, SelfCalibration, and Self Repair
• Memories (Caches and Register Files)
– Failure analysis
– Updated March Test for process induced failures
– Process/Defect tolerant caches
– Self-Repairing SRAM’s
– Leakage and delay sensors for self-repair
• Logic
– Failure analysis in pipelines and robust pipeline design
– Delay sensor to measure critical path delays and integrated
test generation for robust segment delay coverage
• Updated scan chain logic
– FLH and FLS (First level hold and scan flip-flops) for lowpower (dynamic and leakage) and efficient delay testing
• Temperature control to prevent thermal runaway during burn-in
SRAM Memory Cell Failure



Process variation:
Device miss-match → Cell failure
Primary source:
Random dopant fluctuation
→ Vth miss-match
BL
BR
Vth3
PL
AXL
VL = 1
NL
Vth5 V
th1
Results in three types of failures
AF: Access time failure
RF: Read failure
WF: Write failure
VL
HF: Hold failure
Flipping
Vth4
PR
AXR
VR = 0
NR
V
Vth2 th6
No
writing
VL
VR
VR
TACCESS > TMAX
AF: Access time failure
RF: Read failure
WF: Write failure
Mechanisms of Parametric Failures
Read Failure
Voltage
MIN
BR
Voltage
Access Failure
BL
VR
WL
WL
VL
Time ->
Time ->
VL
WL
VR
Time ->
Hold Failure
Voltage
Voltage
Write Failure
VDDH
VR
VL
Time ->
Read Failure (RF)
WL
VDD
PR
PL
V
V L = ‘1 ’
R
= ‘0 ’
AXL
AXR
NL
NR
Subthreshold
BL
Gate leakage (
leakage ( I sub )
I gd )
Junction leakage (
BR
Distribution of VREAD
I jn )
PRF  PVREAD  VTRIPRD 
I dsatAXR  I dlinNR
Solve for VREAD
I dsatNL(VTRIPRD,VTRIPRD, gnd )  I dsatPL(VTRIPRD,VTRIPRD,VDD )
PRF  PZ  VREAD  VTRIPRD   0  1  FZ (0)
where,  Z  VREAD  VTRIP and  Z2   V2READ   V2TRIPRD
Solve for VTRIPRD
Write Failure (WF)
WL
VDD
TWL
PR
PL
V
V L = ‘1 ’
R
= ‘0 ’
AXL
AXR
NL
NR
Subthreshold
Gate leakage (
BL
leakage ( I sub )
I gd )
Junction leakage (
BR
I jn )
PWF  PTWRITE  TWL 
 VTRIP
CL (VL )dVL

; if (VW R  VTRIPW R)

TW RITE  
I in( L ) (VL )  I out( L ) (VL )
V
 DD

if (VW R  VTRIPW R)
;
I in( L )  current into L  I dsPL, I out( L )  current out of L  I dsAXL


PW F 
f
W R (tW R )d (tW R )
tWR TWL
 1  FW R (TW L )
Hold Failure (HF)
VHOLD
Variation of and Distribution of VDDHmin
PHF  PVDDHmin  VHOLD 
VL VDDHmin, Vt PL , Vt NL   VTRIP VDDHmin, Vt PR , Vt NR 
Solve for VDDHmin

PHF 
f
VDDHmin(vDDHmin)d (vDDHmin)
VHOLD
 1  FVDDHmin(VHOLD )
Access Time Failure (AF)
Distribution of
TACCESS
TMAX
Variation and distribution of TACCESS with variation in δVt
PAF  P(T ACCESS  TMAX )
TACCESS 
CBRCBL MIN
CB  MIN

CBL I BR  CBR I BL I dsatAXR 
I subAXL(i )

i 1,.., N

PAF 
f
TACCESS (t ACCESS )d (t ACCESS )
t ACCESS TMAX
 1  FTACCESS (TMAX )
Basic Modeling Approach
•
Estimation of the mean and the variance of a function several independent
normal random variable
TACCESS  f (Vt1 , Vt 2 ,..., Vt 6 )
 f (Vt1 ,Vt 2 ,...,Vt 6 ) 
 (f / V
ti
i 1,.., 6
)(Vti Vti )  ....
– Expand ‘f’ in Taylor series with respect to Vt1,…,Vt6 around their mean and
consider up to 2nd order terms.
– Estimate the mean and the variance as:
Mean (TACCESS )  f (Vt1 ,...,Vt 6 ) 
Variance (TACCESS ) 
•
•
 f
i 1,..., 6


1
2
2
2

f

V


ti
Vti
2 i 1,..., 6
2
V ti   Vti
2
Estimation of the probability distribution function of Y
– Assume a normal pdf with the estimated mean and variance.
The Vt of each transistors are assumed to be independent normal random
variables.

 Vt 1
LW

Estimation of Overall SRAM Cell
Failure Probability (PF)
PF  PFail   PAF  RF  WF  H F 
U
AF
RF
Fail
PASS
WF
1-PF
HF
PF
Estimation of Memory Failure Probability
(PMEM)
• PCOL: Probability that any of the cells in a column fails
• PMEM: Probability that more than NRC (# of redundant
columns) fail
PCOL  1  (1  PF )
NCOL
N
and PMEM
 N COL  i

 PCOL 1  PCOL NCOLi

i 
i  N RC 1 
PMEM
PF
PCOL

Redundant Columns
Estimation of Yield
Select inter-die parameters
(L, W, Vt)
Find PF
NINTER times
Find PCOL
Find PMEM
Find Yield


Yield  1  
PMEM LINTER ,WINTER ,Vt INTER  N INTER 


 INTER


NINTER : the total number of inter-die MonteCarlo simulations (i.e. total number of chips)
Snapshot: Transistor Sizing and Yield
100%
87%
76%
80%
Yield
65%
60%
40%
27%
WA
20%
WN
0%
0.4
0.6
0.8
1
WA / WN
Proposed failure analysis assists SRAM designers to
achieve maximum memory yield
Parametric Failures in SRAM Cell
WL
AXL
PL
PR
NL
NR
AXR
BR
BL
High-Vt
Low-Vt
350
Yield ≈ 33%
300
Fault statistics
250
σVt ≈ 30mv, using BPTM 45nm technology
200
150
100
50
Number of faulty cells (NFaultyFaulty-Cells)
Parametric Failures => Yield degradation
1049
996
944
890
839
786
734
682
629
577
524
472
419
367
315
262
210
157
105
52
0
0
Chip Count
• Transistor mismatch =>
Parametric failures
– Read failure
– Write failure
– Access failure
– Hold failure
SRAM Failure Mechanisms and Logic Fault Models
Random Dopant Fluctuations; W, L, Tox Variations
Instability in
SRAM cells
Hold
Failure
Low Supply Data
Retention Fault
Access
Failure
Vt mismatch in
Sense-amplifiers
Sense-amp
Functional Failure
Random
Read Fault
Data Retention
Fault
Transition
Fault
Incorrect
Read Fault
Delay variations in
address decoders
Write
Failure
Read
Destructive
Fault
Flipping
Read
Failure
Deceptive
Read
Destructive
Fault
Process
Variations
Circuit
Level
Deviations
Physical
Failure
Mechanisms
Logic
Fault
Models
• Deceptive read destructive faults are overlooked in conventional test
sequences
• Hold failures not detectable in conventional test sequences
Efficient Testing of SRAM*
Challenges
Fault Coverage of
Existing Test Sequences
Contributions
Failures due to Process Variations
Optimized Test Sequences
to improve fault coverage
* IEEE VLSI Test Symposium, May 2005
Test Time of
Existing Test Sequences
Novel DFT Circuit
to reduce test time
Test: Optimized March Test Sequence
1. Optimized March C(c W0) (  R0 W1) (  R1 W0) (  R0 W1) (HOLD)
(  R1 R1 W0) (HOLD) ( c R0 R0)
+
Good fault coverage
- Test time increases
2. March Q
( W0) (HOLD) (  R0 W0 W1 R1)
(HOLD) (  R1 W1 W0 R0) (  R0)
+
+
-
Reduce the test time
Cover all the fault models induced by process variations
Not able to detect Transition Coupling Fault and Address Decoder
Fault
March test sequence comparison
Logic Fault Models
Conventional Test Sequences
Proposed Sequences
March C-
March B
March SR
Opt. March
C-
March Q
Address Decoder Fault
+
+
-
+
-
Data Retention Fault
-
-
+
+
+
Low Supply Data Retention Fault
-
-
-
+
+
Stuck-at Fault
+
+
+
+
+
Transition Fault
+
+
+
+
+
Random Read Fault
+-
+-
+-
+-
+-
Read Destructive Fault
+
+
+
+
+
Deceptive Read Destructive Fault
-
-
+
+
+
Incorrect Read Fault
+
+
+
+
+
State Coupling Fault
+
-
+
+
+
Disturb Coupling Fault
+
-
+
+
+
Incorrect Read Coupling Fault
+
-
+
+
+
Read Destructive Coupling Fault
+
-
+
+
+
Transition Coupling Fault
+
+-
+
+
+-
10N
17N
14N
12N
10N
Test Time
Double Sensing: A Novel DFT Circuit to Reduce Test Time
• In order to reduce test time, double sensing is used to detect
Deceptive Read Destructive Fault in one cycle
•
However, the WL can NOT be extended too long for
The
• test
The
WL
can
be
extended
so
as
for
the
flipping
to
show
since
in
normal
mode
the
memory
should
still
• The content of the cell flips after the WL is
flipping
itself
on bitlines
operates
at the designed frequency.
activated.
is
parallel
sense-amp
is firedthe
at the
time
the WL is the
• • Disturbance
is togenerated
test
to accelerate
• Another
No time
left
detect the during
flipping
since
the
dis-activated.
sensed .
development
of the
bitline
voltage.
WL is disabled
soon
after differential
that.
WL
Already
extended
discharged
bitline is pulled
up
•
Then the required WL extension time is minimized.
Parametric Failures and Yield
PMEM
Redundant Columns
Inter-die Variation
PF
SRAM die with a global
shift in process parameter
Failure Probability
PCOL
Intra-die Variation =>
Cell, Column and
Memory failure
Hold
Failure
Write
Read
Failure
Failure
Access Failure
Std. dev. of intra-die variation
PMEM  f ( Inter -die Variation)
# Non-Faulty Die
Yield 
# Total Die
What is the effect of Inter-die
variation on Parametric Failures?
Inter-die Variation and Cell Failure
LVT
Nom. Vt
RF/HF high
Read (RF)
HVT
AF/WF high
Access (AF)
Hold (HF)
Write (WF)
• Inter-die shift in process parameter amplifies the failure
due to intra-die variations.
Inter-die Variation and Memory Failure
LVT
Reg. A
High RF/HF
Nom. Vt
HVT
Reg. B
Low Failures
Reg. C
High AF/WF
Col. Fail. Prob.
Cell. Fail. Prob.
Mem. Fail. Prob.
• Memory failure probabilities are high at high when interdie (global) shift in process is high.
How can we improve yield
considering both inter-die and
intra-die variations?
Adaptive Repairing of SRAM Array
LVT
Region A
Region A
LVT Corner
Nom. Vt
Region B
PMEM ≈1
HVT
Region C
PMEM ≈1
Region C
HVT Corner
Access & Write
failures dominate
Read & Hold
failures dominate
Mem. Fail.
Probability
Reduce
RF & HF
PMEM ≈0
Reduce
AF & WF
• Reduce the dominant failures at different inter-die
corners to increase width of low failure region.
How can we reduce the
dominant failures at different
inter-die corners?
Body Bias and Parametric Failures
Overall
Cell Fail. Prob
WL
VDD
Write (WF)
Read (HF)
Access (AF)
BL
GND
VBB
BR
Hold (HF)
• Proper body bias can reduce parametric failures
–Forward bias reduces Access & Write failures
–Reverse bias reduces Read & Hold failures
Adaptive Repair using Body Bias
LVT
Region A
Region A
LVT Corner
Nom. Vt
Region B
RBB
HVT
Region C
FBB
Region C
HVT Corner
ZBB
Access & Write
failures dominate
Read & Hold
failures dominate
Mem. Fail.
Probability
Apply RBB
PMEM ≈0
Apply FBB
• Reduce the dominant failures at different inter-die
corners to increase width of low failure region.
Self-Repair Technique in SRAM
SRAM Array
Pre-Silicon Design of Circuit and Architecture
Post-Silicon Adaptive Repair
Separation of inter-die process corners – Vt Binning
Adaptive Repair using Body Bias
Self-Repairing SRAM
Enhanced Yield
How we can identify the inter-die Vt
corner under a large random intradie variation ?
• Monitor circuit parameters e.g. delay and leakage
–Effect of inter-die variation can be masked by that of
intra-die variation
• Adding a large number of random variables
reduces the effect of intra-die variation
n
Y   Xi
i 1
n
n
 Y    Xi  N  X &    
i 1
2
Y
i 1
2
Xi
Y
1 X
 N  =>

Y
N X
2
X
Vt Binning by Leakage Monitoring
WL
VDD
BL
GND
BR
Vt Inter = 100mV
High Vt
Nom. Vt
Low Vt
Vt Inter = 0mV
Vt Inter
= -100mV
Vt Inter
= -100mV
Vt Inter
= 0mV
Vt Inter
= -100mV
Self-Repair using Leakage Monitoring
DD
Bypass
Switch
V
out
V
V
REF1 REF2
Comparator
SRAM
Array
Body
bias
Body-Bias
Generator
• On-chip monitoring of leakage
of entire array
• Body-bias is generated based
on leakage monitor output
• Leakage monitored is bypassed
in normal operating mode
VOUT
SRAM
ARRAY
LVT
Calibrate
Signal
On-chip
Leakage
Monitor
Nom. Vt
V
VREF1
VREF2
HVT
Nom. Vt
LVT
Current Monitor Circuit
Yield Enhancement using Self-Repair
265KB Self-Repairing
SRAM
265KB SRAM with
No Body-Bias
• Self-Repairing SRAM using body-bias can significantly
improve design yield.
Self-Repairing SRAM: Die Photo
64KB
High-Vt
SRAM
Self-Repair Circuit
64KB
Low-Vt
SRAM
Bypass
Switch
Schematic and measurement of self-repair mechanisms
VDD
Online
Leakage
Monitor
Vout
Calibrate
Signal
FBB signal
VREF1 VREF2
Comparator
SRAM
Array
Body voltage
(Zero bias
Body voltage
(Forward Bias)
Vbody Body-Bias
Generator
Measured waveform of body voltage for
Forward bias with On-chip FBB generator
Schematic of the Self-repairing SRAM
RBB
VREF1 > VREF2
VREF1
ZBB
Body-Bias
Voltage
RBB signal
FF
VOUT
VREF2
FF
FBB
Body voltage
(Zero bias
Body voltage
(Reverse Bias)
Calibrate
Bodybias generation logic (MUX switches are designed
with level converter for negative bias, not shown here)
Measured waveform of body voltage for52
Reverse bias with On-chip RBB generator
Design and Measurement of Current Sensor
VDD
200.0/0.12
200.0/0.12
BIAS
To SRAM
Array
5.0/0.12
2.0/0.12
CALIBRATE
VDD
sensor output with temperature
and Vt change (simulation 0.13m)
Effect of intra-die variation on
sensor output (simulation 0.13m)
1.0/0.12
1.0/0.12
40.0/0.30
CALIBRATE
40.0/0.30
50.0/0.12
VDD
BIAS
1.4/2.0
1.0/1.0
0.83/0.36
10.0/0.36
30.0/0.12
10.0/0.36
1.0/0.12
CALIBRATE
10.0/0.36
Current Sensor Circuit
Measured current sensor output
attached to the 64KB LVT array
53
Probability of Failure
Failure Probability in SRAM Memory Cells
1.0
PAF
PRF
PWF
PFault
10-1
MONTE CARLO simulation
using BPTM 45nm tech.
10-2
10-3
10-4
10
20
30
40
50
60
70
σVth( mV)
 Intrinsic Fluctuation of Vth due to random dopant effect
 PFault = PAF U PRF U PWF
 In 45nm technology σVth≈ 30mV → PFault > 1.0x10-3
Large number of faulty cells in nano-scale SRAM
under process variation
350
Conv. Yield
≈ 33.4%
300
250
Fault statistics
200
σVt ≈ 30mv, using BPTM 45nm technology
150
MONTE CARLO simulation of 1000 chips
100
50
524
498
472
446
419
393
367
341
315
288
262
236
210
184
157
131
105
79
52
26
0
0
Chip Count (Nchip)
Fault Statistics in 32K Cache
NFaulty-Cells
NFaulty-Cells = PFault X NCells (total number of cells in a cache)

Conventional 32K cache results in only 33.4% yield
Need a process/fault-tolerant mechanisms to improve
the yield in memory
Basic Cache Architecture
Index = Row Address + Column Address
 Multiple cache blocks are stored in a single row
 Minimize delay, area, routing complexity
 Column MUX selects one block

Basic Cache Architecture
BPTM 45nm technology, 32KByte direct mapped cache
# of Block in a Row
1 Block
2 Blocks
4 Blocks
Decoder Delay
0.086ns
0.085ns
0.084
Wordline Delay
0.069ns
0.075ns
0.128ns
Bitline to Q Delay
0.452ns
0.355ns
0.313ns
Total Delay
0.608ns
0.515ns
0.525ns
Energy
0.166nJ
0.181nJ
0.195nJ
For 32K cache best # of cache blocks/row = 2
 We choose 4 blocks in a row for our design
 Results in higher yield – 16.25% increase
 2% cache access penalty
 7% energy overhead

Fault-Tolerant Cache Architecture
Tag
Index
17
off
10
5
Column
Decoder
2 8
Row Address
Controller
17b 17b 17b 17b
256b
256b
256b
Faulty
Faulty
Column Mux Data
Col Mux Tag
Config
Storage
Sense
Amp
Sense
Amp
Hit/Miss
Tag
=
BIST
Data
Configurator
Fault
Memory Locations
Configuration
256b
256 Rows
Col Address
Row Decoder
Index
Data Blocks
Tag Blocks
Test Mode
Operating Condition
BIST detects the faulty blocks
 Config Storage stores the fault information
Idea is to resize the cache to avoid faulty blocks
during regular operation

Resizing the Cache
Feeds the fault information
to controller
Controller
Controller alters the
column address
CACHE, 4 Blocks in a Row
“00”
“11” “10” “01” “00”
Faulty Block
Config
Storage
Row Decoder
Config Storage is accessed
in parallel with cache
Column Address
Row Address
Column MUX
“00” “01” “10” “11” Column
Decoder
“01” “01” “10” “11”
Force the column MUX to select a non-faulty block
in the same row if the accessed block is faulty
Handle large number of faults without significantly
reducing the cache size
Mapping Issue
DATA
Address “one” Address “two” Location TAG
“T R 00 Off”
“T R 01 Off”
“R 00"
FAULTY
Mapped by Controller
STORE D1 “one”
T
D1
LOAD “two” Register
“R 01"
Tag matches but wrong data
More than one INDEX
are mapped to same
block
TAG
INDEX
Off
Column Address
Off
Include column
address bits into
TAG bits
INDEX
New TAG
DATA
Address “one” Address “two” Location TAG
“T R 00 Off”
“T R 01 Off”
“R 00"
FAULTY
Mapped by Controller
STORE D1 “one”
T00
D1
LOAD “two” Register
“R 01"
Tag does not match, cache miss
Resizing is transparent to processor → same memory address
Config Storage
16 Blocks
Row Address Part
of Index 9 Bits
Config
Storage
Cache
32KByte
Config
Storage
1Kbit
Blocks per row
4
x
Block Size
32Byte
4bit
16 Rows
Row Decoder
4 bit
Column Mux
Sense Amplifier
Accessed in parallel
with cache
4 bit fault information about 4 blocks
stored in a single cache row
One bit fault information per cache block
 Bits are determined by BIST at the time of testing
 Accessed using row address part of INDEX
 Provides the fault information of all the blocks in
a cache row to controller

Controller
Column address selection based on fault location
Accessed Column Address
Faulty Blocks in
Accessed Row
Fault Information
by Config Storage
00
01
10
11
Forced Column Address




None
0000
00
01
10
11
3rd Block
0010
00
01
00
11
2nd &3rd Block
0110
00
00
11
11
1st, 2nd & 3rdBlock
1110
11
11
11
11
All four Blocks
1111
NA
NA
NA
NA
Based on 4 bits read from Config Storage controller
alters the column address
Controller
Column address selection based on fault location
Accessed Column Address
Faulty Blocks in
Accessed Row
Fault Information
by Config Storage
00
01
10
11
Forced Column Address




None
0000
00
01
10
11
3rd Block
0010
00
01
00
11
2nd &3rd Block
0110
00
00
11
11
1st, 2nd & 3rdBlock
1110
11
11
11
11
All four Blocks
1111
NA
NA
NA
NA
One block in a row is faulty
Selects the first available non-faulty block
e.g 3rd block → 1st block
Controller
Column address selection based on fault location
Accessed Column Address
Faulty Blocks in
Accessed Row
Fault Information
by Config Storage
00
01
10
11
Forced Column Address




None
0000
00
01
10
11
3rd Block
0010
00
01
00
11
2nd &3rd Block
0110
00
00
11
11
1st, 2nd & 3rdBlock
1110
11
11
11
11
All four Blocks
1111
NA
NA
NA
NA
Two blocks in a row is faulty
Selects two non-faulty blocks respectively
e.g 2nd block → 1st block
3rd block → 4th block
Controller
Column address selection based on fault location
Accessed Column Address
Faulty Blocks in
Accessed Row
Fault Information
by Config Storage
00
01
10
11
Forced Column Address




None
0000
00
01
10
11
3rd Block
0010
00
01
00
11
2nd &3rd Block
0110
00
00
11
11
1st, 2nd & 3rdBlock
1110
11
11
11
11
All four Blocks
1111
NA
NA
NA
NA
Three blocks in a row is faulty
All the blocks are mapped to non-faulty block, e.g 4th block
One non-faulty block in each row, this architecture
can correct any number of faults
Energy, Performance, and Area Overhead
of Config Storage and Controller
BPTM 45nm technology, 32KByte Cache, 1Kbit Config Storage
Energy and
Performance
32KB
Cache
Config Storage &
Controller
Delay (ns)
0.45
0.22
Area overhead
NA
0.5%
Energy overhead
NA
1.8%

Controller changes the column address before
data reaches at column MUX

Does not affect the cache access time

Negligible energy and area overhead
(excluding BIST)
Results: Pop (ECC, Redundancy and
Proposed Scheme)

Pop: Probability that a chip with NFaulty-cells can be
made operational

Faults are randomly distributed across chip

Yield is defined as:
Ychip 

1
N Tot
P
op
( N Faulty _ Cell ) * N chip ( N Faulty _ Cell )
N Faulty_ Cell
Each scheme add some extra storage space

Pop includes the probability of having faults
in these blocks

To consider area, yield is redefined as:
eff
chip
Y
 Ychip
Achip _ without _ any _ scheme
Achip _ with _ fault _ tolerant _ scheme
Results: Pop
1.0
Redundency R = 32
ECC 1bit
Proposed Architecture
105 Faulty
cells
Pop
0.8
Prop. 65%
0.6
0.4
0.2
ECC
6%
0.0
0
52
105
157
210
262 315
367
419
472
524
NFaulty-Cells
% of the chips with 105 faulty cells which can be
saved by
Proposed scheme ~ 65% (high fault tolerant capability)
 ECC ~ 6%
 Redundancy ~ 0%

Results: Pop
Adding redundant rows
(r) in config storage
1.0
Pop
0.8
r=0
r=1
0.6
r=2
0.4
r=3
r=4
0.2
0.0
0
52
105 157 210 262 315 367 419 472 524
NFaulty-Cells

Pop improves with redundant rows in config storage

r = 2 is optimum for 32K cache with 1Kbit config
storage
Results: Pop
1.0
Pop
0.8
R=0 r=2
R=8 r=2
R=16 r=2
0.6
0.4
Proposed
architecture
with redundancy
0.2
0.0
0
52
105
157
210
262
315
367
419
472
524
NFaulty-Cells
Adding redundant rows (R) in cache in proposed
scheme improves the Pop further

(optimum is R =8 for 32K cache)
Effective Yield of 32K Cache
100
100
87
% Yield
77
60
40
89
89
77
Optimum r = 2
Proposed Arch.
Yield without
any Redundancy
91
89
89
77
77
77
80
% Yield
80
89
93
89
60
54
49
Conv. Yield
40
38
33
20
20
33
33
Proposed Architecture with r = 2
ECC
Redundency
Proposed Architecture
0
0
0
1
2
3
4
Redundent Rows in Config
Storage (r)
0
8
16
24
32
Redundent Rows in Cache (R)

ECC + Redundancy yield ~ 77%

Proposed architecture + Redundancy yield ~ 93%
(with 2 blocks in a cache row yield ~ 80%)
Chip Count (Nchip)
Fault Tolerant Capability
350
Fault statistics
300
Chips saved by the proposed + redundancy (R=8, r=2)
250
Chips saved by ECC + redundancy ( R=8)
200
More number of saved chips
as compare to ECC
150
100
ECC fails to save
any chips
50
0
0
52
105
157
210
262
315
367
419
472
524
NFaulty-Cells

Proposed architecture can handle more number of
faulty cells than ECC, as high as 419 faulty cells

Saves more number of chips than ECC for a given
NFaulty-Cells
350
Conv. Yield
≈ 33.4%
300
250
Fault statistics
σVt ≈ 30mv, using BPTM 45nm technology
200
150
100
50
NFaulty-Cells
NFaulty-Cells = PFault X NCells (total number of cells in a cache)
 Conventional 64K cache results in only 33.4% yield
Need a process/fault-tolerant mechanisms to improve
the yield in memory
1049
996
944
890
839
786
734
682
629
577
524
472
419
367
315
262
210
157
105
52
0
0
Chip Count (Nchip)
Process Tolerance: Fault Statistics in 64K
Cache
Process-Tolerant Cache Architecture
Tag
Index
16
off
11
5
Column
Decoder
2 9
Row Address
Controller
16b 16b 16b 16b
256b
256b
256b
Faulty
Column Mux Data
Col Mux Tag
Config
Storage
Sense
Amp
Sense
Amp
Hit/Miss
Tag
=
BIST
Data
Configurator
Fault
Memory Locations
Configuration
256b
512 Rows
Col Address
Row Decoder
Index
Data Blocks
Tag Blocks
Test Mode
Operating Condition

BIST detects the faulty blocks

Config Storage stores the fault information
Resize the cache to avoid faulty blocks during
regular operation
Fault Tolerant Capability
Chip Count (Nchip)
350
Fault statistics
300
Chips saved by the proposed + redundancy (R=8, r=3)
250
Chips saved by ECC + redundancy ( R=16)
200
More number of saved chips
as compare to ECC
150
ECC fails to save
any chips
100
50
0
0
105
210
315
419
524
629
734
839
944
1049
NFaulty-Cells
 Proposed architecture can handle more number of faulty
cells than ECC, as high as 890 faulty cells with marginal perf loss
CPU Performance Loss
% CPU Performance Loss
2.5
For a 64K cache
averaged over SPEC
2000 benchmarks
2.0
1.5
1.0
0.5
0.0
0
105 210 315 419 524 629 734 839
NFaulty-Cells
 Increase in miss rate due to downsizing of cache
Average CPU performance loss over all SPEC 2000
benchmarks for a cache with 890 faulty cells is ~ 2%

Register File: Self-Calibration using
Leakage Sensing
C. Kim, R. Krishnamurthy, & K. Roy
Process Compensating Dynamic Circuit
Technology
Conventional Static Keeper
clk
LBL0
N0
RS0
RS7
RS1
...
D0
D1
LBL1
D7
 Keeper upsizing degrades average performance
Process Compensating Dynamic Circuit
Technology
3-bit programmable keeper
b[2:0]
W
s
2W
s
4W
s
clk
LBL0
N0
RS0
RS7
RS1
...
D0
D1
LBL1
D7
C. Kim et al. , VLSI Circuits Symp. ‘03
 Opportunistic speedup via keeper downsizing
Robustness Squeeze
Number of dies
250
Noise
floor
Conventional
This work
200
150
100
50
saved
dies
0
0.7
0.8
0.9
1.0
1.1
Normalized DC robustness
 5X reduction in robustness failing dies
1.2
Delay Squeeze
Number of dies
300
Conventional
This work
250
PCD μ = 0.90
Conv. μ = 1.00
200
150
μ : avg. delay
100
50
0
0.8
0.9
1.0
1.1
Normalized delay
 10% opportunistic speedup
1.2
Self-Contained Process Compensation
Fab
Wafer test
Process detection
Leakage measurement
On-die leakage sensor
Customer
Package test
Burn in
Program
PCD
using
fuses
Assembly
On-Die Leakage Sensor For Measuring
Process Variation
compa
rators
VBIAS
gen.
NMOS
device
current
mirrors
73μm
current
reference
test interface
83μm
C. Kim et al. , VLSI Circuits Symp. ‘04
High leakage sensing gain
 Compact analog design sharing bias generators

Leakage Current Sensing Circuits
VBIAS
VSEN
+
T. Kuroda et al., JSSC, Nov. 1996
 Susceptible
IREF
d0
VSEN
+
-
d0
VDD/2
M. Griffin et al., JSSC, Nov. 1998
to P/N skew and supply fluctuation
 Large area due to multiple analog bias circuits
 Limited leakage sensing gain
85
Single Channel Leakage Sensing Circuit
M1 (saturation)
VSEN
VBIAS
IREF
 Basic
+
-
+
-
d0
VREF
M2 (subthreshold)
principle: Drain induced barrier lowering
 Low sensitivity to P/N skew and supply
fluctuation
86
PV Insensitive Current Reference (IREF)
Vt generation circuit
18/0.4
2/0.4
Subtraction circuit
6/0.4
2/0.4
6/0.4
IREF
18/0.4
6/0.4 18/0.4
6/0.8
2/0.4
6/0.8
4/0.8
4/0.8
1/1.6
S. Narendra et al., VLSI Circuits Symp. 2001
• Sub-1V process, voltage compensated MOS
current generation concept
• Reference voltage, external resistor not required
• Scalable, low cost, flexible solution
87
PV Insensitive Bias Voltage (VBIAS)
8/0.4
1/0.4
96/0.2
IREF
VBIAS =
2/0.2
W1
kT
log (
)
q
W2
(W1=96μm, W2=2μm)
E. Vittoz et al., JSSC, June 1979
• PTAT containing no resistive dividers
• Based on weak inversion MOS characteristics
• Desired output voltage achieved via sizing
88
Comparator
+
=
-
4/0.4
4/0.4
8/0.4
Subtraction circuit
IREF
(-)
(+)
8/0.4
8/0.4
output
4/0.4
• 2-stage differential amplifier
• Already designed IREF is used for bias current
4/0.4
89
PV Sensitivity of Designed IREF, VBIAS
1.2V, 90nm CMOS, 80˚C
1.2
1.0
0.99 1.00
1.00 1.00
0.99 1.00
1.2
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
IREF
VBIAS
0.2
0.0
0.99 0.99
1.00 1.00
1.01 1.01
typical
slow
IREF
VBIAS
0.0
1.1
1.2
VDD (V)
1.3
fast
Process skew
• IREF variation < 4%, VBIAS variation < 2%
• Under realistic process skews, ±100mV supply
voltage fluctuations
90
Proposed Leakage Current Sensing
M1 (saturation)
VSEN
VBIAS
d0
VREF
+
-
IREF
+
-
M2 (sub-threshold)
fast
typical
slow
fast
typical
slow
PMOS M1
Ids
Ids
NMOS M2
1.2V, 90nm CMOS, 80˚C
0
0.2
0.4
0.6
0.8
VSEN (V)
1
1.2
0
0.2
0.4
0.6
0.8
VSEN (V)
1
1.2
91
Superimposed I-V Curves
1.2V, 90nm CMOS, 80˚C
slow
typical
fast
Ids
∆VSEN=0.93V
0
0.2
0.4
0.6
0.8
1
1.2
VSEN (V)
• 1.9-10.2X higher VSEN swing than prior-art
• Process-voltage insensitive design
92
6-Channel Leakage Sensor Test Chip
WP
VBIAS
IREF
+
-
2WP
3WP
4WP
6WP
9WP
WN
WN
WN
WN
WN
WN
VSEN6
VSEN5
VSEN4
VSEN3
VSEN2
VSEN1
-
+
-
+
-
+
-
+
-
+
-
+
VREF
Bubble
rejection
circuit
V1
V2
V3
V4
V5
V1
V2
V3
OUT[2]
V1
V4
V5
OUT[1]
V2
V4
V6
OUT[0]
V6
 Incremental
mirroring ratio for multi-bit
resolution leakage sensing
 Shared bias generators  compact design
 Process-voltage insensitive IREF, VBIAS gen.
93
Multi-Bit Resolution Leakage Sensing
1.2V, 90nm CMOS, 80˚C
1.2
Voltage (V)
1
0.8
0.6
0.4
VREF
VSEN6
VSEN5
VSEN4
VSEN3
VSEN2
VSEN1
0.2
0
fast
typical
slow
Process skew
 Leakage
level determined by comparing VSEN1
through VSEN6 with VREF
 6-channel leakage sensor gives 7 level
resolution
94
Example: Operation at Fast
Process Corner
WP
IREF
VBIAS
+
-
2WP
3WP
WN
WN
4WP
WN
6WP
WN
Voltage (V)
1.2
9WP
WN
WN
0.8
0.6
0.4
VREF
0.2
0
-
1
+
-
1
+
-
1
+
-
+
-
+
-
+
1
fast
VSEN6
VSEN5
VSEN4
VSEN3
VSEN2
VSEN1
VREF
0
0
Bubble
rejection
circuit
 Fast
1
V1
V2
V3
V4
V5
V6
1
1
1
1
0
1
corner: output code ‘101’
V1
V2
V3
V1
V4
V5
V2
V4
V6
typical slow
OUT[2]
1
OUT[1]
0
OUT[0]
1
95
Example: Operation at Typical Process
Corner
WP
IREF
VBIAS
+
-
2WP
3WP
WN
WN
4WP
WN
6WP
WN
Voltage (V)
1.2
9WP
WN
WN
0.8
0.6
0.4
VREF
0.2
0
-
0
+
-
0
+
-
0
+
-
+
-
+
-
+
1
fast
VSEN6
VSEN5
VSEN4
VSEN3
VSEN2
VSEN1
VREF
0
0
Bubble
rejection
circuit
 Typical
1
V1
V2
V3
V4
V5
V6
1
0
1
1
1
1
V1
V2
V3
V1
V4
V5
V2
V4
V6
corner: output code ‘010’
typical slow
OUT[2]
0
OUT[1]
1
OUT[0]
0
96
On-Die Leakage Sensor Test Chip
current reference
NMOS
devices
current
mirrors
compar
ators
VBIAS
gen.
test interface
Technology
90nm dual Vt CMOS
VDD
1.2V
Resolution
7 levels
Power consumption
0.66 mW @80Cº
Dimensions
83 X 73 μm2
Leakage Binning Results
001 010 011 100
101
110
111
Output codes from leakage sensor
Conclusion
Statistical Failure Analysis Helps Enhance
Yield
Post Silicon Tuning/Calibration is Becoming
Promising for Si Nano systems
Built-In Leakage/Delay Sensors Provide
Information on Intra-Die Process Variations