Built-in Self Repair - BTU Cottbus

Download Report

Transcript Built-in Self Repair - BTU Cottbus

Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Transient and Permanent Faults in
Nanoelectronic ICs: Compensation and Repair
Problems, Solutions, Limitations
H. T. Vierhaus
BTU Cottbus
Computer Engineering
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Outline
1. Introduction: Nanostructure Problems
2. Transient Faults
3. Repair of Permanent Faults
4. Bus Structures and NoCs
5. Diagnostic Test
6. A Lot of Things to do ...
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
1. Introduction
A bunch of new problems from nanostructures ...
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Nanoelectronic Problems
Lithography:
The wavelength used to „map“ structural information from
masks to wafers is larger (4 times of more) than the minimum
structural features (193 versus 90 / 65 / 45 nm).
Adaptation of layouts for correction of mapping faults
Parameter variations:
The number of atoms in MOS- transistor channels becomes so
small that statistical variations of doping densities have an impact
on device parameters such as threshold voltages.
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Doping Fluctuations in MOS Transistors
Poly-Si
n
doping atom
n
p-Substrate
Density and distribution of doping atoms
cause shifts in transistor threshold voltages!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Nanostructure Problems
Individual device characteristics such as Vth are more dependent
on statistical variations of underlying physical features such
as doping profiles.
A significant share of basic devices will be „out or specs“ and needs
a replacement by backup elements for yield improvement after
production.
As smaller features mean higher stress (field strength, current
density), also early failures „in the field“ are more likely and must
be compensated.
Transient error recognition and compensation „in time“ is
becoming a must due to e. g. charged particles that can
discharge circuit nodes.
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Key Technologies
Fault tolerant computing
Is required to handle intermittent and transient fault effects, e.g. induced by
radiation.
An old technology that is already heavily used in every day computing
(e.g. memory interfaces with ECC- check and correction).
Can handle only a limited number of permanent faults!
Built-in self test (BIST) and self-repair (BISR)
Is required to handle permanent faults by self-repair using redundant elements.
State-of-the-art for memories, not for logic.
Can handle multiple faults (sequentially) until the resource of redundancy
is exhausted.
Algorithms that are fully or partially „fault hard“
Most DSP algorithms show an inherent „stability“ and work even under
fault conditions with reduced precision. The effect can be „HW-enhanced“.
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
System-on-a Chip (SoC)
global
bus
DSP
Bus
coupler
Local
Memory
FU 1 FU 2 FU 3
local
bus
RISC
Local
Memory
DSP
Local
Memory
global
bus
SoCs are heterogeneous
systems that require
test & repair strategies for:
- logic (also in processors)
- memory blocks
- interconnects
- analog and D/A
components
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Fault Tolerant Computing
Fault
event
Software-based
fault detection
& compensation
Works only
for transient faults!
specific
HW logic &
RT-level
detection &
compensation
Typically works
for transient and
permanent faults!
universal
Transistor-and switch level
compensation
Typically works
for specific types of
transient faults
only!
very
specific
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
2. Transient Fault Effects
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Storage Nodes and Particles
Q / fC
100
Alpha-Part.
10
1
0,35
0,25
0,18
0,09
Technology
1 MeV Alpha-Particle generates 42 fC Charge!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Contribution to Soft-Error Rates
Static combinational logic:
11 %
Sequential elements (FFs, Latches): 49 %
Unprotected SRAM:
40 %
Source: S. Mitra, N. Seifert, M. Zhang, Q. Shi, K. S. Kim,
„Robust System Design with Built-In Soft Error Resilience“
IEEE Computer, Vol. 38, No.2, Febr. 2005, pp. 43-52
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Spikes and Clock Rates in Logic
Source: Pulse of 100 ps
clock
Charge-/status
restoration
is possible
t
clock
Charge-/status
restoration
is impossible
t
Fault probability is digital logic is about proportional
to clock frequency!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Logic Structures and Fault Events
Input-FFs
Particleradiation
Output
FFs
Flip-flops need fault tolerance / fault hardening
in the first place, logic close-to outputs comes next.
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Muller-C-Element
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Fault-Tolerant Latch Design
in
Latch
1
outl1
out
Muller
C-Element
Latch
2
outl2
CL
v(t)
clock
outl1= in
outl2= in
If clock is high: out = in
outl1,
outl2
latched
outl1= in
outl2= in
t
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Fault Handling
Muller-C-Element:
If both inputs are equal: out = outl1, outl2
If both element are not equal: out = previous (outl1, outl2)
Under local fault conditions on the latch outputs
(one of 2 latches false), the C-element preserves the output
condition from the „charge“ phase of the latch.
in
Latch
1
outl1
Latch
2
outl2
out
Muller
C-Element
Essentially 3 latches!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Intel‘s Scan Path Element
SCB
OR
SI
SCA
Cap-
1D
C1 Latch
2D LA
C2
1DLatch
C1 LB
SO
ture
update
1D
D
1D
Latch
C1 PH2
CLK
C1Latch
PH1
2D
C2
Q
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Intel‘s Scan Path Element plus Fault
Compensation
SCB
OR
SI
1D
SCA
Capture
&
C1 Latch
2D LA
C2
C-Element
update
1D
D
1D
Latch
C1 PH2
CLK
Test
SO
1DLatch
C1 LB
C1Latch
PH1
2D
C2
Keeper
Latch
Q
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
TMR-Latch / Flip-Flop
in
FF1
Out = L1out with cout = 1
MUX
Out = L2out with cout = 0
FF2
XOR
cout
FF3
clock
Works with latches or- flip-flops
Can compensate static or dynamic faults in latches / FFs!
FF1 is untestable (active redundancy)
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
TMR-Scan-Element
Signals
out
SC
ic TMR
0
0
0
0
0
0
1
0
0
0
1
1
0
0
0
ff2
ff1
Scan t1 1
in
t2 1
dyn.
1
0
0
0
0
0
0
Scan
1
in stat. 1
Scan
0
test
0
stat.
0
1
0
1
1
1
ff2
ff2
ff2
ff2
1
1
funct.
Scan t1
test t2
dyn.
ff2
ff2
ff2
ff1
ff2
ff1
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
TMR Scan-Element
Fault tolerant in functional mode
Fault tolerant in scan-mode
Optional support of test strategies that require a specific
sequence of 2 input bits!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Fault tolerant Latches and FFs
Latch with TMR- Scan path Scan-path
TMR scan
C-elem. [9] latch cell [9]
cell + C-el. [9] path .elem.
No. of
trans.
20
24
34
48
66
0
0
4
(2 clocks)
5
(2 clocks)
2
(1 clock)
fault tol.
funct.dyn.
yes
yes
no
yes
yes
fault tol.
scan dyn.
-
-
no
yes
fault tol.
ffs static
no
yes
no
no
yes
2-pat.
scan test
-
no
no
yes
Contr.
signals
-
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Fault Compensation in Combinational Logic
Input-FFs
Particleradiation
D
D
D
MC
MC
MC
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Fault Compensation in Combinational Logic
fault-free signal
V(t)
t
Signal with glitch
V(t)
Signal with delayed glitch
Latch
close
t
Time left
to capture!
V(t)
MC capture
MC no capture /
hold
MC capture
t
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
3. Repair of Permanents Faults
Compensation of transient faults is not enough.
Some technologies for transient compensation can handle
permanent faults, too, but not on the long run and with
additional transient faults!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Memory Test & Repair
Lines
Line
address
Read-/
Write lines
spare
column
columns
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Memory Test & Repair (2)
Line
address
Lines
Read-/
Write lines
spare
column
Memory
BIST
controller
columns
... is already state-of-the-art!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Logic Self Repair
Repair procedure
overhead
Functioning
Elements lost
Size or replaced blocks
(granularity)
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Granularity of Replacement
Block-level
replacement
(e. g. FPGAs)
Hardly
explored
(logic)
CoreReplacement
(e. g. CPU)
Expected fault density (1 out of..)
trans.
100
gate
101
FPGAmacro block
102
103
cores
104
CPU
105
106 Granularity
(transistors)
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Levels of Repair
Transistors - Switch Level
Replace transistors or transistor groups
Losses by reconfiguration: (switched-off „good“ devices):
Potentially small ( 20 – 50%) for transistor faults
Overhead for test and diagnosis: Very high
Gate Level
Replace gates or logic cells
Losses by reconfiguration:
Medium (60 to 90 %) for single transistor faults
Overhead for test and diagnosis: Medium to high
Macro-Block Level
Replace functional macros (ALU, FPU, CPU)
Losses by reconfiguration: High, 99 % or more
Overhead for test and diagnosis: Low
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Replacement in Regular Structures
(e.g. for DSP)
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Parallel Backup Transistors
VDD
VDD
in1
out
in1
out
redundant
transistors
in2
GND
Basic gate
in2
GND
Gate with redundant transistors
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Redundancy by „Active“ Parallel Transistors
Active redundancy is not testable. Therefore there is no way to
monitor the status of „available“ redundancy in a logic circuit.
Parallel transistors cannot compensate a fault of the „stuck-on“
type (transistor always conducting).
Faulty „backup“-transistors may produce additional
faults that cannot be corrected!
Adding redundancy is not enough,
fault isolation is a real problem!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Configuration and Fault Isolation
Ap
VDD
config.
switches
Ap
VDD
stuck-on
fault
out
in1
out
in1
backup
transistors
in2
in2
GND
An
An
GND
config.
switches
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
The Gate-Short-Problem
Load
1
Driver
Gateshort
Load
2
GND-shorts of input gates affect the whole fan-in
network and make redundancy obsolete!!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Gate Turn-off
Ap
VDD
config.
switches
Ap
input shut-off
switches
in1
out
backup
transistors
in2
gate_control
An
An
GND
config.
switches
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Schematic Layout with VDD/GND Switches
Gate with parallel
redundancy
Gate with parallel redundancy and
fault isolation
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Transistor-Level Overhead
Redundancy
parallel
transistors
Overhead
(cells only)
stuck-off
coverage
stuck-on
coverage
gate
shorts cov.
30-40%
control
lines
yes
VDD / GND
switches
60-80 %
estimates
yes
separate gate
poly lines
100-150 %
yes
no
yes
yes
no
no
yes
none
one wire
mult. wires
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Duplicate Standard Cells
VDD
Gate 1
Switch
control
VDD-Switch
VDD1
Gate 2
VDD2
out
in1
out
in1
in2
in2
GND
GND
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Again: Fault Isolation
VDD
Gate 1
Switch
control
VDD-Switch
VDD1
Gate 2
VDD2
out
in1
in1
in2
in2
GND
Gate input short
out
GND
Output VDD / GND short
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Administrated Duplicate Cells
VDD
power switches
VDD1
1X
X1
VDD2
gate in
gate in
gate
out
gate
out
Gate 1
Gate 2
Gate
short
GND1
Act 1
Act 2
01
0X
X0
GND2
10
GND switches
10
GND
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Features
Use „normal“ cell designs
Four states of operation:
Config. 1: Gate 1 active, Gate 2 isolated
Config. 2: Gate 2 active, Gate 1 isolated
Config. 3: Both Gates active operating in parallel
Config. 4: Both Gates isolated from VDD / GND
Operations like „high / low power“ possible.
Cells can be put to temporary „sleep“ for stress relieve.
Permanent repair functions.
Active cell output is connected only to „floating“
outputs of the other cell.
If twin tubs are used and cell-internal tubs are
also disconnected, gate input / GND short prohibited.
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Bistable Switching Cell
VDD
01
10
10
01
Gate 1
Gate 2
10
10
Act
0 1
GND
01
Output
separation
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Cell Duplication and Power Switch
Possible for all types of cells (also flip-flops).
Granularity of partitioning for replacements (single gates,
blocks) can be selected upon demand.
Combination with dynamic circuit optimization is favorably
possible.
Good coverage potential for transistor faults.
Significant overhead (above 100 %), but most likely below
Triple Modular Redundancy (TMR).
Redundancy may become exhausted and requires a further level
of redundancy!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Gate - Replacement
Gatefault
backupcell
Std cells (gates)
Insertion of replacement cell
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Regular Logic Wiring
link
next cell
logic
drive gates
feed
next
cell
backup
cell
link
next cell
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Faults on Irregular Interconnects
Routing tree
C
signal
source
S
C
C
single fault
(line break)
C
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Redundant Wiring
Routing tree with loops
extra
wire
signal
source
.. plus double vias!
C
S
C
C
single fault
(line break)
C
Problem: classic delay calculation works well on trees only!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
4. Bus Structures and
„Networks on Chip“ (NoCs)
Technology forecasts predict that nano-wires may become
the most vulnerable and unreliable circuit elements ...
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Buses versus NoCs
Bus
master
Bus
master
Bus
master
Bus
master
Bus
master
Irregular bus structure
(SoC)
NoC
node
NoC
node
NoC
node
NoC
node
NoC
node
NoC
node
NoC
node
NoC
node
NoC
node
Regular network structure
(NoC)
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Faults on Bus Structures
BM
1
BM
3
BM
5
BM
2
BM
4
BM
6
Local defect
affecting the
total network
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Bus Fault Conditions
Technology forecasts predict a reliability problem with
interconnects (nano-wires) in nano-technologies.
A single permanent fault on a bus may affect the bus
as a whole.
Fault detection and compensation by methods developed
for transient faults (Hamming code, ECC-checks) can handle
static faults, but are relatively expensive.
Capabilities of handling transient faults on top of permanent
faults are limited.
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Bus Segmentation
BM
1
BM
3
BM
5
SC
SC
SC
S
C
S
C
S
C
SC
SC
SC
BM
2
BM
4
BM
6
segment
couplers
Structure the bus into segments that can be repaired
individually!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
The Switching Problem
n
n+k
n
backup
p
1
p
n
k
p
8
16
32
1
1
2
1
1
2
switches contr. states
16
32
128
9
33
65
1
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Faults and Repair Actions
1. Line- break: Section of a line is interrupted
use spare wire!
2. Line- short to GND: Section of a line is connected to GND
use spare wire!
3. Dynamic coupling between adjacent line:
a. Re-allocate lines in bundle
b. Insert grounded line for decoupling
4. Bridge between lines:
a. Feed both lines with same signal
b. Make one line „floating“
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Single Line Replacement
Fault
Signal
s0
s1
s2
s3
s4
(k-1)
Bachup
b0
b1
b2
Overhead:
2k switches,
(k+1) logic states for 1 backup line
2pk switches, p (k+1) logic states for p backup lines
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Inserting Lines for Decoupling
Signal
s0
s1
s2
s3
s4
couplingfault
(k-1)
Backup
b0
b1
b2
Multiple line insertion for de-coupling requires multiple
Shifts of lines, multiple switches and states!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Repair Mechanisms
Buses with „extra“ backup lines that need specific configuration
for repair generate high cost in terms of switches and
administration due to many „logic states“ of the bus section.
Such repair schemes are not suited to re-organize neighborhood
relations on buses for de-coupling of lines.
Try to cover all relevant fault conditions by a small set of
states using permutation of lines!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Reconfiguration for De-Coupling
SC
s0
SC
2-Way Switches
may be used!
s5
reconfigure
s0
s5
SC
SC
s0
i
k
i
k
i
k
i
k
s5
… can help to minimize dynamic coupling faults!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Characteristics of 6 / 8 Wire Bundles
Given a bundle of 6 or 8 bus lines:
Are there any permutations that create all-new neighbors
for every single line in order to eliminate coupling faults?
NNP6
NNP81
6 lines
0-2
1-4
2-0
3-5
4-1
5-3
NNP82
NNP83
8 lines
0-2
1-6
2-0
3-5
4-7
5-3
6-1
7-4
0-3
1-5
2-7
3-0
4-6
5-1
6-4
7-2
0-5
1-7
2-4
3-6
4-2
5-0
6-3
7-1
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
6 Wires: Permutations and Replacement
Permutations
Input wire
PW2
2
0
mapping
1st switching column
2nd switching column
3rd switching column
Selected backup lines
0
2
0
5
5
3
2,5,3
2
4
1
2
1
Replacement possible
by lines # (2 sw. col.)
1
4
2
0
4,2,0
backup
Line selected
for backup
NNP
2
2
1
2
0
1
4
0,1,4
PW3
0
3
2
4
5
1
3
0
1
5
4
2
2
4
0
3
2
1
0
5
3
4
5
0
PW2
NNP
3
4
3
5
4
1
4
3
5,4,1
backup
4
4
3
4
1
3
0
1,3,0
2
5
5
0
5
3
0
2
3,0,2
PW3
3
0
5
1
4
2
1
5
4
2
1
5
2
4
0
3
5
1
3
0
0
3
2
4
Administration:
4 logic states for
2 sw.-columns
2 extra. wires
6 logic states for
3 sw.-columns
1 extra. wire
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Selection of Permutations
All single faults must be repairable by selecting
a minimum set of permutations.
Those lines that can act as replacement for most of the
others are selected for „backup lines“.
By permutation, also non-faulty functional lines are
re-arranged.
No permutation used for repair must map a functional
line to a faulty line.
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Permutations for 8-Wire-Bundles
New-neighborhood
Pair-wise symmetrical
NNP1
NNP2
NNP3
PW1
PW2
PW3
0-2
1-6
2-0
3-5
4-7
5-3
6-1
7-4
0-3
1-5
2-7
3-0
4-6
5-1
6-4
7-2
0-5
1-7
2-4
3-6
4-2
5-0
6-3
7-1
0-1
1-0
2-3
3-2
4-5
5-4
6-7
7-6
0-6
1- 3
2-4
3-1
4-2
5-7
6-0
7-5
0-4
1-7
2-5
3-6
4-0
5-2
6-3
7-1
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
8 Wires: Permutations and Replacement
Permutations
Selected
backup
Selected
backup
wires
2 lines selected for backup!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
8 Wires: Permutations and Replacement
Permutations
4 lines selected for backup!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Overhead / Coverage for 6-Line-Bundle
Spare. lines / Switches
Faults
0/ 12
1 /36
2 / 24
Single
line fault
-
+
+
Dyn. coupl.
faults
+
+
+
Double
line faults
-
-
50%
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Overhead / Coverage for 8-Line-Bundle
Spare Lines (out of 8) / Switches
Faults
Single
fine fault
0/ 16
1 /48
2 / 32 3 / 32 4/ 32
-
+
+
+
+
Dyn. coupl.
fault
+
+
++
++
++
Double
line faults
-
-
20%
30%
100 %
Note: The number of switches is reduced by a factor
of 2 if full 2-way-switches with 2 inputs / 2 outputs are used!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Results
Bus segments can favorably be organized into bundles
of 8 lines for reconfiguration. Wider bundles require even
more columns of switches.
In a bundle of 8 lines, all single faults can be repaired
either by one backup line and 3 columns of switches or
two backup lines and 2 columns with 6 / 4 logic states.
Two columns with 4 states also allow for two alternative
modes of changing neighborhood relations for de-coupling.
It also covers a fraction of double-line faults.
A full coverage of double-line-faults requires 4 backup lines
and 2 columns of switches or 2 backup lines and 4 columns.
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Administration Scheme
in /
out 0
SC
Switches
lines
0‘
SC
in /
0 out
Switches
1
1‘
1
2
2‘
2
3
3‘
3
4
5
A
B
6
4‘
B
5‘
5
6‘
6
7‘
7
Decode
C1
C2
ConfigLogic
4
A
Config-bits
Matching
7
Decode
C2
C1
ConfigLogic
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Processor-Based Bus Test
Bus
Master
Bus reflector
data lines
Bus
Master
Bus
Master
Test
Processor
reflector select
invert control
clock
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Test and Fault Diagnosis
BM C
S
SC
S
BM
C
S
SC
S
BM
C
BM C
S
BM C
S
C
S
C
Test
Processor
S
BM
C
Segment
Status
List
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Upcoming:
Test Procedure & Fault Management
Test-Processor can „reset“ control of bus sections.
Test processor runs diagnostic test to identify faulty lines.
In case of faults, „trial and error“ test to identify
faulty line segment(s).
Test Processor keeps „fault list“ for redundancy
management & supervision.
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Summary
A simple scheme of re-arranging bus sections for repair of
permanent faults.
Simple control scheme based on few logic states.
Modular approach based on bundles of lines is scalable to
cover wider buses. Should work well with NoCs.
Compatibility with regular schemes for bus test based on a
dedicated test processor device.
The number and the electrical effect of switches in complex
bus systems may still cause problems.
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
5. Diagnostic Tests
Fault diagnosis by diagnostic (self-) test is possibly the
real bottleneck in logic BISR!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Fault Diagnosis
Memory cells are either to diagnose in case of faults
affecting single cells. BIST is possible.
Diagnostic tests of buses that have to discover a single
faulty line are straightforward. They can easily find which
wires are affected, but not where the fault is.
Detecting a fault gate or even transistor in a logic block
is a much more challenging problem. Diagnosis must be
compatible with methods of test response compaction used
in scan testing.
Intelligent encoding for test responses!
... such as done by U. Potsdam and Infineon!
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Combinational Logic Fault Diagnosis
Input-FFs
Output
FFs
Faults can occur within specific gates, on interconnects,
or in a „distributed“ manner. Identifying a specific fault gate or line is
not easy at best and sometimes close-to impossible by logic testing.
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Logic Test
(pseudo-)
inputs
Input
vector
(pseudo-)
outputs
Comb.
Logic
Output
vector
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Scan Path Technology
(pseudo-)
inputs
(pseudo-)
outputs
ff
ff
ff
ff
ff
ff
Comb.
Logic
ff
ff
ff
ff
Scan-in
Input
vector
ff
Scan-out
Output
vector
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Scan-based Logic Test
Compacted / encoded test information
De-compactor
C
L
C
L
Test response compactor
Diagnosis
Coding
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
Fault Diagnosis on Compacted Output Data
Scan input Generator
(De-Compactor)
scan clock
d-value
storage
d0 d1
&
&
d2
&
d3
&
d4
&
MISR
MISR clock: k * scan-clock
*patented, U. Potsdam and Infineon Technologies AG
d5
&
d6
&
Ref. MISR
compare
Brandenburgische Technische Universität Cottbus
Lehrstuhl Technische Informatik - Computer Engineering
6. A Lot of Work to Do
Logic fault diagnosis
Efficient logic self repair
Redundancy supervision and management
Resource management under fault conditions
Repair functions for interconnects
Overall system-level fault management