Transcript 0 - IEAv

SEE Mitigation Strategies for
Digital Circuit Design Applicable to
ASIC and FPGAs
Prof. Fernanda Lima Kastensmidt, Ph.D.
Instituto de Informatica
Universidade Federal do Rio Grande do Sul
Porto Alegre – RS – Brazil
Motivation

A large set of electronics devices used in avionic, space and
ground-level applications can be upset by ionized particles.
General System


FPGA


ASIC


memory



processors

Analog 
electronics
low reliability
high reliability
Hardened
components
X
COTS
components
$$$$$$$$$$$$$$
Prof. Fernanda Lima Kastensmidt
$$$
Motivation

Solution I:
high reliability
Hardened
components
$$$$$$$$$$$$$$

If it is too expensive, so the solution may be design your own
hardened device!
– Which fault tolerance techniques should be used?
– How much fault tolerance is enough?
 It is necessary to qualify your hardened design.
Prof. Fernanda Lima Kastensmidt
Motivation

Solution II:
low reliability
COTS
components
$$$

It is necessary to qualify the device to analyze its robustness
to the application!
– Is it possible to apply some fault tolerance technique?


Software level
Component replication level
Prof. Fernanda Lima Kastensmidt
Types of SEE
Single event phenomena can be classified into three
effects (in order of permanency):



Single event upset and Single event transient
(soft error)
Single event latchup (soft or hard error)
Single event burnout (hard failure)
Hard errors or Single Event Latchup (SEL) are due to
shorts between ground and power, and cause
permanent functional damages.
Prof. Fernanda Lima Kastensmidt
Collected Charge
Depending on the circuit, transistor size, charge energy,
different current amplitude, duration and shapes will appear.
Prof. Fernanda Lima Kastensmidt
Charge Collection Mechanism
ION
Ion
IC Ic
I
Ip P
IC(t) = ICRITICAL(t) = IP(t) – ION(t)
Soft Error occurs when Qcollected > Qcritical
Prof. Fernanda Lima Kastensmidt
Fault Tolerance
+ -+
- +ionization
Fault Masking: any technique
that prevents faults from
introducing errors to the
output (failure)
Prof. Fernanda Lima Kastensmidt
FAILURE
Fault Tolerance
shielding
+ -+
- +ionization
Fault latency
Error latency
FAILURE
Transient voltage pulse
(capacitor node)
Transient current
FAULT
(injected or
extracted from the
junction)
Sensors
(detection)
clk
FAULT EFFECT
ERROR
Fault Masking (hardening by design):
Hardware and time redundancy
Hardened memory cells
Error-correction codes
Self-checking mechanisms with recovery
Prof. Fernanda Lima Kastensmidt
Fault Tolerance
Fault latency
+ -+
- +ionization
Error latency
FAILURE
Transient voltage pulse
(capacitor node)
Transient current
FAULT
(injected or
extracted from the
junction)
Sensors
(detection)
clk
FAULT EFFECT
ERROR
Fault Masking (hardening by design):
Hardware and time redundancy
Hardened memory cells
Error-correction codes
Self-checking mechanisms with recovery
Number of faults overcome the mitigation technique
Prof. Fernanda Lima Kastensmidt
Redundant
Spare
components
Outline

Radiation Effects on Digital ICs

Radiation Hardening by Design: Strategies for ASICs

Radiation Effects on FPGAs

Radiation Hardening by Design: Strategies for FPGAs

Final Remarks
Prof. Fernanda Lima Kastensmidt
Outline

Radiation Effects on Digital ICs

Radiation Hardening by Design: Strategies for ASICs

Radiation Effects on FPGAs

Radiation Hardening by Design: Strategies for FPGAs

Final Remarks
Prof. Fernanda Lima Kastensmidt
Single Event Effects (SEEs)
Transient Effect


Single Event Upset (SEU): bit-flip in a sequential logic
element
Digital Single Event Transient (DSET): transient
voltage pulse in the combinational logic
0
01
1
0
1
0
1
1
1
Combinational logic
sequential logic
sequential logic
Prof. Fernanda Lima Kastensmidt
SEU in Sequential Logic
OFF
OFF
0 1
OFF
WL
OFF
WL
01
gnd
BIT-FLIP
N
N
P
P
ionization
Prof. Fernanda Lima Kastensmidt
Hardened Memories
Approach 1: use decoupling resistors to slow the cell
regenerative feedback response avoiding the bit-flip
[Rocket, R., IEEE TNS, 1992]
Prof. Fernanda Lima Kastensmidt
Hardened Memory
Approach 2: add transistors to create an appropriate
feedback devoted to restore the data corrupted.
IBM Memory Cell [Rockett cell, 88]
Vdd
Vdd
PE
clk
HIT Memory Cell (Velazco, 92]
PF
PC
Vdd
PD
C
Vdd
Vdd
L
MP1
MN3
MN1
D
N3
Q
P1
A
P2
N1
B
N2
Vss
Vss
N4
Vdd
MN2
MN4
/D
MN5
Vdd
MP6
MP5
/Q
D
PB
Vdd
MP2
Q
clk
Vdd
PA
MP4
MP3
Vdd
M
MN6
Vss
/D
clk
/Q
Vss
Prof. Fernanda Lima Kastensmidt
Hardened Memories
The principle is to store the data in two different locations
within the cell in such way that the corrupted part can be
restored.
Whitaker/Liu Memory Cell [Liu, 92]
DICE Memory Cell [Calin, 96]
/clk
Vdd
/Q
D
Vdd
Vdd
Vdd
MP0
MP1
MP2
MP3
A
B
C
D
MN0
Vss
MN1
Vss
MN2
MN3
Vss
Vss
clk
clk
MN4
D
Q
Vss
MN5
D
Vss
Prof. Fernanda Lima Kastensmidt
MN6
MN7
/D
Dual Interlocked storage Cell (DICE)
0
clk
0
OFF
OFF
clk
1
OFF
OFF
1
Qa
Qb
Prof. Fernanda Lima Kastensmidt
0
Dual Interlocked storage Cell (DICE)
0
clk
OFF
OFF
01
OFF
1
OFF
0
OFF
OFF
clk
Qa
Qb
Prof. Fernanda Lima Kastensmidt
0
Dual Interlocked storage Cell (DICE)
0
The original value is restored
clk
OFF
OFF
1
0
OFF
1
OFF
0
OFF
OFF
clk
Qa
Qb
Prof. Fernanda Lima Kastensmidt
0
Challenges in Sequential Logic
MULTIPLE BIT UPSETS


Particle incidence angle
 Transistor Dimensions
 Voltage Supply
 Memory Array Density
Single memory cell
Multiple memory cells
Prof. Fernanda Lima Kastensmidt
Charge Sharing (NMOS transistor)
T=0
T=250ps
T=50ps
T=100ps
T=800ps
T=2ns
[Reed, et al., New Electronic Technologies Insertion into Flight Programs Workshop, 2007]
Prof. Fernanda Lima Kastensmidt
Limitations of Hardened Memory
Multiple nodes collecting charge are able to upset
hardened memory cells.
+ -+
- +-
+
+
+-+
ionization
Solutions:
 Shallow Trench Isolation (STI) structures
 Suitable transistors placement and routing
 Hardened memory cells combined with hardware
redundancy.
Prof. Fernanda Lima Kastensmidt
Triple Modular Redundancy
Sequential logic
X
Combinational
logic
MAJ
OK
clk
inputs MAJ
000 0
001 0
010 0
011 1
100 0
101 1
110 1
111 1
Each master-slave flip-flip can be composed of:
 standard latches: robust to multiple node collected
charge in the same latch
 hardened latches: robust to multiple node collected
charge in crossing domain latches too
Prof. Fernanda Lima Kastensmidt
Triple Modular Redundancy
Sequential logic
MAJ
Combinational
logic
X
clk

inputs MAJ
000 X
0 1
001 0
010 0
011 1
100 0
101 1
110 1
111 1X 0
Voter’s output can show a transient wrong value that
may be captured by the next memory cell.
Prof. Fernanda Lima Kastensmidt
Triple Modular Redundancy
MAJ
Sequential logic
Combinational
logic
MAJ
OK
OK
Current strength
MAJ
clk
OK
• Increases current drive
helping keeping the node in
the original value.
Triple MAJ voter
Prof. Fernanda Lima Kastensmidt
Triple Modular Redundancy
Sequential logic
X
Combinational
logic
X
Triple
MAJ
voter
X
X
clk

inputs MAJ
000 0
001 0
010 0
011 1
100 0
101 1
110 1
111 1
Catastrophic effect: the system votes three wrong
values out of three and the result is assumed to be
correct.
Prof. Fernanda Lima Kastensmidt
SET in Combinational Logic
current
Each node has an associated:
 Capacitance
Critical Charge
QCRIT
 Resistance
QDrift
Qdiffusion
Charge Qi
…
time
Prof. Fernanda Lima Kastensmidt
SET pulse
Amplitude x Width
SET in Combinational Logic
Not all SETs are captured by a memory cell.
They can be:
 Logical masked
 Electrical masked
 Latch window masked
Logical masked
e0
e1
e2
a3
1
0
0
0
1
1
1
Prof. Fernanda Lima Kastensmidt
Q
SET in Combinational Logic
Not all SETs are captured by a memory cell.
They can be:
 Logical masked
 Electrical masked
 Latch window masked
Electrical masked
e0
e1
e2
a3
1
0
1
1
1
Negligible pulse
0
0
0
Prof. Fernanda Lima Kastensmidt
Q
SET in Combinational Logic
Not all SETs are captured by a memory cell.
They can be:
 Logical masked
 Electrical masked
 Latch window masked
Latch window masked
e0
e1
e2
a3
1
0
1
1
1
0
0
0
clk edge
Prof. Fernanda Lima Kastensmidt
Q
Electrical Masking
Heavy Ion Radiation Results: 180nm CMOS
Pulse too narrow!!!
[Bruguier, G., et al., IEEE TNS, 1996]
Prof. Fernanda Lima Kastensmidt
SET vs. Frequency
Radiation Results:
DSET for 180nm vs. Freq
Freq.
clk
[Benedetto et al, IEEE TNS, 2004]
Prof. Fernanda Lima Kastensmidt
Challenges in Combinational Logic
SET
Transient Width (TW) may vary from few hundred of
pico seconds to few nano seconds according to LET.
Critical Transient Width (ps)
TW
500 Mhz
1Ghz
clk
TW
2.5 GHz
5Ghz
clk
100 Ghz
Process technology (nm)
TW
100
[Dodd, P., IEEE TNS 2004]
clk
TW
Prof. Fernanda Lima Kastensmidt
SET vs. SEU Error Rate
Prof. Fernanda Lima Kastensmidt
Challenges in Combinational Logic
According to the logic topology fan-out, a single SET may
originate multiple SETs.
a0
y0
a1
a2
a3
X
a4
a5
y1
X
Prof. Fernanda Lima Kastensmidt
Q0
Q1
Identifying the most sensitive nodes
Fault injection performed by electrical (SPICE) and logic
simulations can identify the most sensitive nodes:
 Lower critical charge (QCRIT)
 Lower SET logical mask probability
most sensitive nodes
A
B
C
D
Z
E
F
Prof. Fernanda Lima Kastensmidt
Transistor Resizing
[Zhou et al., IRPS 2004]
[Cazeaux et al., IOLTS 2005]
[Dhillon et al., IEEE Transaction on ISVLSI 2006]
most sensitive nodes
A
B
C
D
Z
E
QCRITICAL
F
Prof. Fernanda Lima Kastensmidt
Gate Replication
[Lisboa, C., et al., SBCCI 2005]
[Nieuwland et al., IOLTS 2006]
most sensitive nodes
A
B
C
D
Z
E
Current strength
F
• Increases current drive
helping keeping the node in
the original value.
Prof. Fernanda Lima Kastensmidt
Temporal Filtering


Votes the SET out by time redundancy.
The time redundancy is implemented by delays at the
clock lines or at the latch/flip-flops inputs.
Sequential logic
Combinational
logic
clk
clk+ T
X
Sequential logic
Triple
or
Single
Combinational
logic
OK
MAJ
voter
clk+ 2.T
T
2.T
clk
[Nicolaidis, VTS 1999], [Anghel et al., DATE 2000]
Prof. Fernanda Lima Kastensmidt
X
Triple
or
Single
MAJ
voter
OK
T T
Full time redundancy
clk
[Nicolaidis, VTS 1999]
[Anghel et al., DATE 2000]
clk+T
clk+2.T
TW
Sequential logic
Combinational
logic
SET
clk
clk+T
X
Triple
or
Single
comb
OK
MAJ
voter
ffp0
ffp1
clk+ 2.T
The .T is directly proportional to the
SET Transient Width (TW)
ffp2
MAJ
MAJ + comb delays
Prof. Fernanda Lima Kastensmidt
T
T
Full time redundancy
T
clk
clk+2.T
2. TW
clk+4.T
SET
Sequential logic
Combinational
logic
clk
clk+2.T
X
Triple
or
Single
comb
OK
MAJ
voter
ffp0
ffp1
clk+4.T
ffp2
TW
clk period (T)
MAJ
MAJ + comb delays
T
Prof. Fernanda Lima Kastensmidt
Temporal Latching to Trigger SETs
.T
[Benedetto et al., IEEE TNS 2004]
Prof. Fernanda Lima Kastensmidt
Error cross-section
decreases with the
increase of T
Triple Sample Memory Robust to
Multiple Bit Upsets and SET
[MAVIS, IRPS 2002]
combinational logic
Vdd
Vdd
Vdd
Vdd
MP0
MP1
MP2
MP3
A
B
C
D
Shifted clocks
MN0
Vss
MN1
Vss
MN2
MN3
Vss
Vss
clk
MN4
D
Prof. Fernanda Lima Kastensmidt
MN5
MN6
MN7
/D
Triple Sample Memory Robust to
Multiple Bit Upsets and SET
[MAVIS, IRPS 2002]
combinational logic
X
OK
Shifted clocks
Prof. Fernanda Lima Kastensmidt
Triple Sample Memory Robust to
Multiple Bit Upsets and SET
[MAVIS, IRPS 2002]
Multiple nodes collected charge
X
combinational logic
OK
Shifted clocks
OK
Prof. Fernanda Lima Kastensmidt
OK
Triple Sample Memory Robust to
Multiple Bit Upsets and SET
[MAVIS, IRPS 2002]
Multiple nodes collected charge
OK
combinational logic
OK
Shifted clocks
OK
Prof. Fernanda Lima Kastensmidt
OK
Full Triple Modular Redundancy
(TMR) with self-recovery
combinational logic
D0
TR0
TR1
TR2
clk0
E0
combinational logic
D1
clk1
X
TR1
TR0
TR2
TRV0
voter
OK
TRV1
voter
OK
E1
combinational logic
D2
clk2
TR2
TR0
TR1
E2
Prof. Fernanda Lima Kastensmidt
TRV2
voter
OK
Full Triple Modular Redundancy
(TMR) with self-recovery
combinational logic
D0
TR0
TR1
TR2
clk0
E0
combinational logic
D1
clk1
X
TR1
TR0
TR2
TRV0
voter
OK
TRV1
voter
OK
E1
combinational logic
D2
clk2
TR2
TR0
TR1
E2
Prof. Fernanda Lima Kastensmidt
TRV2
voter
OK
Full Triple Modular Redundancy
(TMR) with self-recovery
output pads
combinational logic
D0
clk0
E0
combinational logic
D1
clk1
TR0
TR1
TR2
TR1
TR0
TR2
TRV0
voter
TRV1
voter
output
pad
E1
combinational logic
D2
clk2
TR2
TR0
TR1
E2
Prof. Fernanda Lima Kastensmidt
TRV2
voter
wired voter
How much mitigation is enough?
 The circuits are becoming more and more complex
 Hardware and Time redundancy techniques can provide
a certain level of protection on:
– Single Event Upsets (SEU)
– Single Event Transient (SET)
– Multiple Bits or Nodes Upsets
 Problem: in some cases multiple faults can overcome
the mitigation techniques, provoking a system failure.
Prof. Fernanda Lima Kastensmidt
Multiple Faults in the Full TMR
clk0
X
TR0
TR1
TR2
X
TR1
TR0
TR2
E0
combinational logic
D1
clk1
TRV0
voter
TRV1
voter
E1
combinational logic
D2
clk2
TR2
TR0
TR1
E2
Prof. Fernanda Lima Kastensmidt
TRV2
voter
WRONG VALUE
combinational logic
D0
How much mitigation is enough?

How is it possible to know that the mitigation technique is
working properly for a certain Soft Error Rate (SER)?

It is necessary to have a mechanism to inform the system
when the number of multiple faults have passed a certain
level.

Built-in Self Test (BIST) Mechanism:
– sensors working as watch dogs
– each time an ionization occurs, the system is informed
Prof. Fernanda Lima Kastensmidt
How about sensors working as
watch dogs?
Full TMR with sensors
combinational
logic
TRV0
D0
clk0
D1
clk1
sensors
combinational
logic
sensors
voter
sensors
sensors
combinational
logic
TR0
TR1
TR2
TR1
TR0
TR2
TRV1
voter
sensors
TRV2
D2
clk2
TR2
TR0
TR1
voter
sensors
Prof. Fernanda Lima Kastensmidt
How about sensors working as
watch dogs?
Full TMR with sensors
combinational
logic
TRV0
D0
clk0
D1
clk1
sensors
combinational
logic
sensors
TR1
TR0
TR2
• One upset per time
TRV1
voter
sensors
TRV2
D2
clk2
TR2
TR0
TR1
If sensors detect:
voter
sensors
sensors
combinational
logic
TR0
TR1
TR2
voter
sensors
Prof. Fernanda Lima Kastensmidt
Technique is working!
How about sensors working as
watch dogs?
Full TMR with sensors
combinational
logic
TRV0
D0
clk0
D1
clk1
sensors
combinational
logic
sensors
If sensors detect:
voter
sensors
sensors
combinational
logic
TR0
TR1
TR2
X
TR1
TR0
TR2
TRV1
voter
sensors
TRV2
D2
clk2
TR2
TR0
TR1
voter
sensors
Prof. Fernanda Lima Kastensmidt
• Two or more upsets in
distinct redundant modules
per time
Technique is not working!
Bulk Built-in Current Sensors

During normal operation, the current in the bulk is
approximately zero.
 When an energetic particle generates an ionization, it
creates a current that flows through the stroke node and Vdd
or gnd.
 The bulk-BICS senses the current generated by ionization
at the bulk terminal.
[Henes Neto et al. IEEE MICRO, 2006]
+-+
-+-
Prof. Fernanda Lima Kastensmidt
Bulk-BICS
Bulk Built-in Current Sensors
Vdd
Vdd
Flips the BICS latch
p6
Vdd’
P
p5
01
P
p1
p2
p4
p3
10
RST
N
ionization
BICS-P
Vdd
Circuit
Design
Gnd’
n1
n2
n4
n3
n5
n6
BICS-N
Prof. Fernanda Lima Kastensmidt
nRST
Trade-offs

There is always some penalty to be paid when protecting
circuits against upsets.

Each technique may present a combination of:
– area overhead,
– performance penalty,
– power dissipation increase.

The challenge is to select the most cost-effective
techniques for the target circuit application.
Prof. Fernanda Lima Kastensmidt
CASE-STUDY: Adder
X
Detection
• SET
• SEU
ADDER
X
Recomputing with Shifted Operands
Duplication with Comparison (DWC)
<<
ADDER
=
Bulk-BICS
ADDER
<<
=
>>
ADDER
S=A+B
2.S = 2.A + 2.B
ADDER
Bulk-BICS
Prof. Fernanda Lima Kastensmidt
CASE-STUDY: Adder
SEU correction
X
ADDER
X
Error-Correction Code (Hamming)
Hardened Flip-flops
enc
dec
ADDER
ADDER
enc
dec
Prof. Fernanda Lima Kastensmidt
enc
dec
CASE-STUDY: Adder
SEU and SET correction
TMR with single voter
TMR with triple voter
ADDER
ADDER
voter
ADDER
ADDER
voter
ADDER
voter
voter
ADDER
Prof. Fernanda Lima Kastensmidt
CASE-STUDY: Adder
SEU and SET correction
Time redundancy with TMR in the registers
voter
voter
ADDER
T
voter
2.T
voter
voter
Prof. Fernanda Lima Kastensmidt
AREA vs. PERFORMANCE
Time Redundancy + TMR registers
More than 200%
TMR triple voter
SEU and SET
correction
TMR single voter
ECC hamming
Less than 50%
Hardened memory
SEU correction
Recomputation with Shifted Operands
Less than 50%
bulk-BICS
SEU and SET detection
DWC
Performance
Area
No protected
0
500
1000
1500
Prof. Fernanda Lima Kastensmidt
2000
2500
3000
How about Qualifying for SEE?

Testing by fault injection:
– Model the SEU and SET effect at:



Testing in space (actual environment)
Prof. Fernanda Lima Kastensmidt
cost

Testing in a Laser Facility
Testing at ground-level facilities
– (in front of a beam of Protons,
heavy ions, neutrons)
accuracy

Spice level
Logic level or RTL level
When testing in a Ground Level
facility for SEE:


Static Testing:
– no application is running during the test.
– The register files are read during or after the test
to check for SEU or/and SET and compared to a gold file.
– Test in memories, microprocessors, ASICs in general
Dynamic Testing:
– Applications are running during test.
– Outputs are been analyzed and compared to a gold design.
– SEU and SET can be checked during test
– Test in memories, microprocessors, ASICs in general, analog
circuits, etc…
Prof. Fernanda Lima Kastensmidt
General System
processors
FPGA
memory
ASIC
Analog logic
Prof. Fernanda Lima Kastensmidt
Outline

Radiation Effects on Digital ICs

Radiation Hardening by Design: Strategies for ASICs

Radiation Effects on FPGAs

Radiation Hardening by Design: Strategies for FPGAs

Final Remarks
Prof. Fernanda Lima Kastensmidt
Field-Programmable Gate Arrays

An array of logic blocks and interconnections
customizable by programmable switches.
 High logic density
 Customizable by the end user to realize different
designs
Configurable logic blocks
(CLBs)
interconnections
Switches for customization
Prof. Fernanda Lima Kastensmidt
Programmable Technologies
Programmable switches can be based on:
Antifuse: (Antifuses based FPGAs)
– when an electrically programmable switch forms a low resistance path between
two metal layers.
– One-time configurable
SRAM: (SRAM based FPGAs)
– the state of a static latch controls pass transistors or multiplexers connected to
pre-defined metal layers
– Re-configurable
Flash: (Flash based FPGAs)
– Floating gate controls the switches
– Re-configurable
Prof. Fernanda Lima Kastensmidt
Antifuse-based FPGAs

Non-volatile: hold the customizable content even when
not connected to the power supply.
 They can be programmed just once.

FPGAs products for Space
– ACTEL
– AEROFLEX (based on Quicklogic)
Prof. Fernanda Lima Kastensmidt
ACTEL: RTAX-S device
RAM
RAM
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
RAMC SC
SC
SC
SC
SC
SC
SC
SC
SC
SC RD SC
SC RD SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
TX
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
RX
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
RAMC SC
SC
SC
SC
SC
SC
SC
SC
SC
SC RD SC
SC RD SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
HD
HD
HD
HD
HD
HD
HD
HD
HD
HD
HD
HD
RAMC SC
SC
SC
SC
SC
SC CT SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
RAMC SC
SC
SC
SC
SC
SC
SC
SC
SC
SC RD SC
SC RD SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
RAMC SC
SC
SC
SC
SC
SC
SC
SC
SC
SC RD SC
SC RD SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
RAMC SC
SC
SC
SC
SC
SC RD SC
SC
SC
SC
SC
SC
HD
RAM
RAM
C C R
Super Cluster
TX
RX
B
TX
TX
RX
RX
C C R
[Actel, RTAX-S RadTolerant FPGAs 2007]
Prof. Fernanda Lima Kastensmidt
ACTEL: RTAX-S device
C
C-CELL
R
R-CELL
CFN
C-CELL
D1
D3
B0
B1
FCI
D1
D3
B0
B1
CFN
Susceptible to SET
C
Robust to SEU
0
1
0
1
X
0
1
0
1
0
1
0
1
0
1
0
1
X
0
1
X
0
1
FCO
Y
A1
A0
DB
D0
D2
Y
A1
A0
DB
D0
D2
0
1
ERROR
[Actel, RTAX-S RadTolerant FPGAs 2007]
Prof. Fernanda Lima Kastensmidt
Effects of Frequency Response
Circuit: Shift Register with 8 levels of C-cell between R-cells
# ERROR
Error cross-section
increases when
frequency increases.
clk edge
[Berg, M. et al., IEEE TNS 2006]
Prof. Fernanda Lima Kastensmidt
RadHard Eclipse FPGA from Aeroflex
ERROR
X
hardened flip-flops
Robust to SEU
ViaLink connections
Prof. Fernanda Lima Kastensmidt
Antifuse FPGAs: summary
 Customized routing is not sensitive to SEU
 Flip-flops are not sensitive to SEU
– Actel and Aeroflex provides one solution where all
flip-flops are hardened.
 Logic are susceptible to DSETs
– The user may protect the logic by using high level
mitigation techniques in the VHDL/VERILOG
description of the design (TMR, duplication and
others)
Prof. Fernanda Lima Kastensmidt
SRAM-based FPGAs

Volatile: loose their contents information when the memories
are not connected to the power supply.
 They can be reprogrammed as many times as necessary at
the work site
 They are programmed by loading a bitstream

FPGAs products for Space
– XILINX
– ATMEL
– HONEYWELL
Prof. Fernanda Lima Kastensmidt
SRAM-based FPGAs
Basic board must be composed of:
The original design
bitstream must be stored
in a memory outside the
FPGA.
Programming Interface
Memory size needed:
Bitstream may range
from Kbytes to several
Mbytes.
Osc.
Power Supply
Core & IO
EEPROM
FPGA
LOADER
& MEMORY
110101011
Prof. Fernanda Lima Kastensmidt
FPGA
IO Interface
Reconfigurability
Can offer benefits for space and remote applications by:

saving space in the system: the same circuitry can be
used with different configurations at different stages of a
mission, reducing weight and power requirements.

allowing in-orbit design changes reducing the mission
cost by correcting errors
If part of an FPGA fails, then circuitry can be reprogrammed
to make use of remaining functional portions of the chips.
Prof. Fernanda Lima Kastensmidt
FPGA Design Flow
Hardware Description Language
Synthesis optimizations
Logic mapping
Placement
Routing
configuration bitstream
… 101001110100000111…
Prof. Fernanda Lima Kastensmidt
Technology Scaling in Xilinx FPGAs
Nanometer technologies
Embedded Hard microprocessor
Embedded memories (BRAM)
Prof. Fernanda Lima Kastensmidt
SRAM-based FPGA Architecture
Xilinx FPGA
Configurable
logic block
Lookup Table
(LUT) (CLB)
A
B
C
D
0
1
1
slices
1
1
1
1
1
0
BRAM
Boolean
Function
GRM
F(A,B,C,D)
1
0
0
1
0
1
0
Prof. Fernanda Lima Kastensmidt
‘0’
SEU in SRAM-based FPGAs: CLB slice
I1
LUT
I2
I3
I4
CLB slice
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1
1
Transient Effect
(corrected at
next ffp load)
LUT
routing
Persistent effect (corrected by scrubbing)
Prof. Fernanda Lima Kastensmidt
Configuration memory bits
SET in SRAM-based FPGAs : CLB slice
I1
LUT
I2
I3
CLB slice
I4
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1
1
SET may be
captured by the
ffp.
X
LUT
routing
Configuration memory bits
Prof. Fernanda Lima Kastensmidt
General Routing Matrix (GRM)
Direct lines
Long lines
CLB CLB
CLB
CLB
CLB
CLB
CLB
CLB
CLB
Hex lines
Hex connections
CLB
CLB
CLB
Direct connections
CLB
CLB
CLB
CLB
Fast connect
Double lines
CLB
Prof. Fernanda Lima Kastensmidt
CLB
CLB
CLB
CLB
SEU in SRAM-based FPGAs: Routing
Direct connections:
0
Hex connections:
1
open
open
short
short
1
0
short
open
0 1
Prof. Fernanda Lima Kastensmidt
1 1
Other sensitive structures
Power-on Reset (POR)
• Low probability of occurrence
• Signature: done pin transitions low, I/O becomes tristated, no user functionality available
• Solution: reconfigure device
Single-Event-Functional Interrupts
(SEFI)
SelectMAP and JTAG controllers
• Low probability of occurrence
• Signature: loss of communication, read access to
configuration memory returns constant value.
• Solution: reconfigure device
Input and Output Blocks (IOB)
Digital Clock Manager (DCM)
Power-PC Hard IP
Multi-Gigabit Transceivers (MGT)
Prof. Fernanda Lima Kastensmidt
SEE Characterization – Heavy Ion:
Static Testing in Virtex4
BRAMs present higher error cross-section
compared to CLBs
Error cross-section of POR in Virtex4
has improved compared to Virtex-II.
[George, et al. IEEE Radiation Effects Data Workshop, 2006]
Prof. Fernanda Lima Kastensmidt
Scrubbing
Hardware Description Language
TMR by hand
ISE tool
Placement
Routing
ISE tool
Synthesis optimizations
Logic mapping
Placement
Routing
configuration bitstream
… 101001110100000111…
Scrubbing
(full or partial
reconfiguration)
10101011..
output
Prof. Fernanda Lima Kastensmidt
Fault Injection
(fault tolerance verification)
Scrubbing: continuous configuration
• No application interruption
XQR18V04
BOOT
DATA[7:0]
OE/RESET
CE
GND
CLK
PROM
10001101010
It does not correct upsets in:
- Embedded Memory (BRAM)
- CLB flip-flops
SRAM-based FPGA
DATA[7:0]
INIT
DONE
CS
WR
I/O
I/O
Original bitstream
SCRUB
Controller
XQR18V04
SCRUB
DATA[7:0]
OE/RESET
CE
GND
CLK
I/O
I/O
I/O
OSC
CCLK
Prof. Fernanda Lima Kastensmidt
Configuration bits
00000001010
00000001010
10101010100
10101010100
10101010010
10101010010
10101010101
10101000101
01010100101
01010100101
11111111101
11111111101
11100000000
11100000000
11101010101
11101010101
10101010101
10101010101
00101000010
00101000010
Configuration Scrubbing Example:
to correct persistent effect faults
Scrub
Column
x
Configuration
Upset
Prof. Fernanda Lima Kastensmidt
Configuration Scrubbing Example:
to correct persistent effect faults

Scrubbing rate is
important to reduce
the probability of
multiple upsets.

Scrubbing can be
performed:
– from outside the
FPGA by
another FPGA
controller
– from inside the
FPGA:
Hardware
Internal
Configuration
Access Port
(HWICAP)
Scrub
Column
Configuration
Upset
Repaired
Prof. Fernanda Lima Kastensmidt
Mitigation Techniques
Hardware Description Language
TMR by hand
ISE tool
Placement
Routing
ISE tool
Synthesis optimizations
Logic mapping
Placement
Routing
configuration bitstream
… 101001110100000111…
Scrubbing
(full or partial
reconfiguration)
10101011..
output
Prof. Fernanda Lima Kastensmidt
Fault Injection
(fault tolerance verification)
X-TMR
Full TMR in:
 Combinational logic
 Sequential Logic
 Inputs/Output pads
Why do we need full TMR?
To guarantee the correct output
in the presence of the persistent
effect errors that are corrected
only by loading the correct
bitstream.
FPGA
REDUNDANT
LOGIC (tr2)
REDUNDANT
LOGIC (tr1)
REDUNDANT
LOGIC (tr2)
REDUNDANT
LOGIC (tr0)
REDUNDANT
LOGIC (tr1)
REDUNDANT
LOGIC (tr2)
package PIN
TMR Output Voter
REDUNDANT
LOGIC (tr1)
REDUNDANT
LOGIC (tr0)
TMR flip-flop
INPUT
TMR flip-flop
REDUNDANT
LOGIC (tr0)
OUTPUT
package PIN
Prof. Fernanda Lima Kastensmidt
FPGA
REDUNDANT
LOGIC (tr2)
REDUNDANT
LOGIC (tr1)
REDUNDANT
LOGIC (tr2)
TMR Output Voter
REDUNDANT
LOGIC (tr1)
REDUNDANT
LOGIC (tr0)
TMR flip-flop
INPUT
TMR flip-flop
REDUNDANT
LOGIC (tr0)
REDUNDANT
LOGIC (tr0)
REDUNDANT
LOGIC (tr1)
REDUNDANT
LOGIC (tr2)
OUTPUT
package PIN
The recovery path is mandatory to
correct the state of the flip-flops,
specially in FSM.
TMR flip-flop
tr0
tr1
tr2
clk0
clk1
clk2
MAJ
R0 R1 R2 MAJ
0
0
0
0
1
1
1
1
MAJ
MAJ
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
0
0
1
0
1
1
1
LUT: 00010111_00010111
Prof. Fernanda Lima Kastensmidt
FPGA
REDUNDANT
LOGIC (tr2)
REDUNDANT
LOGIC (tr1)
REDUNDANT
LOGIC (tr2)
REDUNDANT
LOGIC (tr0)
REDUNDANT
LOGIC (tr1)
REDUNDANT
LOGIC (tr2)
TMR Output Voter
REDUNDANT
LOGIC (tr1)
REDUNDANT
LOGIC (tr0)
TMR flip-flop
INPUT
TMR flip-flop
REDUNDANT
LOGIC (tr0)
OUTPUT
package PIN
package PIN
REF
R0 R1 R2 MAJ
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
0
0
1
1
0
0
0
0: it allows the data to pass
to the output pad.
R0
O_voter
3-state_0
R0
1: it blocks the data
R1
O_voter
3-state_1
R1
LUT: 00011000_00011000
R2
O_voter
3-state_2
R2
Prof. Fernanda Lima Kastensmidt
Evaluating TMR I/O pads
[Swift et al, IEEE TNS 2004]
Inputs at 66 MHz
Prof. Fernanda Lima Kastensmidt
Evaluating TMR I/O pads
[Swift et al., IEEE TNS 2004]
Heavy Ion
Prof. Fernanda Lima Kastensmidt
Evaluating Multiple Bit Upsets
Heavy ion radiation static test:
Virtex Family
Virtex II Family
220nm CMOS
130nm CMOS
[Quinn, et al., IEEE TNS, 2005]
Prof. Fernanda Lima Kastensmidt
Domain Crossing Events
Bit-flips in the routing can generate short cut connections
among different blocks of the TMR (tr0, tr1 and tr2).
FPGA
REDUNDANT
LOGIC (tr1)
REDUNDANT
LOGIC (tr2)
package PIN
a
tr0
X
TMR Output
Majority Voter
INPUT
TMR register
with voters and refresh
REDUNDANT
LOGIC (tr0)
OK
tr1
tr2
OK
OUTPUT
OK
package PIN
Bit-flip a: affects only the redundant logic tr0,
consequently, the majority voter choose the correct
result (two out of three outputs).
Prof. Fernanda Lima Kastensmidt
Domain Crossing Events
Bit-flips in the routing can generate short cut connections
among different blocks of the TMR (tr0, tr1 and tr2).
FPGA
REDUNDANT
LOGIC (tr1)
REDUNDANT
LOGIC (tr2)
package PIN
tr0
b
OK
X
tr1
tr2
X
TMR Output
Majority Voter
INPUT
TMR register
with voters and refresh
REDUNDANT
LOGIC (tr0)
X
OUTPUT
package PIN
Bit-flip b: affect two redundant logic parts,
consequently, the majority voter will not choose the
correct result (two out of three outputs).
Prof. Fernanda Lima Kastensmidt
Solution to Reduce
Domain Crossing Events
Voters Insertion:
Barrier of voters can reduce the probability of a bit-flip in
the routing causing a short cut connection among two
or more redundant blocks.
[Kastensmidt, et al., DATE 2005]
FPGA
package PIN
tr2
OK
OK
b
tr1
tr2
X
OK
OK
OK
tr0
OK
X
tr1
tr2
TMR Output
Majority Voter
REDUNDANT
LOGIC (tr2)
tr1
tr0
TMR Majority Voter
REDUNDANT
LOGIC (tr1)
tr0
TMR Majority Voter
INPUT
TMR register
with voters and refresh
REDUNDANT
LOGIC (tr0)
OK
OUTPUT
OK
package PIN
logic partition
Prof. Fernanda Lima Kastensmidt
TMR BRAM (Embedded memory)
 Upsets in BRAMs are not
corrected by scrubbing.
 TMR with refreshing must
be used to mitigate upsets.
 Need to use Dual Port
BRAMs.
Mechanism to refresh the
memory contents
– Counter
– Voters
Prof. Fernanda Lima Kastensmidt
X
OK
OK
Verifying the Mitigated Design
Hardware Description Language
TMR by hand
ISE tool
Placement
Routing
ISE tool
Synthesis optimizations
Logic mapping
Placement
Routing
configuration bitstream
… 101001110100000111…
Scrubbing
(full or partial
reconfiguration)
10101011..
output checking
Prof. Fernanda Lima Kastensmidt
Fault Injection
(fault tolerance verification)
Flash-based: Actel
ProASIC3
Prof. Fernanda Lima Kastensmidt
Flash-based FPGA: CLB tile
Prof. Fernanda Lima Kastensmidt
Summary
Antifuse FPGAs:
- Fault tolerance techniques applied in VHDL/Verilog
- protect SET (SEU is protected by the vendor)
SRAM FPGA
- Fault tolerance techniques applied in VHDL/Verilog
- Scrubbing to clean persistent faults
- protect SET and SEU
- New FPGA protected by Vendor is coming out!
Flash FPGA
- Fault tolerance techniques applied in VHDL/Verilog
- protect SEU and SET
- Flash transistor sensitivity for SEE is low, still under
Investigation
Prof. Fernanda Lima Kastensmidt
Outline

Radiation Effects on Digital ICs

Radiation Hardening by Design: Strategies for ASICs

Radiation Effects on FPGAs

Radiation Hardening by Design: Strategies for FPGAs

Final Remarks
Prof. Fernanda Lima Kastensmidt
Final Remarks

Mitigation techniques for ASICs and FPGAs must take
into account SEUs and SETs considering single and
multiple effects.
 ASICs: Integrated systems fabricated at nanometer
technologies should have mitigation techniques at
different levels to ensure robustness:
– charge dissipation (transistor resizing, capacitors,
resistors)
– Sensors (bulk-BICS)
– hardware and time redundancy
– Error-correction codes (ECCs)
– Self-checking and recomputation
Prof. Fernanda Lima Kastensmidt
Final Remarks



FPGAs: new FPGA generations bring more flexibility and
design capabilities but also more reliable design challenges.
The design can always be protected by high level
techniques (VHDL, VERILOG) such as TMR.
In order to reduce the cost of TMR, solutions at the FPGA
architectural level must be done in:
– CLB logic:



Combinational blocks
Sequential blocks
Programmable switches
– Routing programmable switches
… to mitigate against SEU and SET!
Prof. Fernanda Lima Kastensmidt
Conferences
NSREC –
IEEE Nuclear and Space Radiation Effects Conference
www.nsrec.com
RADECS
European Conference on Radiation Effects on Components
and Systems
www.radecs.org
2011- RADECS in Sevilla, SPAIN
Prof. Fernanda Lima Kastensmidt
Schools
SERESSA
First: 2006 - Manaus - Brazill
Second: 2007 - Sevilla - Spain
Third: 2008 - Buenos Aires - Argentina
Fourth: 2009 - Florida, USA
2010 - France
2011 - Brazil
Prof. Fernanda Lima Kastensmidt
Takasaki, Japan
December 2-4 th, 2009
TECHNICAL PROGRAM
9
10
AM
11
Registration
Welcome
Robert Ecoffet (CNES)
& Pascal Fouillat (IMS)
Environments & Anomalies
Daniel Loveless (Vanderbilt Univ.)
Basics
Michel Pignol (CNES)
System hardening
TBD (ONERA)
Radiation testing
Dale McMorrow (NRL)
& Vincent Pouget (IMS)
Laser testing
TBD (JAXA)
TBD
Massimo Violante (Polito)
Software hardening
Sarah Armstrong (NSWC)
Basics
Ron Schrimpf (Vanderbilt Univ & ISDE)
Single event effects
Fernanda Lima-Kastensmidt (UFRGS)
SEU & SET in FPGA
Tour of JAEA Radiation testing facilities
Raoul Velazco (TIMA)
Experiments & Rate prediction
Guy Berger (UCL)
& Paul Peronnard (TIMA)
Remote Heavy Ion testing
TBD (JAEA)
TBD
12
1
2
3
PM
Hugh Barnaby (ASU)
Total dose effects
TBD (LIMMS)
MEMS in space applications
4
5
Philippe Adell (JPL)
Rad effects
Power systems
Vincent Pouget (IMS)
Remote laser testing
Conclusions
Bob Walters (NRL)
Radiation effects in solar cells
Prof. Fernanda Lima Kastensmidt
SEE Mitigation Strategies for
Digital Circuit Design
Applicable to ASIC and FPGAs
Fernanda Lima Kastensmidt, Ph.D.
[email protected]