201_carmichael_bof-l..

Download Report

Transcript 201_carmichael_bof-l..

SEE Validation of SEU
Mitigation Methods for
FPGAs
Carl Carmichael1 , Sana Rezgui1, Gary Swift2, Jeff George3, & Larry
Edmonds2
1Xilinx
Corporation, San Jose CA
2Jet Propulsion Laboratory, Pasadena CA
3Aerospace Corporation, Albuquerque NM
"This work was carried out in part by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and
Space Administration."
"Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not constitute or
imply its endorsement by the United States Government or the Jet Propulsion Laboratory, California Institute of Technology."
1
P201-L/MAPLD2005
XTMR SEE Testing
• Experiments were devised to focus TMR mitigation on
major architectural elements of the Virtex-II FPGA.
– Sequential State-Machines were created with Registers,
Multipliers, and Memories
• Configurable Logic Block
– Combinatorial Logic, Sequential Logic, Arithmetics, Multiplexing.
– Design implementation is an array of counters.
• Multipliers
– Dedicated 18 x 18 bit multiply function blocks.
– Design implementation is array of Multiply and Accumulate functions.
• Block Memories
– Synchronous Dual Port 18k bit RAM blocks.
– First Design is large memory block rewritten externally.
– Second design Design implemented as an array of ROMs initialized to
incrementing values with internal EDAC.
2
P201-L/MAPLD2005
Plot Definitions
• Predicted SEFI cross-section
–
–
Static and Dynamic SEE Characterization of the Virtex-II FPGA revealed several Single Event Functional
Interrupt Modes: POR (2.5E-06), SMAP (1.72E-06), IOB (4.2E-06)
These combined cross-sections represent the minimum functional error cross-section for a single Virtex-II
(XQR2V6000) device on orbit.
• Worst Case Orbital Upset Rate
–
CREME96 calculation of the worst case orbital upset rate for a XQR2V6000 is 7,740 bit-errors/day (9E-02
bit-errors/sec) in a GEO orbit at 36,000km during the worst day of an Anomalously Large Solar Flare
accounting for both Heavy Ion and Proton. In a 40MeV Kr beam the exact same upset rate is achieved with
a Flux of 1.25E-01 p/cm2/s. This denotes that the equivalent upset rates for all other orbits and solar
conditions would reside to the LEFT of this line.
• Single Event Functional Interrupts
–
This is the average cross-section of the observed SEFI(s) while collecting the data represented in the plot.
This cross-section is not Flux dependent. Variations from the predicted value are due to statistical
significance of the total accumulated fluence during each test.
• Functional Errors
–
Data plot of the observed events when the Device Under Test returned an incorrect result. Cross-section is
determined by the number of error events divided by total fluence at the specified flux. TMR denotes that the
DUT design was fully mitigated with XTMR and scrubbing. The Unmitigated results were obtained with an
identically functional design without XTMR, however scrubbing was also used for the unmitigated test.
• Extrapolation
–
A derived function describing the relation between Mitigation failure as a function of upset rate. Extension of
the function predicts functional error cross-sections at worst case orbital upset rates to be less than SEFI
cross-sections.
3
P201-L/MAPLD2005
PLOT 1
XQR2V6000 Mitigation Error Statistics
(CLB/IOB Logic: State-Machines)
1.00E-02
Sigma (cm2/device)
1.00E-03
1.00E-04
3.5E-02
3.5E-01
3.5E+00
3.5E+01
3.5E+02
Configuration Bit
Errors per Scrub Cycle
36,000km GEO Orbit
Worst Day Solar Flare
8,000 bit-errors/day
All other
orbits
40 MeV Kr
LET= 22.3 MeV/cm2/mg
SEFIs drive error rate for
all designs and all orbits.
Unmitigated Functional Errors
TMR Functional Errors
1.00E-05
Extrapolation (square root
function)
Single Event Functional
Interupts (SEFIs)
Worst Case Orbital Upset Rate
(9E-2 Upsets/Sec)
Predicted SEFI Cross-Section
1.00E-06
1.00E-07
1.E-02
3.5E+03
Mitigation errors on orbit are always less
than SEFI errors by orders of magnitude
1.E-01
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
Beam Flux (particles/cm2/s)
4
P201-L/MAPLD2005
PLOT 2
XQR2V6000 Mitigation Error Statistics
(Dedicated Multipliers: Multiply-and-Accumulate)
1.00E-02
Sigma (cm2/device)
1.00E-03
1.00E-04
3.5E-02
3.5E-01
3.5E+00 3.5E+01 3.5E+02
Configuration Bit
Errors per Scrub Cycle
36,000km GEO Orbit
Worst Day Solar Flare
8,000 bit-errors/day
All other
orbits
3.5E+03 3.5E+03
40 MeV Kr
LET= 22.3 MeV/cm2/mg
SEFIs drive error rate for
all designs and all orbits.
Unmitigated Functional Errors
TMR Functional Errors
1.00E-05
1.00E-06
Mitigation errors on orbit are always less
than SEFI errors by orders of magnitude
Extrapolation (square root
function)
Single Event Functional
Interupts (SEFIs)
Worst Case Orbital Upset Rate
(9E-2 Upsets/Sec)
Predicted SEFI Cross-Section
1.00E-07
1.E-02 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05
Beam Flux (particles/cm2/s)
5
P201-L/MAPLD2005
PLOT 3
XQR2V6000 Mitigation Error Statistics
(Block Memory: Read/Write)
3.5E-02
1.00E+02
1.00E+00
Sigma(cm2)
1.00E-02
1.00E-04
All other
orbits
3.5E-01
3.5E+00 3.5E+01 3.5E+02 3.5E+03 3.5E+03
Configuration Bit
Errors per Scrub Cycle
36,000km GEO Orbit
Worst Day Solar Flare
8,000 bit-errors/day
TMR Functional Errors
SEFIs drive error rate for
all designs and all orbits.
1.00E-06
1.00E-08
1.00E-10
40 MeV Kr
LET= 22.3 MeV/cm2/mg
Extrapolation (square root
function)
Single Event Functional
Interupts (SEFIs)
Worst Case Orbital Upset Rate
(9E-2 Upsets/Sec)
Predicted SEFI Cross-Section
Mitigation errors on orbit are always less
than SEFI errors by orders of magnitude
1.00E-12
1.E-02 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05
Beam Flux (particles/cm2/s)
6
P201-L/MAPLD2005
Improved SEE Test
Methodology for Mitigation
• There is an expected physical relationship between functional error rate of a
mitigated system as a function of upset rate. The expected relationship is a
function that predicts the increasing probability of upsetting bit combinations that
will cause a mitigated (TMR) system to fail as a function of bit upset rate:
R
–
–
–
–
–
1
1 M

 [3 exp( 2 Ni r TC )  2 exp( 3Ni r TC )] .
TC TC i 1
R = Mitigation Error Rate
M = Number of groups of relevant bits
NB = Average number of relevant bits per group
TC = Scrub Time
r = Upset Rate of relevant bits.
• Therefore, testing at extremely high fluxes over several orders of magnitude
variation can be performed to reveal this functional relationship between
mitigation error rate and bit upset rate.
• This function can then be extrapolated to make predictions at the much lower
upset rates of earth orbits.
7
P201-L/MAPLD2005
Mitigation System Topology
Module 1
Module 2
Module 3
Block (1,1)
N1 bits
Block (1,2)
...
N1 bits
...
...
Group 1
N1 bits
Block (1,3)
...
N2 bits
Block (2,3)
…
Block (2,2)
…
…
Block (2,1)
N2 bits
…
N2 bits
...
...
Group 2
Block (M,1)
NM bits
Block (M,2)
8
...
NM bits
...
...
Group M
NM bits
Block (M,3)
P201-L/MAPLD2005
Probability Function Fit for
Counter Data
R (system errors/second)
M=9224
Ni=200 (same number of bits in each block )
Sigma per bit =2.1E-8 cm2
TC=0.266 sec
10
1
Counters
data
small-r form extrapolated
fit using exact equation
0.1
0.01
0.001
0.0001
1e-7
1e-6
1e-5
1e-4
1e-3
r (bit errors/bit-second)
9
P201-L/MAPLD2005
Conclusions
• Efficiency and accuracy of the validation of mitigation
techniques is greatly improved by demonstrating the
upset rate dependency of the mitigation method by
testing at Flux rates that overwhelm the mitigation.
• The static SEFI cross-section is the dominating factor for
calculating orbital error rates for any Virtex-II design
when mitigated with Full XTMR & Scrubbing.
– Additional Work
• Self-Scrubbing BlockRAMs
• Self Scrubbing FPGA Configuration
• Soft-core processors (e.g. Microblaze)
10
P201-L/MAPLD2005