PowerPoint Template

Download Report

Transcript PowerPoint Template

Complex Upset Mitigation Applied to a
Re-Configurable Embedded Processor
EEL 6935
Lu Hao
Wenqian Wu
Outline
•
•
•
•
Issues of SRAM-based FPGA used for space application
Upset mitigation solutions
Resource usage and performance analysis
Summary
System on Programmable Chip
•
Soft-core processor implemented in SRAM based FPGA is very attractive to
spacecraft designer. A complete computer system can be created on a single FPGA
chip.
MicroBlaze core
•
•
MicroBlaze is a soft processor core designed for Xilinx FPGAs.
Many aspects of the MicroBlaze can be user configured: cache size, pipeline depth
(3-stage or 5-stage), embedded peripherals, memory management unit, and businterfaces.
Onchip
peripheral
bus
Local
memory
bus
Space application issues
•
Radiation environment
In space, high energy ionizing particles exist as part of the natural background.
In addition, solar particle events and high energy protons trapped in the Earth's magnetosphere
(Van Allen radiation belts).
These electro-magnetic radiation brings potential threats to electronic devices.
•
Single Event Upset (SEU)
SEU is a change of state caused by ions or electro-magnetic radiation striking a sensitive node in
a micro-electronic device, such as in a microprocessor, semiconductor memory, or power
transistors. The state change is a result of the free charge created by ionization in or close to an
important node of a logic element (e.g. memory "bit").
•
FPGA is susceptible to SEU
data/instruction stored in block memory
configuration bits stored in distributed RAM
•
Upsets mitigation technique is one of key issues for SRAM-based FPGA
design for space application
Proposed upset mitigation
•
To ensure reliable space application based on SRAM-FPGA, the author
investigates 3 level of upset mitigation:
– Functional-block design triplication
– Continuous external configuration scrubbing
– Independent internal BRAM scrubbing (also triplicated)
Tool, device and environment
•
Tools:
Xilinx TMR: easily trade off maximum radiation effect immunity against area, pinout, and board
layout consideration.
•
Device:
Xilinx Virtex II XQR2 V6000 FPGA
•
Program running in MicroBlaze:
Integer-based FFT
•
Test environment:
Crocker Nuclear Laboratory at University of California at Davis using a proton beam of 63.3 MeV.
•
Test borad
Two FPGAs, one is device under test (DUT), the other is service FPGA
DUT and Service FPGA
•
Service FPGA performs two functions:
1) configuration readback and scrubbing DUT when there is readback error
2) control and monitoring of the functional operation of the MicroBlaze running the
FFT program
•
•
•
Program (FFT) is stored in internal BRAM each time the DUT is configured
Data is sent to DUT internal BRAM by service FPGA.
The result of FFT program are returned to service FPGA and compared to the
expected result.
uBlaze
Service FPGA
DUT
BRAM
Upset Mitigation
• Mitigation solution
1. Functional-block design triplication
2. Continuous external configuration scrubbing
3. Independent internal BRAM scrubbing (also triplicated)
TMR
•
Triple Module Redundancy
3 modules performing the same task, only the majority will be pick up as output by
the Voter.
TMR
If any one of the three systems fails, the other two systems can correct and mask the
fault. If the voter fails then the complete system will fail. However, in a good TMR
system the voter is a critical component and should be much more reliable than the
other components.
Xilinx TMR
Upset mitigation
• Mitigation solution
1. Functional-block design triplication
2. Continuous external configuration scrubbing
3. Independent internal BRAM scrubbing (also triplicated)
External Configuration Scrubbing
•
Configuration scrubbing is the process of rewriting the configuration
memory of an FPGA for the purpose of correcting any errors that may have
accumulated since the device was last configured.
•
Service FPGA will detect readback error, and scrub the configuration by
reloading bitstream to correct upsets.
•
Transparent process
normal device operation runs concurrently and without interruption
•
Configuration scrubbing frequency: 16 MHz, i.e. 4 scrub-cycles per sec
Upset mitigation
• Mitigation solution
1. continuous external configuration scrubbing
2. functional-block design triplication
3. Independent internal BRAM scrubbing (also triplicated)
Independent internal BRAM scrubbing
BRAM Triplication
Port A: used for MicroBlaze
processor
Port B: counter connected; used
for error detection and correction
BRAM Triplication
•
TMR counter
– Allow continuous refreshing of the BRAM
contents
– Cycle through the memory addresses
incrementing the BRAM address of the
second port
– In case the first port of the BRAM is not
being used, it rewrites the BRAM content
at this specific address with the voted
value from the associated voter (TRV16).
•
BRAM
– Conventional BRAM
•
Associated voter (TRV 16)
– Compares three values from the same
address of three BRAMs, selects the
majority and writes back to the
corresponding address.
Testing
• Two mitigated versions of the MicroBlaze design architecture have been
implemented and tested:
– with the BRAM scrubber.
– without the BRAM scrubber.
• Error types:
– Type 1 errors: FFT outputs were wrong.
• Type 1a: Corrected after a configuration scrub cycle
• Type 1b: Not corrected after a scrub cycle, even after a reset of the DUT design
– Type 2 errors: Nonresponsiveness of the DUT, requiring a reset and
synchronization
• Type 2a: Corrected by scrubbing and hence referred to as a recovering reset
• Type 2b: Not corrected by scrubbing and referred to as a runaway reset.
– This type of error (runaway reset) is an uncorrected error condition that
This is what we causes the functional monitor to continually attempt to reset the MicroBlaze
processor each time the watchdog timer set for the handshaking between
emphasis on
the two FPGAs reaches its limit value.
– Type 3 errors: Occurrence of an exception or interrupt detection.
(No BRAM scrubber)
(BRAM scrubber)
Is BRAM code corruption the main reason of runaway resets?
Standalone test
• To make sure that the BRAM code corruption is likely to
be the cause of these runaway resets, the BRAM
mitigation design has been implemented in standalone
mode and tested under proton beams at similar fluxes
and at the same facility.
Runaway Resets Caused by BRAM Corruption
•
At a flux (1.70×108), at least 17% (1.21×10-11/6.82×10-11) of the runaway resets are
due to errors in the BRAM code, while at a (1.70×109) flux, 23% of them are caused
by code corruption.
Exceptions Caused by BRAM Runaway Resets
•
•
•
•
Design 1: An average of 64% of the unrecovered resets (due to BRAM code
corruption) has been detected by exceptions (64% at the flux 1 and 80% at the flux
2).
Design 2: exceptions were observed only after an increase of two orders of
magnitude of the flux (1.70×109) and only 25% of the runaway resets have been
detected.
Not all the illegal states are detected by the exception mechanism.
– At a lower flux (1.70×108) , although seven resets have been observed, no
exceptions have been detected
The MicroBlaze was optimized to fit in the Xilinx FPGAs and the exception circuitry
has been designed to detect only major illegal operations.
Conclusion
• Issues of SRAM-based FPGA used for space application
– Single Event Upset (SEU) can be caused by radiation
environment
– So we need fault tolerance system
• Complete solution of upset mitigation implemented on Xilinx Virtex II
FPGA
– continuous external configuration scrubbing
– functional-block design triplication
– Independent internal BRAM scrubbing (also triplicated)
• Testing results
– BRAM code corruption is the main reason causing runaway
resets
Reference
•
•
•
•
•
•
•
•
•
[1] F. Lima, C. Carmichael, J. Fabula, R. Padovani, and R. Reis, “A fault injection analysis of virtex FPGA TMR
design methodology,” presented at the Radiation and Its Effects on Components and Systems, Sep. 2001.
[2] F. Lima(de), S. Rezgui, E. F. Cota, M. Lubaszewski, and R. Velazco, “Designing and testing a radiation
hardened 8051-like micro-controller,” presented at the Military and Aerospace of Programmable Devices and
Technologies Conf., Laurel, MD, Sep. 2000.
[3] G. Swift et al., “Dynamic testing of xilinx virtex-II field programmable gate array’s (FPGA’s) Input Output Blocks
(IOB’s),” IEEE Trans. Nucl. Sci., vol. 51, no. 6, pp. 3469–3474, Dec. 2004.
[4] C. Carmichael, B. Bridgford, and J. Moore, “Triple module redundancy scheme for static latch-based FPGAs,”
presented at the Military and Aerospace of Programmable Devices and Technologies Conf., Laurel, MD, Sep.
2004.
[5] Triple Module Redundancy Design Techniques for Virtex FPGAs, Xilinx Appl. Note XAPP197, C. Carmichael.
(2001, Nov.). [Online]. Available: http://www.xilinx.com/bvdocs/appnotes/xapp197.pdf
[6] MicroBlaze Processor Reference User Guide, Xilinx, Inc., Aug. 2004. Embedded Development Kit (EDK 6.3),
UG081, Version 4.0.
[7] FFT C Code, T. Roberts and M. Slaney. (1994, Dec.). [Online]. Available: http://www.jjj.de/fft/int_fft.c
[8] TMR Tool User Guide, Xilinx, Inc., UG156, Version 6.2.3 (2004, Sep.). [Online]. Available:
http://support.xilinx.com/products/milaero/ug156.pdf
[9] Triple Module Redundancy Design Techniques for Virtex FPGAs, Nov. 2001. Xilinx Appl. Note 197.
Thanks!
Questions?