Transcript Slide 1

SEFI Mitigation Technique for COTS
Microprocessors – Proton Testing
Demonstration
D. Czajkowski, P. Samudrala, D. Strobel, and M. Pagey
Space Micro Inc.,
Suite 400
San Diego, CA 92121
Czajkowski
1
MAPLD 2005/139
Overview
➢ Single Event Functional Interrupts (SEFI) introduction
➢ H-Core chip for SEFI mitigation in microprocessors
➢ Tests/Results for Pentium P-III processor
➢ Tests/Results for TI-DSP Processor
➢ Tests/Results for BSP Processor
➢ Summary
Czajkowski
2
MAPLD 2005/139
Single Event Functional Interrupts
➢ EIA/JEDEC Standard No. 57 (1996): “ The loss of functionality of the
device that does not require cycling of the devices power to restore
operability unlike SEL and does not result in permanent damage as in SEB”
➢ SEFIs observed in various complex integrated circuits: EEPROMs,
DRAMs, ADC/DACs, and Microprocessors.
➢ Most common solution for SEFIs is to power cycle the system:
“Even single bit-flips can create circuit-level effects that cause string of
errors, or even result in a “lock-up” condition that requires removal of
power and subsequent re-initialization to resume proper operation”
Czajkowski
3
MAPLD 2005/139
SEFIs in Microprocessors
➢ SEFI Characteristics
✔ Processor Hangs Suddenly
➢ Probable causes of “Hangs”
✔ Illegal branching
✔ Upsets in program counter of the CPU
✔ Jumps to undefined/test states
➢ Approx. rates : 1 per 100 days for SOI Power PC and 1per 10 for
CMOS version
➢ Current solution to power cycle the system
➢ Results in unnecessary delays and data loss
Czajkowski
4
MAPLD 2005/139
H-Core Technique
➢ H-Core
✔ Combination of Software and Hardware
CPU
✔ Monitors CPU Functionality
✔ Stores rollback information
✔ Detects and indicates SEFI occurrences
Bus
Controller
✔ Revives CPU from SEFI events
➢ H-Core
✔ Sends CPU alive messages
✔ Saves periodic roll-back information
Ethernet
✔ Reads SEFI indicator from H-Core chip, and
✔ Recovers running processes after SEFI events
Czajkowski
5
H-Core
Chip
Memory
SCSI
HBA
MAPLD 2005/139
The H-Core Chip
➢ Manufactured using rad-hard components
➢ Usable with any processor
➢ Provides min. 8 interrupt signals
➢ Uses MOSFET driver for power cycle
➢ Provides variable levels and pulse widths of interrupts
➢ Contains programmable CPU check timer
➢ Sets SEFI status signal for SEFI recovery software
➢ Provides external reset control
Czajkowski
6
MAPLD 2005/139
Radiation Tests
➢ Radiation Tests on three different processors were performed at Crocker
Nuclear Laboratory at UC, Davis
➢ Processors used
✔ Pentium P-III
✔ TI TMS320C6713 DSP
✔ Equator BSP-15 DSP
➢ Each processor was bombarded with radiation to induce a SEFI
➢ H-Core circuit was then used in mitigating the induced SEFIs
➢ The following slides discuss the tests
Czajkowski
7
MAPLD 2005/139
Irradiation Test Procedure
➢ Verify test loop results without radiation
➢ Start irradiation and monitor test loop results
➢ Stop irradiation when incorrect/no test loop results received
➢ Assert H-Core signals in sequence to revive the processor
➢ CPU assumed to be fully recovered when it responds to a signal
Czajkowski
8
MAPLD 2005/139
Radiation test of Intel PIII
➢ SEFI test performed by irradiating Pentium P-III processor
➢ H-Core circuit was used in bringing back the processor from SEFI
➢ Test performed at U.C, Davis, California
➢ Used a test program containing an infinite loop of arithmetic operations
Czajkowski
9
MAPLD 2005/139
Intel PIII Experimental Setup
VSBC-8d Test Computer
RS-232
Serial Console
Ethernet
H-Core Signals
➢Software/Hardware include:
✔ SEFI board and test loop
Monitor Computer
✔ Diagnostic self-tests
✔ Hardware watchdog
➢Controls test loops,
➢Collects test results, and
✔ Linux software watchdog
➢Sends H-Core signals
Czajkowski
10
MAPLD 2005/139
Test and Monitor Software
➢ VSBC-8d runs Linux OS
Monitor software
➢ Test loops:
✔Mathematical functions test
✔CPU timer test
✔Network communication test
✔IDE controller test
➢Monitor software:
✔Serial console and telnet
✔Socket communication with test loops
✔Data logging during irradiation
✔H-Core signal generation after SEFI
Output
H-Core signal controls
Czajkowski
11
MAPLD 2005/139
H-Core Signals for Intel P-III
➢ BINIT#
Bus state machine reset
➢ INIT#
Resets integer registers
➢ LINT0
General purpose interrupt signal.
➢ IRQ5
Hardware interrupt through PCI bus.
➢ LINT1
Non-maskable interrupt (NMI).
➢ RESET#
Intel PIII hardware reset signal.
➢ Software, hardware, and APIC watchdogs.
Czajkowski
12
MAPLD 2005/139
H-Core Signal Success Rate
➢ SEFI Occurrences and Recovery
✔ 21 SEFIs detected during
experiment
✔ 21 SEFIs recovered using HCore signals
✔ IRQ5, NMI, and RESET# most
effective signals
✔ Presence of software, hardware,
APIC watchdog aids recovery
Czajkowski
13
MAPLD 2005/139
TMS320C6713 Radiation Test
March 2004
➢ SEFI test performed by irradiating Texas Instrument's TMS320C6713
DSP
➢ A custom board called SEFI Switch was built
➢ SEFI switch with controlling software (running on monitor computer)
was used as a H-Core chip for SEFI mitigation
➢ Test performed at U.C, Davis, California
➢ Used a test program containing an infinite loop of arithmetic operations
Czajkowski
14
MAPLD 2005/139
TI-DSP Experimental Setup
Monitor Computer
TMS320C6713 DSP
USB
Communication
➢ Development Board
➢ Windows 2000 PC
➢ Runs TTMR Test Loops
➢ Monitors and controls the
TI-DSP board
➢ Communicates using USB-JTAG
link
Czajkowski
15
MAPLD 2005/139
Monitoring TI-DSP Execution
➢CodeComposer allows remote monitoring of processor
➢All processor registers can be observed during irradiation
➢Test loop results are transmitted back to Monitor computer
➢ If the processor hangs, interrupts and reset are asserted
➢ List of Interrupts for TI processor
✔ HD4
✔ INT7
✔ INT6
✔ INT5
✔ INT4
✔ NMI
Czajkowski
16
MAPLD 2005/139
Signatures of SEFI
Program counter at unexpected
memory location
➢Typical SEFI signatures observed during radiation experiment:
✔ Jumps to arbitrary memory locations containing random data
✔ Execution of valid instructions that are not part of current program
Czajkowski
17
MAPLD 2005/139
TI-DSP Radiation Test : Results
➢ A total of 9 SEFIs were observed
➢ Interrupts and Reset able to bring back the processor in 7 cases
➢ Observed a success rate of 77.7 % in mitigating SEFIs
➢ Reset was the most effective signal
Czajkowski
18
MAPLD 2005/139
BSP-15 Radiation Test
August 2004
➢ Processor : Equator BSP-15 DSP processor
➢ Estimating the performance of H-Core circuit by bombarding BSP-15
processor with different fluxes of radiation
➢ Test Facility : U.C, Davis, CaliforniaH-Core Signals
➢ Input Program : a set of arithmetic operations running infinitely
Czajkowski
19
MAPLD 2005/139
BSP-15 Experimental Setup
RS-232
Ethernet
Serial
Port
H-Core Signals
Monitor
Computer
Ethernet
Port
SEFI Switch
Czajkowski
20
Test Board
MAPLD 2005/139
BSP-15 SEFI Test
➢ SEFI switch and the controlling program running on the Monitor
computer acts as a H-Core chip
➢ Processor bombarded with radiation until it hangs
➢ The interrupts viz. INTD, INTB, INTA and RESET signal are asserted
after detecting a SEFI
➢ Customized interrupts service routines are called when an interrupt is
pulled
Czajkowski
21
MAPLD 2005/139
BSP-15 Test Results
➢ The processor has encountered 26 SEFIs during the test
➢ The interrupts were asserted in the increasing order of their severity
➢ Interrupts had a success rate of 11.5% in mitigating the SEFIs
➢ However, reset had 100% success rate
➢ In all the cases the processor was revived without powering cycling the
board
Czajkowski
22
MAPLD 2005/139
Conclusions
✔Anatomy of SEFI illustrated using proton irradiation of Pentium P-III,
TI-DSP, and Equator BSP – 15 microprocessors
✔Demonstrated the effectiveness of H-Core circuit in SEFI mitigation
✔Processors recovered from all detected SEFIs using H-Core signals
without requiring power cycle
✔Tests indicate that H-Core has 100 % success rate in mitigating SEFIs
without powering down the board
Czajkowski
23
MAPLD 2005/139