3_l222_seely_s
Download
Report
Transcript 3_l222_seely_s
Maintaining Data Integrity in
Programmable Logic in Atmospheric
Environments through
Error Detection
Joel Seely
Technical Marketing Manager
Military & Aerospace Business Unit
Single Event Upset (SEU)
Overview for SRAM-Based
FPGAs
Definitions
SEU: Single Event Upset
Unwanted Change in State of a Latch or a
Memory Cell
SER: Soft Error Rate
SEU Rate
SEFI: Single Event Functional Interrupt
Functional Failure by SEU
Not All SEUs are SEFIs
Generally Takes 5-10 SEUs to Cause SEFI
Copyright © 2004 Altera Corporation
Circuit Components of
SRAM-Based FPGAs
I/O Registers & I/O Configuration
No Issue, Very Robust Registers, < 1 FIT
Logic Registers (LEs)
No Issues, Very Robust Registers, < Hard Error Rate
User Memory
Typically On-Chip Memories are “By 9” for
Parity Checking
IP Available for ECC
Configuration RAM (CRAM) for LUTs & Routing
Area of Focus
Copyright © 2004 Altera Corporation
Noise Current for 10fC Collected Charge
Time
Vcc
200
Voltage
Voltage
Upset of a CRAM Cell
Current (µA)
Add
Time
150
Data In
100
Data Out
50
Clear
0
0
50
100
150
200
Vss
Time (ps)
6 Transistor Cell
Copyright © 2004 Altera Corporation
SEU Induced Failure Rate*
Device
LE Count
SEU
Rate
(FIT)
SEFI Rate
(FIT)
MTBF**
(Years)
EP1C6
6K
250
60
1,900 Years
EP1C20
20K
730
180
634 Years
EP1S25
26K
1950
400
285 Years
EP1S80
79K
6000
1200
95 Years
* Data at Sea Level
**MTBF: Mean Time Between Functional Interrupt
Copyright © 2004 Altera Corporation
Number of CRAM Bit Upsets for Each
Occurrence of Functional Upset
Altera EP1S25 Neutron SER - WNR data
Altera EP1S25 Alpha SER
3
99.5%
99%
2.5
Std Deviation
1.5
90%
84%
70%
60%
50%
40%
30%
20%
16%
1
0.5
0
-0.5
Median ~6
-1
-1.5
-2
1%
0.5%
-2.5
-3
0
10
20
30
40
50
# of CRAM bit upsets for each event of functional
upset
Std Deviation
2
3
2.5
2
1.5
1
0.5
0
-0.5
-1
-1.5
-2
-2.5
-3
99.5%
99%
Median 5
1%
0.5%
0
10
20
30
40
# of CRAM bit upsets for each event of functional upset
Copyright © 2004 Altera Corporation
90%
84%
70%
60%
50%
40%
30%
20%
16%
50
Addressing
System-Level Issues
SER Improvements/Mitigation
Chip Design Enhancements
New Materials & Process Enhancements
Larger CRAM Structure
Increase in Capacitance on Critical Node
Smaller Process => Smaller Die => Lower
SEU Probability
Built-In Error Detection/Correction Circuitry
Copyright © 2004 Altera Corporation
SER Per SRAM Bit Trend
SER per SRAM MBit
1,000 FITS
100 FITS
90 nm
Projection
0.5 µm
1995
Copyright © 2004 Altera Corporation
Process Technology
Year
0.13 µm
2002
System Level Improvements
Mitigation
ECC for User Memory
Use Detection/Correction Feature
Triple Module Redundancy (TMR)
To Achieve
Lower Error Rate & Less Downtime
Migrate to Structured ASIC
Copyright © 2004 Altera Corporation
Soft Error Detection Methods
Configuration RAM Readout
Read-Out Full Bitstream
Compare with Stored Bitstream
Can Determine where in Configuration Error Occurred
Caveat: Security Issues with Reading Out Bitstream
Stored
CRAM
Data
FPGA
Copyright © 2004 Altera Corporation
Microprocessor
or
CPLD
Same or Different?
Soft Error Detection Methods
On-Chip SEU Detection
Dedicated Comparison Circuitry
e.g. CRC Engine Comparing Stored CRC with That Calculated from
Configuration RAM
Detection Circuitry Running Continuously
Error Detection Rate Variable Based on Implementation of
Hardware, Number of CRAM Bits & Input Clock Frequency
Error Signal Available Internally or Externally
Caveat: Cannot Determine Where in Configuration Error Occurred
FPGA
Stored
Value
Computed
Value
Copyright © 2004 Altera Corporation
=
To Core
On-Chip Detection Example
Dedicated CRC Circuit
Configuration RAM Verification Capability
32-Bit Cyclic Redundancy Code Check
Verified Against Internally Stored Value
Runs in the Background Without Impacting
Device Performance
Close to Real-Time Detection
Variable Clock Frequency
Depends on Number of CRAM Bits
Multi-Event Detection
Up to 3-Bit for 32-Bit CRC
Result Output to Either Core or Pin
Use with Either Internal or External Hardware for
Error Correction
Copyright © 2004 Altera Corporation
Correction Methods
FPGA Detection, System-Level Correction
Lower Total Cost
Downtime Is Limited & Manageable
Used in Non-Critical Applications
Triple Module Redundancy
Two Flavors
All On-Chip in FPGA
Separate Chips & Voter
Correction Can Be Real-Time
Used in Critical Applications
Copyright © 2004 Altera Corporation
Single System Detection & Correction
Step One: Detect the Soft Error
75% of Reported Errors Are “Don’t Care” Errors
Step Two: Alert the System
Step Three: Fix the Error
In Some Cases, Re-Program the FPGA
In Some Cases, Reboot the Sub-System
In Some Cases, Reboot the System
Need to Focus on System “Downtime”
Each System Has Unique Requirements
Re-Programming FPGA Takes < 250 ms
Rebooting Time Varies & Can Be Fast “by Design”
Copyright © 2004 Altera Corporation
TMR Method 1
FPGA
Hardware1
FPGA
Hardware 2
FPGA or
CPLD
(Voting)
FPGA
Hardware3
Copyright © 2004 Altera Corporation
Identical Hardware in
FPGAs
Use Voter Implemented
in FPGA or CPLD
Utilize Either Hardware
Output or CRC Error Pin
Voter Also Used to Signal
Reconfiguration on
Difference or Error
TMR Method 2
Hardware
1
Hardware
2
Multiple Instantiations of
Hardware in Single FPGA
For Low-Rate SEUs
SEU Events May Occur Much
More Frequently than
Functional Error (De-Rating)
Voter Signals Reconfiguration
of FPGA
FPGA Must be Reconfigured
Voting
Circuit
Hardware
3
FPGA
Copyright © 2004 Altera Corporation
De-Rating Methodology
Only a Fraction of Configuration Bits Are Actually
Programmed
e.g. Using Only Two Inputs of 4-Input LUT Leaves 75% of LUT as
“Don’t Care”
Only About 20% of Routing Is Used
Depends on Utilization & Application
Some Un-Programmed Bits Still Matter
Flipping Could Change Function of the Device
Extensive Experimentation Shows a Range From 1/8 to
1/3 of the Bits Matter
Copyright © 2004 Altera Corporation
Structured ASIC: Ultimate SEU
Protection
PLD Architecture
with ASIC Routing
FPGA
Structured ASIC
No Configuration Memory = Estimated SER is
below Hard Failure Rate for the Device
Copyright © 2004 Altera Corporation
Summary
SEU is a Well Understood Phenomena
Many Chip Level Enhancements Mitigate SEUs
Process
Design
Manufacturing Techniques
Easy Detection of SEU Events is Key
After Detection, Other Methods Must be Employed to Deal
with the Event
Critical Nature of Application Determines Level of SEU Response
Structured ASICs from FPGA Designs Offer a Much More
Robust Solution Due to Removal of All CRAM
Copyright © 2004 Altera Corporation