3_l222_seely_s

Download Report

Transcript 3_l222_seely_s

Maintaining Data Integrity in
Programmable Logic in Atmospheric
Environments through
Error Detection
Joel Seely
Technical Marketing Manager
Military & Aerospace Business Unit
Single Event Upset (SEU)
Overview for SRAM-Based
FPGAs
Definitions

SEU: Single Event Upset
 Unwanted Change in State of a Latch or a
Memory Cell

SER: Soft Error Rate
 SEU Rate

SEFI: Single Event Functional Interrupt
 Functional Failure by SEU
 Not All SEUs are SEFIs
 Generally Takes 5-10 SEUs to Cause SEFI
Copyright © 2004 Altera Corporation
Circuit Components of
SRAM-Based FPGAs

I/O Registers & I/O Configuration
 No Issue, Very Robust Registers, < 1 FIT

Logic Registers (LEs)
 No Issues, Very Robust Registers, < Hard Error Rate

User Memory
 Typically On-Chip Memories are “By 9” for
Parity Checking
 IP Available for ECC

Configuration RAM (CRAM) for LUTs & Routing
 Area of Focus
Copyright © 2004 Altera Corporation
Noise Current for 10fC Collected Charge
Time
Vcc
200
Voltage
Voltage
Upset of a CRAM Cell
Current (µA)
Add
Time
150
Data In
100
Data Out
50
Clear
0
0
50
100
150
200
Vss
Time (ps)
6 Transistor Cell
Copyright © 2004 Altera Corporation
SEU Induced Failure Rate*
Device
LE Count
SEU
Rate
(FIT)
SEFI Rate
(FIT)
MTBF**
(Years)
EP1C6
6K
250
60
1,900 Years
EP1C20
20K
730
180
634 Years
EP1S25
26K
1950
400
285 Years
EP1S80
79K
6000
1200
95 Years
* Data at Sea Level
**MTBF: Mean Time Between Functional Interrupt
Copyright © 2004 Altera Corporation
Number of CRAM Bit Upsets for Each
Occurrence of Functional Upset
Altera EP1S25 Neutron SER - WNR data
Altera EP1S25 Alpha SER
3
99.5%
99%
2.5
Std Deviation
1.5
90%
84%
70%
60%
50%
40%
30%
20%
16%
1
0.5
0
-0.5
Median ~6
-1
-1.5
-2
1%
0.5%
-2.5
-3
0
10
20
30
40
50
# of CRAM bit upsets for each event of functional
upset
Std Deviation
2
3
2.5
2
1.5
1
0.5
0
-0.5
-1
-1.5
-2
-2.5
-3
99.5%
99%
Median 5
1%
0.5%
0
10
20
30
40
# of CRAM bit upsets for each event of functional upset
Copyright © 2004 Altera Corporation
90%
84%
70%
60%
50%
40%
30%
20%
16%
50
Addressing
System-Level Issues
SER Improvements/Mitigation

Chip Design Enhancements
 New Materials & Process Enhancements
 Larger CRAM Structure
 Increase in Capacitance on Critical Node
 Smaller Process => Smaller Die => Lower
SEU Probability
 Built-In Error Detection/Correction Circuitry
Copyright © 2004 Altera Corporation
SER Per SRAM Bit Trend
SER per SRAM MBit
1,000 FITS
100 FITS
90 nm
Projection
0.5 µm
1995
Copyright © 2004 Altera Corporation
Process Technology
Year
0.13 µm
2002
System Level Improvements
Mitigation
ECC for User Memory
 Use Detection/Correction Feature
 Triple Module Redundancy (TMR)

 To Achieve

Lower Error Rate & Less Downtime
Migrate to Structured ASIC
Copyright © 2004 Altera Corporation
Soft Error Detection Methods

Configuration RAM Readout
 Read-Out Full Bitstream
 Compare with Stored Bitstream
 Can Determine where in Configuration Error Occurred
Caveat: Security Issues with Reading Out Bitstream
Stored
CRAM
Data
FPGA
Copyright © 2004 Altera Corporation
Microprocessor
or
CPLD
Same or Different?
Soft Error Detection Methods

On-Chip SEU Detection
 Dedicated Comparison Circuitry

e.g. CRC Engine Comparing Stored CRC with That Calculated from
Configuration RAM
 Detection Circuitry Running Continuously
 Error Detection Rate Variable Based on Implementation of
Hardware, Number of CRAM Bits & Input Clock Frequency
 Error Signal Available Internally or Externally
Caveat: Cannot Determine Where in Configuration Error Occurred
FPGA
Stored
Value
Computed
Value
Copyright © 2004 Altera Corporation
=
To Core
On-Chip Detection Example

Dedicated CRC Circuit
 Configuration RAM Verification Capability



32-Bit Cyclic Redundancy Code Check
Verified Against Internally Stored Value
Runs in the Background Without Impacting
Device Performance
 Close to Real-Time Detection


Variable Clock Frequency
Depends on Number of CRAM Bits
 Multi-Event Detection

Up to 3-Bit for 32-Bit CRC
 Result Output to Either Core or Pin

Use with Either Internal or External Hardware for
Error Correction
Copyright © 2004 Altera Corporation
Correction Methods

FPGA Detection, System-Level Correction
 Lower Total Cost
 Downtime Is Limited & Manageable
 Used in Non-Critical Applications

Triple Module Redundancy
 Two Flavors


All On-Chip in FPGA
Separate Chips & Voter
 Correction Can Be Real-Time
 Used in Critical Applications
Copyright © 2004 Altera Corporation
Single System Detection & Correction

Step One: Detect the Soft Error
 75% of Reported Errors Are “Don’t Care” Errors


Step Two: Alert the System
Step Three: Fix the Error
 In Some Cases, Re-Program the FPGA
 In Some Cases, Reboot the Sub-System
 In Some Cases, Reboot the System

Need to Focus on System “Downtime”
 Each System Has Unique Requirements
 Re-Programming FPGA Takes < 250 ms
 Rebooting Time Varies & Can Be Fast “by Design”
Copyright © 2004 Altera Corporation
TMR Method 1
FPGA
Hardware1
FPGA
Hardware 2
FPGA or
CPLD
(Voting)
FPGA
Hardware3
Copyright © 2004 Altera Corporation

Identical Hardware in
FPGAs

Use Voter Implemented
in FPGA or CPLD

Utilize Either Hardware
Output or CRC Error Pin

Voter Also Used to Signal
Reconfiguration on
Difference or Error
TMR Method 2
Hardware
1
Hardware
2

Multiple Instantiations of
Hardware in Single FPGA

For Low-Rate SEUs

SEU Events May Occur Much
More Frequently than
Functional Error (De-Rating)

Voter Signals Reconfiguration
of FPGA

FPGA Must be Reconfigured
Voting
Circuit
Hardware
3
FPGA
Copyright © 2004 Altera Corporation
De-Rating Methodology

Only a Fraction of Configuration Bits Are Actually
Programmed
 e.g. Using Only Two Inputs of 4-Input LUT Leaves 75% of LUT as
“Don’t Care”
 Only About 20% of Routing Is Used
 Depends on Utilization & Application

Some Un-Programmed Bits Still Matter
 Flipping Could Change Function of the Device

Extensive Experimentation Shows a Range From 1/8 to
1/3 of the Bits Matter
Copyright © 2004 Altera Corporation
Structured ASIC: Ultimate SEU
Protection
PLD Architecture
with ASIC Routing
FPGA
Structured ASIC
No Configuration Memory = Estimated SER is
below Hard Failure Rate for the Device
Copyright © 2004 Altera Corporation
Summary


SEU is a Well Understood Phenomena
Many Chip Level Enhancements Mitigate SEUs
 Process
 Design
 Manufacturing Techniques


Easy Detection of SEU Events is Key
After Detection, Other Methods Must be Employed to Deal
with the Event
 Critical Nature of Application Determines Level of SEU Response

Structured ASICs from FPGA Designs Offer a Much More
Robust Solution Due to Removal of All CRAM
Copyright © 2004 Altera Corporation