Reliability - IQSoft Software Consultants

Download Report

Transcript Reliability - IQSoft Software Consultants

Reliability
Introduction
Introduction to Reliability
Historical Perspective
Current Devices
Trends
The Bathtub Curve (1)
Failure
rate, 
Infant
Mortality
Useful life
 Constant
Time
Wear out
The Bathtub Curve (2)
What is the "bathtub" curve?
In the 1950’s, a group known as AGREE (Advisory Group for the Reliability of Electronic
Equipment) discovered that the failure rate of electronic equipment had a pattern similar to the death
rate of people in a closed system. Specifically, they noted that the failure rate of electronic
components and systems follow the classical “bathtub” curve. This curve has three distinctive
phases:
1. An “infant mortality” early life phase characterized by a decreasing failure rate (Phase 1). Failure
occurrence during this period is not random in time but rather the result of substandard components
with gross defects and the lack of adequate controls in the manufacturing process. Parts fail at a high
but decreasing rate.
2. A “useful life” period where electronics have a relatively constant failure rate caused by randomly
occurring defects and stresses (Phase 2). This corresponds to a normal wear and tear period where
failures are caused by unexpected and sudden over stress conditions. Most reliability analyses
pertaining to electronic systems are concerned with lowering the failure frequency (i.e., const
shown in the Figure) during this period.
3. A “wear out” period where the failure rate increases due to critical parts wearing out (Phase 3). As
they wear out, it takes less stress to cause failure and the overall system failure rate increases,
accordingly failures do not occur randomly in time.
Introduction to Reliability
• Failure in time (FIT)
Failures per 109 hours
( ~ 104 hours/year )
• Acceleration Factors
– Temperature
– Voltage
Introduction to Reliability (cont'd)
Most failure mechanisms can be modeled using the
Arrhenius equation.
EA/kT
ttf = C • e
ttf
C
EA
k
T
-
time to failure (hours)
constant (hours)
activation energy (eV)
Boltzman's constant (8.616 x 10-5eV/°K)
temperature (ºK)
Introduction to Reliability (cont'd)
Acceleration Factors
A.F. =
A.F.
ttfL
ttfH
ttfL
-----ttfH
= acceleration factor
= time to failure, system junction temp (hours)
= time to failure, test
junction temp (hours)
Introduction to Reliability (cont'd)
Activation Energies
Failure Mechanism
EA(eV)
Oxide/dielectric defects
0.3
Chemical, galvanic, or electrolytic corrosion 0.3
Silicon defects
0.3
Electromigration
0.5 to 0.7
Unknown
0.7
Broken bonds
0.7
Lifted die
0.7
Surface related contamination induced shifts
1.0
Lifted bonds (Au-A1 interface)
1.0
Charge injection
1.3
Note: Different sources have different values these values just given for examples.
Acceleration Factor - Voltage
Oxides and Dielectrics
• Large acceleration factors from increase in
electric field strength
A.F. = 10 •  / (MV / cm)
0.07/kT

=
k
T
- Boltzman's constant (8.616 x 10-5eV/°K)
- temperature (ºK)
0.4 • e
Acceleration Factor: Voltage
Median-time-to-fail of unprogrammed antifuse vs. 1/V for
different failure criteria with positive stress voltage on top
electrode and Ta = 25 °C.
Device and Computer Reliability
1960's Hi-Rel Application
• Apollo Guidance Computer
– Failure rate of IC gates:
< 0.001% / 1,000 hours ( < 10 FITS )
– Field Mean-Time-To-Failure
~ 13,000 hours
• One gate type used with large effort on
screening, failure analysis, and
implementation.
Device Reliability:1971
Reliability Level of
Parts and Practices
Commercial
Military
High Reliability
Representative
MTBF (hr)
500
2,000
10,000
(104 hours)
MIL-M-38510 Devices (1976)
Circuit Types
5400
5482
5483
5474
54S174
54163
4049A
4013A
4020A
10502
Description
Quad, 2-input NAND
2-bit, full adder
4-bit, full adder
Dual, D, edge-triggered flip-flop
Hex, D, edge-triggered flip-flop
4-bit synchronous counter
Inverting hex buffer
Dual, D, edge-triggered flip-flop
14-stage, ripple carry counter
Triple NOR (ECL)
HYPROM512 512-bit PROM
FITS
60
44
112
72
152
120
52
104
344
80
280
Harris CICD Devices (1987)
Circuit Types
HS-6504
HS-6514
HS-3374RH
HS-54C138RH
HS-80C85RH
-
4k X 1 RAM
1k x 4 RAM
Level Converter
Decoder
8-bit CPU
HS-8155/56
HS-82C08RH
HS-82C12RH
HS-8355RH
-
256 x 8 RAM
Bus Transceiver
I/O Port
2k x 8 ROM
Package Types
Flat Packs (hermetic brazed and glass/ceramic seals)
LCC
DIP
FITS @ 55°C, Failure Rate @ 60% U.C.L.
43.0
UTMC and Quicklogic
• FPGA
– < 10 FITS (planned)
– Quicklogic reports 12 FIT, 60%
UCL
• UT22VP10
UTER Technology, 0 failures, 0.3 [double check]
• Antifuse PROM
– 64K: 19 FIT, 60% UCL
– 256K: 76 FIT, 60% UCL
Xilinx FPGAs
• XC40xxXL
– Static:
– Dynamic:
9 FIT, 60% UCL
29 FIT, 60% UCL
• XCVxxx
– Static:
34 FIT, 60% UCL
– Dynamic: 443 FIT, 60% UCL
Actel FPGAs
Technology
2.0/1.2
1.0
0.8
0.6
0.45
0.35
RTSX 0.6
0.25
0.22
FITS
# Failures
33
9.0
10.9
4.9
12.6
19.3
33.7
88.9
78.6
2
6
1
0
0
0
0
0
0
Device-Hours
9.4
6.1
1.9
1.9
7.3
4.8
2.7
1.0
1.2
x
x
x
x
x
x
x
x
x
107
108
108
108
107
107
107
107
107
RAMTRON FRAMs
Technology
FITS
# Failures
# Devices
Hours
Device-Hours
1608 (64K)
1281
1
100
103
105
4k & 16K
Serial
37
152
4257
103
4.3 x 106
Note: Applied stress, HTOL, 125ºC, Dynamic, VCC=5.5V.
1
The one failure occurred in less then 48 hours. The
manufacturer feels that this was an infant mortality
failure.
2
12 failures detected at 168 hours, 3 failures at 500
hours, and no failures detected after that point.
Actel FIT Rate Trends
Skylab Lessons Learned
58. Lesson: New Electronic Components
Avoid the use of new electronic techniques and components in
critical subsystems unless their use is absolutely mandatory.
Background:
New electronic components (resistors, diodes, transistors,
switches, etc.) are developed each year. Most push the state-ofthe-art and contain new fabrication processes. Designers of
systems are eager to use them since they each have advantages
over more conventional components. However, being new, they
are untried and generally have unknown characteristics and
idiosynchracies. Let some other program discover the problems.
Do not use components which have not been previously used in a
similar application if it can be avoided, even at the expense of
size and weight.
Reliability - Summary
• Covered device reliability basics
• Design reliability is another set of topics
– Advanced Design: Designing for Reliability
– Fundamental Logic Design: Clocking, Timing
Analysis, and Design Verification
– Fundamental Logic Design: VHDL for HighReliability Applications - Coding and Synthesis
– Fundamental Logic Design: Verification of HDLBased Logic Designs for High-Reliability
Applications