Soft Error Rate Determination for Nanometer

Download Report

Transcript Soft Error Rate Determination for Nanometer

Soft Error Rate Determination for
Nanometer CMOS VLSI Circuits
Master’s Defense
Fan Wang
Thesis Advisor: Dr. Vishwani D. Agrawal
Thesis Committee: Dr. Fa Foster Dai and Dr. Victor P. Nelson
Department of Electrical and Computer Engineering
Auburn University, AL 36849 USA
March 12, 2008
Fan's MS Defense
1
Outline
 Background
 Problem Statement
 Contributions
Proposed soft error model
Proposed soft error propagation through logic
Experimental results
Discussion of results
 Conclusion
March 12, 2008
Fan's MS Defense
2
Motivation for This Work
 With the continuous downscaling of CMOS technologies, the
device reliability has become a major bottleneck.
 The sensitivity of electronic systems can potentially become a
major cause of soft (non-permanent) failures.
 The determination of soft error rate in logic circuits is a
complex problem. There is no existing analysis method that
comprehensively considers all the factors that influence the
soft error rate.
March 12, 2008
Fan's MS Defense
3
Background
Certain behaviors in the state of the art electronic
circuits caused by random factors.
Single event upset (SEU) is a non-permanent or
transient error.
Definition from NASA Thesaurus:
“Single Event Upset (SEU): Radiation-induced errors in
microelectronic circuits caused when charged particles
[also, high energy particles] (usually from the radiation
belts or from cosmic rays) lose energy by ionizing the
medium through which they pass, leaving behind a wake of
electron-hole pairs”.
March 12, 2008
Fan's MS Defense
4
What is Soft Error
 A “fault” is the cause of errors. Faults can be permanent
(hardware fault) or non-permanent.
 A non-permanent fault is a non-destructive fault and falls
into two categories:
 Transient faults caused by environmental conditions like
temperature, humidity, pressure, voltage, power supply,
vibrations, fluctuations, electromagnetic interference,
ground loops, cosmic rays and alpha particles.
 Intermittent faults caused by non-environmental conditions
like loose connections, aging components, critical timing,
interconnect coupling, resistive or capacitive variations and
noise in the system.
 An error caused by a non-permanent fault is a “soft error”.
 With advances in manufacturing, soft errors caused by
cosmic rays and alpha particles remain the dominant
causes of failures in electronic systems.
March 12, 2008
Fan's MS Defense
5
Soft Error Rate (SER) in Specific Applications
 Figure of Merit:
1. Failures In Time (FIT): Number of failures per 109
device hours
2. MTTF (Mean Time To Failure): 1 year MTTF =
109/(24*365) FIT = 114,155 FIT
 SER of contemporary commercial chips is controlled
to within 100~1000 FIT
 Most hard failure mechanisms produce error rate on
the order of 1~100 FIT
 Programmable logic SER is almost 100 times larger
than combinational logic
March 12, 2008
Fan's MS Defense
6
Soft Error Rate (SER) for SRAM-Based FPGA
 Effects
of smaller design rules and lower supply voltages
Radiation chamber measurement of SER at altitude of 10km
at 60°N (Sweden):
FPGA (Xilinx)
XC4010E
XC4010XL
Process
0.60μ
0.35μ
Vcc
5V
3.3V
1 SEU every
1×106 hours
2.8×105 hours
Projecting through 3 design rule shrinks and 2 voltage
reductions we get ≈ 1 SEU every 28.2 hours
M. Ohlsson, P. Dyreklev, K. Johansson and P. Alfke, “Neutron Single Event Upsets in
SRAM-Based FPGAs,” Proc. IEEE Nuclear & Space Radiation Effects Conference, 1998.
C. E. Stroud, “FPGA Architectures and Operation for Tolerating SEUs,” VLSI Design &
Test Seminar, Auburn University, January 31, 2007.
March 12, 2008
Fan's MS Defense
7
Reliability Requirements
Commodity flash memory reliability requirements*
Year
Density (megabit)
Maximum data
rate (MHz)
MTTF (hours)
FIT**
2007
2010
2013
2016
1024
2048
4096
8192
166
200
250
300
4020
4654
5388
6237
2.487x105
2.149x105 1.856x105 1.603x105
* from 2002 International Technology Roadmap for Semiconductors ITRS.
** FIT = 109/MTTF
March 12, 2008
Fan's MS Defense
8
Single Event Transient (SET)
 SET is caused by the generation of charge due to a high-energy
particle passing through a sensitive node.
 Each SET has its unique characteristics like polarity,
waveform, amplitude, duration, etc., depending on particle
impact location, particle energy, device technology, device
supply voltage and output load.
 An “off” transistor struck by a heavy ion with high enough
LET* in the junction area is most sensitive to SEU.
 Specifically, the channel region of an off-NMOS transistor and
the drain region of an off-PMOS transistor are sensitive
regions.
*Linear Energy Transfer (LET) is a measure of the energy
transferred to the device per unit length as an ionizing particle
travels through material. Unit: MeV-cm2/mg.
March 12, 2008
Fan's MS Defense
9
Measured Environmental Data
 Typical ground-level total neutron flux: 56.5cm-2s-1.
 J. F. Ziegler, .Terrestrial cosmic rays,. IBM Journal of Research and Development,
vol. 40, no. 1, pp. 19.39, 1996.
 Particle energy distribution at ground-level:
“For both 0.5μm and 0.35μm CMOS technology at ground level, the
largest population has an LET of 20 MeV-cm2/mg or less. Particles
with energy greater than 30 MeV-cm2/mg are exceedingly rare.”
Probability density
 K. J. Hass and J. W. Ambles, .Single Event Transients in Deep Submicron CMOS,
Proc. 42nd Midwest Symposium on Circuits and Systems, vol. 1, 1999.
0
15
30
Linear energy transfer (LET), MeV-cm2/mg
March 12, 2008
Fan's MS Defense
10
Details of SET Generation
(a) Along the path traverses, the particle produces a dense radial distribution of
electron-hole pairs.
(b) Outside the depletion region the non-equilibrium charge distribution induces
a temporary funnel-shaped potential distortion along the trajectory of the
event (drift component).
(c) Funnel collapses, diffusion component then dominates the collection
process until all excess carriers have been collected, recombined, or
diffused away from the junction area.
(d) Current vs. Time to illustrate the charge collection and SET generation.
March 12, 2008
Fan's MS Defense
11
SET in CMOS Inverter
*For example, in ami12 technology, when the output load capacitance is
100fF and the cumulative collected charge is 0.65pC, the amplitude of
the voltage pulse is 0.65pC/100fF = 0.65 x10-12C/100 x10-15F = 0.65V .
March 12, 2008
Fan's MS Defense
12
Original Contributions of This
Research
March 12, 2008
Fan's MS Defense
13
Problem Statement
Given background environment data
 Neutron flux
 Background LET distribution
*Those two factors are location dependent.
Given circuit characteristics
 Technology
 Circuit netlist
 Circuit node sensitive region data
*Those three factors depend on the circuit.
Estimate neutron caused soft error rate in standard FIT
units.
March 12, 2008
Fan's MS Defense
14
Proposed Soft Error Model
 Single event effect exists as single event transient.
 An SET has its unique characteristics like polarity,
waveform, amplitude and duration.
 Environmental neutrons come from cascaded interactions
when galactic cosmic rays traverse earth’s atmosphere.
Occurrence rate
March 12, 2008
Fan's MS Defense
15
Error Occurrence Rate
Environmental neutron flux is N/cm2-s, where N is the
number of particles.
Each neutron particle bear different energy when it
interacts with silicon.
Not all particles with enough energy will cause an error.
There is some probability P per hit for a given particle
energy.
For a circuit node with sensitive region A (cm2) and a given particle
energy the SER probability per hit is P. If neutron flux rate is
N/cm2-s, then the soft error occurrence rate at this node is
(A x P x N)/s
March 12, 2008
Fan's MS Defense
16
Single Event Transient (SET)
 For a circuit node a soft error occurs as a transient signal whose
width depends on the energy of the striking neutron.
 The transient width determines whether it can propagate through
logic gates. Transient pulse width is the interval between Vdd/2
points.
 The LET probability density function determines the transient
width density statistics.
Typical charge collection depth L is 2μm for bulk silicon.
An ionizating particle with 1MeV-cm2/mg deposits about 10.8fC charge along each
micron on its track. τ a is collection time constant and τB is ion-track establishment time
constant. Typical value for τ a and τB is 1.64x10-10 and 5x10-11 respectively.
March 12, 2008
Fan's MS Defense
17
Summarizing
We model the soft error with two parameters:
Occurrence rate
Single event transient width
Next, we propose a propagation
algorithm for the modeled soft error
transient pulses.
March 12, 2008
Fan's MS Defense
18
Pulse Widths Probability Density Propagation
X, Y are random variables
X
X: input pulse width, Y : output pulse width
Y
fX(x): probability density function of X
fY (y): probability density function of Y
1
Given function g: Y=g(X)
Propagation function through a sensitized gate:
g: Y=g{p: W/L, n:W/L, Cload, technology}
Assume: g is differentiable and an increasing function of X, so g’ and g-1
exist. Then,
y  y
x  x
f
X
x
( s)ds 
f
Y
(t )dt 
 f X ( x)x  fY ( y )y
y
i.e., fY ( y )  lim
x   
March 12, 2008
x
f X ( x)
f X ( x)

y
g ' ( x)
Fan's MS Defense
19
Propagation Rule
 Din: input pulse width
 Dout: output pulse width
 τp : gate input output delay
Dout = Y
We use a linear “3-interval piecewise linear” propagation
model to approximate the non-linear function g.
Three-intervals:
1) Non-propagation, if Din ≤τp.
2) Propagation with attenuation, ifτp < Din < 2τp.
3) Propagation with no attenuation, if Din  2τp.
Where
0
τ
2τp
Din = X
p
March 12, 2008
Fan's MS Defense
20
Determination of Model Parameter
• We simulated a CMOS inverter using HSPICE
• This CMOS inverter is in TSMC035 technology, with
nmos W/L ratio = 0.6µ/0.24µ and pmos W/L ratio =
1.08µ/0.24µ.
• The proposed 3-interval piecewise linear equation is
approximated as
Dout
0 if Din  36.0 ps


72.0
 ( Din  36.0) 
if 36.0 ps  Din  72.0 ps
36.0

Din if Din  36.0 ps

March 12, 2008
Fan's MS Defense
21
Pulse Width Density Propagation Through a
CMOS Inverter
March 12, 2008
Fan's MS Defense
22
Validating Propagation Model Using HSPICE
Simulation
Simulation of a CMOS inverter in TSMC035 technology with load capacitance 10fF
March 12, 2008
Fan's MS Defense
23
Logic SEU Occurrence Rate Propagation
• Because all pulse widths are greater than or equal to 0, so
we have:



f Y ( y ) dy   f X ( x) dx  1
0
0
• In fX(x) to fY(y) conversion, there is a fraction of pulses
being filtered out or attenuated due to electrical masking.
We define electrical masking ration (EMR) as:
EMR 

f Y ( y ) dy

f X ( x ) dx
y 0
x0
March 12, 2008
Fan's MS Defense
24
Soft error occurrence rate calculation
for generic gate
PSEU  PSEU (1) 
i
EMR j  [Pnoncontrollin g (i)]
electrical_ masking
March 12, 2008
2
Fan's MS Defense
logic _ masking
25
Experimental Results for ISCAS85 Circuits
Assume probability of SEU per particle hit is 10-4.
Assume the SET width density per circuit node follows
normal distribution with mean µ = 150 and standard
deviation σ = 50 for ground level environment.
At ground level, total neutron flux is 56.5 m-2s-1.
Circuit are in TSMC035 technology and sensitive region
per node is 10 µm2.
For a circuit with n primary outputs and m nodes, we
calculate the SER as:
1 n 1 m
SER   (  SERi _ caused _ by _ j )
n i 0 m j 0
March 12, 2008
Fan's MS Defense
26
SER Results on Workstation Sun Fire 280R
Circuit
#PIs
#POs
#Gates
CPU s
FIT/gate/
output
C17
5
2
6
0.01
0.3679
C432
36
7
160
0.04
1.0563
C499
41
32
202
0.14
0.2188
C880
60
26
383
0.08
0.3882
C1908
33
25
880
1.14
0.7427
C2670
233
140
1193
0.77
0.2882
C5315
178
123
2307
2.78
0.5572
C7552
207
108
3512
10.82
0.6652
March 12, 2008
Fan's MS Defense
27
SER Results for Inverter Chains
Circuit
Inv2
Inv5
Inv10
#PIs
1
1
1
#POs
1
1
1
#Gates
2
5
10
Inv20
Inv50
Inv100
1
1
1
1
1
1
20
50
100
March 12, 2008
Fan's MS Defense
CUP (s) FIT/gate
0.00
0.2819
0.00
0.5388
0.00
0.9654
0.00
0.00
0.04
1.1819
4.3780
8.6473
28
Methods Comparison
Factors
Considered
LET Re-cov. Sensitive
Spec. Fanout region
Occurance Vectorsa
rate
pplied
Location
altitude
Circuit SET
Tech.
degrad.
Our work
Yes
No
Yes
Yes
No
Yes
Yes
Yes
Rao et at.
[1]
Yes
No
No
No
Yes
Yes
Yes
Yes
Rajaraman
et al. [2]
No
No
No
No
Yes
No
No
Yes
AsadiTahoori [3]
No
No
No
Yes
No
No
No
No
ZhangShanbhag[4]
Yes
No
Yes
Yes
Yes
Yes
Yes
No
RejimonBhanja [5]
No
No
No
Yes
Yes
No
No
No
March 12, 2008
Fan's MS Defense
29
Experimental Results Comparison
Our approach
Circuit
# #
#
PI PO Gates CPU
s
C432
36
7
160
C499
41 32
C880
C1908
Rao et al. [1]
Rajaraman et al[2]
FIT
CPU
min.
Error
Prob.
0.04 1.18x103 <0.01
1.75x10-5
108
0.0725
202
0.14 1.41x103
0.01
6.26x10-5
216
0.0041
60 26
383
0.08 3.86x103
0.01
6.07x10-5
102
0.0188
33 25
880
1.14 1.63x104
0.01
7.50x10-5
1073
0.0011
Computing Platform
Sun Fire 280R
Pentium 2.4 GHz
Sun Fire v210
Circuit Technology
TSMC035
Std. 0.13 µm
70nm BPTM*
Altitude
Ground
Ground
N/A
FIT
CPU s
*BPTM: Berkley Predictive Technology Model
March 12, 2008
Fan's MS Defense
30
More Result Comparison
Logic Circuit SER Estimation
Ground Level
Measured Data
Devices
SER*
(FIT/Mbit)
0.13µ SRAMs [6]
10,000 to
100,000
SRAMs, 0.25μ and
below [7]
10,000 to
100,000
1 Gbit memory in
0.25µ [8]
4,200
Our Work
1,000 to 10,000
Rao et al. [1]
1x10-5 to 8x10-5
* The altitude is not mentioned for these data.
March 12, 2008
Fan's MS Defense
31
Discussion

We take the energy of neutron to be the key factor to induce
SEU. In real cases, there can also be secondary particles
generated through interaction with neutrons.

Estimating sensitive regions in silicon is a hard task. Also, the
polarity of SET should be taken into account.

Because on the earth surface, typical error rates are very small,
their measurement is time consuming and can produce large
discrepancy. This motivates the use of analytical methods.
For example, a circuit may experience 1 SEU in 6 months (4320
hours), equals 231,480 FIT. It is also likely that the circuit has 0
SEU in these 6 months, so the measured SER is 0 FIT.
March 12, 2008
Fan's MS Defense
32
Discussion Continued
 Fan-out stems should be considered. Two situations
can arise:
 When an SET goes through a large fan-out, the large load
capacitance can eliminate the SET, or
 If it is not canceled by the fan-out node, it will go through
multiple fan-out paths to increase the SER.
 It is highly recommended to have more field tests for
logic circuits.
 None of these SER approaches consider the process
variation effects on SER.
March 12, 2008
Fan's MS Defense
33
Conclusion
 SER in logic and memory chips will continue to
increase as devices become more sensitive to soft
errors at sea level.
 By modeling the soft errors by two parameters, the
occurrence rate and single event transient pulse width
density, we are able to effectively account for the
electrical masking of circuit.
 Our approach considers more factors and thus gives
more realistic soft error rate estimation.
March 12, 2008
Fan's MS Defense
34
Publications related to this work
•
F. Wang and V. D. Agrawal, “Single Event Upset: An Embedded
Tutorial,” in Proc. 21st IEEE International Conference on VLSI Design,
January 2008, pp. 429-434.
•
F. Wang and V. D. Agrawal, “Soft Error Rate Determination for
Nanometer CMOS VLSI Circuits,” in Proc. 40th IEEE Southeastern
Symposium on System Theory, March 16-18, 2008, Paper TA1.
•
F. Wang and V. D. Agrawal, “Probabilistic Soft Error Rate Estimation
from Statistical SEU Parameters,” in Proc. 17th IEEE North Atlantic Test
Workshop, May 2008.
Unpublished work:
•
F. Wang and V. D. Agrawal, “Soft Error Considerations for Computer
Web Servers”.
March 12, 2008
Fan's MS Defense
35
References
[1] R. R. Rao, K. Chopra, D. Blaauw, and D. Sylvester, “An Efficient Static Algorithm for
Computing the Soft Error Rates of Combinational Circuits," Proceedings of the
conference on Design automation and test in Europe: Proceedings, pp. 164-169, 2006.
[2] R. Rajaraman, J. S. Kim, N. Vijaykrishnan, Y. Xie, and M. J. Irwin, “SEAT-LA: A Soft
Error Analysis Tool for Combinational Logic," VLSI Design, 2006 19th International
Conference on, 2006, pp. 499-502.
[3] G. Asadi and M. B. Tahoori, “An Accurate SER Estimation Method Based on
Propagation Probability,” Proc. Design Automation and Test in Europe Conf,2005, pp.
306-307.
[4] M. Zhang and N. R. Shanbhag, “A soft error rate analysis (SERA) methodology," in
IEEE/ACM International Conference on Computer Aided Design, ICCAD-2004, 2004,
pp. 111-118.
[5] T. Rejimon and S. Bhanja, “An Accurate Probabilistic Model for Error Detection," in
18th International Conference on VLSI Design, 2005, pp.717-722.
[6] J. Graham, “Soft errors a problem as SRAM geometries
shrink,“http://www.ebnews.com/story/OEG20020128S0079, ebn, 28 Jan 2002.
[7] Wingyu Leung; Fu-Chieh Hsu; Jones, M. E., "The ideal SoC memory: 1T-SRAMTM,"
Proc.13th Annual IEEE International on ASIC/SOC Conference, vol., no., pp.32-36,
2000
[8] Report, “Soft Errors in Electronic Memory-A White Paper," Technical report, Tezzaron
Semiconductor, 2004.
March 12, 2008
Fan's MS Defense
36
Thank You . . .
March 12, 2008
Fan's MS Defense
37