SRC TechCon 2005

Download Report

Transcript SRC TechCon 2005

Dealing with
Multiple Simultaneous Faults
in Future Technologies
Carlos A. L. Lisbôa
Luigi Carro
Erik Schüler
SRC TechCon 2005
Portland, Oregon, USA
Embedded Systems Laboratory
Informatics Institute
Federal University of Rio Grande do Sul
Porto Alegre – RS – Brazil
Why Multiple Simultaneous Faults ?
• Future technologies (2010 and beyond)
• very small transistors and fewer electrons to form the
channel ( SETs)
• transient pulses due to radiation attack will last longer
than the propagation delays of gates
• devices will be more sensitive to the effects of
electromagnetic noise, neutrons and alpha particles
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
2
Single Event Upset Origin
1 0 1 0 0 0 0 1
01011110
Carlos A. L. Lisbôa
11011110
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
3
Why Should One Study Multiple Faults ?
Change in paradigm:
Gates will behave statistically,
producing correct outputs only a
fraction of the time.
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
4
How to Deal with Multiple Faults ?
• New paradigm: multiple simultaneous faults
• new fault tolerance techniques will be required
(TMR will no longer provide enough protection)
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
5
How to Deal with Multiple Faults ?
• New paradigm: multiple simultaneous faults
• new fault tolerance techniques will be required (TMR
will no longer provide enough protection)
• How to deal with this problem ?
• new materials and manufacturing technologies
must be developed
OR
• new design approaches must be taken
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
6
How to Deal with Multiple Faults ?
• New paradigm: multiple simultaneous faults
• new fault tolerance techniques will be required (TMR
will no longer provide enough protection)
• How to deal with this problem ?
• new design approaches
must be taken (our bet !)
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
7
Research Approaches
• Use of stochastic operators
• Use of bit stream operators
• Ensuring voter reliability to use n-MR
while dealing with multiple
simultaneous faults
• Next steps: 2005 - 2007 time frame
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
8
Research Evolution
Bit Stream
Operators
Small footprint
and fast
Tolerant to
multiple faults in
n-MR solutions
Stochastic
Operators
OK for some
DSP
Applications
Carlos A. L. Lisbôa
Analog
Voter
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
9
Using Stochastic Operators
• SEU induced transient errors are of random nature
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
10
Using Stochastic Operators
• SEU induced transient errors are of random nature
• Stochastic operators rely on randomness to produce
approximate results
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
11
Using Stochastic Operators
• SEU induced transient errors are of random nature
• Stochastic operators rely on randomness to produce
approximate results
• The injection of random faults in the input signals
processed by stochastic operators did not impact
the precision of the results
% Errors in 1,000 additions
Conventional
0.0000
Carlos A. L. Lisbôa
0 faults
0.1412
Stochastic Adder
2 faults 4 faults
0.2580
0.1768
8 faults
0.2196
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
12
Using Stochastic Operators
• SEU induced transient errors are of random nature
• Stochastic operators rely on randomness to produce
approximate results
• The injection of random faults in the input signals
processed by stochastic operators did not impact the
precision of the results
• Several application areas (DSP) can deal with
approximate values and still produce acceptable
results (outputs)
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
13
Using Stochastic Operators
• Benefit: reduced area of the operators
01100010101
S1
010111011001
Sum
S3
S2
0010100110101
01010101101
Stochastic Adder Circuit
1001000100001011
1000000100001010
1000100110011010
Stochastic multiplier circuit
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
14
Using Stochastic Operators
How does it work ?
Come and see the posters !
No free drinks, but the answer
to this question is granted !
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
15
Using Bit Stream Operators
• Computation principles similar to those of the stochastic
adder and multiplier
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
16
Using Bit Stream Operators
• Computation principles similar to those of the stochastic
adder and multiplier
• Operators can produce bit streams which represent the
exact results of the operation
F1 2
F2 2
F1 1
F2 1
F1 0
F2 0
F2 0 .F1 2
F2 1 .F1 1
F2 2 .F1 0
F2 0 .F1 1
F2 1 .F1 0
F2 0 .F1 0
b48 .. b33 b32 .. b17 b16 .. b5
b4 .. b1
b0
x
F2 2 .F1 2
F2 1 .F1 2
F2 2 .F1 1
Proposed Multiplication Algorithm - bit stream product
(the count of 1’s in the stream is equal to the product value)
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
17
Using Bit Stream Operators
• Computation principles similar to those of the stochastic
adder and multiplier
• Operators can produce bit streams which represent the exact
results of the operation
• Redundancy is added to the bit streams in order to stand
to multiple bit flips
b48 .. b48 b47 .. b47 ... b0 .. b0 1 1 1 1 0 0 0
8 times
8 times
8 times
total count of 1’s = 8 * product + 4
+4
Adding robustness to the bit stream through redundancy
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
18
Using Bit Stream Operators
• Computation principles similar to those of the stochastic adder
and multiplier
• Operators can produce bit streams which represent the exact
results of the operation
• Redundancy is added to the bit streams in order to stand to
multiple bit flips
• Conversion of bit streams to binary coded values is
delayed as much as possible, and conversion circuits
must use TMR or n-MR for protection against faults
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
19
Using Bit Stream Operators
• Computation principles similar to those of the stochastic adder
and multiplier
• Operators can produce bit streams which represent the exact
results of the operation
• Redundancy is added to the bit streams in order to stand to
multiple bit flips
• Conversion of bit streams to binary coded values is delayed as
much as possible, and conversion circuits must use TMR or nMR for protection against faults
• Issues to be further investigated: size of bit streams and
area of the conversion circuits
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
20
Using Bit Stream Operators
How does it work ?
Come and see the posters !
No free food, but some more info
on this subject will be provided !
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
21
What is Wrong with TMR ?
• TMR protects only against single faults in one
of the modules
Module 1
correct output
Module 2
correct output
Module 3
correct output
Carlos A. L. Lisbôa
V
O
T
E
R
correct output
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
22
What is Wrong with TMR ?
• TMR protects only against single faults in one
of the modules
Module 1
correct output
Module 2
wrong output
Module 3
correct output
Carlos A. L. Lisbôa
V
O
T
E
R
correct output
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
23
What is Wrong with TMR ?
• TMR does not protect against double faults in
different modules
Module 1
wrong output
Module 2
correct output
Module 3
wrong output
Carlos A. L. Lisbôa
V
O
T
E
R
wrong output
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
24
What is Wrong with TMR ?
• When a single fault occurs in the voter circuit,
the voter output may be wrong
Module 1
correct output
Module 2
correct output
Module 3
correct output
Carlos A. L. Lisbôa
V
O
T
E
R
correct output
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
25
What is Wrong with TMR ?
• When a single fault occurs in the voter circuit,
the voter output may be wrong
Module 1
correct output
Module 2
correct output
Module 3
correct output
Carlos A. L. Lisbôa
V
O
T
E
R
correct output ?
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
26
Making TMR (n-MR) more reliable
• Known solutions imply in
• area, performance and / or power penalties
• deadlock: how to protect the output generator ?
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
27
Making TMR (n-MR) more reliable
• Known solutions imply in
• area, performance and / or power penalties
• deadlock: how to protect the output generator ?
• Proposed solution:
• use TMR to cope with single faults in the modules
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
28
Making TMR (n-MR) more reliable
• Known solutions imply in
• area, performance and / or power penalties
• deadlock: how to protect the output generator ?
• Proposed solution:
• use TMR to cope with single faults in the modules
• replace the digital voter by an analog voter that
• uses a comparator to generate the output
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
29
Making TMR (n-MR) more reliable
• Known solutions imply in
• area, performance and / or power penalties
• deadlock: how to protect the output generator ?
• Proposed solution:
• use TMR to cope with single faults in the modules
• replace the digital voter by an analog voter that
• uses a comparator to generate the output
• can support some noise, nevertheless producing the
correct result
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
30
The Analog Voter
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
31
Minimum Area Comparator
Injection of faults
in the comparator (*)
(*) using CMOS 0.35µm
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
32
Electrical Simulation: Multiple Faults
(SPICE and CMOS 0.35 m)
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
33
Dealing with Multiple Simultaneous
Faults: n-MR
The Analog Voter with 5 Inputs (for 5-MR)
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
34
Dealing with Multiple Simultaneous
Faults: n-MR
The Analog Voter with 5 Inputs (for 5-MR)
Simulations with injection of
2 simultaneous faults also succeeded
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
35
The Analog Voter ... Oops !
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
36
The Analog Voter
Let’s
see the
posters !
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
37
Future Work - Short Term (2005-2006)
• use of signal redundancy with other number
representation forms, such as Sigma-Delta
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
38
Future Work - Short Term (2005-2006)
• use of signal redundancy with other number
representation forms, such as Sigma-Delta
• use of the analog voter as an efficient way to
implement robust n-MR circuits
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
39
Future Work - Short Term (2005-2006)
• use of signal redundancy with other number
representation forms, such as Sigma-Delta
• use of the analog voter as an efficient way to
implement robust n-MR circuits
• investigate the application of statistical
methods and neural networks to the design
of fault tolerant circuits with minimum
redundancy
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
40
Future Work - Long Term (2006-2007)
• use of logic properties to develop signal
redundancy with low cost
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
41
Future Work - Long Term (2006-2007)
• use of logic properties to develop signal
redundancy with low cost
• apply the developed techniques to actual
processors w/ DSP and VLIW architectures
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
42
Future Work - Long Term (2006-2007)
• use of logic properties to develop signal
redundancy with low cost
• apply the developed techniques to actual
processors with DSP and VLIW architectures
• discuss the architectural impact of new
technologies together with fault tolerance
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
43
Research Evolution
Bit Stream
Operators
Stochastic
Operators
Analog
Voter
previous work (2004-2005)
Carlos A. L. Lisbôa
2005
2006
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
2007
44
Research Evolution
Sigma
Delta
Bit Stream
Operators
Stochastic
Operators
Analog
Voter
previous work (2004-2005)
Carlos A. L. Lisbôa
2005
2006
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
2007
45
Research Evolution
Sigma
Delta
Bit Stream
Operators
Stochastic
Operators
Logic
Properties
Analog
Voter
previous work (2004-2005)
Carlos A. L. Lisbôa
2005
2006
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
2007
46
Research Evolution
Sigma
Delta
Bit Stream
Operators
Stochastic
Operators
Logic
Properties
Analog
Voter
previous work (2004-2005)
Carlos A. L. Lisbôa
2005
2006
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
2007
47
Research Evolution
Sigma
Delta
Bit Stream
Operators
Stochastic
Operators
Logic
Properties
Analog
Voter
DSP / VLIW
previous work (2004-2005)
Carlos A. L. Lisbôa
2005
2006
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
2007
48
Thank You !
Questions ?
Looking forward to answer them
at the poster booth!
(# 20.4)
No free anything, but a nice chat about
these matters will be a pleasure !
Contact: [email protected]
Carlos A. L. Lisbôa
SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4
49