Fault-Tolerance in VHDL Description: Transient
Download
Report
Transcript Fault-Tolerance in VHDL Description: Transient
CASE
(EMI-Tolerant Embedded System)
Part 4
[email protected]
1
Summary
1. Technology Trends Impact & Failure Types
Induced by Conducted-EMI on ICs
2. Design for Electromagnetic Immunity (DEMI)
Fault Avoidance
3. Experimental Evaluation
4. Final Considerations
[email protected]
2
1. Technology Trends Impact & Failure Types
Induced by Conducted-EMI on ICs
Dependence on electronics is widespread and increasing …
- Distributed Systems (Grids)
- Servers integrating real-time
voice/image/data
...
- ABS
- Air-Bag
- EFI
- Onboard Computers
…
Embedded
Portable
Electronics
.
- Smart phones
- Web browsers
- Digital MP3 audio players
- Handheld computers
- Messaging applications
- GPS
- Speech
…
- Sensors
- Actuators
- Gateways integrating wireless
LAN / AC Power Line Nets /
Internet for building automation
…
Requirement assurance that application upsets due to the EM
environment will not occur is fundamental to acceptance of
systems as fit for purpose.
[email protected]
3
1. Technology Trends Impact & Failure Types
Induced by Conducted-EMI on ICs
Dependence on electronics is widespread and increasing …
- Distributed Systems (Grids)
- Servers integrating real-time
voice/image/data
...
- ABS
- Air-Bag
- EFI
- Onboard Computers
…
Embedded
Increasingly
Portablehostile
Electromagnetic
Electronics
.
(EMI) Noise
- Smart phones
- Web browsers
- Digital MP3 audio players
- Handheld computers
- Messaging applications
- GPS
- Speech
…
- Sensors
- Actuators
- Gateways integrating wireless
LAN / AC Power Line Nets /
Internet for building automation
…
Requirement assurance that application upsets due to the EM
environment will not occur is fundamental to acceptance of
systems as fit for purpose.
[email protected]
4
1. Technology Trends Impact & Failure Types
Induced by Conducted-EMI on ICs
Dependence on electronics is widespread and increasing …
- Distributed Systems (Grids)
- Servers integrating real-time
voice/image/data
...
- ABS
- Air-Bag
- EFI
- Onboard Computers
…
Embedded
Portable
Electromagnetic
(EMI) Noise
Electronics
.
- Smart phones
- Web browsers
- Digital MP3 audio players
- Handheld computers
- Messaging applications
- GPS
- Speech
…
- Sensors
- Actuators
- Gateways integrating wireless
LAN / AC Power Line Nets /
Internet for building automation
…
Fig. 1. Technology trends impact on ICs.
Requirement assurance that application upsets due to the EM
environment will not occur is fundamental to acceptance of
systems as fit for purpose.
[email protected]
5
1. Technology Trends Impact & Failure Types
Induced by Conducted-EMI on ICs
The following are high speed effects on ICs that you can “see” in your design
1) The PCB or MCM only works at low frequencies.
2) The PCB and MCM work only within a narrow frequency range.
3) When you change vendor parts. It won’t work as well or won’t work at all.
4) Temperature changes make a big difference in your design.
5) The design is peculiar to the type of connectors and parts that you use.
6) Small changes in power supply voltages can make a big difference.
7) Touching or bringing the hand closer to the board can affect the performance (capacitive coupling).
8) Adding bypass capacitors can cause significant changes in performance.
9) The board radiated a lot and is sensitive to EMI.
10) Things work alone well. When you connect to system or other components it won’t work well or won’t work at all.
Fig. 2. Diagnosing problems high-speed circuits.
[email protected]
6
1. Technology Trends Impact & Failure Types
Induced by Conducted-EMI on ICs
Fig. 3. Output PAD signal
distortion due to
interference (500MHz)
superimposed on the input
square wave.
Fig. 4. Output PAD signals distorted
by RF interference (800MHz)
superimposed on the input square
wave are compared with the signal
measured without interference.
[email protected]
7
1. Technology Trends Impact & Failure Types
Induced by Conducted-EMI on ICs
In such a noisy environment, there are two types of failures induced by
conducted RF interference on ICs:
a)
Static failures: occur in the presence of conducted RF interference
superimposed on high or low logical level. The signal at the IC input port goes
out of high or low noise margins. In this case, errors at the IC’s output ports
come from failures in the IC input ports.
b)
Dynamic failures: occur when conducted RF interference added to the
IC input logical signal gives variation in the input port propagation delay. Thus,
changing the logic gates settling time and hold time. In this case, errors due to
conducted RF interference observed at the IC’s output ports come from
failures in internal sub-circuits.
[email protected]
8
2. Design for Electromagnetic Immunity (DEMI)
Fault Avoidance
Embedded
Portable
Electronics:
computers
vehicles
hand-held devices
building automation
[email protected]
Design for
Electromagnetic
Immunity
DEMI
9
2. Design for Electromagnetic Immunity (DEMI)
Fault Avoidance
Design methods for DEMI:
HW-Based Fault Avoidance
SW-Based Fault Detection
Common practice:
Board-level
Becoming more usual:
IC-level
More recently …
Application-level
[email protected]
10
2. Design for Electromagnetic Immunity (DEMI)
Fault Avoidance
Solution design methods for DEMI at the IC-level in
particular to low power IC-based applications has
become mandatory
Examples …
reduce the dynamic switching currents:
block decoupling capacitors, improved (weak) pad-drivers
design
optimize distribution of switching currents over time:clock
concepts with intentional non-zero skew
[email protected]
11
2. Design for Electromagnetic Immunity (DEMI)
Fault Avoidance
Solution design for electromagnetic immunity (DEMI) at
the board-level is also largely employed:
A) Board-Level Layout Analysis:
A.1) Grounding:
Two basic types: single and multipoint.
[email protected]
12
2. Design for Electromagnetic Immunity (DEMI)
Fault Avoidance
A.2) Isolation and Partitioning (Moating):
Isolation and partitioning refers to the physical separation of
components, circuits, and power planes from the other functional
devices, areas, and subsystems.
An isolated area is an island in the board, similar to a castle with a
moat. Only those traces required for operation or interconnects can
travel to this separate area.
Two methods exist to interconnect traces, power, and ground planes to
its island:
Method 1: uses isolation transformers or optical isolators and commonmode data line filters to cross the moat.
Method 2: uses a bridge in the moat. In this case, isolation is also used to
separate high-frequency-bandwidth components from lower-bandwidth
circuits.
[email protected]
13
2. Design for Electromagnetic Immunity (DEMI)
Fault Avoidance
Other Techniques …
B) Shielding
C) Decoupling Capacitors
D) Watch-Dog Timer (board and on-chip levels)
[email protected]
14
2. Design for Electromagnetic Immunity (DEMI)
Fault Avoidance
In summary:
There is a long list of possible design and fabrication
solutions that can be used to enhance SoCs EM
immunity.
However, none of them guarantees perfect solution!
Additionally, very cost-sensitive (system performance,
power consumption, and implementation complexity).
[email protected]
15
2. Design for Electromagnetic Immunity (DEMI)
Fault Avoidance
For instance …
Weak drivers lead to a more sensitive IC to noise as well as
may expose it to delay faults since transistors become slower
with higher temperatures.
Zero-clock skew trend naturally supported by today’s design
tools, but clock smearing concept for the sake of reduced RF
emission to perform clock distribution must be done manually,
since today’s tools do not support directly non-zero clock
signal design.
[email protected]
16
2. Design for Electromagnetic Immunity (DEMI)
Fault Avoidance
Board-level layout analysis burden task from the design
point of view; sometimes impossible to attend all the
requirements (e.g. blocks placement-&-routing).
Shielding, Decoupling Capacitors Shielding increases
weight and volume (not acceptable for embedded
applications); decoupling capacitors are effective filters
under limited voltage swings.
WDT more robust than the monitored logic, but under
certain limits.
[email protected]
17
3. Experimental Evaluation
Goal: analyze the (joint) use of HW-based fault-avoidance with SW-based
fault-detection techniques in the presence of EMI
Device: MSP430 Texas Microcontroller
Workload: Bubble Sorting; Matrix Multiplication
Test Setup:
Compliance with the Int. Std Normative IEC 61000-4-29 to inject noise (voltage dips) into the
Vcc power supply line of the device.
GigaHertz Transverse Electromagnetic (GTEM) Cell, to expose the device to different EM
fields.
[email protected]
18
Experiment 1: Conducted EMI
3. Experimental Evaluation
Fig. 19. IEC 61.000-4-29 Normative –
compliance EMI Generator and test
setup used for electromagneticinduced noise injection and analysis.
Output
Display
Programming
Keyboard
Noisy Power
Lines to the
DUT
Start-Stop
Command
[email protected]
19
Experiment 1: Conducted EMI
3. Experimental Evaluation
Fig. 19. IEC 61.000-4-29 Normative –
compliance EMI Generator and test
setup used for electromagneticinduced noise injection and analysis.
[email protected]
20
Experiment 1: Conducted EMI
3. Experimental Evaluation
Fig. 21.
Oscilloscope printscreen of voltage dips injected into the
microcontroller Vcc pin. Negative pulse of -30% (-0.9V) and width of 30ms.
(Nominal Vcc = 3.0V.)
[email protected]
21
Experiment 1: Conducted EMI
3. Experimental Evaluation
Duration (s)
Observed
Errors
0,01
36
0,03
25
0,1
1
0,3
0
1
0
Detected Faults (%)
SW-Based Techniques (%)
HW-Based
% Data
% Control
Technique (WDT)
63,89
27,79
82,61
17,39
48
24
100
0
100
0
100
0
0
0
0
0
0
0
0
0
Not Detected
Faults(%)
Detection Rate for Different Voltage Dip Durations
(Including Data and Control-Flow Errors)
8,32
100%
28
80%
60%
% Not Detected
40%
20%
0
% Detected
0%
0,01
0
0,03
0,1
0,3
1
Duration (s)
0
Table 3. Fault Detection Summary for the 238databyte Image Processing Program.
Fig. 7. Fault-Detection Capability Summary for
the SW-Based Techniques. (238-byte Image
Processing Program)
SW-Based Fault Detection Summary
(Including Data and Control-Flow Errors)
Errors Observed for Different Voltage Dip Durations
100%
80%
29,37
60%
Control-flow Errors
40%
Data Errors
20%
0%
70,63
0,01
0,03
0,1
0,3
%Det ect ed
1
%Not Det ect ed
D urat io n ( s)
Fig. 6. “Data” versus “Control” Error Detection
Summary. (238-databyte Image Processing
Program)
Fig. 8. SW-Based Fault Detection Summary.
(238-byte Image Processing Program)
[Data-dominant]
[email protected]
22
Experiment 1: Conducted EMI
3. Experimental Evaluation
Duration (s)
Observed
Errors
0,01
85
0,03
70
0,1
53
0,3
39
1
16
Not Detected
Detected Faults (%)
SW-Based Techniques (%)
HW-Based
Faults(%)
% Data
% Control
Technique (WDT)
83,05
16,95
0
39,8
60,2
82,86
4,285
12,855
32,76
67,24
86,79
1,88
11,32
21,74
78,26
87,18
0
12,82
20,59
79,41
81,25
6,25
12,5
15,38
84,62
Table 4.
Fault-Detection Summary for the
238-byte Bubble Sort Program.
Errors Observed for Diferent Voltage Dip
Durations
100%
80%
60%
Control-flow Errors
40%
Data Errors
20%
0%
0,01
0,03
0,1
0,3
1
Duration (s)
Fig. 9. “Data” versus “Control” Error
Detection Summary. (238-byte Bubble Sort
Program)
SW-Based Fault Detection Summary
(Including Data and Control-Flow Errors)
Detection Rate for Different Voltage Dip Durations
(including data and control-flow errors)
100%
15,78
Not Detected
50%
Detected (%)
Detected
Not Detected (%)
0%
0,01
0,03
0,1
0,3
84,23
1
Duration (s)
Fig. 10.
Fault-Detection Capability Summary
for the SW-Based Techniques. (238-byte Bubble
Sort Program)
Fig. 11. SW-Based Fault Detection Summary. (238byte Bubble Sort Program)
[Control-dominant]
[email protected]
23
Experiment 2: Irradiated EMI
3. Experimental Evaluation
GTEM Cell
(b)
(d)
(c)
RF Signal Generator
Power Amplifier
GTEM Cell
Power
Meter
SUT
On-Line Communication:
“SUT/Host” (Serial Port, Optical Fiber)
Personal
Computer
(Host)
JTAG
(a)
Fig 23. Test setup for EMI-based fault injection; (a), (b) and (c) General scheme and
equipments at INTI; (d) SUT (MSP430F149 Texas Microcontroller) and Power Meter inside the
GTEM Cell.
[email protected]
24
Experiment 2: Irradiated EMI
3. Experimental Evaluation
Number of
workload runs
Number of runs
yielding erroneous
outputs
Number of runs
with errors
detected
Number of runs
terminated by system
crash
Number of runs with
errors not detected^
Average number of erroneous
memory words per erroneous
run
70
12
7 (58.3%)
1
5 (41.7%)
58.8
Number of
workload runs
Number of runs
yielding erroneous
outputs
Number of runs
with errors
detected
Number of runs
terminated by system
crash
Number of runs with
errors not detected^
Average number of erroneous
memory words per erroneous
run
75
36
18 (50%)
3
18 (50%)
84.5
(a)
(b)
Table 1.
Test summary for the workload Bubble-Sort:
(a) HW-based fault detection supported by remote personal computer operating as WDT;
(b) SW-based fault detection technique implemented with data- and control-flow faults detection.
Test parameters: Modulation Frequency: 1GHz; Carrier frequency: 1KHz; Measured EM field: 70V/m.
[email protected]
25
Experiment 2: Irradiated EMI
3. Experimental Evaluation
(a)
Number of
workload
runs
167
(b)
Number of
workload
runs
173
(c)
Number of
workload
runs
174
Table 2.
Number of runs
yielding erroneous
outputs
Number of runs
with errors
detected
Number of runs
terminated by system
crash
Number of runs with
errors not detected^
147
70 (72,2%)
4
77 (27,8%)
Number of runs
yielding erroneous
outputs
Number of runs
with errors
detected
Number of runs
terminated by system
crash
Number of runs with
errors not detected^
94
75 (79.8%)
7
19 (20.2%)
Number of runs
yielding erroneous
outputs
Number of runs
with errors
detected
Number of runs
terminated by system
crash
Number of runs with
errors not detected^
169
92 (84.4%)
14
77 (15.6%)
Average number of erroneous
memory words per erroneous
run
3.3*,
6.6**,
471.5***
Average number of erroneous
memory words per erroneous
run
20.8*,
9.4**,
295.8***
Average number of erroneous
memory words per erroneous
run
27.5*,
6.6**,
124.5***
Test summary for the workload Matrix Multiplication:
(a) HW-based fault detection supported by remote personal computer operating as WDT;
(b) SW-based fault detection technique implemented with data-flow fault detection;
(c) SW-based fault detection technique implemented with control-flow fault detection.
Test parameters: Modulation Frequency: 100MHz; Carrier frequency: 1KHz; Measured EM field: 100V/m.
[email protected]
26
Experiment 2: Irradiated EMI
3. Experimental Evaluation
99
1
100
0,9
90
80
0,8
66
0,7
60
50
34
40
NNPF
58
70
0,6
0,5
0,4
30
0,3
20
0,2
10
0,1
0
A
A = 0,1/90
B
C
MF(GHz) / IF(V/m)
B = 0,15/100 C = 1/70
D
0
0
D = 0,1/100
Fig 24.
Number of erroneous
memory words as a function of the
Modulation Frequency (MF) and the
EM Incident Field (IF). Results for
the workload Bubble Sort.
10
20
30
40
50
60
70
90
100
EM Field Incident on the SUT (V/m)
Fig 25.
Normalized Number of
Processor Failures [NNPF = (# of
runs yielding erroneous outputs)/(#
of runs)] as a function of the EM
incident field applied on the SUT
(V/m).
[email protected]
27
4. Final Considerations
System architecture co-implemented in HW+SW to detect transient
faults in control-flow and application data. Architecture main
characteristics:
SW-embedded structures at the application code level yield
good fault-detection for EMI.
Migrating part of the SW-embedded structures into HW, by
implementing a watch-dog (int/ext) to monitor the application
processor (joint use of SW + HW fault detection improves even
more fault coverage in EMI-exposed environments at
reasonable cost)
[email protected]
28