Part2-(Exploiting HW..

Download Report

Transcript Part2-(Exploiting HW..

Exploiting HW+SW Partitioning
for Reliable Embedded Systems
Part 2
[email protected]
Summary
1.
Introduction: targeting the problem
2.
The Possible Solution
2.1. SW-Based Fault Detection Mechanisms
2.2. Migrating SW-Based Fault Detection Mechanisms into HW
3.
Experimental Evaluation
4.
Final Considerations
[email protected]
1. Introduction: targeting the problem
?
The increasing # of computer-based
critical applications rises questions about
the techniques for guaranteeing sufficient
degrees of reliability and to keep reasonable
costs for design and manufacturing.
[email protected]
1. Introduction: targeting the problem
?
Techniques commonly used (on-chip and
system level): stand-alone solutions
Fault-Tolerance Techniques
(HW, SW, Time or Info domains)
Duplication/Voter, TMR
Layout-Driven Fault Avoidance
Watch-Dogs
EDAC
Re-computation
Consistency Checks
Capability Checks
[email protected]
1. Introduction: targeting the problem
?
Techniques commonly used (on-chip and
system level): stand-alone solutions
Fault-Tolerance Techniques
(HW, SW, Time or Info domains)

Impacts design:
Duplication/Voter, TMR
Layout-Driven Faultperformance,
Avoidance
weight, size/volume,
Watch-Dog Timer
Re-computation
power consumption, reliability.
Consistency Checks
Capability Checks
[email protected]
EDAC
1. Introduction: targeting the problem
?
Techniques commonly used (on-chip and
system level): stand-alone solutions
Fault-Tolerance Techniques
(HW, SW, Time or Info domains)

Impacts design:
Duplication/Voter, TMR
Layout-Driven Faultperformance,
Avoidance
weight, size/volume,
Watch-Dog Timer
Re-computation
power consumption, reliability.
Consistency Checks
Capability Checks
[email protected]
EDAC
1. Introduction: targeting the problem
HW Techniques:
Disadvantages:
High area overhead
High development/fab cost
SW Techniques:
Disadvantages:
Significant performance degradation
Memory overhead
[email protected]
2. The Possible Solution
Development
of
a
hybrid
methodology (HW+SW redundancies)
able to perform runtime detection of
errors
in
μprocessor-based
SoCs
may have very good cost X benefit
returns.
[email protected]
2. The Possible Solution
Returns:
 Minimization of area overhead and fab/development costs
(benefits of SW-based redundancy techniques)
 Improvement of performance and minimization of memory
overhead (benefits of HW-based redundancy techniques)
In summary:
 Minimize fab cost and performance degradation,
improving reliability
Target faults:
 Control flow errors
 Data handling errors
[email protected]
while
2. The Possible Solution
Hybrid
methodology
(HW+SW
redundancies) explores:
• I-IP Core Architecture
• Software-Based Techniques
[email protected]
2. The Possible Solution
HW+SW SoC FT Architecture:
SoC
Mismatch
signal
mP IP
WDT
I-IP
Information flow
traveling
on the bus
Memory
IP
Custom
IP
bus
I/O port
Stores a hardened
program
[email protected]
Computes run-time and
stores control flow
signatures and data read
from memory
2. The Possible Solution
SW-Based Fault Detection Mechanisms
Faults Affecting Data:
Cerberus (Matteo et al.)
Faults Affecting Control:
ECCA (Matteo et al.)
CFCSS (McCluskey et al.)
ECI (Miremadi et al.)
[email protected]
2. The Possible Solution
SW-Based Fault Detection Mechanisms
 Faults Affecting Data:
Cerberus (Matteo et al.)
Original Code:
Modified Code:
a = b;
a0 = b0;
a1 = b1;
if(b0 != b1)
error();
a = b + c;
a0 = b0 + c0;
a1 = b1 + c1;
if (b0 != b1) || (c0 != c1)
error();
Code modification for errors affecting data.
[email protected]
2. The Possible Solution
SW-Based Fault Detection Mechanisms
 Faults Affecting Data:
Cerberus (Matteo et al.)
Original Code:
Modified Code:
res = search(a);
…
int search(int p)
*r1)
{
int q;
…
q = p + 1;
…
return(1);
}
error();
search(a0, a1, &res0, &res1);
…
void search(int p0, int p1, int *r0, int
{
int q0, q1;
…
q0 = p0 + 1;
q1 = p1 + 1;
if(p0 != p1)
…
*r0 = 1;
*r1 = 1;
return;
[email protected]
}
Code transformation for errors affecting procedure parameters.
2. The Possible Solution
SW-Based Fault Detection Mechanisms
 Faults Affecting Control:
ECCA - (Error Control-Flown Checking using Assertions) (Matteo et al.)
Original Code:
/* Basic Block beginning */
#371 */
…
/* Basic Block end */
Modified Code:
/* Basic Block beginning
ecf = 371;
…
if (ecf != 371)
error ();
/* Basic Block end */
[email protected]
Example of detection
of errors affecting not allowed branches
2. The Possible Solution
SW-Based Fault Detection Mechanisms
 Faults Affecting Control:
ECCA - (Error Control-Flown Checking using Assertions) (Matteo et al.)
Original Code:
If (condition)
{
/* Block A */
…
}
else
{
/* Block B */
…
}
Modified Code:
If (condition)
{
/* Block A */
if (!condition)
error();
…
}
else
{
/* Block B */
if (condition)
error();
…
}
[email protected]
Code transformation for a test statement
2. The Possible Solution
SW-Based Fault Detection Mechanisms
 Faults Affecting Control:
ECCA - (Error Control-Flown Checking using Assertions) (Matteo et al.)
In summary
To harden a given program this approach defines the
following assertions introduced into each basic block vj:
• Test Assertion: it controls the signature of basic block vj
checking if vi belongs to pred(vj).
• Set Assertion: updates the signature setting it to the value Bj
associated to vj.
Bj = (Bi  M1)  M2
[email protected]
2. The Possible Solution
SW-Based Fault Detection Mechanisms
 Faults Affecting Control:
ECCA - (Error Control-Flown Checking using Assertions) (Matteo et al.)
01: while(k1<DIM)
02: {
03:
if(  != M1 &&  != M2 )
04:
//Error detected
05:
A1 = matrixA1[i1][k1];
06:
B1 = matrixB1[k1][j1];
07:
C1 += A1*B1;
08:
matrixC1[i1][j1] = C1;
09:
k1++;
10:
j =(i ^M1)^M2;
[email protected]
11: }
2. The Possible Solution
SW-Based Fault Detection Mechanisms
 Faults Affecting Control:
 CFCSS (McCluskey et al.)
Principle: Modification of a Basic Block
[email protected]
2. The Possible Solution
SW-Based Fault Detection Mechanisms
 Faults Affecting Control:
 CFCSS (McCluskey et al.)
Basically, the approach consists of six steps:
1)
Divide the program into basic blocks. A basic block is a minimal set of ordered
instructions in which its execution begins from the first instruction and
terminates at the last instruction. There is no branching instruction in a basic
block except possibly for the last one. A basic block terminates at either an
instruction branching to another basic block or an instruction receiving transfer
of control flow (CF) from two or more places in the program. Notations: (a) V =
{vi: i = 1, 2,…, n}: set of vertices denoting basic blocks. (b) E: set of edges
denoting possible CF between basic blocks.
2) Construct a graph for the program according to the instructions flow (each node
represents a basic block). Note that a program can be represented by a programgraph, P, where bri,j are not necessarily explicit branch instructions; they also
represent fall-through execution paths, jumps, subroutine calls, and returns. Fig.
2.5 is an example. Notation: P: Program Graph {V, E}.
3)
Arbitrarily assign a signature for each node (compilation time).
4)
Compute the signature difference between the source and the destiny blocks.
5)
Compute the new signature for each node (execution time).
[email protected]
6)
Compare both signatures.
2. The Possible Solution
SW-Based Fault Detection Mechanisms
 Faults Affecting Control:
 CFCSS (McCluskey et al.)
General Form
f = f(G, di) = G XOR di
G2 = f(G1, d2) = G1 XOR d2 = s1 XOR (s1 XOR s2) = s2
G4 = f(G1, d4) = G1 XOR d4 = G1 XOR (s3 XOR s4) = s1 XOR s3 XOR s4 ≠ s4
Sequence of instructions
and its graph.
[email protected] of illegal branch.
2. The Possible Solution
SW-Based Fault Detection Mechanisms
 Faults Affecting Control:
 CFCSS (McCluskey et al.)
Detection of an illegal branch: a numerical example
[email protected]
2. The Possible Solution
SW-Based Fault Detection Mechanisms
 Faults Affecting Control:
 CFCSS (McCluskey et al.)
Node v1 and node v3 have the same signatures: Branch Fan-in Nodes
[email protected]
2. The Possible Solution
SW-Based Fault Detection Mechanisms
 Faults Affecting Control:
 CFCSS (McCluskey et al.)
Node v1 and node v3 have different signatures: Adjusting Signature D
[email protected]
2. The Possible Solution
SW-Based Fault Detection Mechanisms
 Faults Affecting Control:
 CFCSS (McCluskey et al.)
G5 = f(G1, d5, D1) = G1 XOR d5 XOR D1 = s1 XOR (s1 XOR s5) EXOR “000” = s5
G5 = f(G3, d5, D3) = G3 XOR d5 XOR D3 = s3 XOR (s1 XOR s5) EXOR “s1 EXOR s3” = s5
Node v1 and node v3 have different signatures: Adjusting Signature D
[email protected]
2. The Possible Solution
SW-Based Fault Detection Mechanisms
 Faults Affecting Control:
 ECI (Miremadi et al.)

Insertion of trap instructions in the program area, in the data
area, and in the unused area of the memory.

The ECIs are inserted in the main memory locations that are
not used by the CPU during normal execution. Thus, the
execution of an ECI is a indication that a control flow error has
occurred.

The task of an ECI is to initiate a recovery process.
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
 WDT / I-IP works in symbiosis with the
processor which is not modified.
 WDT / I-IP continuously spies the information
execution flow on the bus (which is computed
to test and update signatures).
 If a mismatch is detected, WDT / I-IP outputs a
mismatch signal.
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Peace of code for control-flow faults detection (ECCA Partitioning):
01: while(k1<DIM)
02: {
03:
IIPtest( BB1 );
03:
04:
04:
IIPtest( BB2 );
05:
A1 = matrixA1[i1][k1];
06:
B1 = matrixB1[k1][j1];
07:
C1 += A1*B1;
08:
matrixC1[i1][j1] = C1;
09:
k1++;
10:
10:
IIPset( BB2 );
11: }
[email protected]
if(  != M1 &&  != M2 )
//Error detected
j =(i ^M1)^M2;
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
WDT / I-IP Architecture:
•
Three modules: - bus interface logic
- consistency check logic
- CAM memory
Mismatch
Signal
WDT / I-IP
Detects
signatures
passing on the
bus
bus
Stores flow
signatures
CAM Memory
Bus Interface
Logic
Consistency Check
Logic
adx, data
[email protected]
Compares flow
signatures
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
WDT / I-IP Architecture:
WDT / I-IP
Modulo 1
Bus Interface Logic
Clk
Reset
Instruction_in
Ram_data_in
Ram_address_in
Clk
Reset
Instrucion_in
Ram_data_in
Ram_address_in
Modulo 2
CAM Memory
Clk
Reset
Data_memory_in
Data_memory_out
Data_memory_out
Adr_memory_out
Ctrl_rw_out
Data_memory_in
Adr_memory_in
Ctrl_rw_in
Modulo 3
Consistency Check
Logic
En_compare_out
Data_1_out
Data_2_out
[email protected]
Clk
reset
En_compare_out
Data_1_out
Data_2_out
Mismatch Signal
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Consider now that the
µprocessor-based SoC runs
under an Operating System …
The application code is only a
fragment of the total time allocated
during system operation!
[email protected]
?
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
• Critical applications need operating systems (OS)
which guarantee a correct and safe behavior despite
the occurrence of errors.
• Faults can affect OS calls as well as the OS kernel:
How does the system react in front of invalid or
corrupted values operated by the kernel?
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Application
Memory (Operating System)
µProcessor
Status Register
Driver
WDT / I-IP
Error Indication
Address + Data Bus
Memory
(Application
Code + Data)
SoC
HW-SW Partitioning for Fault-Detection in Complex Systems
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
SW Part
Com Channel
Application
µCLinux, µCOS-II
SW Part
Memory (Operating System)
Programmable
Logic
Driver
HW Part
µProcessor
Status Register
WDT / I-IP
Error Indication
DragonBall, ARM,
Pentium, 8086, 68K
SW Part
Address + Data Bus
Memory
(Application
Code + Data)
SoC
HW-SW Partitioning for Fault-Detection in Complex Systems
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Status Information
CGM&Power
Control
Real-Time
Clock
In-Circuit
Emulation
Interrupt
Controller
Memory
Controller
Bootstrap
Mode
Special Function
Pins (CPU Space)
16-Bit
Timers(2)
16-Bit
PWM2
8-Bit
PWM1
LCD
Controller
SPI 1
SPI 2
UART 1
IrDA1.0
UART 2
IrDA1.0
MC68VZ328 Block Diagram
[email protected]
GPIO Ports
FLX6800
Static
CPU
68000 Internal Bus
GPIO Ports
8/16-Bit 68000 Bus Interface
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Status Information
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Status Information
Special Function Pins
(CPU Space): FC2, FC1, FC0
Function Code
Output
Processor Cycle Type
FC2
FC1
FC0
0
0
0
Undefined, reserved
0
0
1
User Data
0
1
0
User Program
0
1
1
Undefined, reserved
1
0
0
Undefined, reserved
1
0
1
Supervisor Data
68000 Die
1
1
0
[email protected]
1
1
1
Supervisor Program
CPU space (interrupt acknowledge)
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Status Information
A16 - A19 Pins
68010 – 68030 Dies
FC2 = FC1 = FC0 = 1 indicate CPU operations
other than interrupt acknowledge cycles
(e.g. co-processor communications).
Then, different CPU spaces are indicated
in A16 - A19 pins, if properly decoded.
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Status Information
Interrupt Control Pins:
IPL2, IPL1, IPL0
Interrupt
Processor Level
IPL2
IPL1
IPL0
0
0
0
0
0
1
0
1
0
0
1
1
1
0
0
1
0
1
Processor Cycle Type
Lowest priority
68000 Die
1
1
0
[email protected]
1
1
1
|
|
|
|
|
|
|
|
|
Highest priority
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Status Information
Event-Ticking Pins – ETPs:
PM0, PM1
Pentium Die
Event-Ticking Pins – ETP associated with
Model Specific Registers – MSR to monitor:
# cache memory misses,
# committed instructions,
# interruptions executed,
# taken branches,
...
Model Specific Registers – MSRs: Counters CRT0 and CRT1
programmed through the
Control and Events Selector Register - CESR
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Status Information
Instructions used to program counters CRT0 and CRT1 through the
Control and Events Selector Register – CESR:
WRMSR
RDMSR
The RDMSR instruction may be executed in all CPLs (Current Privileged Level),
but the WRMSR instruction may only be executed in CPL0.
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Status Information
Event-Ticking Pins – ETPs: d_i, s_u
DragonBall Core
If “0”: data;
If “1”: instruction;
If “z”: undefined.
These pins were
added to the
processor core to
serve as interface
with the I-IP
(watch-dog).
If “0”: supervisor mode;
If “1”: user mode;
If “z”: undefined.
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Status Information
Event-Ticking Pins – ETPs: d_i, s_u
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
• OS error detection coverage has been measured and observations
about OS critical data structures to be improved have been
commented, in order to improve the final robustness of the µCOS-II
operating system.
Juan Pardo, 2004
Fault Tolerant Systems Group
Polytechnic University of Valencia Spain
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
µC/OS-II Operating System
•
Selection came motivated from the perspective that it is a system widely used in
particular for embedded applications since several years ago.
 First Version µC/OS 1992
•
Industrial robots, motor control, medical instruments, etc.
•
It is 99% compliant with the Motor Industry Software Reliability Association
(MISRA) C Coding Standards.
•
All Modified Condition Decision Coverage (MCDC) code in µC/OS-II has been
removed, improving code quality for RTCA / EUROCAE DO-178B Level A-certified
environments for avionics applications.
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
µC/OS-II: Characteristics
•
Portable: uC/OS-II is written in highly portable ANSI C, with target microprocessor-specific
code written in assembly language.
•
ROMable: was designed for embedded applications. This means that if you have the proper
tool chain (i.e., C compiler, assembler, and linker/locator), you can embed uC/OS-II as part of
a product.
•
Scalable: it’s possible to use only the services needed in the application. This allows to
reduce the amount of memory (both RAM and ROM) needed. Scalability is accomplished with
the use of conditional compilation (full version: 8KB).
•
Preemptive: uC/OS-II is a fully preemptive real-time kernel. This means that uC/OS-II
always runs the highest priority task that is ready.
•
Multitasking: uC/OS-II can manage up to 64 tasks (Current version of the software reserves
8 of these tasks for system use. This leaves for application up to 56 tasks. Each task has a
unique priority assigned to it, which means that uC/OS-II cannot do round-robin scheduling.)
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
µC/OS-II: Characteristics
•
Deterministic: Execution time of all uC/OS-II functions and services are deterministic.
You can always know how much time uC/OS-II will take to execute a function or a service.
Furthermore execution time of all uC/OS-II services do not depend on the number of tasks
running in your application.
•
Task Stacks: Each task requires its own stack (uC/OS-II allows each task to have a
different stack size. This allows to reduce the amount of RAM needed for application).
•
Services: system services such as mailboxes, queues, semaphores, fixed-sized memory
partitions, time-related functions, etc.
•
Interrupt Management: Interrupts can suspend the execution of a task. If a higher
priority task is awakened as a result of the interrupt, the highest priority task will run as soon
as all nested interrupts complete. Interrupts can be nested up to 255 levels deep.
•
Robust and Reliable: uC/OS-II is based on uC/OS, which has been used in hundreds of
commercial applications since 1992.
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Workload Design
Characteristics:
Worst case application: maximum
system calls consume.
System calls: Synchronization,
Semaphores, Memory, Queues,
Messages, Tasks Handling, Timing
Management, etc.
[email protected]
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Workload Design
The system workload is
continuously running and consists
of a series of tasks executing the
application.
Consistency checks are added
to the application code and kernel
to detect faults and invalid values
at the kernel calls in order to
improve system robustness.
The WDT / I-IP is the monitor.
[email protected]
Addition of
Consistency
Checks
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
// 1. Define necessary configuration constants for uC/OS-II
#define OS_MAX_EVENTS 2
#define OS_MAX_TASKS 20
#define OS_MAX_QS 0
#define OS_Q_EN 0
#define OS_MBOX_EN 0
#define OS_TICKS_PER_SEC 32
// 2. Define necessary stack configuration constants
#define STACK_CNT_512 1 // initial program stack
#define STACK_CNT_1K OS_MAX_TASKS // task stacks
// 3. This ensures that the above definitions are used
#use "ucos2.lib“
void RandomNumberTask(void *pdata);
// Declare semaphore global so all tasks have access
OS_EVENT* RandomSem;
void main(){
int i;
// Initialize OS internals
OSInit();
Initializing Tasks
for(i = 0; i < OS_MAX_TASKS; i++){
// Create each of the system tasks
OSTaskCreate(RandomNumberTask, NULL, 1024, i);
}
// semaphore to control access to random number generator
RandomSem = OSSemCreate(1);
// 4. Set number of system ticks per second
OSSetTicksPerSec(OS_TICKS_PER_SEC);
// Begin multi-tasking
[email protected]
OSStart();
Starting Tasks
}
Workload Design
void RandomNumberTask(void *pdata)
{
// Declare as auto to ensure reentrancy.
auto OS_TCB data;
auto INT8U err;
auto INT16U RNum;
OSTaskQuery(OS_PRIO_SELF, &data);
while(1)
{
// Rand is not reentrant, so access must be controlled
// via a semaphore.
OSSemPend(RandomSem, 0, &err);
RNum = (int)(rand() * 100);
OSSemPost(RandomSem);
printf("Task%02d's random #: %d\n",data.OSTCBPrio,RNum);
// Wait 3 seconds in order to view output from each task.
OSTimeDlySec(3);
}
}
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Workload Design
Set an indication for the instant when the processor gets into the supervisor mode
“OS_ENTER_CRITICAL”
and when when it leaves this mode: “OS_EXIT_CRITICAL”.
The signaling is done by writing to a specific memory address.
OS_ENTER_CRITICAL
/*Code implemented for GNU-GAS*/
asm ("
move.l #0x0100, -(%a0)
| Write in “a0” the hexadecimal “0x0100”
move.b #11, %a0
| Move the byte “11” to the address “a0”
");
…
asm ("
move.l #0x0100, -(%a0)
move.b #00, %a0
");
| Write in “a0” the hexadecimal “0x0100”
| Move the byte “00” to the address “a0”
[email protected]
OS_EXIT_CRITICAL
2. The Possible Solution
Migrating SW-Based Fault Detection Mechanism into HW
Workload Design
Consistency Check
Systems Calls performed by Pend and Post through Semaphore, Mailbox and QUEUE
/*************************************************************
if ((OSRdyTbl[y] &= ~bitx) == 0)
*
PEND ON SEMAPHORE
OSRdyGrp &= ~bity;
************************************************************
psem->OSSemTbl[y] |= bitx;
*/
psem->OSSemGrp |= bity;
UBYTE OSSemPend(OS_SEM *psem, UWORD timeout)
{
OS_EXIT_CRITICAL();
UBYTE x, y, bitx, bity;
OSSched();
Consistency
OS_ENTER_CRITICAL();
/*Code implemented for GNU-GAS*/
OS_ENTER_CRITICAL();
asm ("
move.l #0x0100, -(%a0) | Write in “a0” the hexadecimal “0x0100” if (OSTCBCur->OSTCBStat & OS_STAT_SEM) {
move.b #4, %a0
| Move the byte “4” to the address “a0”
if ((psem->OSSemTbl[y] &= ~bitx) == 0) {
");
psem->OSSemGrp &= ~bity;
/*End*/
}
if (psem->OSSemCnt-- > 0) {
OSTCBCur->OSTCBStat = OS_STAT_RDY;
OS_EXIT_CRITICAL();
return (OS_NO_ERR);}
else {
OSTCBCur->OSTCBStat |= OS_STAT_SEM;
OSTCBCur->OSTCBDly = timeout;
y
= OSTCBCur->OSTCBPrio >> 3;
x
= OSTCBCur->OSTCBPrio & 0x07;
bity
= OSMapTbl[y];
bitx
= OSMapTbl[x];
OS_EXIT_CRITICAL();
return (OS_TIMEOUT);
} else {
OS_EXIT_CRITICAL();
return (OS_NO_ERR);
}
}
}
[email protected]
Check
Consistency Check
3. Experimental Evaluation
• An Intel 8051-based SoC was inspected.
• PANDORA I-IP: VHDL (~1500 lines).
Matteo Sonza Reorda, 2002-05
Fault Tolerant Systems Group
Politecnico di Torino
[email protected]
3. Experimental Evaluation
• Fault detection capabilities evaluated via HW-based
fault injection experiments (FPGA environment).
• Four benchmarks considered:
– Matrix multiplication, Elliptical Filter,
FIR Filter and Viterbi Algorithm.
[email protected]
3. Experimental Evaluation
Detection capabilities:
• Transient faults (30,000 bit-flips)
• Number of wrong answers evaluated (escape detection).
Program
Plain [%]
Pandora
[%]
ECCA
[%]
CFCSS
[%]
Matrix
Ellipf
FIR
Viterbi
9.78
20.83
5.64
21.06
0.18
0
0
4.89
0.99
2.38
2.12
6.33
4.88
14.29
4.49
17.48
IP (HW+SW)
SW Sol.
SW Sol.
Orig. SW
[email protected]
3. Experimental Evaluation
Memory overhead:
• Additional code lines required to implement the
hybrid technique.
Prog.
Plain
[byte]
Pandora
[byte]
ECCA
[byte]
CFCSS
[byte]
Matrix
Ellipf
FIR
223
303
194
385
361
364
902
640
701
456
347
320
Viterbi
436
707
1,115
725
Orig. SW
IP (HW+SW)
[email protected]
SW Sol.
SW Sol.
3. Experimental Evaluation
Execution time overhead:
Prog.
Plain
[cycle]
Pandora
[cycle]
ECCA
[cycle]
CFCSS
[cycle]
Matrix
Ellipf
FIR
31,211
16,268
43,434
41,462
17,815
71,994
102,356
25,635
153,458
43,791
17,611
57,357
Viterbi
286,364
328,150
349,111
314,244
Orig. SW
IP (HW+SW)
[email protected]
SW Sol.
SW Sol.
3. Experimental Evaluation
Area overhead:
PANDORA size  992 gates
8051 size  30480 gates
PANDORA introduces about
3.2% of area overhead
Area overhead is expected to decrease when
processor size increases.
[email protected]
4. Final Considerations
Development
of
a
hybrid
methodology (HW+SW redundancies)
able to perform runtime detection of
errors
in
μprocessor-based
SoCs
may have very good cost X benefit
returns.
[email protected]
4. Final Considerations
Returns:
 Minimization of area overhead and fab/development costs
(benefits of SW-based redundancy techniques)
 Improvement of performance and minimization of memory
overhead (benefits of HW-based redundancy techniques)
In summary:
 Minimize fab cost and performance degradation,
improving reliability
Target faults:
 Control flow errors
 Data handling errors
[email protected]
while
4. Final Considerations
A hybrid methodology (HW+SW
redundancies) explores:
• I-IP Core Architecture
• Software-Based Techniques
[email protected]
4. Final Considerations
 System architecture co-implemented in HW+SW to detect faults in
control-flow and application data. The main characteristics of this
architecture:
 SW-embedded structures at the application code level.
 Partial migration of the SW-embedded structures into HW:
specific I-IIP monitors application processor such as a “watch-dog”.
Communication channel between the HW+SW entities: driver
embedded in the OS Kernel and specific signals used to
communicate the I-IP with the application processor.
[email protected]