Folie 1 - Vienna University of Technology
Download
Report
Transcript Folie 1 - Vienna University of Technology
Safety Critical Computer
Systems - Open Questions
and Approaches
Institut für
Computertechnik
ICT
Institute of
Computer Technology
Andreas Gerstinger
Institute for Computer Technology
February 16, 2007
Agenda
Safety-Critical Systems
Project Partners
Three research topics
Safety Engineering
Diversity
Software Metrics
Conclusion and Outlook
Institut für Computertechnik
2
Safety-Critical Systems
Institut für Computertechnik
Safety Critical Systems
A safety-critical computer system is a computer system
whose failure may cause injury or death to human
beings or the environment
Examples:
Aircraft control system (fly-by-wire,...)
Nuclear power station control system
Control systems in cars (anti-lock brakes,...)
Health systems (heart pacemakers,...)
Railway control systems
Communication systems
Wireless Sensor Networks Applications?
Institut für Computertechnik
4
SYSARI Project
SYSARI = SYstem SAfety Research in Industry
Goal of the project
to conduct and promote the research in system
safety engineering and safety-critical system design
and development
Close cooperation between ICT and Industry
One "shared" Employee (me)
Students conducting practical Diploma Theses
PhD Theses
Institut für Computertechnik
5
What is Safety?
“The avoidance of death, injury or poor health to customers,
employees, contractors and the general public; also
avoidance of damage to property and the environment”
Safety is also defined as "freedom from unacceptable
risk of harm"
A basic concept in System Safety Engineering is the
avoidance of "hazards"
Safety is NOT an absolute quantity!
Institut für Computertechnik
6
Safety vs. Security
These two concepts are often mixed up
In German, there is just one term for both!
System
Institut für Computertechnik
Security
Safety
= protection against
attacks
= doesn’t cause harm
7
SILs and Dangerous Failure Probability
Safety
Integrity
Level
High demand mode of operation
(Probability of dangerous failure per
hour)
SIL 4
10-9 P < 10-8
SIL 3
10-8 P < 10-7
SIL 2
10-7 P < 10-6
SIL 1
10-6 P < 10-5
Institut für Computertechnik
8
Project Partners
Institut für Computertechnik
Project Partner:
Austrian High Tech company
World leader in air traffic control
communication systems
700 employees, company based in
Vienna, customers all over the
world
http://www.frequentis.com
Institut für Computertechnik
10
Frequentis Voice Communication System
Enables communication between aircraft
and controller
Communication link must never fail!
Requirements:
Safety
High Availability and Reliability
Fault Tolerance
Other domains:
railway
ambulance, police, fire brigade,...
maritime
Safety Integrity Level 2
Institut für Computertechnik
11
Project Partner:
French company
68000 employees worldwide
Mission critical information
systems
25000 researchers
Nobel Prize in Physics 2007
awarded to Albert Fert, scientific
director of Thales research lab
http://www.thalesgroup.com
Institut für Computertechnik
12
Railway Signalling Systems
Signalling and Switching
Axle Counters
Applications for ETCS
An incorrect output may lead
to an incorrect signal causing
a major accident!
Safety Integrity Level 4
(highest)
Institut für Computertechnik
13
(Old) Interlocking Systems
Mechanical /
Electromechanical
Systems
Institut für Computertechnik
14
Signal Box / Interlocking Tower
Electric system with some electronics
Institut für Computertechnik
15
Modern Signal Box / Interlocking Tower
Lots of electronics and computer systems
Institut für Computertechnik
16
Safety Engineering
Institut für Computertechnik
What is a Hazard?
Hazard
physical condition of platform that threatens the safety of
personnel or the platform, i.e. can lead to an accident
a condition of the platform that, unless mitigated, can
develop into an accident through a sequence of normal
events and actions
"an accident waiting to happen"
Examples
oil spilled on staircase
failed train detection system at an automatic railway level
crossing
loss of thrust control on a jet engine
loss of communication
distorted communication
undetectably incorrect output
Institut für Computertechnik
18
Hazard Severity Level (Example)
Category
Id
.
CATASTROPHIC
I
General: A hazard, which may cause death, system
loss, or severe property or environmental damage.
CRITICAL
II
General: A hazard, which may cause severe injury,
major system, property or environmental damage.
MARGINAL
III
General: A hazard, which may cause marginal injury,
marginal system, property or environmental damage.
NEGLIGIBLE
IV
General: A hazard, which does not cause injury,
system, property or environmental damage.
Institut für Computertechnik
Definition
19
Hazard Probability Level (Example)
Occurrences
per year
Level
Probability [h-1]
Definition
Frequent
P ≥ 10-3
may occur several times
More than 10
a month
Probable
10-3 > P ≥ 10-4
likely to occur once a
1 to 10
year
Occasional
10-4 > P ≥ 10-5
likely to occur in the life
10-1 to 1
of the system
Remote
10-5 > P ≥ 10-6
unlikely but possible to
occur in the life of the 10-2 to 10-1
system
Improbable
10-6 > P ≥ 10-7
very unlikely to occur
Incredible
P < 10-7
extremely unlikely, if not
Less than 10-3
inconceivable to occur
Institut für Computertechnik
10-3 to 10-2
20
Risk Classification Scheme (Example)
Hazard Severity
Hazard
Probability
CATASTROPHIC
CRITICAL
MARGINAL
NEGLIGIBLE
Frequent
A
A
A
B
Probable
A
A
B
C
Occasional
A
B
C
C
Remote
B
C
C
D
Improbable
C
C
D
D
Incredible
C
D
D
D
Institut für Computertechnik
21
Risk Class Definition (Example)
Risk Class
Interpretation
A
Intolerable
B
Undesirable and shall only be accepted when
risk reduction is impracticable.
C
Tolerable with the endorsement of the authority.
D
Tolerable with the endorsement of the normal
project reviews.
Institut für Computertechnik
22
Risk Acceptability
Having identified the level of risk for the product we
must determine how acceptable & tolerable that risk is
Regulator / Customer
Society
Operators
Decision criteria for risk acceptance / rejection
Absolute vs. relative risk (compare with previous, background)
Risk-cost trade-offs
Risk-benefit of technological options
Institut für Computertechnik
23
Risk Tolerability
Hazard
Severity
Probability
Risk
Risk Criteria
Tolerable?
Yes
Institut für Computertechnik
No
Risk Reduction
Measures
24
Diversity
Institut für Computertechnik
Diversity
Goal: Fault Tolerance/Detection
Diversity is "a means of achieving all or part of
the specified requirements in more than one
independent and dissimilar manner."
Can tolerate/detect a wide range of faults
"The most certain and effectual check upon errors
which arise in the process of computation, is to cause
the same computations to be made by separate and
independent computers; and this check is rendered still
more decisive if they make their computations by
different methods."
Dionysius Lardner, 1834
Institut für Computertechnik
26
Layers of Diversity
abstraction
Diversity Examples
Concept of Operation
(e.g. specifications)
e.g. two different paradigms, such as
rule based and functional
Design
(e.g. design descriptions)
e.g. n version design
Implementation
(e.g. source code)
e.g. n version coding
Realisation
(e.g. object code)
e.g. diverse compilers
HW
(CPU, memory,...)
e.g. diverse CPU
Institut für Computertechnik
27
Examples for Diversity
Specification Diversity
Design Diversity
Data Diversity
Time Diversity
Hardware Diversity
Compiler Diversity
Automated Systematic Diversity
Testing Diversity
Diverse Safety Arguments
…
Institut für Computertechnik
Some faults to be targeted:
programming bugs,
specification faults, compiler
faults, CPU faults, random
hardware faults (e.g. bit flips),
security attacks,...
28
Compiler Diversity
Use of two
diverse compilers
to compile one
common source
code
...
Module A
{
int i;
int end;
get(end);
for i = 1 to end
result=func(i,result);
POS[i]=result;
next
}
...
Compiler
A
...
move $4, A
jmp $54256
add ($5436), B
...
Institut für Computertechnik
Common
Source Code
Compiler
B
...
add ($66533), A
ret
move $4, C
...
Diverse Compiler
- different manufacturer
- different version
- different compiler options
Diverse Object
Code (?)
29
Compiler Diversity: Issues
Targeted Faults:
Systematic compiler faults
Some Heisenbugs
Some systematic and permanent hardware faults (if
executed on one board)
Issues:
To some degree possible with one compiler and
different compile options (optimization on/off,…)
If compilers from different manufacturers are taken,
independence must be ensured
Institut für Computertechnik
30
Systematic Automatic Diversity
Artificial introduction of diversity to tolerate HW
Faults
(Automatic) Transformation of program P to a
semantically equivalent program P' which uses
the HW differently
e.g. different memory areas, different registers,
different comparisons,...
if A=B then if A-B = 0 then
A or B not (not A and not B)
Institut für Computertechnik
31
Systematic Automatic Diversity
What can be "diversified":
memory usage
execution sequence
statement structures
array references
data coding
register usage
addressing modes
pointers
mathematical and logic rules
Institut für Computertechnik
32
Systematic Automatic Diversity: Issues
Targeted Faults:
Systematic hardware faults
Permanent random hardware faults
Issues:
Can be performed on source code or assembler level
If performed on source code level, it must be
ensured that compiler does not "cancel out" diversity
(Software) Fault injection experiments showed an
improvement of a factor ~100 regarding HW faults
Institut für Computertechnik
33
Example: Diverse Calculation of Position
Accelerometer
Determine Position
from Speedometer
Determine Position
from Accelerometer
Voter A:
if PA=PB then send PA
else RaiseException
PositionA
Institut für Computertechnik
P
A
PB
PB
PA
Position P can be
calculated based on
speedometer and
accelerometer readings
Voter can also be
implemented diversely
PositionA and PositionB
could be transmitted in
different formats
Speedometer
Voter B:
if PA-PB=0 then send
PB
else RaiseException
PositionB
34
Open Issues
How can diversity be used most efficiently?
Can diversity be introduced automatically?
Which faults are detected/tolerated to which
extent?
How can the quality fo the diversity be
measured?
Can diversity be also used to detect security
intrusions?
Institut für Computertechnik
35
Software Metrics
Institut für Computertechnik
Software Metrics for Safety-Critical Systems
Problems
Which metrics should
safety-critical software
fulfill?
Which coding rules are
good and useful?
What are the desired
ranges for metrics?
Which metrics influence
maintainability?
Institut für Computertechnik
if P then
if Q then
S1
else
S2
if R then
S3
else
S4
else
S5
Sx
(block) statements
P, Q, R (boolean) predicates
37
Some RAW Metrics...
(Main) Language
P1
P2
P3
P4
P5
P6
Firefox
C#
C#
Java
Java
Java
C++
C/C++
Functions
1321
11383
1344
2997
1383
3863
102630
101
2170
119
413
225
455
8979
LOCs
34731
287279
21098
48650
23567
95289
2640688
eLOCs
25077
204737
16775
40182
19624
74774
2187030
LOC/Function
26.29
25.24
15.70
16.23
17.04
24.67
25.73
343.87
132.39
177.29
117.80
104.74
209.43
294.10
18.98
17.99
12.48
13.41
14.19
19.36
21.31
248.29
94.35
140.97
97.29
87.22
164.34
243.57
Max CC
135
213
58
281
43
222
751
Avg CC
3.36
2.62
2.83
3.23
2.67
2.87
4.28
CC >10
51
323
60
162
51
154
8802
CC >50
4
13
2
4
0
9
478
CC >10 [%]
3.86
2.84
4.46
5.41
3.69
3.99
8.58
CC >50 [%]
0.30
0.11
0.15
0.13
0.00
0.23
0.47
50.24
57.33
118.12
143.70
100.90
68.10
112.84
4.26
6.02
18.06
22.32
14.68
15.26
34.48
Classes
LOC/Class
eLOC/Function
eLOC/Class
Notices/KLOC
SevereNotices/KLOC
Institut für Computertechnik
38
Outline of Method
1. Create a questionnaire with relevant questions
regarding software quality and get answers from
expert developers for various software packages they
work with
2. Automatically measure potentially interesting metrics
of the software packages
3. Correlate questionnaire responses with the measured
metrics to find out which metric correlates with which
property
Institut für Computertechnik
39
Graph 3: Code Clarity vs. Return Points
1.8
Average number or return points
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1
1
1.5
2
2.5
3
3.5
4
4.5
5
Code Clarity (1-best, 5-worst)
Institut für Computertechnik
40
Graph 4: Internal Quality vs. CC
6
Average Cyclomatic Complexity
5
4
3
2
1
0
1
1.5
2
2.5
3
3.5
4
4.5
5
General Internal Quality (1-best, 5-worst)
Institut für Computertechnik
41
Summary of Results
Strongest correlation with perceived internal quality:
Comment density
Control Flow Anomalies
No correlation with perceived internal quality:
Cyclomatic Complexity
Average Method Size
Average File Size
...
Institut für Computertechnik
42
Conclusion and Outlook
Institut für Computertechnik
Further Related Topics
Agile Methods in Safety Critical Development
Hazard Analysis Methods
Safety Standards
Safety of Operating Systems
COTS Components for Safety-Critical Systems
Safety Aspects of Modern Programming Languages
(Java, C#.NET)
Fault Detection, Correction and Tolerance
Safety and Security Harmonisation
Linux in Safety-Critical Environments
Online Tests to detect hardware faults
Institut für Computertechnik
44
Conclusion
Many open issues in this field...
All research activities in SYSARI project
practically motivated
Number of safety-critical systems increases
International Standards play a vital role (e.g.
IEC 61508)
Contact:
Andreas Gerstinger: [email protected]
Institut für Computertechnik
45