trailblazer - Academia Sinica

Download Report

Transcript trailblazer - Academia Sinica

new frontiers in
formal software verification
s0
s1
f
Gerard J. Holzmann
[email protected]
23 spacecraft and 10 science instruments
Exploring the Solar System
model checking
static analysis
JASON 2
KEPLER
DAWN
WISE
INSTRUMENTS
CLOUDSAT
AQUARIUS
Earth Science
• ASTER
• MISR
MRO
• TES
• MLS
• AIRS
JUNO
VOYAGER 2
Planetary
• MIRO
VOYAGER 1
CASSINI
EPOXI/
DEEP
IMPACT
STARDUST
ACRIMSAT
MARS ODYSSEY
GRAIL
SPITZER
• Diviner
• MARSIS
Astrophysics
• Herschel
• Planck
JASON 1
GRACE
OPPORTUNITY
MSL
GALEX
2
formal software verification
• after some ~30 years of development, is still
rarely used on industrial software
– primary reasons:
• it is (perceived to be) too difficult
• it is takes too long (months to years)
• even in safety critical applications, software
verification is often restricted to the verification
of models of software, instead of software
• goal:
– make software verification as simple as
testing and as fast as compilation
3

verification of the PathStar switch
(flashback to 1999)
• a commercial data/phone switch designed
in Bell Labs research (for Lucent Technologies)
• newly written code for the core call
processing engine
• the first commercial call processing code
that was formally verified
– 20 versions of the code were verified with model
checking during development 1998-2000 with a
fully automated procedure
4
software structure
basic call processing
plus a long list of features
(call waiting, call forwarding,
three-way calling, etc., etc.)
PathStar Code (C)
call processing
(~30KLOC)
traditional hurdles:
~10%
call processing
control kernel
feature interaction
feature breakage
concurrency problems
race conditions
deadlock scenarios
non-compliance with
legal requirements, etc.
5
complex feature precedence relations
detecting undesired feature interaction is a serious problem
6
the verification was automated in 5 steps
code
2
@
PathStar
call processing
abstraction
map
3
context
1
4
feature
requirements
Spin
5
msc
bug reports
property violations
7
1. code conversion
...
@dial:
switch(op) {
default:
/* unexpected input */
goto error;
case Crdtmf:
/* digit collector ready */
x->drv->progress(x, Tdial);
time = MSEC(16000); /* set timer */
PathStar
implied state
machine:
dial
@:
switch(op) {
default: /* unexpected input */
goto error;
case Crconn:
goto B@1b;
case Cronhook: /* caller hangs up */
x->drv->disconnect(x);
@:
if(op!=Crconn && op!=Crdis)
goto Aidle;
...etc...
Crdtmf
Crconn
else
dial1
Cronhook
dial2
else
error
8
2. defining abstractions
• to verify the code we convert it into an automaton:
a labeled transition system
– the labels (transitions) are the basic statements from C
• each statement can be converted via an abstraction – which
is encoded as a lookup table that supports three possible
conversions :
– relevant:
– partially relevant:
– irrelevant to the requirements:
keep
map/abstract
hide
(~ 60%)
(~ 10%)
(~ 30%)
• the generated transition system is then checked against the
formal requirements (in linear temporal logic) with the model
checker
– the program and the negated requirements are converted into automata, and the model checker computes their intersection
9
3. defining the context
C
code
bug
reporting
model
extraction
map
environment
model
Spin
model
database of
feature
requirements
10
4. defining feature requirements
a sample property:
“always when the subscriber
goes offhook, a dialtone is
generated”
-automaton
failure to satisfy this requirement:
<>
eventually,
the subscriber goes offhook
/\
and
X
thereafter, no dialtone is
U
generated until the next
onhook.
in Linear Temporal Logic this is written:
<> (offhook /\ X ( !dialtone U onhook))
mechanical
conversion
11
5. verification
LTL requirement
logical
negation
C code
abstraction
map
environment
model
model
extractor
bug report
*
no dialtone
generated
12
hardware support (1999)
client/server
sockets code
scripts
13
iterative search refinement
t=5 min.
t=15 min.
t=40 min.
14
each verification task is run multiple times, with increasing accuracy
performance (1999) “bugs per minute”
percent
of bugs
reported
100
(60)
75
(number
of bugs
(50) reported)
50
(30)
25
(15)
10
20
30
40
minutes since
start of check
15
that was 1999, can we do better now?
1999
2012
• 16 networked computers
running the plan9 operating
system
– 500 MHz clockspeed
(~8GHz equiv)
– 16x128 Mbyte of RAM
(~2GB equiv)
• 32-core off-the-shelf system,
running standard Ubuntu
Linux (~ $4K USD)
– 2.5 GHz clockspeed
(~80GHz equiv)
– 64 Gbyte of shared RAM
difference:
approx. 10x faster, and 32x more RAM
does this change the usefulness of the approach?
16
performance in 2012: “bugs per second”
number
of bugs
found
32-core PC,
64 GB RAM
(2012)
2.5GHz per core
Ubuntu Linux
(1999)
11 bug reports
after 1 second
10 seconds
10 min
16 PCs,
128 MB per PC
500MHz
Plan9 OS
number of seconds
since start of check
17
side-by-side comparison
1999
2012
• 25% of all bugs
reported in 180 seconds
(15 bugs)
• 50% of all bugs
reported in 480 seconds
(30 bugs)
• 16 CPU networked system
• 15% of all bugs
reported in 1 second
(11 bugs)
• 50% of all bugs
reported in 7 seconds
(38 bugs)
• 1 desktop PC
18
generalization: swarm verification
• goal: leverage the availability of large
numbers of CPUs and/or cpu-cores
– if an application is too large to verify
exhaustively, we can define a swarm of
verifiers that each tries to check a randomly
different part of the code
• using different hash-polynomials in bitstate
hashing and different numbers of polynomials
• using different search algorithms and/or search
orders
– use iterative search refinement to dramatically
speedup error reporting (“bugs per second”)
19
swarm search
20
spin front-end
http://spinroot.com/swarm/
$ swarm –F config.lib –c6 > script
swarm: 456 runs, avg time per cpu 3599.2 sec
$ sh ./script
swarm configuration file:
# range
k
the user specifies:
1. # cpus to use
2. mem / cpu
3. maximum time
1
# limits
depth
cpus
memory
time
vector
speed
file
4
# min and max nr of hash functions
10000
128
64MB
1h
512
250000
model.pml
# max search depth
# nr available cpus
# max memory to be used; recognizes MB,GB
# max time to be used; h=hr, m=min, s=sec
# bytes per state, used for estimates
# states per second processed
# the spin model
# compilation options (each line defines a search mode)
-DBITSTATE
# standard dfs
-DBITSTATE -DREVERSE
# reversed process ordering
-DBITSTATE -DT_REVERSE
# reversed transition ordering
-DBITSTATE –DP_RAND
# randomized process ordering
-DBITSTATE –DT_RAND
# randomized transition ordering
-DBITSTATE –DP_RAND –DT_RAND # both
# runtime options
-c1 -x -n
21
many small jobs do the work of one large job
– but much faster and in less memory
states
reached
100% coverage
swarm
search
linear
scaling
100% coverage with
a swarm of 100 using
0.06% of RAM each (8MB)
compared to a single
exhaustive run (13GB)
#processes (log)
(DEOS O/S model)
22
thank you!
tools mentioned:
• Spin:
http://spinroot.com
(1989)
o parallel depth-first search added
2007
o parallel breadth-first search added 2012
• Modex: http://spinroot.com/modex (1999)
• Swarm: http://spinroot.com/swarm (2008)
23