Formal Methods and Software Verification

Download Report

Transcript Formal Methods and Software Verification

Software Correctness Proofs
CIS 376
Bruce R. Maxim
UM-Dearborn
1
Formal Analysis
• Refers to tool-based methods used to explore,
debug, and verify formal specifications
• Methods
– Theorem proving
– Proof checking
– Model checking
– Animation and simulation
2
Formal Proof - part 1
• Use deductive reasoning
• Proofs are based on a formal system that includes
– set of primitives
• finite strings from a fixed alphabet
– set of axioms
• specifying the rules of behavior for the primitives
– set of inference rules
• allow deduction of additional true statements (known a
theorems) within the system
3
Formal Proof - part 2
• Deductive system
– axioms and inference rules for a formal system
• Theory
– axioms and derived theorems in a formal system
• Proof of theorem
– sequence of statement transformations that adheres to
the system’s inference rules
• s1, s2, s3, … , sn |- T
– theorem T is provable following the sequence si
4
Formal System Properties
• Consistent
– not possible to derive a statement and its contradiction
form the same set of initial statements
• Complete
– every true statement is provable
• Decidable
– there is an algorithm for determining whether any legal
statement is true
• Note: consistency must be present, completeness
and decidability would be nice
5
Proof Construction
• Forward argument (deductive calculus)
– starting with axioms and proven results the inference
rules are used to prove the desired consequent
• Backward argument (test calculus)
– starting with the desired result and applying the
inference rules to derive a known result, axiom, or
theorem
6
Program Verification
• Similar to writing a mathematical proof
• You must present a valid argument that is
believable to the reader
• The argument must demonstrate using
evidence that the algorithm is correct
• Algorithm is correct if code correctly
transforms initial state to final state
7
State of Computation
• Most programming algorithms are based on
the notion of transforming the inputs to
outputs
• The state of computation may be defined by
examining the contents of key variables
before and after the execution of each
statement
8
Assertions
• Assertions are facts about the state of the program
variables
• It is wasteful to spend your time looking at
variables that are not effected by a particular
statement
• Default assertion
– any variable not mentioned in the assertion for a
statement do not affect the state of computation
9
Use of Assertions
• Pre-condition
– assertion describing the state of computation
before statement is executed
• Post condition
– assertion describing the state of computation
after a statement is executed
• Careful use of assertions as program
comments can help control side effects
10
Code Verification Example
• Let’s assume we have an array with a search
operation that corresponds to the English
specifications below
– The procedure searches and array A of length N for a
value X
– If X is found than the value of Y is set to the array
index where X was found on exit
– If no array element in A matches X then Y will be set to
zero on exit
11
Example - Formal Specifications
• Pre-condition
–N>0
• Post condition
– {X = A[Y] and (1 <= Y <= N)} or
– {(Y = 0) and (k : (1 <= k <= N)  A[k] <> X)}
• Proof would need to show that algorithm
transformed state of computation from precondition to the post condition
12
13
14
Simple Algorithm
• Model {P} A {Q}
– P = pre-condition
– A = Algorithm
– Q = post condition
• Sum algorithm
{pre: x = x0 and y = y0}
z=x+y
{post: z = x0 + y0}
15
Sequence Algorithm
• Model
if
{P} A1 {Q1} and {Q1} A2 {Q}
then {P} A1 ; A2 {Q} is true
• Swap algorithm
{pre: x = x0 and y = y0}
temp = x
x=y
y = temp
{post: temp = x0 and x = y0 and y = x0}
16
Intermediate Assertions
• Swap algorithm
{pre: x = x0 and y = y0}
temp = x
{temp = x0 and x = x0 and y = y0}
x=y
{temp = x0 and x = y0 and y = y0}
y = temp
{post: temp = x0 and x = y0 and y = x0}
17
Conditional Statements
• Absolute value
{pre: x = x0}
if x < 0 then
y=-x
else
y=x
{post: y = | x0 |}
18
Intermediate Assertions
if x < 0 then
{x = x0 and x0< 0}
y=-x
{y = | x0 |}
else
{x = x0 and x0>= 0}
y=x
{y = | x0 |}
19
Understanding the While Loop
• Every well designed while loop must
– make progress to ensure eventual termination
– must maintain the loop invariant to ensure that it is valid
at each loop exit point
//invariant holds here
while (condition) do
//invariant holds here
make_progress
restore_invariant
//invariant and not(condition) hold here
20
Loop Invariant
• Type of assertion that describes the variables
which remain unchanged during the execution of a
loop
• In general the stopping condition should remain
unchanged during the execution of the loop
• Some people show the loop invariant as a
statement which becomes false when loop
execution is complete
21
Loop Invariant Example
k = 0;
//invariant: A[1], … , A[k] equal 0
//assert: N >= 0
while k < N do
k = k +1;
//make progress
A[k] = 0;
//restore invariant
//assert: A[1], … , A[k] equal 0 and k >= N
Note: say k=N will require an induction argument
22
While Loop Example
• Algorithm
while y <> 0 do
z=z+x
y=y–1
ans = z
• What does it do?
23
While Loop Assertions
{pre: x = x0 and z = z0 and y = y0 and y0 >= 0}
while y <> 0 do
begin
{0 < n <= y0 and z’ = z0 + n * x0}
z=z+x
y=y–1
end
{y = 0 and z’ = z0 + y0 * x0}
ans = z
{post: ans = z0 + y0 * x0}
24
Proof -1
• If y = 0 loop does not execute and no
variables change so
z = z + 0 * x = ans
• If we assume that for n = k if program
begins loop with y = k it will exit with
ans = z + k * x
25
Proof - 2
• We must prove that when program begins
loop with y = k + 1 it will exit loop with
ans = z + (k + 1) * x
• Suppose y = k + 1 at top of loop and the
body of the loop executes one time
x=x‘=x
y = y’ = (k + 1) – 1 = k
z = z’ = z + x
26
Proof - 3
• Since we are at the top of the loop with y = k, we
can use our induction hypothesis to get
ans = z’ + k * x’
• Substituting we get
ans = (z + x) + k * x
= z + (x + k * x)
= z + (1 + k) * x
= z + (k + 1) * x
27
Second While Loop Example
Prod = 0
I = 0
{X and N are initialized and Prod = I * X and I <= N}
while (I < N) do
begin
{Prod = I * X and I < N}
Prod = Prod + X
{Prod = (I + 1) * X and I < N}
I = I + 1
{Prod = I * X and I <= N}
end
{loop exited with Prod = I * X and I >= N}
{Prod = N * X}
28
Proof - 1
• Check I = 0
Verify that invariant is true
Prod = 0
Prod = 0 * X = 0
If N = 0 condition is false and loop terminates
If N > 0 condition is true and loop execution
continues
29
Proof - 2
• Assume correctness of the invariant at the top
of the loop when I = K
Prod = K * X
If K < N condition is true and the loop execution
continues
If K = N condition is false and loop terminates
So post condition is true
• So we are assuming that
X0 + X0 + X0 + … + X0 + X0 = K * X0 = Prod
k times
30
Proof - 3
• If we are at the top of the loop and execution
continues the body adds X0 to both sides
• Using our induction hypothesis we get
(X0 + X0 + X0 + … + X0 + X0) + X0 = K * X0 + X0
k times
• Which gives us
X0 + X0 + X0 + … + X0 + X0 + X0 = K * X0 + X0
k + 1 times
• Which can be written as
X0 + X0 + X0 + … + X0 + X0 + X0 = (K + 1) *X0 = Prod
31
Proof - 4
• Since assuming the invariant is true for K
we showed that it held for K + 1
• If the loop condition is true execution
continues
• otherwise is halts with
I=N+1
Prod = X0 * N
32
Counting Loop Example
• This loop stores the sum of the first I array
elements in position C[I]
{pre: max >= M >= 1 and C initialized}
for I = 1 to M
C[I] = C[I] + C[I – 1]
{post: I = M + 1 and
for each J = 1 to I – 1 : C’[J] = C[0] + … + C[J]}
33
Proof - 1
• Show for the case M = 1 that the program
will exit with
C[1] = C[1] + C[0] and C’[0] = C[0]
• If M = 1 loop executes with I = 1 and
C’[1] = C[0] + C[1]
• Loop is satisfied and result is
I = 2 and C[1] = C[1] + C[0] and C[0] = C[0]
• This gives us our intended result
34
Proof - 2
• For our induction hypothesis we assume
if loop reaches top with M = K the loop exits
C’[K] = C[0] + … + C[K]
…
C’[1] = C[0] + C[1]
C’[0] = C[0]
35
Proof - 3
• We need to show that if loop reaches the top
with M = K + 1 the loop ends with
C’[K + 1] = C[0] + … + C[K + 1]
C’[K] = C[0] + … + C[K]
…
C’[1] = C[0] + C[1]
C’[0] = C[0]
36
Proof - 4
• C’[K + 1] = C[K + 1] + C[(K + 1) - 1]
• C’[K + 1] = C[K + 1] + C[K]
• By our induction hypothesis C[K] now
contains C’[K] so
C’[K + 1] = C[K + 1] + C’[K]
C’[K + 1] = C[K + 1] + C[K] + … + C[0]
37
Proof - 5
• Loop condition is satisfied and
I = (K + 1) + 1 = K + 2
• The array should now contain
C’[K + 1] = C[0] + … + C[K + 1]
C’[K] = C[0] + … + C[K]
…
C’[1] = C[0] + C[1]
C’[0] = C[0]
38
These proof examples were posted on
the World Wide Web by
Ken Abernathy
Furman University
39
Proof Technique Sequent Calculus
• One type of deductive calculus
• A sequent is written  |- , which means /\ 
implies \/ , where  is a (possibly empty) list of
formulas {A1, …, An} and  is a (possibly empty)
list of formulas {B1, …, Bn}
– the formulas in  are called the antecedents
– the formulas in  are called the consequents
• To restate,  |-  means
A1 /\ … /\ An implies B1 \/ … \/ Bn
40
Sequent Calculus
• A sequent calculus proof is a tree of sequents whose root
is a sequent of the form |- T where T is the formula to be
proved and the antecedent is empty
• The proof tree is then generated by applying inference
rules of the form:
1 |- 1 … n |- n
 |- 
Rule N
• Intuitively, this rule replaces a leaf node in the proof tree
of form  |-  with the n new leaves specified in the rule.
If n is zero, that branch of the proof tree terminates.
41
Sequent Calculus - Example Rule 1
• The Propositional Axiom (Prop_Axiom) is one of the rules of
inference in sequent calculus. It has the following form form:
, A |- (A, 
Prop_Axiom
• Intuitively, this rule indicates that a proof branch is complete
when the sequent above is derived. Note that the consequent
means the following:
 /\ A implies A \/ 
which is obviously true.
42
Sequent Calculus - Example Rule 2
• The Rule for Conjunction on the Right (And_Right) is
another of the rules of inference in the sequent calculus. It
has the following form:
 |- A,  |- B, 
 |- (A /\ B, 
And_Right
• This rule is typical of many sequent calculus inference
rules which divide, but simplify, a branch of the proof tree.
Note that the consequent is replaced by two simpler
formulas which will be easier for a mechanized theorem
prover to deal with.
43
Sequent Calculus - Example Rule 3
• The Rule for Conjunction on the Left (And_Left) is another
of the rules of inference in the sequent calculus. It has the
following simple (non-branching) form:
A, B,  |- 
(A /\ B, |- 
And_Left
• This rule is typical of several sequent calculus inference
rules which simply restate the “obvious,” thereby providing
a form easier for a mechanized theorem prover to deal with.
44
Sequent Calculus - Example Rule 4
• The Rule for Implication on the Left (Implies_Left)
is another of the rules of inference in the sequent
calculus. It has the following form:
 |- A, B,  |- 
(A => B, |- 
Implies_Left
• Similar to the And_Right rule, this rule again splits
the proof into two cases, each of which will be
easier for the mechanical prover to deal with.
45
Sequent Calculus - Example Rule 5
• The Rule for Implication on the Right (Implies_Right)
is another of the rules of inference in the sequent
calculus. It has the following form:
A |- B, 
|- (A => B, 
Implies_Right
• This rule does not branch, but provides a form easier
for a mechanized theorem prover to deal with.
46
Sequent Calculus Proof Example
Theorem 1: (P => (Q => R)) =>
((P /\ Q) => R)
We begin the proof by forming the requisite sequent:
Antecedents:
none
Consequents:
Formula 1: (P => (Q => R)) =>
((P /\ Q) => R)
47
Proof Example - Step 1
As our first step we apply the rule Implies_Right. This rule
will decompose the entire formula. Remember there is an
implied “implies” in the sequent. In other words this sequent
could be written
|- (P => (Q => R)) => ((P /\ Q) => R).
Antecedents:
Formula 1: P => (Q => R)
Consequents:
Formula 1: (P /\ Q) => R
48
Proof Example Step 2
A second application of the rule Implies_Right will decompose
the formula below the line in a similar way. Remember that
rules applying to the “left” part of the sequent work on formulas
above the bar; rules applying to the “right” part of the sequent
work below the bar.
Antecedents:
Formula 1: P => (Q => R)
Formula 2: P /\ Q
Consequents:
Formula 1: R
49
Proof Example Step 3
We next apply the rule And_Left -- this rule will modify
(rewrite) Formula 2 above the line. Remember that all
formulas above the line are connected by AND’s; formulas
below the line are connected by OR’s.
Antecedents:
Formula 1: P => (Q => R)
Formula 2: P
Formula 3: Q
Consequents:
Formula 1: R
50
Proof Example Step 4
We next apply the rule Implies_Left -- this rule will modify
Formula 1 above the line. Remember that Implies_Left splits
the proof tree into two branches. We show and deal with
Case 1 first, then move to Case 2 later.
Case 1:
Antecedents:
Formula 1: Q => R
Formula 2: P
Formula 3: Q
Consequents:
Formula 1: R
51
Proof Example Step 5
To modify Formula 1 above the line, we next apply the rule
Implies_Left again. Again this splits the proof tree into two
branches. We show and deal with Case 1.1 first, then move to
Case 1.2 later.
Case 1.1:
Antecedents:
Formula 1: R
Formula 2: P
Formula 3: Q
Case 1.1 will now
yield to an
application of
Prop_Axiom
Consequents:
Formula 1: R
52
Proof Example Steps 5 & 6
As noted, an application of Prop_Axiom (Step 5) completes
Case 1.1. We now move to Case 1.2. This is the second case
resulting from the application of Implies_Left on the Case 1
sequent. Another application of Prop_Axiom (Step 6) completes
Case 1.2 (and in turn Case 1 itself).
Case 1.2:
Antecedents:
Formula 1: P
Formula 2: Q
Consequents:
Formula 1: Q
Formula 2: R
Case 1.2 will also
yield to an
application of
Prop_Axiom
53
Proof Example Step 7
Having completed the proof for Case 1, we now move to
Case 2. Recall that this is the second case resulting from
our first application of Implies_Left. Another application of
Prop_Axiom (Step 7) completes Case 2 (and in turn the
entire proof).
Case 2:
Antecedents:
Formula 1: P
Formula 2: Q
Consequents:
Formula 1: P
Formula 2: R
Case 2 will also
yield to an
application of
Prop_Axiom
54
Mechanical Theorem Provers
• Many mechanical theorem provers require human
interaction
• Users typically are required to choose the rule of
inference to be applied at each step
• The theorem prover may be able to discover (by
heuristic search) some rules to apply on its own
• The user needs to translate the statements to some
normal form prior to beginning
55
Proving Algorithm Correctness
• Suppose you have a software component that
accepts as input an array T of size N
• As output the component produces an T` which
contains the elements of T arranged in ascending
order
• How would we convert the code to its logical
counterpart and prove its correctness?
56
Bubble Sort Algorithm
lab1:
lab2:
T` = T
more = true
i=0
if (more ~= true) then go to end
not(more) = true
//** assertion needed
i=i+1
if i >= N then go to lab1
if T`(i) < T`(i+1) then go to lab2
//* assertion needed
exchange T`(i) with T`(i+1)
more = true
go to lab2
57
Step 1
• Write assertions to describe the components input
and output conditions
• input assertion
A1: (T is an array) & (T is of size N)
• output assertion
Aend: (T` is an array) &
( i if i < N then (T`(i) <= T`(i+1)) &
( i if i < N then j(T`(i) = T(j)) &
(T` is of size N)
58
Step 2
• Draw a flow diagram to represent the logical flow
through the component
• Indicate points where data transformations will
occur and write assertions
• Ex. Assuming a bubble sort is used two assertions
might be
– [(not(more) = true)) & (i < N) & T`(i) > T`(i+1))] 
[T`(i) is exchanged with T`(i+1)] //*
– [(not(more) = true)) & (I >= N)]  [T`(i) sorted] //**
59
Step 3
• From the assertions a we generate a series of
theorems to be proved
• If your first assertion is A1 and the first
transformation point has assertion A2 associated
with it the theorem to be proved is
A1  A2
• Once the list of theorems is established each must
be proved individually (order does not matter)
60
Steps 4 and 5
• We need to locate each loop in the flow diagram
and write an if-then assertion for each loop
condition
• To prove correctness, each logic path beginning
with A1 and ending with Aend Following each of
these paths allows us to demonstrate that the code
shows that the truth of the input condition will
lead to the truth of the output condition
61
Steps 6 & 7
• After identifying each logic paths the truth of each
path is proved rigorously (showing the the input
assertion implies the output assertion according to
the logic transformations found on that path)
• Finally you need to prove the program terminates
(which may mean an induction argument if loops
are involved)
62
Cost of Correctness Proofs - part 1
• Advantages
– You can discover algorithmic faults in the code
– Gives you a formal understanding of the logical
structures of the program
– Regular use of proofs forces you to be more precise in
specifying data, data structures and algorithmic rules
• Disadvantages
– Code is often smaller size than its proof
– It may take less effort to create code than to prove its
correctness
63
Cost of Correctness Proofs - part 2
• Disadvantages (continued)
– Large programs require complex diagrams and contain
many transformations to prove
– Nonnumeric algorithms are hard to represent logically
– Parallel processing is hard to represent
– Complex data structures require complex
transformations
– Mathematical proofs have occasionally been found to
be incorrect after years of use
64
Symbolic Execution
• Involves simulated execution of the program code using
symbols rather than data variables
• The test program is viewed as having an input state defined
by the input data and preconditions
• As each line of code is executed the program statement is
checked for state changes
• Each logical path in the program corresponds to an ordered
sequence of state changes
• The final state of each path must be a proper output state
• A program is correct if each input state generates the
appropriate output state
65
Symbolic Execution Example
• Consider the following lines of code
a = b + c;
if (a > d)
call_task1( );
else
call_task2( );
66
Symbolic Execution Steps
• A symbolic execution tool would decide that (a>d) can be
true or false, without worrying about the values assigned to
a and d
• This gives us two states
– (a > d) is false
– (a > d) is true
• All data values are presumed to fall into one of the
equivalence classes defined by the two states (so only a
small number of test cases need be considered in a proof)
• This technique has many of the same costs and
disadvantages of using logical correctness proofs
67
Structural Induction
• Induction can be used to show loop termination
and correctness of list processing algorithms
• To show the f(list) is true for every list you must
prove that
– f(list) is true for an empty list //the base case
– whenever f(list`) is true, so is f(x :: list) in other words
adding an element to the list preserves truth no matter
how the list is
//induction step
– f([x1, … ,xn]) is true after n steps
68
Generalizing Induction
• Suppose you want to show that
– k addlen(k. list) = k + nlength(list)
• Give the following definitions
nlength([ ])
nlength(x :: xs)
=
=
0
1 + nlength(xs)
addlen(k, [ ])
addlen(k, x::xs)
=
=
k
addlen(k + 1, xs)
69
Induction Correctness Proof
• k addlen(k. list) = k + nlength(list)
• Base case:
addlen(k, [ ])
= k
= k+0
= k + nlength([ ])
• Induction step - assume that
addlen(k, x::list`) = addlen(k + 1, list`)
= k + 1 + nlength(list`) //IHOP
= k + nlength(x :: list`) // must prove
70