Using First-Order Theorem Provers in Data Structure

Download Report

Transcript Using First-Order Theorem Provers in Data Structure

Using First-Order Theorem Provers
in Data Structure Verification
Charles Bouillaguet
Viktor Kuncak
Martin Rinard
Ecole Normale Supérieure,
Cachan, France
MIT CSAIL
Implementing Data Structures is Hard




Often small, but complex code
Lots of pointers
Unbounded, dynamic allocation
Complex shape invariants
 Dag
 Properties involving arithmetic (ordering…)
 Need strong invariants to guarantee
correctness
 e.g. lookup in ordered tree needs sortedness
How to obtain reliable data structure
implementations?
 Approach
 Prove that the program is correct
 For all program executions (sound)
 Verified properties:
 Data structure operations do not crash
 Data structure invariants are preserved
 Data structure content is correctly updated
Infrastructure
 Jahob system for verifying data structure
implementation
 Kuncak, Wies, Zee, Rinard, Nguyen,
Bouillaguet, Schmitt, Marnette, Bugrara
 Analyzed programs: subset of Java
 Specification : subset of Isabelle’s language
Summary of Verified Data Structures
 Implementations of relations





Add a binding
Remove all bindings for a given key
Test key membership
Retrieve data bound to a key
Test emptiness
 Verified implementations
 Linked list
 Ordered tree
 Hash table
An Example : Ordered Trees
 Implementation of a finite map
 Operations
key
value
right
left
 insert
 lookup
 remove
 Representation invariants:
 tree shaped (acyclicity, unique
parent)
 ordering constraints
Sample code
public static FuncTree update(int k, Object v, FuncTree t)
{
FuncTree new_left, new_right; Object new_data; int new_key;
if (t==null) {
new_data = v; new_key = k;
new_left = null; new_right = null;
} else {
if (k < t.key) {
new_left = update(k, v, t.left); new_right = t.right;
new_key = t.key; new_data = t.data;
} else if (t.key < k) { … } else {
new_data = v; new_key = k;
new_left = t.left; new_right = t.right;
}}
FuncTree r = new FuncTree();
r.left = new_left; r.right = new_right;
r.data = new_data; r.key = new_key;
return r;
}
Sample code
public static FuncTree update(int k, Object v, FuncTree t)
/*: requires "v ~= null”
ensures "result..content = t..content - {(x,y). x=k} + {(k,v)} */
{
FuncTree new_left, new_right; Object new_data; int new_key;
if (t==null) {
new_data = v; new_key = k;
new_left = null; new_right = null;
} else {
if (k < t.key) {
new_left = update(k, v, t.left); new_right = t.right;
new_key = t.key; new_data = t.data;
no null dereferences
} else if (t.key < k) { … } else {
new_data = v; new_key = k;
new_left = t.left; new_right = t.right;
}}
FuncTree r = new FuncTree();
postcondition
holds
and
r.left = new_left;
r.right
= new_right;
invariants
preserved
r.data
= new_data;
r.key = new_key;
//: "r..content" := "t..content - {(x,y). x=k} + {(k,v)}";
return r;
}
3 lines spec
30 lines code
Ordered tree: interface
public ghost specvar content :: "(int * obj) set" = "{}";
public static FuncTree empty_set()
ensures "result..content = {}"
public static FuncTree add(int k, Object v, FuncTree t)
requires "v ~= null & (ALL y. (k,y) ~: t..content)”
ensures "result..content = t..content Un {(k,v)}”
public static FuncTree update(int k, Object v, FuncTree t)
requires "v ~= null”
ensures "result..content = t..content - {(x,y). x=k} + {(k,v)}”
public static Object lookup(int k, FuncTree t)
ensures "((k, result) : t..content)
| (result = null & (ALL v. (k,v) ~: t..content))”
public static FuncTree remove(int k, FuncTree t)
ensures "result..content = t..content - {(x,y). x=k}”
Representation Invariants
public final class FuncTree{
private int key;
abstract set-valued field
private Object data;
private FuncTree left, right;
/*: public ghost specvar content :: "(int * obj) set";
tuples
invariant ("content definition") "this ~= null -->implicit universal
quantification
content = {(key, data)} Un left..content Un
right..content"over this
invariant ("nullequality
implies empty") "this = null --> content = {}"
between sets
invariant ("left children are smaller")
"ALL k v. (k,v) : left..content --> k < key”
invariant ("right children are bigger")
"ALL k v. (k,v) : right..content --> k > key"
*/
explicit quantification
arithmetic
How could these properties be verified?
Standard Approach
eauto ; intros . intuition ; subst .
apply Extensionality_Ensembles. unfold Same_set.
unfold Included. unfold In. unfold In in H1.
intuition. destruct H0.
destruct (eq_nat_dec x1 ArraySet_size).
subst. rewrite arraywrite_match in H0 ; auto.
intuition. subst. apply Union_intror.
auto with sets. assert (x1 < ArraySet_size). omega.
clear n. apply Union_introl.
rewrite arraywrite_not_same_i in H0.
unfold In. exists x1. intuition.omega.
 Transform program into a logic formula
 Using weakest precondition
inversion H0 ; subst ; clear H0. unfold In in H3.
 The program is correct iff the
formula
valid
destruct
H3. exists x1.is
intuition.
low efficiency
 1 line per grad student-minute
 parallelization looks non-trivial

rewrite arraywrite_not_same_i.
intuition ; omega. omega.
exists ArraySet_size. intuition. inversion H3.
subst. rewrite arraywrite_match ;
trivial.
 Prove the formula
 Very difficult formulas: interactively (Coq, Isabelle)
 Decidable classes: automated (MONA, CVCL, Omega)
 This talk: difficult formulas in automated way :)
Formulas in Jahob
 Very expressive specification language
 Higher-Order features
 How to prove formulas automatically?
 Convert them to something simpler
 Decidable classes
 First-Order Logic
Automated reasoning in Jahob
Why FOL?
 Existing theorem provers:
 SPASS, E, Vampire, Theo, Prover9, …
 continuously improving (yearly competition)
 Effective on formulas with short proofs
 Handle nicely formulas with quantifiers
HOL  FOL
 Ideas :




avoid axiomatizing rich theories
Translate what can naturally be expressed in FOL
soundly approximate the rest
Sound, incomplete approach
 Full details in long version of the paper




(x,y) ∈ z.content ⇢ Content(x,y,z)
w.f := y ⇢(x=y ⋀ w=v) ⋁ (x ≠ y ⋀ w=f(y) )
λx.E = λx.F⇢∀x. E=F
…
Arithmetic
 Numbers are uninterpreted constants in FOL
 Provers do not know that 1+1=2 !
 Still need to reason about arithmetic…
 Our Solution
 Provide partial, incomplete axiomatization
• Still cannot deduce 1+1=2 !
• comparison between constants in formula
 Satisfactory results in practice
• ordering of elements in tree
• array bound checks
Observation

Most formulas are easy to prove


ie in no measurable time
have very short proofs (in # of resolution step)

Problem often concentrated in a small number
that take very long to prove

We applied two existing techniques to make
them easier
1. Eliminating type/sort information
2. Filtering unnecessary assumptions
Sort Information
 Specification language has sorts
 Integers
 Objects
 Boolean
 Translate to unsorted FOL
∀(x : Obj). P(x)
⇣
∀x. Obj(x) ⇒P(x)
Sort Information
 Encoding sort information
 bigger formulas
 longer proofs
 Formulas become harder to prove
 Temptation to omit sort information…
Effect on hard formulas
 Formulas that take more than 1s to prove, from
the Tree implementation (SPASS)
Benchmark
Time (s)
with
Tree.remove
Tree.remove_max
w/o
Proof length
with
w/o
Generated
clauses
with
w/o
5.3
1.1
799
155
18 376
9 425
3.6
0.3
1 781
309
19 601
1 917
9.8
4.9
1 781
174
33 868
27 108
8.1
0.5
1 611
301
31 892
3 922
8.1
4.7
1 773
371
37 244
28 170
7.9
0.3
1 391
308
41 354
3 394
+∞
0.22
78.9
6.8
2 655
1 159
177 755
19 527
34.8
0.8
4 062
597
115 713
5 305
97
1 075
Omitting Sorts (cont’d)
 Great speed-up (more than x10 sometimes) !
 However:
∀ (x y:S). x = y
∃ (x y:T). x ≠ y
 Satisfiable with sorts (S={a}, T={b,c})…
 Unsatisfiable without!
 Omitting sort guards breaks soundness!!!
 Possible workaround: type-check generated proof
 When it is possible to skip type-checking ?
Omitting Sorts Result
We proved the following
Theorem. Suppose that
i. Sorts are pair-wise disjoint (no sub-sorting)
ii. Sorts have the same cardinality
Then omitting sort guards is
sound and complete
This justify this useful optimization
Assumption Filtering
 Provers get confused by too many assumptions
 Lots of useless assumptions
 Hardest shown benchmark needs 12 out of 56
 Big benchmark: on average 33% necessary
 Assumption filtering
 Try to eliminate irrelevant assumptions automatically
 Give a score to assumption based on relevance
Experimental results
Benchmark
lines of
lines of
# of
code specification methods
verif.
time
Sets as
imperative list
60
24
9
18s
Relation as
functional Linked
list
76
26
9
12s
186
38
10
178s
69
53
10
119s
Relation as
functional
Ordered trees
Relation as hash
table (using f.list)
Verification effort
 Decreased as we improved the system
 functional list was easy
 a few days for trees
 two hours for simple hash table
 FOL : Currently most usable method for
these kind of data structures
Related work
 Interactive Provers – Isabelle, Coq, HOL, PVS, ACL2
 First-Order ATP
 Vampire – Voronkov [04]
 SPASS – Weidenbach [01]
 E – Shultz [IJCAR04]
 Program Checking
 ESC/Java2 – Kiniry, Chalin, Hurlin
 Krakatoa – Marche, Paulin-Mohring, Urbain [03]
 Spec# – Barnett, DeLine, Jacobs, Fähndrich, Leino, Schulte, Venter
[05]
 Hob system: verify set implementations (we verify relations)
 Shape analysis
 PALE - Møller and Schwartzbach [PLDI01]
 TVLA - Sagiv, Reps, and Wilheim [TOPLAS02]
 Roles - Kuncak, Lam, and Rinard [POPL02]
Multiple Provers - Screenshot
Conclusion
 Jahob verification system
 Automation by translation HOLFOL
 omitting sorts theorem gives speedup
 filtering automates selection of assumptions
 Promising experimental results
 strong properties: correct implementation
• Do not crash
• operations correctly update the content, clarifies behavior in
case of duplicate keys, …
• representation invariants preserved (ordering, treeness, each
element is in appropriate bucket)
 relatively fast
 verification effort much smaller than using interactive
provers
Thank you
Formal Methods are the Future of computer Science.
Always have been…
Always will be.
Questions ?
Converting to GCL
 Conditionnal statement: easy
 [| if cond then tbranch else fbranch |] =
(Assume cond; [| tbranch|] )
 (Assume !cond; [| fbranch|] )
 Procedure calls:
 Could inline (potentially exponential blowup)
 Desugaring (modularity) :
• [| r = CALL m(x, y, z) |] =
Assert (m’s precondition);
Havoc r;
Havoc {vars modified by m} ;
Assume (m’s postcondition)
Converting to GCL (cont’d)
 Loops: invariant required
 [| while /*: invariant */ (condition) {lbody} |] =
assert invariant;
havoc vars(lbody);
assume invariant;
((assume condition;
[| lbody |];
assert invariant;
assume false)
 (assume !condition))
invariant hold initially
no assumptions on variables
except that invariant hold
condition hold
invariant is preserved
no need to verify anything more
or condition do not hold
and execution continues
Verification condition for remove
((((fieldRead Pair_data null) = null) & ((fieldRead FuncTree_data null) = null) & ((fieldRead FuncTree_left null) = null) & ((fieldRead
FuncTree_right null) = null) & (ALL (xObj::obj). (xObj : Object)) & ((Pair Int FuncTree) = {null}) & ((Array Int FuncTree) = {null}) & ((Array Int
Pair) = {null}) & (null : Object_alloc) & (pointsto Pair Pair_data Object) & (pointsto FuncTree FuncTree_data Object) & (pointsto FuncTree
FuncTree_left FuncTree) & (pointsto FuncTree FuncTree_right FuncTree) & comment ''unalloc_lonely'' (ALL (x::obj). ((x ~: Object_alloc) -->
((ALL (y::obj). ((fieldRead Pair_data y) ~= x)) & (ALL (y::obj). ((fieldRead FuncTree_data y) ~= x)) & (ALL (y::obj). ((fieldRead FuncTree_left y)
~= x)) & (ALL (y::obj). ((fieldRead FuncTree_right y) ~= x)) & ((fieldRead Pair_data x) = null) & ((fieldRead FuncTree_data x) = null) &
((fieldRead FuncTree_left x) = null) & ((fieldRead FuncTree_right x) = null)))) & comment ''ProcedurePrecondition'' (True & comment
''FuncTree_PrivateInv content definition'' (ALL (this::obj). (((this : Object_alloc) & (this : FuncTree) & ((this :: obj) ~= null)) --> ((fieldRead
(FuncTree_content :: (obj => ((int * obj)) set)) (this :: obj)) = (({((fieldRead (FuncTree_key :: (obj => int)) (this :: obj)), (fieldRead (FuncTree_data
:: (obj => obj)) (this :: obj)))} Un (fieldRead (FuncTree_content :: (obj => ((int * obj)) set)) (fieldRead (FuncTree_left :: (obj => obj)) (this :: obj))))
Un (fieldRead (FuncTree_content :: (obj => ((int * obj)) set)) (fieldRead (FuncTree_right :: (obj => obj)) (this :: obj))))))) & comment
''FuncTree_PrivateInv null implies empty'' (ALL (this::obj). (((this : Object_alloc) & (this : FuncTree) & ((this :: obj) = null)) --> ((fieldRead
(FuncTree_content :: (obj => ((int * obj)) set)) (this :: obj)) = {}))) & comment ''FuncTree_PrivateInv no null data'' (ALL (this::obj). (((this :
Object_alloc) & (this : FuncTree) & ((this :: obj) ~= null)) --> ((fieldRead (FuncTree_data :: (obj => obj)) (this :: obj)) ~= null))) & comment
''FuncTree_PrivateInv left children are smaller'' (ALL (this::obj). (((this : Object_alloc) & (this : FuncTree)) --> (ALL k. (ALL v. (((k, v) : (fieldRead
(FuncTree_content :: (obj => ((int * obj)) set)) (fieldRead (FuncTree_left :: (obj => obj)) (this :: obj)))) --> (intless k (fieldRead (FuncTree_key ::
(obj => int)) (this :: obj)))))))) & comment ''FuncTree_PrivateInv right children are bigger'' (ALL (this::obj). (((this : Object_alloc) & (this :
FuncTree)) --> (ALL k. (ALL v. (((k, v) : (fieldRead (FuncTree_content :: (obj => ((int * obj)) set)) (fieldRead (FuncTree_right :: (obj => obj)) (this
:: obj)))) --> ((fieldRead (FuncTree_key :: (obj => int)) (this :: obj)) < k))))))) & comment ''t_type'' (((t :: obj) : (FuncTree :: obj set)) & ((t :: obj) :
(Object_alloc :: obj set)))) --> ((comment ''TrueBranch'' (((t :: obj) = null) :: bool) --> (comment ''ProcedureEndPostcondition'' ((((fieldRead
(FuncTree_content :: (obj => ((int * obj)) set)) (null :: obj)) = ((fieldRead (FuncTree_content :: (obj => ((int * obj)) set)) (t :: obj)) - {p. (EX x y. ((p
= (x, y)) & (x = (k :: int))))})) & (ALL (framedObj::obj). (((framedObj : Object_alloc) & (framedObj : FuncTree)) --> ((fieldRead FuncTree_content
framedObj) = (fieldRead FuncTree_content framedObj))))) & comment ''FuncTree_PrivateInv content definition'' (ALL (this::obj). (((this :
Object_alloc) & (this : FuncTree) & ((this :: obj) ~= null)) --> ((fieldRead (FuncTree_content :: (obj => ((int * obj)) set)) (this :: obj)) =
(({((fieldRead (FuncTree_key :: (obj => int)) (this :: obj)), (fieldRead (FuncTree_data :: (obj => obj)) (this :: obj)))} Un (fieldRead
(FuncTree_content :: (obj => …
And 200 more kilobytes…
Infeasible to prove directly
Splitting heuristic
 Verification condition is big conjunction
 conjunctions in postcondition
 proving each invariant
 proving each branch in program
 Solution: split VC into individual conjuncts
 Prove each conjunct separately
 Each conjunct has form
H1 /\ … /\ Hn  Gi
Tree.Remove has 230 such conjuncts
 How do we prove them?
Detupling (cont’d)
 Complete rules:
Handling of Fields (cont’d)
 We dealt with field updates
 New function expressed in terms of old one
 Base case: field variables
 Natural encoding in FOL using functions:
x = y.f ! x = f(y)
Future work
 Verify more examples
 balanced trees
 fancy priority queues (binomial, Fibonacci, …)
 hash table with dynamic resizing
 hash function
 verify clients of data structures
 Improve assumption filtering
 take rarity of symbols into account
 check for occurring polarity
…