Abstract Interpretation - Dipartimento di Informatica

Download Report

Transcript Abstract Interpretation - Dipartimento di Informatica

Abstraction and Approximation
via Abstract Interpretation:
a systematic approach to program analysis
and verification
Giorgio Levi
Dipartimento di Informatica, Università di Pisa
[email protected]
http://www.di.unipi.it/~levi.html
1
Abstraction and approximation

two relevant concepts in several areas of
computer science (and engineering)
– to reason about complex systems
– to make reasoning computationally feasible
2
Abstract Interpretation
(Cousot & Cousot, POPL 77 & 79)

a 20-years old technique to systematically
handle abstraction and approximation
– born to describe (and prove correct) static
analyses (for imperative programs)
– popular mainly in declarative paradigms
– viewed today as a general technique to reason
about semantics at different levels of abstraction
– successfully applied to distributed and mobile
systems and to model checking
– recently applied to program verification
3
Abstract Interpretation,
Semantics, Analysis Algorithms

how abstract interpretation is often used in
static program analysis
– a semantics
– an analysis algorithm developed by ad-hoc
techniques
– the A.I. Theory (definition of an abstract domain) is
used to prove that the algorithm is correct, i.e.,
that its results are an approximation of the
property to be analyzed
4
Abstract Interpretation,
Semantics, Analysis Algorithms

the abstract interpretation I like
– a semantics
– an abstract domain designed to model the
property to be analyzed
– the A.I. Theory is used to systematically derive the
abstract semantics
– the analysis algorithm is exactly the computation
of the abstract semantics and is correct by
construction
5
Abstract Interpretation
Theory in 4 Steps




concrete and abstract domain
the Galois insertion
abstract operations
from the concrete to the abstract semantics
6
Concrete and Abstract Domains



two complete partial orders
– the partial orders reflect precision
smaller is better
concrete domain (C, )
– C has the structure of a powerset
abstract domain (A, )
– each abstract value is a description of “a set
of” concrete values
7
The Sign Abstract Domain


concrete domain
(P(Z), )
– sets of integers
abstract domain
(Sign, )
top
0-
0+
0
+
bot
8
Galois insertions
(C, ), (A, )
: A  C (concretization)
: C  A (abstraction)

C
 ,  monotonic
xC. x  ((x))
yA. ((y)) = y
 ,  mutually determine each
other
A

9
The sign example
sign (y) = glb of
sign (x)
–
–
–
–
–
–
–
– bot , if y= 
– - , if y  {y|y<0}
, if x= bot
{y|y>0}, if x= +
{y|y0}, if x= 0+
{0}, if x= 0
{y|y0}, if x= 0{y|y<0}, if x= Z, if x= top
– 0- , if y  {y|y0}
– 0 , if y = {0}
– 0+ , if y  {y|y  0}
– + , if y  {y|y>0}
– top , if y  Z
top
0-
0+
0
+
bot
10
Abstract Operations


the concrete semantic evaluation function is defined
in terms of primitive semantic operations fi on C
for each fi we need to provide a corresponding fi
defined on A
 f i
must be locally correct, i.e. x1,..,xn C.
fi (x1,..,xn)  (fi ((x1),..,(xn)))


the optimal (most precise) abstract operator is
fi (y1,..,yn) = (fi ((y1),.., (yn)))
the operator is complete (precise) if x1,..,xn C.
(fi (x1,..,xn)) = fi ((x1),.., (xn)))
11
Times Sign
bot
bot
00
0+
+
top
bot
bot
bot
bot
bot
bot
bot
-
0-
0
bot bot bot
+
0+ 0
0+ 0+ 0
0 0
0
0- 00
00
top top
0
0+
+
bot bot
00- 00
0
0+ 0+
0+ +
top top
top
bot
top
top
0
top
top
top
12
Plus Sign
bot
00
0+
+
top
bot
-
bot
bot
bot
bot
bot
bot
bot
bot
0top
top
top
0-
0
bot bot
00000
top
0+
top
0+
top top
0+
bot
top
top
0+
0+
+
top
+
top
bot
top
top
0+
+
+
top
bot
top
top
top
top
top
top
13
The Sign example




Times and Plus are the usual operations lifted to
P(Z)
both Timessign and Plussign are optimal (hence
correct)
Timessign is also complete (no approximation)
Plussign is necessarily incomplete
sign(Times({2},{-3})) =
Timessign(sign({2}),sign({-3}))
sign(Plus({2},{-3})) 
Plussign(sign({2}),sign({-3}))
14
The Abstract Semantics

F = concrete semantic evaluation function
– if we start from a standard semantic definition, the lifting to the
powerset (collecting semantics) is simply a conceptual operation


lfp F = concrete semantics
F = abstract semantic evaluation function
– obtained by replacing in F every concrete semantic operation by
a corresponding (locally correct) abstract operation

lfp F = abstract semantics
 global
correctness
 (lfp F)  lfp F
– the abstract semantics is less precise than the abstraction of the
concrete semantics
15
Where does the approximation come
from?


incomplete abstract operations
more execution paths in the abstract control flow
– the abstract state has not enough information to make
deterministic choices
– conditionals, pattern matching, etc.
 the
set of resulting abstract states is turned into a
single abstract state, by performing an abstract
lub operation
16
Approximation in
abstract Sign computations



concrete state [x={3}]
if x>2 then y:=3 else y:=-5;
concrete state [x={3}, y={3}]
top
0-
0+
0
bot
+
 abstract
state [x=+]
 if x>2 then y:=3 else y:=-5;
– the abstract guard “can be
both true and false”
– both paths need to be
abstractly evaluated
– the two resulting abstract
states are merged by
performing a lub in Sign
 abstract
state [x=+,y=top]
17
 (lfp F)  lfp F
why computing lfp F?


lfp F cannot be computed
in finitely many steps
–  steps are in general
required
lfp F can be computed in
finitely many steps, if the
abstract domain is finite or
at least noetherian
– no infinite increasing
chains

static analysis 1
– noetherian abstract domain
– termination, approximation

static analysis 2
– non-noetherian domain
– termination via widening
– further approximation

comparative semantics
– non-noetherian domain
– abstraction without
approximation (completeness)
 (lfp F) = lfp F
18
Static Analysis




abstract domain and
Galois connection to
model the property
(possibly optimal) correct
abstract operations
F
the analysis is the
computation of lfp F


if the abstract domain is nonnoetherian, or
if the complexity of lfp F is too
high
– use a widening operator
– which effectively computes
an (upper) approximation of
lfp F
 one example later
19
Comparative Semantics
 (lfp F) = lfp F



none of the two fixpoints is
finitely computable
useful to reason about
different semantics and to
systematically derive more
abstract semantics
– choice of the most adequate
reference semantics for
analysis and verification

F
is less expensive than F in
computing the observable
property modeled by 
hierarchy of transition systems
semantics (P. Cousot, MFPS 97)
– trace, big-step operational,
denotational, relational, predicate
transformer, axiomatic, etc.

systematic reconstruction of
several fixpoint (TP-like) semantics
for (positive) logic programs
(Comini, Levi & Meo, Info. & Comp. 00)
– applied in Pisa also to finite failure
& infinite computations, CLP, CCP,
Prolog, -Prolog, sequent calculi
– no junk
20
Polymorphic type inference
in ML-like functional languages


the ad-hoc solution
–
– Milner’s algorithm, specified
by a set of inference rules


an elegant, well-understood,
universally accepted
semantic formalization
the systematic derivation via
abstract interpretation
– provides a better insight
– shows how to improve
precision
inference rules mimic the concrete
semantics
–

in the structure of the semantic
evaluation function
in the semantic domains (environment)
semantics to well-typed programs
only introduces approximation
– if true then 2 else false

the most general polymorphic type
for recursive functions is not
computable
–
–
the inferred type may not be the most
general
some type-correct programs cannot be
typed
21
Polymorphic type inference
via Abstract Interpretation


abstract values = pairs of
– a term (with variables)
 type expression
– a constraint (on variables)
 set of term equalities in
solved form

partial order (on terms only)
– top is “no type”
– bottom is “any type”

an optimal abstract operation
– +((t1,c1),(t2,c2)) =
(int, c1c2 {t1=int ,t2=int})
abstracting functional values
– the concrete semantics
E x.e r = v. E e (bind r x v)
– the abstract value
let v1 = newvar() in
let (v2,c2) = E e (bind r x (v1,{}))
in (v1c2 -> v2,c2)
– t1  t2, if t2 is an instance of t1

the domain is non-noetherian
– there exist infinite increasing
chains
22
Recursion and Widening



the abstraction of recursive
functions is similar to the one
of regular functions, but
– a fixpoint computation is
required
– the first approximation of
the abstract value of the
function is bottom
since the abstract domain is
non-noetherian the fixpoint
computation may diverge
the solution in Milner’s algorithm
– take the results of the first two
iterations and compute their lub
(most general common instantiation,
computed through unification)

– if the lub is top (unification fails),
the program is not typable (type
error)
this is exactly a widening operator,
which returns a (correct) upper
approximation of the lfp
(Furiesi, Master Thesis Pisa. 99)
23
How to improve precision

straightforward!
– perform at most k iterations of the fixpoint computation
– if we reach a fixpoint, it is the most general type
– otherwise, we apply Milner’s widening to the last two results




one example (due to Cousot)
CaML
–

we succeed in typing more functions
we get more precise types
# let rec f f1 g n x = if n=0 then (g x) else
(((((f f1)(fun x -> (fun h -> (g(h x)))))(n - 1))(x))(f1));;
This expression has type ('a -> 'a) -> 'b but is here used with type 'b
our answer (the fixpoint is reached in 3 iterations)
–
val f : ('a -> 'a) -> ('a -> 'b) -> int -> 'a -> 'b = <fun>
24
Abstract Interpretation
vs. Type Systems

Patrick Cousot has reconstructed a hierarchy of type systems for ML-like
languages by using abstract interpretation

(Cousot, POPL 97)
type systems have been proposed to cope with other static analyses
(strictness, various properties related to security)



type systems need to be proved correct wrt a semantics
abstract semantics are systematically derived from the semantics and
are correct by construction
two related open interesting problems
–
–
comparison of the two approaches from the viewpoint of expressive power and analysis precision (and
complexity)
definition of methods to automatically translate formalizations from one approach to the other
25
Static Analysis of
Logic Programs

abstract Interpretation is very popular in logic languages
– the computational model has several opportunities for optimization, based on analysis
results
– it is (relatively) easy to define, because the standard semantics is collecting and the
concrete domain (sets of substitutions) is quite simple

several important properties (groundness, freeness, sharing, depth(k))

for some properties (i.e., groundness and sharing) a lot of
different abstract domains
– techniques to compare the relative precision of abstract domains
– important results on techniques for the systematic design of abstract
domains, which can probably be applied to other paradigms as well

abstract compilation in CLP (Giacobazzi, Debray & Levi, JLP 95)
– the program is transformed by syntactically replacing concrete constraints by
abstract constraints
– the abstract computation is a standard CLP computation on a different
constraint system
26
Groundness in
Logic Programs




CLP version
concrete domain
– (P(Eqns), ), sets of sets of term equations in solved form
concrete semantics
– the CLP version of the s-semantics (answer constraints)
3 abstract domains
– G: the property of being ground
– DEF: functional groundness dependencies
– POS: DEF + some disjunctive information
 lattices shown in the 2-variables case
27
An example

the program
p(X,Y) :- X=a.
p(X,Y) :- Y=b.
q(X,Y) :- X=Y.
r(X,Y) :- p(X,Y),q(X,Y).


the concrete semantics
p(X,Y) -> {{X=a},{Y=b}}
q(X,Y) -> {{X=Y}}
r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}}
in the concrete semantics of r
– both the arguments are
bound to ground terms (in all
the answer constraints)
28
The domain G
G (v) =
true
X
–
Y
, if v= bot
– {e  Eqns | X is bound to a ground term in e },
if v= X
X is always ground
– Eqns, if v= true no groundness information
X&Y
bot

the program
p(X,Y) :- X=a.
p(X,Y) :- Y=b.
q(X,Y) :- X=Y.
r(X,Y) :- p(X,Y),q(X,Y).


the concrete semantics
p(X,Y) -> {{X=a},{Y=b}}

q(X,Y) -> {{X=Y}}
r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}}
the abstract program
p(X,Y) :- lubG (X,Y).
q(X,Y) :- true.
r(X,Y) :- glbG (p(X,Y),q(X,Y)).

the abstraction of the concrete
semantics
p(X,Y) -> true
q(X,Y) -> true
r(X,Y) -> X & Y
the abstract semantics
p(X,Y) -> true
q(X,Y) -> true
r(X,Y) -> true
29
The domain Def
Def (v) =
true
YX
– {e  Eqns | X = Y  e},
if v= X  Y
X is ground if and only if Y is ground
– {e  Eqns | X = t  e and Y occurs in t},
if v= X  Y
if X is ground then Y is ground
XY
XY
X
Y
–
X&Y

bot
the program
p(X,Y) :- X=a.
p(X,Y) :- Y=b.
q(X,Y) :- X=Y.
r(X,Y) :- p(X,Y),q(X,Y).


…..
the concrete semantics
p(X,Y) -> {{X=a},{Y=b}}

q(X,Y) -> {{X=Y}}
r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}}
the abstract program
p(X,Y) :- lubDef (X,Y).
q(X,Y) :- X  Y.
r(X,Y) :- glbDef (p(X,Y),q(X,Y)).

the abstraction of the concrete
semantics
p(X,Y) -> true
q(X,Y) -> X  Y
r(X,Y) -> X & Y
the abstract semantics
p(X,Y) -> true
q(X,Y) -> X Y
r(X,Y) -> X Y
30
The domain Pos
pos (v) =
true
XY
YX
XY
X
– {e Eqns | either X or Y is bound to a ground
term in e }, if v= X  Y
either X or Y is
XY
Y
–
ground
….
X&Y

bot
the program
p(X,Y) :- X=a.
p(X,Y) :- Y=b.
q(X,Y) :- X=Y.
r(X,Y) :- p(X,Y),q(X,Y).


the concrete semantics
p(X,Y) -> {{X=a},{Y=b}}

q(X,Y) -> {{X=Y}}
r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}}
the abstract program
p(X,Y) :- lubpos (X,Y).
q(X,Y) :- X  Y.
r(X,Y) :- glbpos (p(X,Y),q(X,Y)).

the abstraction of the concrete
semantics
p(X,Y) -> X  Y
q(X,Y) -> X  Y
r(X,Y) -> X & Y
the abstract semantics
p(X,Y) -> X  Y
q(X,Y) -> X Y
r(X,Y) -> X & Y
31
Program Verification by
Abstract Interpretation

F = concrete semantic evaluation function
– concrete enough to observe the property



– if F ( S )  S
– then S is a prefixpoint of F
– hence
the property is modeled by an abstract domain
(lfp F) 
(A, ) and a Galois insertion  ,
F = abstract semantic evaluation function
S = specification of the property, i.e., abstraction of
the intended concrete semantics

partial correctness: (lfp F)  S

sufficient partial correctness condition: F ( S )  S
lfp F
 S
(Comini, Levi, Meo & Vitiello, JLP 99)
32
Analysis and Verification



F = concrete semantic evaluation function
F = abstract semantic evaluation function
analysis: compute lfp F
– we need to compute a fixpoint
– noetherian domain or widening

S = specification of the property

verification: prove F ( S )  S
– no fixpoint computation and no
need for noetherian domains
– finite representation of the
specification
– decidability of 
33
Completeness of the
proof method


assume the program to be partially correct wrt
the specification S, i.e., (lfp F)  S
then there exists another specification T ,
stronger than S, such that the sufficient
condition F ( T )  T holds

we have shown that the proof
method is complete if and only if the
abstraction is complete (precise)
(Levi & Volpe, PLILP 98)
34
Proof methods and the
reference semantics

one can be interested in establishing
different kinds of properties
– of the final state
– of the relation between initial and final state
– of the relation between specific pairs of
intermediate states, e.g., procedure calls
– ….

there exist different corresponding proof
methods



all the proof methods are instances
of F ( S )  S for different choices of
the concrete semantic evaluation
function F
F can be derived by abstract
interpretation (comparative
semantics) from the most concrete
semantics, i.e., a trace semantics
first step of abstraction = choice of
the “right” semantics
in (positive) logic programming, all the known verification methods
have been reconstructed (Levi & Volpe, PLILP 98)
35
Making F ( S )  S effective

extensional specifications

– typical analysis properties described by
noetherian abstract domains
– properties such as polimorphic types which
lead to finite abstract semantics, even with
non-noetherian domains

intensional specifications, specified by
means of assertions

assertions are abstract domains
– a formula describes the set of all
the concrete states which
“satisfy” it (concretization)
– if the specification language is
closed under conjunction, it is
easy to define the abstraction
function
we can derive an abstract function
F, which computes on the domain of
assertions and instantiate the
verification condition
(Comini, Gori & Levi, MFCSIT 00)
the relation  on the domain of assertions must be decidable
an open problem: completeness of the abstract semantics
associated to a specific language of assertions
36
Specification Languages

decidable specification languages have been proposed for functional
programming and logic programming
– one example: a powerful language which allows one to express several properties of
logic programs, including types, freeness and groundness (Volpe, SCP 00)

experiments using Horn Clause Logic as specification language
(Comini, Gori & Levi, AGP 00)
– it is not decidable
– most of the verification conditions can be proved without using a theorem
prover
 simple logic program transformation techniques, which can be
partially supported by an automatic tool
37
Systematic abstract
domain design


once we have the abstract domain, the design of the abstract semantics
is systematic
abstract interpretation theory provides results which can be exploited to
make the design of abstract domains (more) systematic
– to compare and combine domains
– to refine domains so as to improve their precision

reduced product (of domains A and B)
– allows one to analyze (together) the
properties modeled by A and B
– often delivers better results than the
separate analyses
 because of domain interaction
lifting to the powerset (and
disjunctive completion )
– roughly speaking, transform A
into P(A)
– better precision
 no loss of information in
computing lub’s
38
Operations on
Abstract Domains

several useful operators on abstract domains (refinements)
– a survey in (File’, Giacobazzi & Ranzato, ACM Comput. Surv. 96)

linear completion
(Giacobazzi, Ranzato & Scozzari, SAS 98)
– functional dependencies modeled by
linear implication

reconstruction of all the known
domains for groundness analysis
(Scozzari, SAS 97)
– DEF = G -> G
– POS = DEF -> DEF
– POS = POS -> POS
 optimality of POS
successfully applied to other domains
for logic programs
– types (Levi & Spoto, PLILP 98)
– sharing and freeness
(Levi & Spoto, PEPM 00)
open problems
– do the same refinements apply to
other programming paradigms?
– can refinements be extended to
domains of assertions and to type
systems?
39
Abstract Interpretation

a mathematically simple and solid foundation for
– comparative semantics
– static analysis
– verification

a methodology for the systematic derivation of
– abstract domains from the property
 complexity issues?
 quantitative analyses?
– abstract semantics from the concrete semantics and the abstract
domain
40