No Slide Title - University of Pennsylvania

Download Report

Transcript No Slide Title - University of Pennsylvania

Logics, Automata, and Algorithms for
Analysis of Structured Programs
Rajeev Alur
University of Pennsylvania
Marktoberdorf Summer School, August 2006
Software Model Checking
Program
Specification
Research challenges
 Search algorithms
 Abstraction
Abstractor
Debugger
No/bug
Model
Verifier
 Static analysis
 Refinement
 Expressive specs
Counter-example
Yes/proof
Applications
 Device drivers, OS code
 Network protocols
 Concurrent data types
Tools: SLAM, Blast, CBMC, F-SOFT
Verification Example
Does this code
obey the
locking spec?
do {
KeAcquireSpinLock();
Rel
nPacketsOld = nPackets;
Acq
Unlocked
Locked
Rel
Acq
Error
Specification
if(request){
request = request->Next;
KeReleaseSpinLock();
nPackets++;
}
} while (nPackets != nPacketsOld);
KeReleaseSpinLock();
Classical Model Checking
 Both model M and specification S define regular languages
M as a generator of all possible behaviors
S as an acceptor of “good” behaviors (verification is language
inclusion of M in S) or as an acceptor of “bad” behaviors (verification
is checking emptiness of intersection of M and S)
 Typical specifications (using automata or temporal logic)
Safety: Lock and unlock operations alternate
Liveness: Every request has an eventual response
Branching: Initial state is always reachable
 Robust foundations
Finite automata / regular languages
Buchi automata / omega-regular languages
Tree automata / parity games / regular tree languages
Checking Structured Programs
 Control-flow requires stack, so model M defines a context-free
language
 Algorithms exist for checking regular specifications against
context-free models
Emptiness of pushdown automata is solvable
Product of a regular language and a context-free language is
context-free
 But, checking context-free spec against a context-free model is
undecidable!
Context-free languages are not closed under intersection
Inclusion as well as emptiness of intersection undecidable
 Existing software model checkers: pushdown models (Boolean
programs) and regular specifications
Are Context-free Specs Interesting?
 Classical Hoare-style pre/post conditions
If p holds when procedure A is invoked, q holds upon return
Total correctness: every invocation of A terminates
Integral part of emerging standard JML
 Stack inspection properties (security/access control)
If setuuid bit is being set, root must be in call stack
 Interprocedural data-flow analysis
 All these need matching of calls with returns, or finding unmatched calls
Recall: Language of words over [, ] such that brackets are well matched is
not regular, but context-free
Checking Context-free Specs
 Many tools exist for checking specific properties
Security research on stack inspection properties
Annotating programs with asserts and local variables
Inter-procedural data-flow analysis algorithms
 What’s common to checkable properties?
Both model M and spec S have their own stacks, but the two stacks
are synchronized
 As a generator, program should expose the matching structure
of calls and returns
Solution: Nested words and theory of
regular languages over nested words
Course Outline
Model: Recursive/Hierarchical State Machines
Reachability Algorithms
Theory of Nested Words
Fair Cycle Detection Algorithms
Linear Temporal Logic CaRet
Theory of Nested Trees
Fixpoint Logic NT-mu
Program Abstraction
int x, y;
if x>0 {
…………
y:=x+1
……….}
else {
…………
y:=x+1
……….}
Predicate Abstraction
bx: x>0; by : y>0
bool bx, by;
if bx {
…………
by:=true
……….}
else {
…………
by:={true,false}
……….}
Modeling Structured Programs
Program
main() {
bool y;
…
x = P(y);
…
z = P(x);
…
}
bool P(u: bool) {
…
return Q(u);
}
bool Q(w: bool) {
if …
else return P(~w)
}
Recursive State Machine (RSM)/ Pushdown automaton
A1
A2
A2
A2
A3
A3
Entry-node
A3
Box (superstate)
A1
Exit-node
Recursive State Machine A
 M: Set of module names, m0: main module
 Each module m has a state machine Am
 Set Nm of nodes
 Set Bm of boxes (calls to other modules)
 Entry nodes: Em (input parameters)
 Exit nodes: Xm (return values)
 Labeling Ym : Boxes Bm -> Module names M
 Calls of m : (box b, entry node e of Ym(b))
 Returns of m: (box b, exit node x of Ym(b))
 Edges d: (source: node/return; destination: node/call)
Transition System of RSM
Vertices Vm of a module: Nodes, calls, returns of m
A state of A: (sequence of boxes, vertex)
Set of states B*V is infinite
Initial state: (e,e), where e is entry-node of m0
Transitions:
Local transitions: if (u,v) is an edge, then (b1..bn,u) has a
transition to (b1…bn,v)
Call transitions : for a call vertex (b,e), (b1…bn,(b,e)) has a
transition to (b1…bnb,e)
Return transitions: for a return vertex (b,x), (b1…bnb,x)
has a transition to (b1…bn, (b,x))
Variations of Definitions
We can label nodes (and boxes) with propositions,
and/or edges with alphabet symbols, as needed
Boolean programs can be easily mapped to RSMs
Expressiveness same as the classical model of
pushdown automata
Claim: RSMs are well-suited to study algorithms,
logics, automata etc for program analysis
Pushdown automata have no modular structure
No clutter of variables, parameter passing mechanisms
etc of real programs
Complexity parameters (number of entries/exits) clear
Hierarchical State Machine
A1
A2
A2
A3
A3
Node
Entry node
A3
A3
Box (superstate)
Exit node
Hierarchical State Machines
 An RSM A is an HSM, if there is a total ordering of the
modules such that if a box b of a module m is mapped to
madule m’ then m’ appears after m
 The transition system of an HSM is finite, but can be
exponential in size
 Analysis algorithms for RSMs are typically of same
complexity as those of HSMs
 Typical code may not use recursion, so can be abstracted to
HSMs, but what follows does not really depend on this
Specifications of programs should not depend on the bound on the
depth of hierarchy
For algorithms/specifications we do not want to flatten the
hierarchy for succinctness reasons
Course Outline
Model: Recursive/Hierarchical State Machines
Reachability Algorithms
Theory of Nested Words
Fair Cycle Detection Algorithms
Linear Temporal Logic CaRet
Theory of Nested Trees
Fixpoint Logic NT-mu
Reachability
 Given an RSM A, a node v is reachable, if (b1…bn,v) is
reachable in underlying transition system, for some
context b1..bn
 Goal: Given A, compute all reachable nodes
Key idea: There are |M| interrelated searches: if exit x is
reachable from entry e of module m, then add summary
edge from call (b,e) to return (b,x), for every box b mapped
to module m
Same-level Reachability Algorithm
Let R be the smallest subset of UmEm x Vm
satisfying, for each module m, each entry e of m,
R(e,e)
For each edge u to v of m, if R(e,u) then R(e,v)
For each box b, for each call vertex (b,f) and return
vertex (b,x), if R(e,(b,f)) and R(f,x) then R(e,(b,x))
e
u
f
f
v
b
x
x
Claim: For each m, for each entry node e and
vertex u of module m, R(e,u) holds iff (e,u) is
reachable from (e,e) in underlying transition system
Reachability Algorithm
Phase 1: Compute the set R capturing same level
reachability information (evaluation can be viewed
as AND-OR reachability)
Define a graph over vertices of A as follows:
Retain all edges of A
Add summary edges from (b,e) to (b,x) if R(e,x) holds
Add call edges: from call (b,e) to entry e
A node u is reachable in A iff u is reachable in this
graph from entry node of mo
Complexity Analysis
Let a be the number of vertices of A, b be the
number of edges, and let r be max number of entry
nodes per module
Phase 1 AND-OR graph has ar vertices and br+ar2
edges, and hence, predicates e(.) can be evaluated
in time linear in these
Graph of phase 2 has a vertices and b+ar edges,
and can be searched in time linear in these
Thm: Given an RSM A, the set of reachable nodes
of A can be computed in time O(|A|r2)
Mixing Forward and Backward Search
 If a module m has a single exit x, then it’s beneficial to
search it backwards: compute the set of vertices from
which x is reachable
Base case: x(x)
Or case: x(v) -> x(u); if there is an edge from u to v
And case: x((b,y)) & e(y) -> x((b,e)); if e is entry; y an exit
 In general, in phase 1, for a module with less exits and
entries, compute relations x(.) instead of e(.)
 Thm: Given an RSM A, the set of reachable nodes of A can
be computed in time O(|A|r2), where r = maxm min
{|Em|,|Xm|}
 Thm: Given a set F of nodes of m0, checking whether a node
in F is reachable is PTIME-complete (for HSMs with r=1)
Implementations
On-the-fly reachability
Is some node in given target set F reachable?
Modified Depth-first-search that has early termination
and explores a module only when needed
Tabled logic programming
Reachability in RSMs is evaluation of Datalog programs
Efficient implementations (e.g. XSB) exist
Symbolic Search
In presence of global and local variables, at every control
point u, compute the set R(I,V) where I contains global
and input vars and V contains global and local vars
Represent the sets R(I,V) using OBDDs with
appropriately modified search algorithm
Course Outline
Model: Recursive/Hierarchical State Machines
Reachability Algorithms
Theory of Nested Words
Fair Cycle Detection Algorithms
Linear Temporal Logic CaRet
Theory of Nested Trees
Fixpoint Logic NT-mu
Nested Words
Nested word:
Linear sequence + well-nested edges
Positions labeled with symbols in S
a1
a2
a3
a4
a5
a6
a7
a8
a9 a10
a11 a12
Positions classified as:
Call positions: both linear and hierarchical successors
Return positions: both linear and hierarchical predecessors
Internal positions: otherwise
Each position has at most one nested edge
Program Executions as Nested Words
Program
bool P() {
local int x,y;
…
x = 3; 1
if Q x = y ;4
…
}
bool Q () {
local int x;
…
x = 1; 2
3
return (x==0);
}
An execution
as a word
An execution
as a nested word
s
1
s
w
1
s
2
w
3
r
s
4
w
s
w
s
Summary
edges
from calls
to returns
Symbols:
w : write x
r : read x
s : other
s
4
w
s
2
w
3
r
Model for Linear Hierarchical Data
 Nested words: both linear and hierarchical structure is made
explicit. This seems natural in many applications
Executions of structured program
RNA: primary backbone is linear, secondary bonds are well-nested
XML documents: matching of open/close tags
 Words: only linear structure is explicit
Pushdown automata add/discover hierarchical structure
Parantheses languages: implicit nesting edges
 Ordered Trees: only hierarchical structure is explicit
Ordering of siblings imparts explicit partial order
Linear order is implicit, and can be recovered by infix traversal
RNA as a Nested Word
Primary structure: Linear sequence of nucleotides (A, C, G, U)
Secondary structure: Hydrogen bonds between complementary
nucleotides (A-U, G-C, G-U)
U
G
G
A
A
C
A
C
G
C
G
U
U
C
G
C
In literature, this is modeled as trees.
Algorithmic question: Find similarity between RNAs using edit
distances
Linguistic Annotated Data
VP
NP
NP
NP
I
V
saw
Det
the
Adj
old
PP
N
man
Prep
with
Det
a
N
dog
N
today
Linguistic data stored as annotated sentences (eg. Penn Treebank)
Nested words, possibly with labels on edges
Sample query: Find nouns that follow a verb which is a child of a
verb phrase
Existing query languages: XPath, XQuery, LPath (BCDLZ)
Nested Word Automata (NWA)
q0 a q1
1
a2
q29
a9
q8
q9=dr(q8,q29,a9)
q47
q7
q2 a q 3 a
a7
a8
3
4
q4
q5
(q2,q29)=dc(q1,a2)
a5
a6 q6=di(q5,a6)
States Q, initial state q0, final states F
Starts in initial state, reads the word from left to right
labeling edges with states, where states on the outgoing edges
are determined from states of incoming edges
Transition function:
 dc : Q x S -> Q x Q (for call positions)
 di : Q x S -> Q (for internal positions)
 dr : Q x Q x S -> Q (for return positions)
Nested word is accepted if the run ends in a final state
Regular Languages of Nested Words
 A set of nested words is regular if there is a finite-state NWA that
accepts it
 Nondeterministic automata over nested words
Transition function: dc: QxS->2QxQ, di :Q x S -> 2Q, dr:Q x Q x S -> 2Q
Can be determinized
 Graph automata over nested words defined using tiling systems
 Appealing theoretical properties
Effectively closed under various operations (union, intersection, complement,
concatenation, projection, Kleene-* …)
Decidable decision problems: membership, language inclusion, language
equivalence …
Alternate characterization: MSO, syntactic congruences
Software Analysis
 A program P with stack-based control is modeled by a set L of nested
words it generates
Choice of S depends on the intended application
Summary edges exposing call/return structure are added (exposure can
depend on what needs to be checked)
If P has finite data (e.g. pushdown automata, Boolean programs, recursive
state machines) then L is regular
 Specification S given as a regular language of nested words
 Verification: Does every behavior in L satisfy S ?
Runtime monitoring: Check if current execution is accepted by S (compiled
as a deterministic automaton)
Model checking: Check if L is contained in S, decidable when P has finite
data
RSMs as Nested Word Automata
 Given an RSM A, we can transform it into a nondeterministic
finite-state NWA A’ as follows
States of A’ = Vertices of A
Initial state of A’ = Entry node of main module mo of A
All states are final
Local transitions of A’ = Edges of A
Call transitions of A’ = From call (b,e), push b, and goto e
Return transitions of A’ = From exit x, pop b, and goto return (b,x)
 Choose alphabet S and add labels to transitions as needed
 L(A’) models all behaviors of A as nested words
Writing Program Specifications
Intuition: Keeping track of context is easy; just skip using a
summary edge
Finite-state properties of paths, where a path can be a local path,
a global path, or a mixture
Sample regular properties:
If p holds at a call, q should hold at matching return
If x is being written, procedure P must be in call stack
Within a procedure, an unlock must follow a lock
All properties specifiable in standard temporal logics (LTL)
Local Regularity
Let L be a regular language,
Local(L): every local path is in L (skip summary edges)
E.g. L: every write (w) is followed by a read (r)
Given a DFA A for L, construct NWA B for Local(L)
States Q, initial state q0, final states F, same as A
di(q,a) = d(q,a)
dc(q,a) = (q0, d(q,a))
dr(q,q’,a) = d(q’,a) if q is in F
Determinization
q->q
q’->q’…
q->u
q’->v…
q->u’
q’->v’…
u’->u’’
v’->v’’…
q->w
q->w’
q’->w’’…
u’->w
u’->w’
v’->w’’…
Goal: Given a nondeterministic automaton A with states Q, construct an
equivalent deterministic automaton B
Intuition: Maintain a set of “summaries” (pairs of states)
State-space of B: 2QxQ
Initially, state contains q->q, for each q
At call, if state u splits into (u’,u’’), summary q->u splits into (q->u’,u’->u’’)
At return, summaries q->u’ and u’->w join to give q->u
A state of B is final if it contains q->q’, where q is initial and q’ is final
Closure Properties
The class of regular languages of nested words is effectively closed
under many operations
Intersection: Take product of automata (key: nesting given by input)
Union: Use nondeterminism
Complementation: Complement final states of deterministic NWA
Projection: Use nondeterminism
Concatenation/Kleene*: Guess the split (as in case of word automata)
Reverse (reversal of a nested word reverses nested edges also)
Decision Problems
 Membership: Is a given nested word w accepted by NWA A?
Solvable in polynomial time
If A is fixed, then in time O(|w|) and space O(nesting depth of w)
 Emptiness: Given NWA A, is its language empty?
Solvable in time O(|A|3): view A as a pushdown automaton
 Universality, Language inclusion, Language equivalence:
Solvable in polynomial-time for deterministic automata
For nondeterministic automata, use determinization and
complementation; causes exponential blow-up, Exptime-complete
problems
MSO-based Characterization
 Monadic Second Order Logic of Nested Words
First order variables: x,y,z; Set variables: X,Y,Z…
Atomic formulas: a(x), X(x), x=y, x < y, x -> y
Logical connectives and quantifiers
 Sample formula:
For all x,y. ( (a(x) and x -> y) implies b(y))
Every call labeled a is matched by a return labeled b
 Thm: A language L of nested words is regular iff it is definable
by an MSO sentence
Robust characterization of regularity as in case of languages of
words and languages of trees
MSO-NWA Equivalence (Proof sketch)
 From deterministic NWA to MSO
Unary predicates and ql and qh for each state q of A
Formula says that these predicates encode a run of A
consistent with its transition function (qh is used to encode
state-labels on nesting edges)
dr requirement can be encoded using nesting-edge predicate ->
 Only existential-second-order prefix suffices
 From MSO to nondeterministic NWA
NWA can check base predicates x=y, x < y, x -> y
Use closure properties: union, complement, and projection
Congruence Based Characterization
Context C: A nested word and a linear edge
Substitution I(C,w): Insert nested word w in a context C
Congruence: Given a language L of nested words, w ~L w’ if for every
context C, I(C,w) is in L iff I(C,w’) is in L
Thm: A language L of nested words is regular iff the congruence ~L
is of finite index.
Relating to Word Languages
a1
a2
a3
a4
a5
a6
a7
a8
a9 a10
a11 a12
a1
a2
a3
a4
a5
a6
a7
a8
a9 a10
a11 a12
Words labeled with a typed alphabet (visibly pushdown words)
Symbols partitioned into calls, returns, and internals
Two views are basically the same giving similar results
Visibly Pushdown Automata
Pushdown automaton that must push while reading a call, must pop
while reading a return, and not update stack on internals
Height of stack determined by input word read so far
Visibly Pushdown Languages
A robust subclass of deterministic context-free languages
VPLs vs DCFLs
Fix S. For each partitioning of S into Sc, Si, Sr, we get a
corresponding class of visibly pushdown languages
Each class is closed under Boolean operations
Decidable equivalence, inclusion problems etc
DCFL
Regular
Dyck
Are these VPLs?
L1 = {an bn | n > 0}, L2 = {bn an | n > 0}, L3 = words with same # of a’s & b’s
Instead of static typing of symbols, one can use dynamic types
determined by an automaton to get more VPL classes[Caucal’06]
Relating to Tree Languages
A binary tree is hiding in a nested word
At calls, left subtree encodes what happens in the called
procedure, and right subtree gives what happens after return
Why not use tree encoding and tree automata ?
Notion of regularity is same in both views
Nesting is encoded, but linear structure is lost
Deterministic tree automata are not expressive
No notion of reading input from left to right
XML literature has lots of attempts to address this deficiency:
Tree walking automata…
Summary Table
Word
Automata
Pushdown
Automata
Tree
Automata
NWA
Union
yes
yes
yes
yes
Intersection
yes
no
yes
yes
Complement
yes
no
yes
yes
Det= Nondet
yes
no
no
yes
Emptiness
Nlogspace
Ptime
Ptime
Ptime
Inclusion
(Nondet)
Pspace
Undec
Exptime
Exptime
Related Work
 Restricted context-free languages
Parantheses languages, Dyck languages
Input-driven languages
 Logical characterization of context-free languages (LST’94)
 Connection between pushdown automata and tree automata
Set of parse trees of a CFG is a regular tree language
Pushdown automata for query processing in XML
 Algorithms for pushdown automata compute summaries
Context-free reachability
Inter-procedural data-flow analysis
 Game semantics for programming languages (Abramsky et al)
 Model checking of pushdown automata
LTL, CTL, m-calculus, pushdown games
LTL with regular valuations of stack contents
CaRet (LTL with calls and returns)
Research Directions
 Visible Pushdown Languages (AM, STOC’04)
Extends to w-regular languages of infinite words
 VPL triggered research
Games (LMS, FSTTCS’04)
Congruences and minimization (AKMV ICALP’05, KMV Concur’06)
Third-order Algol with iteration (MW FoSSaCS’05)
Dynamic logic with recursive programs (LS FoSSaCS’06)
Synchronization of pushdown automata (Caucal DLT’06)
(Bi)Simulation and other equivalences (Srba STACS’06)
Regularity and counter automata (BLS CSL’06)
 Linear-time Temporal Logics
CaRet (Logic of calls and returns) (AEM TACAS’04)
Caution: Not studied in the nested word framework
Course Outline
Model: Recursive/Hierarchical State Machines
Reachability Algorithms
Theory of Nested Words
Fair Cycle Detection Algorithms
Linear Temporal Logic CaRet
Theory of Nested Trees
Fixpoint Logic NT-mu
Fair Cycle Detection
 Given a set F of nodes, is there an infinite path in the
underlying transition system that contains infinitely many
states with node-component in F?
 Relevant information about a box: for entry e and exit x, is
x reachable from e along a path containing a node in F?
 Complexity same as reachability (Ptime-complete, and in
time O(|A|r2)
Keeping Track of Visits to F
 Let R* be the smallest subset of UmEm x Vm satisfying, for
each module m, each entry e of m,
If R(e,u) and u is in F then R*(e,u)
For each edge u to v of m, if R*(e,u) then R*(e,v)
For each box b, for each call vertex (b,f) and return vertex (b,x),
if R*(e,(b,f)) and R(f,x) then R*(e,(b,x))
if R(e,(b,f)) and R*(f,x) then R*(e,(b,x))
e
u
f
f
v
b
x
x
 Claim: For each m, for each entry node e and vertex u of module
m, R*(e,u) holds iff (e,u) is reachable from (e,e) in underlying
transition system along a path visiting some state with node
component in F
Fair Cycle Detection Algorithm
 Phase 1: Compute the set R
 Phase 2: Compute the set R*
 Define a graph over vertices of A as follows:
Retain all edges of A
Add summary edges from (b,e) to (b,x) if R(e,x) holds, and add this
edge to F if R*(e,x) holds
Add call edges: from call vertex (b,e) to entry e
e
u
f
f
v
x
x
 A has infinite fair path iff the above graph has a cycle that
is reachable from entry-node of m0 and contains either a
node or an edge in F
Course Outline
Model: Recursive/Hierarchical State Machines
Reachability Algorithms
Theory of Nested Words
Fair Cycle Detection Algorithms
Linear Temporal Logic CaRet
Theory of Nested Trees
Fixpoint Logic NT-mu
Linear-time Propositional Temporal Logic
Q ::-
p | not Q | Q or Q’ | Next Q |
Always Q | Eventually Q | Q Until Q’
Interpreted over (infinite) words.
Models of an LTL formula is a w-regular language.
Useful for stating sequencing properties:
 If req happens, then req holds until it is granted:
Always ( req → (req Until grant) )
 An exception is never raised:
Always ( not Exception )
CARET
CARET: A temporal logic for Calls and Returns
Expresses w-regular properties of infinite nested words
A
B
C
A
………….
Global successor used by LTL
CARET
CARET: A temporal logic for Calls and Returns
Expresses w-regular properties of infinite nested words
A
B
C
D
………….
Global successor used by LTL
Local successor:
Jump from calls to returns along nesting edge
Otherwise global successor at the same level
CARET
CARET: A temporal logic for Calls and Returns
Expresses w-regular properties of infinite nested words
A
B
C
A
………….
Global successor used by LTL
Local successor:
Jump from calls to returns along nesting edge
Otherwise global successor at the same level
CARET
CARET: A temporal logic for Calls and Returns
Expresses w-regular properties of infinite nested words
Local path
A
B
C
A
………….
Global successor used by LTL
Local successor:
Jump from calls to returns along nesting edge
Otherwise global successor at the same level
CARET
CARET: A temporal logic for Calls and Returns
Expresses w-regular properties of infinite nested words
A
B
C
A
………….
Global successor used by LTL
Local successor:
Caller modality:
Jump from calls to returns along nesting edge
Otherwise global successor at the same level
Jump to the caller of the current module
Defined for every position except top-level ones
CARET
CARET: A temporal logic for Calls and Returns
Expresses w-regular properties of infinite nested words
A
B
C
A
Caller path gives the
stack content!
………….
Global successor used by LTL
Abstract successor:
Caller modality:
Jump from calls to returns along nesting edge
Otherwise global successor at the same level
Jump to the caller of the current module
Defined for every position except top-level ones
CARET Definition
Syntax:
Q ::- p | not Q
Next Q
Eventually Q
|
Q or Q’
|
| Always Q |
| Q Until Q’
Local-Next Q
Local-Eventually Q
|
|
Local-Always Q
Q Local-Until Q’
Caller Q
CallerPath-Eventually Q
| CallerPath-Always Q
| Q CallerPath-Until Q’
 Local- and Caller- versions of all temporal operators
 All these operators can be nested
Expressing properties in Caret
Pre-post conditions:
If
P holds when A is called, then Q must hold when
the call returns
Always ( (P and call-to-A)
P
Local-Next Q )
Q
A
Integrating Manna/Pnueli-style reasoning for reactive computations
with Hoare-style reasoning for structured programs
Expressing properties in Caret
If A is called with low priority, then it cannot
access the file
Always ( call-to-A and low-priority
Local-Always ( not access-file ) )
A
low-priority
A
high-priority
access-file
Expressing properties in Caret
Stack inspection properties
If a variable x is accessed, then A must be on
the call stack
Always ( access-to-x
CallerPath-Eventually call-to-A )
A
access-to-x
Model checking CARET
 Given:

A recursive state machine/
nested word automaton M
A CARET formula Q
Model-checking:
Do all infinite nested words of M satisfy
specification Q?
CARET can be model-checked in time that is
polynomial in M and exponential in Q.
|M|3 . 2O(|Q|)
Complexity class same as that for LTL !
Generalization of Vardi-Wolper construction
Model-checking CARET: Intuition

Given M and formula Q,
 Build a product nested word automaton that accepts
words exhibited by M that satisfy (not Q)
 M does not satisfy Q if this product has a fair cycle
 Construction builds on the classical tableaux for LTL
 (State of M, set of subformulas of Q)
Local-Next Q1
s
Q1
Course Outline
Model: Recursive/Hierarchical State Machines
Reachability Algorithms
Theory of Nested Words
Fair Cycle Detection Algorithms
Linear Temporal Logic CaRet
Theory of Nested Trees
Fixpoint Logic NT-mu
Nested Trees
Tree edges + Nesting edges
Unranked (arity not fixed)
Unordered
Infinite
Given pushdown automaton/ RSM) A, model it by a nested tree TA
Each path models an execution as a nested word
Branching-time model checking: Specification is a language
of nested trees, verification is membership
Tree Automata Definitions
Transition function of a tree automaton d : Q x S -> D
D depends on type of automaton and type of trees
 Nondeterministic over binary trees: D is a set of pairs; A choice (u,v)
means send u to left child and v to right child
 Nondeterministic over ordered trees: D is a regular language over Q:
the sequence of states sent along children must be in D
 Nondeterministic over unordered unranked trees: D is a set of terms in
2Q x Q; A choice ({q1,q2}, q3) means that send q1 to one child, q2 to a
different child, and q3 to all remaining children
 Alternating over unordered unranked trees: D contains formulas that
positive Boolean combination of terms of the form <q>, [q]; A formula (<q1>
or <q2>) and [q3] means send q3 to all children, and either q1 or q2 to one of
them
Nondeterministic Nested Tree Automata
 Finitely many states Q, initial states
 Run of the automaton: Labeling of edges with states consistent with
initial set and transition function
 Local transitions: di(q,a) is a set of terms in 2Q x Q
Call transitions: dc(q,a) is a set of terms in 2QxQ x Q x Q;
({(q1,q2)},q3,q4) means send q1 to one child, q2 along corresponding
nesting edges, q3 to remaining children, and q4 along all remaining
nesting edges
Return transitions: dr(q,q’,a) is set of terms in 2QxQ, here q is the
state along tree edge, and q’ is the state along nesting edge
 Acceptance condition: Final states, Buchi, Parity (NPNTA)
Properties of NPNTAs
 Thm: Closed under union and projection.
 Thm: Closed under intersection.
Proof idea: Finite-state; just take product.
 Thm: Not closed under complement.
 Thm: Emptiness checkable in EXPTIME.
Proof idea: Special case of emptiness of pushdown tree
automata.
Extension: alternation.
 Thm: Model-checking on pushdown
systems
in EXPTIME.
Extra
expressive
power,
Proof idea: The stack of the input pushdown system is
unlike in the case of tree
synchronized with the implicit stack of the NPNTA, so a product
automata
construction works.
 Thm: Universality undecidable.
Alternating Nested Tree Automata
 Transition Terms (TT): Positive Boolean combination of atomic terms
of the form <q> (send q to some child), [q] (send q to all children)
 CTT: Positive boolean combination of terms of the form
<q,q’> (send q to some child and q’ to all corresponding nesting edges)
[q,q’] (q on all tree edges, q’ on all nesting edges)
Transition function has call, return and internal components:
di : Q x S -> TT, dc : Q x S -> CTT, dr : Q x Q x S -> TT
 Run of the automaton: game between the automaton and an adversary.
 Winning condition: Parity
 Tree accepted iff automaton has a winning strategy
Properties of APNTAs
 Thm: Closed under union, intersection.
 Thm: Closed under complement.
Proof idea: Parity games are determined, and designing the
complement game is easy.
 Thm: Not closed under projection.
 Thm: Can express some non-context-free tree languages.
 Theorem: Model-checking EXPTIME-complete.
Proof idea: Stack of the input pushdown system is synchronized
with the implicit stack of the NTA, so the problem can be reduced
to a pushdown game, solvable in EXPTIME. Next…
Logics on nested trees
 Thm: Emptiness, universality undecidable.
Course Outline
Model: Recursive/Hierarchical State Machines
Reachability Algorithms
Theory of Nested Words
Fair Cycle Detection Algorithms
Linear Temporal Logic CaRet
Theory of Nested Trees
Fixpoint Logic NT-mu
Logics for Trees
mu-calculus
Canonical temporal logic
Fixpoints over sets of states
Suitable for symbolic implementation
Equivalent to bisimulation-closed alternating tree automata
Decidable model-checking on pushdown systems
LTL
CTL
Mu-Calculus
Assembly language of temporal logics
Formulas  Sets of nodes
Least and greatest fixpoints of f
<call>f, <ret>f, <loc>f : there is an edge to call/ret/local node satisfying f
Fixpoints in mu-calculus
Model-checking mu-calculus on pushdown systems is decidable. But…
Reachability in mu-calculus:
Formula describes a terminating symbolic computation for finitestate systems.
Application: mu-calculus is the “assembly language” in temporal
logic model-checkers like NuSMV.
What about pushdown models (interprocedural analysis)?
Algorithms use “summarization”, and not captured by mu-calculus
Summary Subtrees
Summary
call
s
s
local
v
v
ret
u
u
ret
p
Matching returns
of s = {u,v}
Nesting edges let us chop a nested tree into
subtrees that summarize contexts. We could jump
across contexts if we could reason about
concatenation.
Logic of Subtrees
s
<>f
Mu-calculus formulas can be interpreted
at subtrees rather than nodes
u
Formula  set of subtrees
f
Modalities argue about full subtrees
rooted at children
Why not a fixpoint calculus where:
Formulas  sets of summary trees
and modalities for concatenation?
Proposal: NT-mu.
Operations on Summaries
Formulas sets of summaries
s
call
s
local f
ret
local
u
f
Colored Summary Trees
Number of “leaves” is unbounded
Solution: assign leaves k colors
Colors are defined by formulas
(obligations upon return)
Within f, we use the propositions
R1, R2, … Rk to refer to the
colors of return leaves
Mu-calculus vs NT-mu
mu-calculus: fixpoints over full subtrees
NT-mu: fixpoints over summary trees
Semantics of NT-mu
 k-colored summary tree specified by (s,U1, … Uk), where s is a
tree node, and each Ui is a subset of matching returns of s
 Meaning of each formula f of NT-mu is a set of summaries
(s, U1, … Uk) |= p if label of s satisfies p
Meaning of Boolean connectives is standard
(s, U1, … Uk) |= <loc> f if s has an internal-child t s.t. (t,U1, … Uk) |= f
(s, U1, … Uk) |= <ret>Ri if s has a return-child t s.t. t is in Ui
(s, U1, … Uk) |= <call>f(g1,…gm) if s has a call-child t s.t. (t,V1… Vm) |=f
where Vj contains all matching returns w of t s.t. (w,U1, … Uk) |= gj
Formulas define monotonic functions from summary sets to summary
sets; fixpoint semantics is standard
 A nested tree T with root r satisfies f if ( r ) |= f
Examples
 There exists a return colored 1: summaries (s,U) s.t. U is nonempty
f : m X. ( <ret> R1 or <loc> X or <call> X {X} )
 p is reachable : EF p
m X. ( p or <loc> X or <call> X {} or <call> f {X})
 Local reachability: p is reachable within the same procedural
context
m X. (p or <loc> X or <call> f {X}
Specifying Requirements
 Branching-time properties that mix local and global paths
 Inter-procedural data-flow analysis
Set of program points where expression e is very busy (along every
path e is used before a variable in e gets modified)
If e contains local variables, this is not definable in mu-calculus
 Stack inspection, access control, stack overflow
 Pre-post conditions (universal as well as branching)
Model Checking
 Given an RSM A and NT-mu formula f, does the nested tree
TA satisfy f ?
 Consider a point a in a component with exits u and v
A sample state of A is of the form s.a, where s is a stack of boxes
State at any matching return of s.a is either s.u or s.v
 Claim 1: NT-mu is a tree logic, so even though s.a may appear
at multiple places in TA, it satisfies the same formulas
 Claim 2: NT-mu formulas are evaluated over summary trees
(cannot access nodes beyond matching returns), satisfaction
of formula at s.a does not depend on the context s
Bisimulation Closure
A summary (s,U1,…Uk) is bismulationclosed if two matching returns w and w’
are bisimilar, then w in Ui iff w’ in Ui
u
a
v
Claim: During fixpoint evaluation, it
suffices to consider only bisimulationclosed summaries
s.a
Closing each color under bisimulation
does not change the truth of formulas
Return nodes corresponding to the
same exit are bisimilar
Corollory: Bisimulation-closed
summaries have finite representation
(colors for each exit)
s.u
s.v
s.u
s.v
Model Checking
 Model checking procedure:
Consider RSM-summaries of the form (s,U1,..Uk), where s is a vertex in a
component, and Ui is a subset of exit points
Finitely many RSM summaries
Evaluate NT-mu formula using standard fixpoint computation
 Model checking RSMs wrt NT-mu is Exptime-complete
Same complexity as CTL or mu-calculus model checking
 Recall reachability in NT-mu
f: m X. ( <ret> R1 or <loc> X or <call> X {X} )
EF p : m X. ( p or <loc> X or <call> X {} or <call> f {X} )
Local-reach: m X. (p or <loc> X or <call> f {X})
 Evaluation of these over RSM-summaries is the standard way of
solving reachability
Evaluating f corresponds to pre-computing summaries
Global/local reachability are computationally similar
Expressiveness
Thm: NT-mu and APNTA are equally expressive
Corollary: NT-mu can capture every property that the mucalculus can.
Corollary: CARET (a linear temporal logic of calls and
returns, AEM’04) is contained in NT-mu.
Corollary: Satisfiability of NT-mu is undecidable.
NT-mu can express pushdown games
Thm: Expressiveness increases with the number of colors
From NT-mu to APNTA (Proof sketch)
 Given an NT-mu formula f, construct equivalent APNTA A
 States of A are subformulas of f
 Simplify(f,a), where a is an assignment to atomic props
Unroll any top-level fixpoint of f
Replace each top-level proposition by its T/F value according to a
Simplify(f,a) is a positive Boolean comb of terms like <>g and []g
 di(f,a) = Simplify(f,a)
 dc(f{g1,…gk},a) = (Simplify(f,a), (g1,…gk))
Evaluate f at call node and send (g1,..gk) along nesting edge
 dr(Ri, (g1,…gk),a) = Simplify(gi,a)
Retrive i-th return obligation from nesting edge, and evaluate it
 Fixpoints handled using parity condition
From APNTA to NT-mu (Proof sketch)
 Given alternating NTA A with Q = {1..n}, accepting by final
state, construct a set of least fixpoint equations
 Number of colors (return parameters): n
 For each pair of states, a variable Xij
 Intended meaning: A summary (s,
U1,…Un) is in Xij iff A has a strategy
starting at s in state i, with state j
along all nested edges to return, to
end up in a matching return s’ in Uk
in state k
 Write equations among Xij variables
so that the lfp captures the
intended meaning
Game starts
here in state i
s
in color k
if game get
here in state k
state j
MSO Logic for Nested Trees
Monadic Second Order Logic of Nested Trees
First order variables: x,y,z; Set variables: X,Y,Z…
Atomic formulas: a(x), X(x), x=y, x ->y, x -> y
Logical connectives and quantifiers
Thm: Model-checking even the bisimulation-closed fragment
of MSO is undecidable.
Thm: More expressive than NPNTAs.
Thm: Can encode a property not expressible by APNTAs.
Conjecture: Expressiveness of MSO and APNTAs
incomparable.
Recap
 Basic analysis technique for hierarchical/recursive
programs: and-or reachability
 Executions of programs should expose hierarchical
structure
 Nested words: Model for linear+hierarchical data
 Nested word automata for defining regular languages of
nested words
 New Opportunities
More expressive specifications for software analysis tools (model
checking, testing, theorem proving, monitoring…)
Query processing for XML data
Using NWA instead of pushdown automata as well as tree automata
(over finite ordered trees) can lead to simplification of
constructions and/or new insights
Nested tree automata: new frontier of decidability
Papers
 Analysis of recursive state machines (with Etessami,
Yannakakis; CAV’01)
 Adding nesting structure to words (with Madhusudan;
DLT’06; precursor: Visibly pushdown languages, STOC’04)
 Temporal logic of calls and returns (with Etessami and
Madhusudan; TACAS’04)
 Fixpoint logic for local and global flows (with Chaudhuri
and Madhusudan, POPL’06)
 Languages of nested trees (with Chaudhuri and
Madhusudan; CAV’06)
Available on my webpage
Research in Software Model Checking
 Convergence of program analysis and model checking
 Efficient ways for property-guided abstraction
 Symbolic computation of function summaries using SAT
solvers and/or OBDDs
 Avoid abstraction; encode state-space directly (after
slicing and range analysis)
 Automated compositional reasoning using computational
learning of interface assumptions
 How to deal with features of C ?
 What are appropriate application domains?