슬라이드 1

Download Report

Transcript 슬라이드 1

Shape & Alias Analyses
Jaehwang Kim and Jaeho Shin
Programming Research Laboratory
Seoul National University
2005-08-08
Overview
•
•
•
•
•
•
•
K-limited Graph
Graph Type
Shape Type
Memory Type
Shape Grammar
May-Alias Analysis with Access Paths
Shape Analysis via 3-valued Logic
k-limited Graph
[Jones&Muchnick79]
• Parts of AST
• Motivation
– Optimization
• Storage allocation
• Garbage collection
– Memory error detection
Store = Set of Graphs
Semantics of assignments
X
Z
hd
tl hd
a
Y
hd
b
tl
tl
c
 2 Store
node: Var -> Store -> Node
AS[[ X := atom]] 
=
add a new leaf node labeled a;
move X to the new node
AS[[ X := Y ]] 
= move X to (node Y )
AS[[ X := cons(Y,Z)]]  =
make a new node n and move X to label n;
add hd, tl edge to (node Y ) and
(node Z ) respectively;
Share = Set of k-limited
Graphs
X
Z
?c
W ?
?s
Y
k-limited graph
• All nodes are accessible by k or less steps
from at least one variable
• Each node may be labeled by one or more
variables
• unknown nodes labeled ?
– node represents a set of nodes whose
internal structure is not known
– ?s: node may contains sharings
– ?c: node may contains cycles
• Edges from unknown nodes are represented
by --->
Finite variables, finite size, finite selectors
=> Share is finite!
AS[[assign]] 2 Share -> Share
[[s]] D = [  2 D clean(next[[s]] )
–
next[[s]]:
•
•
–
creates a node or edges
moves a variable to other nodes
clean(): make graphs k-limited
1.
U() = subgraph inaccessible from any variables by
path of length k-1 or less
2. remove inaccessible nodes
3. partition U() into SCCs, coalesce each SCC unknown
node labeled ?c
4. partition U() into CC, coalesce each CC unknown node
labeled ?c, ?s, or ?
Example:
clean for 2-limited graph
X
Y
Y
X
remove garbage
?s
?s
Z
X
?c
Z
coalesce cycles
Y
X
Y
coalesce CC
?s
?c
Z
?s
Z
Graph Types
[Klarlund&Schwartzbach93]
Recursive data type is useful for treeshaped value
L ! nonempty(head:Int, tail:L)
| empty()
1
2
3
Recursive data type can’t represent graph shaped value!
Routing expression
Graph type = Recursive data type + Routing expression
– T ! v(…, a:T[R], …)
– R = {a, ", "a, ^, $}*
Examples
– List with head
H ! (first:L, last:L[ first ( tail)*$ "])
L ! (head:Int, tail:L)
| ()
– Cyclic lilst
C ! (next:C)
| (next:C["*^])
Routing Expression (cont)
• Well-formedness
– Every routing expression always defines a
unique destination
– Checked by the monadic second-order logic
• Evaluation
– Linear-time evaluation of routing
expression by finite automata
– Runtime evaluation cost is major weakness
of graph type
Shape Type
[Fradet&Metayer97]
Context free graph grammar H = <NT,T,PR,O>
NT
: ranked non-terminal
T
: ranked terminal
PR
: productions l ! r s.t. l2 NT, r 2 2Term
Term
: built from NT, T and set of variables
Terminal : T and set of variables
O
: origin or derivation
Shape(H) = {M| M !*{O} and M 2 Terminal}
X + ( r) !* X + ( l)
iff
l!r 2 PR and (Var( r) – Var( l)) Å Var(X) = ;
 : variable substitution
Example
Doubly -> p x, pred x x, L x
Lx
-> next x y, pred y x, L y
| next x x
pred
next
a1
p
next
a2
pred
next
a3
pred
Shape-C
Transformer P = (C => A)
shape int cir {
pt x, L x x;
L x y = L x z, L z y;
L x y = next x y;
};
…
cir s = [|=> pt x;
next x x;
$x=1; |];
…
s:[| pt x; next x y; =>
pt x; next x z; next z y;
$z=i; |]
If Check(C,A,PR,O), then
for all X and ,
X + ( C) !* {O} => X + ( A) !* {O}
pt x; next x z; next z y
pt x; L x z; L z y
pt x; L x y
Memory Type[LeYaYi03]
Goal: Separating reusable heap cells from
others
fun insert i l =
case l of
[]
=> i::[]
| h::t =>
if i<h then
i::l
else
let z = insert i t in
h::z
fun insert b i l =
case l of
[]
=> i::[]
| h::t =>
if i<h then
i::l
else
let z = insert b i t in
free l when b;
h::z
Multiset Formula
A term for an abstract multiset of
locations: L 2 Locations ! {0,1,1}
Memory Type
• For trees,
root
left
right
• For functions which have tree!tree type,
input
new result usage
Estimating Memory Usage
Memory-Usage Analysis
fun copyleft a =
A.left A.right
case a of
Leaf
=> Leaf
X(1)
| Node(al,ar) =>
let r = copyleft al
(1)
in
(2)
Node(r,ar)
copyleft:
R
A
X(2)
result usage
Result:
Usage:
A.root
Constructing Dynamic Flags
1/2
Constructing Dynamic Flags
2/2
Find condition
to free “
preservation constraints:
” under two
Final copyleft
fun copyleft [b,bns] a =
case a of
Leaf
=> Leaf
| Node(al,ar) =>
let r = copyleft [b^bns,bns] al
in (free a when b;
Node(r,ar))
b:
whether it is safe to
free the input tree
excluding the result
bns: whether the input
tree has no shared
sub-parts
Shape Grammar[LeYaYi05]
Language
Abstract Domain & Semantics
Correctness:
is provable by the separation logic
Normalization
rmjunk
x
fold
y
a
nil
x
a
nil
y
unify
a
x
b

d
c
b
e
b
f
nil
b

x
c

a

 ::= <,>
simplyfy
a
x
x
b

a’
nil
b
a
x
c

 ::= R() [ R(b)
 ::= R(b) [ {nil}
 ::= nil | <b,b> | <,>
b ::= <,>
{b/}
 ::= <nil,nil>
=
 ::= nil | <b,b>
b ::= <b,b> | <nil,nil> {b /nil} [ {b ::= nil}
=  ::= nil | <b,b> {/b} = { ::= nil | <,>}
b ::= <b,b> | nil
Cyclic Structure
Non-terminals have parameters
Ex. Doubly linked list
dll ::= nil | <arg, dll(self)>
c(a)
c
x
a
c ::= <arg>
c
b
x
a
b
May-alias Analysis for
Pointers: Beyond klimiting
A. Deutsch
PLDI’94
Symbolic Access Path
• String of the form e1.e2. … .en, ei is either:
– a selector: name of a variable or structure field,
dereference operator
– an expression of form Bk where B is the basis of
some recursive type
• e.g.
– X(tl)ihd
– T{left,right}jkey
• Note: xf for a shorthand of (*x).f
Symbolic Alias Pairs
• (<f1, f2>, K)
– Symbolic access paths f1 and f2 are aliases
where equations K of numeric domain
satisfy
– e.g.
(<X(tl)k1hd, Y(tl)k2hd>, {k1=k2})
Program Semantics
• Every operation a statement applies to
an access path is also applied to all
aliases
• e.g.
But This Approach
• Might be effective for finding aliases
• But not appropriate for memory leak
detection
• Since unreachable/leaking memory cells
(i.e. cells that have no access paths)
cannot be expressed, hence ignored
Shape Analysis
via 3-valued Logic
M. Sagiv, T. Reps, R. Wilhelm
POPL’99
Representing Stores
• Using predicates
– x(u): Is u pointed by x?
– n(u,u’): Is u’ pointed by field n of u?
– sm(u): Is u summarizing multiple cells?
3-valued Logic
• Definite/indefinite values
– x(u) = 1 means x points u
– x(u) = 0 means x does not point u
– x(u) = ½ means x may or may not point u
• First order formulae with transitive
closure are used
½
• Unbiased analysis
is possible
1
0
3-valued Structures
• 3-valued structure <U, i> ∈ 3STRUCT[P]
– P: Some given predicates (vocabularies)
– U: universe of cells (individuals)
– i: mapping 3-value for predicates over cells
e.g. i(x)(u) = ½ is the value for x(u)
• Ordered by existence of embedding
– f embeds S=<US,iS> in S’=<US’,iS’> if
• iS(p)(u1, …, uk) ≤ iS’(p)(u1, …, uk)
• (|{u | f(u) = u’}| > 1) ≤ iS’(sm)(u’)
Abstraction
• Renaming cells to their canonical names
– partitions cells by their properties
– bounds the size of a structure finite
u1
y
…
x
un
un+
1
t_embedc
x
u{x},{y}
y
u{y},{x}
Abstract Semantics
• Semantics of each statement
1/2
– is given by predicate update formulae
which defines the new value of each
predicate after running the statement
• e.g. st: x = t->n
Abstract Semantics
2/2
• 3-STRUCT[P]  3-STRUCT[P]
transformer associated with st
– Especially, x = malloc() changes the universe
Shape Analysis Algorithm
• Computes the least fixpoint of set of
structures
– For each node v in control flow graph G
Precision Improvements
• Decompose transformer for st into 3
steps
1. Focus
2. Transform (the original transformer)
3. Coerce
Focus
• For given formulae, Focus returns set of
structures that each makes the
formulae evaluate to a definite value
• Abstract interpretation continues
separately with each case
• Improves precision
Coerce
• Compatible structures
– Remove impossible structures
– Recover indefinite values to definite
– e.g. single variable cannot point two cells
x(u) = 1 and x(u’) = ½ then x(u’) = 0
• Improves precision
Partially Isomorphism
Heap Abstraction
• R. Manevich et al., SAS2004
• Merges structures with same universes
• Hence, reduces the size of sets used in
fixpoint computation
• Improves analysis speed 3~100 times
• Reduces memory usage up to 1/200
What we can learn
• Sharing is not a simple problem
• We might need relation rather than
function for points-to information
• …