Transcript 슬라이드 1
Shape & Alias Analyses
Jaehwang Kim and Jaeho Shin
Programming Research Laboratory
Seoul National University
2005-08-08
Overview
•
•
•
•
•
•
•
K-limited Graph
Graph Type
Shape Type
Memory Type
Shape Grammar
May-Alias Analysis with Access Paths
Shape Analysis via 3-valued Logic
k-limited Graph
[Jones&Muchnick79]
• Parts of AST
• Motivation
– Optimization
• Storage allocation
• Garbage collection
– Memory error detection
Store = Set of Graphs
Semantics of assignments
X
Z
hd
tl hd
a
Y
hd
b
tl
tl
c
2 Store
node: Var -> Store -> Node
AS[[ X := atom]]
=
add a new leaf node labeled a;
move X to the new node
AS[[ X := Y ]]
= move X to (node Y )
AS[[ X := cons(Y,Z)]] =
make a new node n and move X to label n;
add hd, tl edge to (node Y ) and
(node Z ) respectively;
Share = Set of k-limited
Graphs
X
Z
?c
W ?
?s
Y
k-limited graph
• All nodes are accessible by k or less steps
from at least one variable
• Each node may be labeled by one or more
variables
• unknown nodes labeled ?
– node represents a set of nodes whose
internal structure is not known
– ?s: node may contains sharings
– ?c: node may contains cycles
• Edges from unknown nodes are represented
by --->
Finite variables, finite size, finite selectors
=> Share is finite!
AS[[assign]] 2 Share -> Share
[[s]] D = [ 2 D clean(next[[s]] )
–
next[[s]]:
•
•
–
creates a node or edges
moves a variable to other nodes
clean(): make graphs k-limited
1.
U() = subgraph inaccessible from any variables by
path of length k-1 or less
2. remove inaccessible nodes
3. partition U() into SCCs, coalesce each SCC unknown
node labeled ?c
4. partition U() into CC, coalesce each CC unknown node
labeled ?c, ?s, or ?
Example:
clean for 2-limited graph
X
Y
Y
X
remove garbage
?s
?s
Z
X
?c
Z
coalesce cycles
Y
X
Y
coalesce CC
?s
?c
Z
?s
Z
Graph Types
[Klarlund&Schwartzbach93]
Recursive data type is useful for treeshaped value
L ! nonempty(head:Int, tail:L)
| empty()
1
2
3
Recursive data type can’t represent graph shaped value!
Routing expression
Graph type = Recursive data type + Routing expression
– T ! v(…, a:T[R], …)
– R = {a, ", "a, ^, $}*
Examples
– List with head
H ! (first:L, last:L[ first ( tail)*$ "])
L ! (head:Int, tail:L)
| ()
– Cyclic lilst
C ! (next:C)
| (next:C["*^])
Routing Expression (cont)
• Well-formedness
– Every routing expression always defines a
unique destination
– Checked by the monadic second-order logic
• Evaluation
– Linear-time evaluation of routing
expression by finite automata
– Runtime evaluation cost is major weakness
of graph type
Shape Type
[Fradet&Metayer97]
Context free graph grammar H = <NT,T,PR,O>
NT
: ranked non-terminal
T
: ranked terminal
PR
: productions l ! r s.t. l2 NT, r 2 2Term
Term
: built from NT, T and set of variables
Terminal : T and set of variables
O
: origin or derivation
Shape(H) = {M| M !*{O} and M 2 Terminal}
X + ( r) !* X + ( l)
iff
l!r 2 PR and (Var( r) – Var( l)) Å Var(X) = ;
: variable substitution
Example
Doubly -> p x, pred x x, L x
Lx
-> next x y, pred y x, L y
| next x x
pred
next
a1
p
next
a2
pred
next
a3
pred
Shape-C
Transformer P = (C => A)
shape int cir {
pt x, L x x;
L x y = L x z, L z y;
L x y = next x y;
};
…
cir s = [|=> pt x;
next x x;
$x=1; |];
…
s:[| pt x; next x y; =>
pt x; next x z; next z y;
$z=i; |]
If Check(C,A,PR,O), then
for all X and ,
X + ( C) !* {O} => X + ( A) !* {O}
pt x; next x z; next z y
pt x; L x z; L z y
pt x; L x y
Memory Type[LeYaYi03]
Goal: Separating reusable heap cells from
others
fun insert i l =
case l of
[]
=> i::[]
| h::t =>
if i<h then
i::l
else
let z = insert i t in
h::z
fun insert b i l =
case l of
[]
=> i::[]
| h::t =>
if i<h then
i::l
else
let z = insert b i t in
free l when b;
h::z
Multiset Formula
A term for an abstract multiset of
locations: L 2 Locations ! {0,1,1}
Memory Type
• For trees,
root
left
right
• For functions which have tree!tree type,
input
new result usage
Estimating Memory Usage
Memory-Usage Analysis
fun copyleft a =
A.left A.right
case a of
Leaf
=> Leaf
X(1)
| Node(al,ar) =>
let r = copyleft al
(1)
in
(2)
Node(r,ar)
copyleft:
R
A
X(2)
result usage
Result:
Usage:
A.root
Constructing Dynamic Flags
1/2
Constructing Dynamic Flags
2/2
Find condition
to free “
preservation constraints:
” under two
Final copyleft
fun copyleft [b,bns] a =
case a of
Leaf
=> Leaf
| Node(al,ar) =>
let r = copyleft [b^bns,bns] al
in (free a when b;
Node(r,ar))
b:
whether it is safe to
free the input tree
excluding the result
bns: whether the input
tree has no shared
sub-parts
Shape Grammar[LeYaYi05]
Language
Abstract Domain & Semantics
Correctness:
is provable by the separation logic
Normalization
rmjunk
x
fold
y
a
nil
x
a
nil
y
unify
a
x
b
d
c
b
e
b
f
nil
b
x
c
a
::= <,>
simplyfy
a
x
x
b
a’
nil
b
a
x
c
::= R() [ R(b)
::= R(b) [ {nil}
::= nil | <b,b> | <,>
b ::= <,>
{b/}
::= <nil,nil>
=
::= nil | <b,b>
b ::= <b,b> | <nil,nil> {b /nil} [ {b ::= nil}
= ::= nil | <b,b> {/b} = { ::= nil | <,>}
b ::= <b,b> | nil
Cyclic Structure
Non-terminals have parameters
Ex. Doubly linked list
dll ::= nil | <arg, dll(self)>
c(a)
c
x
a
c ::= <arg>
c
b
x
a
b
May-alias Analysis for
Pointers: Beyond klimiting
A. Deutsch
PLDI’94
Symbolic Access Path
• String of the form e1.e2. … .en, ei is either:
– a selector: name of a variable or structure field,
dereference operator
– an expression of form Bk where B is the basis of
some recursive type
• e.g.
– X(tl)ihd
– T{left,right}jkey
• Note: xf for a shorthand of (*x).f
Symbolic Alias Pairs
• (<f1, f2>, K)
– Symbolic access paths f1 and f2 are aliases
where equations K of numeric domain
satisfy
– e.g.
(<X(tl)k1hd, Y(tl)k2hd>, {k1=k2})
Program Semantics
• Every operation a statement applies to
an access path is also applied to all
aliases
• e.g.
But This Approach
• Might be effective for finding aliases
• But not appropriate for memory leak
detection
• Since unreachable/leaking memory cells
(i.e. cells that have no access paths)
cannot be expressed, hence ignored
Shape Analysis
via 3-valued Logic
M. Sagiv, T. Reps, R. Wilhelm
POPL’99
Representing Stores
• Using predicates
– x(u): Is u pointed by x?
– n(u,u’): Is u’ pointed by field n of u?
– sm(u): Is u summarizing multiple cells?
3-valued Logic
• Definite/indefinite values
– x(u) = 1 means x points u
– x(u) = 0 means x does not point u
– x(u) = ½ means x may or may not point u
• First order formulae with transitive
closure are used
½
• Unbiased analysis
is possible
1
0
3-valued Structures
• 3-valued structure <U, i> ∈ 3STRUCT[P]
– P: Some given predicates (vocabularies)
– U: universe of cells (individuals)
– i: mapping 3-value for predicates over cells
e.g. i(x)(u) = ½ is the value for x(u)
• Ordered by existence of embedding
– f embeds S=<US,iS> in S’=<US’,iS’> if
• iS(p)(u1, …, uk) ≤ iS’(p)(u1, …, uk)
• (|{u | f(u) = u’}| > 1) ≤ iS’(sm)(u’)
Abstraction
• Renaming cells to their canonical names
– partitions cells by their properties
– bounds the size of a structure finite
u1
y
…
x
un
un+
1
t_embedc
x
u{x},{y}
y
u{y},{x}
Abstract Semantics
• Semantics of each statement
1/2
– is given by predicate update formulae
which defines the new value of each
predicate after running the statement
• e.g. st: x = t->n
Abstract Semantics
2/2
• 3-STRUCT[P] 3-STRUCT[P]
transformer associated with st
– Especially, x = malloc() changes the universe
Shape Analysis Algorithm
• Computes the least fixpoint of set of
structures
– For each node v in control flow graph G
Precision Improvements
• Decompose transformer for st into 3
steps
1. Focus
2. Transform (the original transformer)
3. Coerce
Focus
• For given formulae, Focus returns set of
structures that each makes the
formulae evaluate to a definite value
• Abstract interpretation continues
separately with each case
• Improves precision
Coerce
• Compatible structures
– Remove impossible structures
– Recover indefinite values to definite
– e.g. single variable cannot point two cells
x(u) = 1 and x(u’) = ½ then x(u’) = 0
• Improves precision
Partially Isomorphism
Heap Abstraction
• R. Manevich et al., SAS2004
• Merges structures with same universes
• Hence, reduces the size of sets used in
fixpoint computation
• Improves analysis speed 3~100 times
• Reduces memory usage up to 1/200
What we can learn
• Sharing is not a simple problem
• We might need relation rather than
function for points-to information
• …