Garbage collection

Download Report

Transcript Garbage collection

Garbage Collection
Compiler
Baojian Hua
[email protected]
Heap Allocation

Unlike stack-allocated values, heapallocated value don’t obey a stackbased discipline
// C
struct A{…};
struct A *p =
malloc(sizeof(*p));
// Java
class A {…}
A p = new A ();
// F#
type t
= A of int
| B of char
let x = A 5
Reclaiming heap storage

If we do NOT reclaim heap storage after it is
used, then



memory leakings
eventually the programs will (needlessly) exhaust
memory
Solutions:

explicit deallocations (e.g., “free”)


manually, highly error-prone, sources of (nearly) all evils
garbage collection

automatic, transparent, popular today
Garbage
class Tree {int data;
Tree left;
Tree right}
void main ()
{
Tree x, y;
x
y
x
x
}
=
=
=
=
new
new
new
new
Tree
Tree
Tree
Tree
x
y
();
();
();
();
Garbage
class Tree {int data;
Tree left;
Tree right}
void main ()
{
Tree x, y;
x
y
x
x
}
=
=
=
=
new
new
new
new
Tree
Tree
Tree
Tree
x
y
();
();
();
();
Garbage

Two kinds of garbage:

syntactic garbage:


semantics garbage:


heap values that are NOT reachable from any declared
variables (roots)
heap-allocated value is reachable from some root, but
this value won’t be used any more in the future
Theory shows that the latter is not decidable,
so current research concentrate on the first
kind
Reference Counting
Reference counting

Basic idea:




add an extra header (the “reference count”)
to every heap-allocated data structure
when first allocated: count = 1
whenever a new reference to the data
structure is established: count++
whenever a reference disappears: count-
if count==0, reclaim it
e.g.
class Tree {…}
Tree x = new Tree ();
1/ 2
…
null
Tree y = x;
Tree z = new Tree ();
x.right = z;
x = new Tree ();
1/ 2
…
null
e.g.
class Tree {…}
Tree x = new Tree ();
1/ 2
…
null
Tree y = x;
Tree z = new Tree ();
x.right = z;
1/ 2
…
null
x = new Tree ();
1
…
null
Moral

RC is simple, and is used in many other areas:


e.g., hard disks or page tables, etc.
But RC suffers from several serious problems:

space usage:


inefficient:



one word per object? (depends on what?)
increment and decrement on every object assignment
tightly-coupled with applications (mutator)
ineffective

for circular data structures
Problem: tight-coupling
class Tree {…}
Tree x = new Tree ();
x.rc = 1;
Bad effect on mutator:
1. code size increases;
Tree y = x;
x.rc++;
Tree z = new Tree ();
z.rc = 1;
x.right = z;
z.rc++;
x.rc--;
x = new Tree ();
x.rc = 1;
2. code slows down (up
to 50%);
3. GC correctness
depends on mutator
implementation; and
4. specific GC.
Problem: Circular DS

RC doesn’t work!
1
1
…
…
1
…
Mark and Sweep
Mark and Sweep

Basic idea:

heap records form graphs!


add an extra header (the “mark bit”) to every
heap-allocated record



with roots in stacks, globals and registers
marked: bit == 1; unmarked: bit==0
mark: starting from roots, traverse the graph,
mark reached nodes
sweep: scan all nodes, sweep unmarked ones
Basic Idea
root
The basic idea is to do
a traversal (BFS or
DFS) of the graph, and
encode traversal
information in the node
itself
It’s a bad software
engineering practice to
encode such
information directly in
data structures, but it’s
OK here, why?
Finding roots?

When a collection starts, which stack
slots and registers contain heappointers?


Not easy
Essentially, finding roots involves
deciding whether or not a machine
word is a pointer

need help from compilers (if any)
Many approaches

Mask every word value:

e.g., a word’s least significant bit == 0 means a
pointer, ==1 means an integer


Bit-vector masks:



so must change the representation of integers
compiler generates a bit-string at every GC site,
for registers and stack slots
you do this in lab4
Allocate roots (live across function calls) in
separate stacks, but not registers

even simpler
Traversal

How to traversal the pointer graph?


depth-first or breadth-first search
But must do with care:

the traversal itself takes space


e.g., the DFS stack is of size O(N)
some strategies:

e.g., explicit stack or pointer reversal, read
Tiger book
Explicit Stack
// x is an unmarked root
stack = [];
DFS (x)
push(stack, x);
while (stack not empty)
y = pop (stack);
mark (y);
for(pointer field y.f in y and y.f not marked)
push (stack, y.f)
// Key observation: explicit stack is tight than
// a call-stack!
Pointer reversal example
root
Pointer reversal example
root
Pointer reversal example
root
Pointer reversal example
root
Pointer reversal example
root
Pointer reversal example
root
Pointer reversal example
root
Pointer reversal example
root
Pointer reversal example
root
Pointer reversal example
root
Pointer reversal example
root
Pointer reversal example
root
Copying Collection
Copying collection

Basic idea:

divide the heap into two sub-heaps: from
and to



initially, allocations happen in from
when from is exhausted, traverse the from
space, and copy all marked nodes to to
when done, flip, change the role of from
and to
from
roots
to
E.g.
A
E
roots
C
B
D
E.g.
A
A
E
roots
C
B
D
C
D
Pointer Forwarding

During copying, objects in “to” space
may have wrong pointers


why?
Solution: keep a forwarding pointer in each
object in “from” space



update this when copying really happens
look up this when forwarding finished
Copying is just a BFS or DFS

e.g., Cheney’s algorithm
E.g.
start limit
roots
A
A
C
E
B
D
E.g.
limitlimit
startstart
roots
A
A
C
E
B
D
C
Moral

Simple data structures:



Fast allocation


compaction comes free after copying!
Space overhead


allocation == a pointer bumping
No fragmentation


linear allocation
only forwarding pointers in headers
half space wasted!
Slow, if ratio of live objects is high

why?
Generational Collection
Generational GC

Two observations by Hewitt in 1987:



most heap-allocated objects die young
younger objects usually point to older ones,
but not vice verse
So put objects in several sub-heaps
according to their generations

younger generations are collected more
frequently
Details




Divide the heap into 2 or more sub-heaps
Allocation happens in the 1st heap
1st heap are collected most frequently, and
others are less frequently
if an object survives long enough in the ith
heap, then promoted it to the (i+1)th heap
1st
2nd
3rd
Number of Generations

Design choice in different GCs



some has a up-to 7 generations
some has a dynamic changing numbers
Most have 2 generations:



the younger one called nursery
the older one called tenured
nursery is relatively small (<1M)

the GC on it is fast!
How to collect sub-heap?


No standard criterion
A popular choice is that copying collect
the nursery, and mark-sweep the
tenured (thus all)
1st
2nd
3rd
Two issues

How to find roots?

normal roots + pointers from older
generation to younger ones


must detect these pointers (conservatively)
When and how to promote objects?

after objects live long enough, they should
be promoted from nursery to tenured
Issue I: Roots

One must remember the pointers from
old to young

Keep a store list, which contains all heap
addresses that have been modified


that means every write to heap should be
stored into the store list
this requires a write barrier
Write Barrier Example
// initial assignment:
x.f = y;
// and y is in nursery, x
// is in tenured
// compiler should
// generate code:
x.f = y;
remember (x.f);
f
young
old
How to implement write
barrier?

Several widely-used techniques:

Remembered list


Remembered set


mark each object
Card marking



store addresses in a list
mark cards, not objects
coarse version of remembered set
Page marking

with OS support
Issue II: Promotion

Must record in each object the number of
collections it has survived


promote it, when N>threshold
But it’s hard to tune the threshold


must leave enough space in nursery
must keep few garbage in tenured


whole collection is slow!
A typical choice is 80%, but should be
dynamic
Moral

Fast and short-pause



for the nursery space is small
Locality is good
Widely used (most popular one)
Incremental GC
Concurrent GC

Basic idea:

further reduce the GC pause, by allowing
the mutator (the user program) to continue
its work, before the collector finishes



the collector may stop (incremental)
or work concurrently
so the collection time is amortized
Baker’s Algorithm
start limit
roots
A
A
C
E
B
D
F
Read barrier

Baker’s algorithm requires extra work before
every read of a heap pointer


so one maintain the invariant that mutator always
points into “to” space
This may be more expensive than write
barrier


read is more frequent than write?
and the overall execution time of mutator
decreases when allocation is high
Conservative GC
Conservative GC

GC C/C++ is more difficult



language designed with no GC in head
Every value is an integer (pointer)
Basic idea:


be more “conservative” in guessing roots
and pointers, and don’t try to do an
accurate trace of the pointer graph
maybe more leakings
Boehm-Weiser GC

Basic idea:

Put the heap in very high address




so few integers in programs are mis-traced
assume all values that could be aligned are
heap addresses
instrument malloc() to keep track of what
has been allocated
then mark-and-sweep

copying collection?