Transcript 080708
Java Garbage Collection,
Byte Code
James Atlas
August 7, 2008
Review
• Java Security
JVM
Cryptography
August 7, 2008
James Atlas - CISC370
2
Schedule
• Today
Java Garbage Collection
Java Bytecode
• Tuesday
Review
• Thursday
Final (5-7PM)
August 7, 2008
James Atlas - CISC370
3
Java Garbage Collection
August 7, 2008
James Atlas - CISC370
4
Garbage Collection
• What is garbage and how can we deal with it?
• Garbage collection schemes
Reference Counting
Mark and Sweep
Stop and Copy
• A comparison
August 7, 2008
James Atlas - CISC370
5
How Objects are Created in Java
• An object is created in Java by invoking the new()
operator.
• Calling the new() operator, the JVM will do the
following:
allocate memory;
assign fields their default values;
run the constructor;
a reference is returned.
August 7, 2008
James Atlas - CISC370
6
How Java Reclaims Objects Memory
• Java does not provide the programmer any means
to destroy objects explicitly
• The advantages are
No dangling reference problem in Java
Easier programming
No memory leak problem
August 7, 2008
James Atlas - CISC370
7
What is Garbage?
• Garbage: unreferenced objects
john
Object
Student john= new Student();
Student jeff= new Student();
john=jeff;
• john Object becomes garbage,
john
jeff
because it is an unreferenced
Object
jeff Object
August 7, 2008
James Atlas - CISC370
8
What is Garbage Collection?
• What is Garbage Collection?
• Finding garbage and reclaiming memory
allocated to it.
john
Object
• When is the Garbage Collection process
invoked?
When the total memory allocated to a Java
program exceeds some threshold.
john
jeff
• Is a running program affected by garbage
collection?
Yes, depending on the algorithm. Typically the
program suspends during garbage collection.
August 7, 2008
James Atlas - CISC370
jeff Object
9
Strategies for Handling Garbage
• Modern Societies produce an excessive amount of
waste?
• What is the solution?
Reduce
Reuse
Recycle
• The same Applies to Java!!!
August 7, 2008
James Atlas - CISC370
10
Reduce Garbage
• A Java program that does not create any objects
does not create garbage.
• Objects used until the end of the program do not
become garbage.
• Reducing the number of objects used will reduce
the amount of garbage generated.
August 7, 2008
James Atlas - CISC370
11
Reuse Garbage
• Reuse objects instead of generating new ones.
for (int i=0;i<1000000; ++i) {
SomeClass obj= new SomeClass(i);
System.out.println(obj);
}
• This program generates one million objects and prints them out.
SomeClass obj= new SomeClass();
for (int i=0;i< 1000000; ++i)
{
obj.setInt(i);
System.out.println(onj);
}
• Using only one object and implementing the setInt() method, we dramatically
reduce the garbage generated.
August 7, 2008
James Atlas - CISC370
12
Recycle Garbage
• Don't leave unused objects for the garbage collector.
Put them instead in a container to be searched when an object is
needed.
• Advantage: reduces garbage generation.
• Disadvantage: puts more overhead on the programmer.
• Can anyone think of what design pattern this represents?
August 7, 2008
James Atlas - CISC370
13
Garbage Collection Strategies
• Reference Counting
• Mark and Sweep
• Stop and Copy
August 7, 2008
James Atlas - CISC370
14
Reference Counting Garbage
Collection
• Main Idea: Add a reference count field for every object.
• This Field is updated when the number of references to an
object changes.
• Example
• Object p= new Integer(57);
• Object q = p;
August 7, 2008
p
57
refCount = 2
q
James Atlas - CISC370
15
Reference Counting (cont'd)
• The update of reference field when we have a reference
assignment ( i.e p=q) can be implemented as follows
if (p!=q)
{
if (p!=null)
--p.refCount;
p=q;
if (p!=null)
++p.refCount;
}
Example:
Object p = new Integer(57);
Object q= new Integer(99);
p=q
p
57
refCount = 0
q
99
August 7, 2008
James Atlas - CISC370
refCount = 2
16
Reference Counting (cont'd)
• What in case of indirect references?
• We can still use reference counting, provided
we consider all references to an object
including references from other objects.
• Object p = new Association(new Integer(57),
new Integer(99));
August 7, 2008
James Atlas - CISC370
17
Reference Counting (cont'd)
• When does reference counting fail?
• When head is assigned to null, first object reference count
becomes 1 and not zero
• Reference counting will fail whenever the data structure
contains a cycle of references
head
ListElements
next
next
refCount = 1
August 7, 2008
ListElements
refCount = 1
James Atlas - CISC370
ListElements
next
refCount = 1
18
Reference Counting (cont'd)
• Advantages and Disadvantages
+ Garbage is easily identified.
+ Garbage can be collected incrementally.
- Every object should have a reference count field.
Overhead for updating reference count fields.
It fails in the case of cyclic references.
It does not de-fragment the heap
August 7, 2008
James Atlas - CISC370
19
Mark-and-Sweep Garbage Collection
• It is the first garbage collection algorithm that is able to
reclaim garbage even for cyclic data structures.
• Mark and sweep algorithm consists of two phases:
mark phase
sweep phase
for each root variable r
mark(r);
sweep();
August 7, 2008
James Atlas - CISC370
20
Mark and Sweep (cont'd)
void sweep(){
for each Object p in the heap
{
if (p.marked)
p.marked=false;
else
heap.release(p);
}
}
program
August 7, 2008
James Atlas - CISC370
21
Mark and Sweep (cont'd)
• Advantages
It correctly identifies and collects garbage even
in the presence of reference cycles.
No overhead in manipulating references.
• Disadvantages
The program suspends while garbage collecting.
It does not De-Fragment the heap.
August 7, 2008
James Atlas - CISC370
22
Stop-and-Copy Garbage Collection
• This algorithm collects garbage and defragments the heap.
• The heap is divided into two regions: active and inactive.
• When the memory in the active region is exhausted, the
•
•
program is suspended and :
Live objects are copied to the inactive region contiguously
The active and in active regions reverse their roles
• The Algorithm
for each root variable r
r=copy(r,inactiveHeap);
swap (activeHeap,inactiveHeap);
August 7, 2008
James Atlas - CISC370
23
Stop-and-Copy Garbage Collection
(cont'd)
head
Object copy(Object p, Heap destination)
{
if (p==null)
return null;
if (p.forward==null)
{
q=destination.newInstance(p.class);
p.forward= q;
for each field f in p
{
if (f is primitive type)
q.f=p.f;
else
q.f= copy(p.f, destination);
}
q.forward = null;
}
return p.forward;
}
inactive
A’
active
null
B’
null
C’
null
August 7, 2008
James Atlas - CISC370
24
Stop-and-Copy Garbage Collection
(cont'd)
• Advantages
It works for cyclic data structures
It Defragments the heap.
• Disadvantages
All objects are copied when the garbage collector is invoked – it
does not work incrementally.
It requires twice as much memory as the program actually uses.
August 7, 2008
James Atlas - CISC370
25
Java Garbage Collector History
• 1.0-1.3
mark-and-sweep
• 1.3
three memory spaces
1. Permanent space: used for JVM class and method
objects
Old object space: used for objects that have been around
a while
New (young) object space: used for newly created
objects
2.
3.
•
Also broken into Eden, Survivor1 and Survivor2
allowed different techniques for each space
August 7, 2008
James Atlas - CISC370
26
Java Garbage Collector History (cont’)
• 1.3 techniques
Copy-compaction: used for new object space.
Mark-compact: used in old object space.
• Similar to mark and sweep, mark-compact marks all unreachable
objects; in the second phase, the unreachable objects compact.
Incremental garbage collection (optional)
• Incremental GC creates a new middle section in the heap, which
divides into multiple trains. Garbage is reclaimed from each train
one at a time. This provides fewer, more frequent pauses for
garbage collection, but it can decrease overall application
performance.
• still all “stop-the-world” techniques
August 7, 2008
James Atlas - CISC370
27
Java Garbage Collector History (cont’)
• 1.4.1 introduced parallel GC algorithms
Young
Old
Stop the world
Multithreaded Concurrent
Copying
X
X
*Parallel
copying
X
X
X
*Parallel
scavenging
X
X
X
Incremental
1 (see note below)
X
Mark-compact
X
X
*Concurrent
X
2 (see note below)
X
Note 1: Subdivides the new generation to create an additional middle generation
Note 2: Uses stop-the-world approach for two of its six phases
August 7, 2008
James Atlas - CISC370
28
JVM Internals
• The architecture
JVM is an abstract concept
Sun just specified the interface
implementation details depend on specific product (SUN
JDK, IBM JDK, Blackdown)
• Java bytecode, the internal language
independent from CPU-type (bytecode)
Stackoriented, object-oriented, type-safe
August 7, 2008
James Atlas - CISC370
29
Runtime view on a JVM
Runtime Data storage
Class
loader
Method
Area (Classes)
Heap (Objects)
PC registers
Stack Frames
Native method
stacks
August 7, 2008
JVM
runtime
Native
methods
James Atlas - CISC370
30
Runtime data
• Frame:
• Saves runtime state of execution threads,
therefore holds information for method execution
(program counter)
• All frames of a thread are managed in a stack
frame
August 7, 2008
James Atlas - CISC370
31
Runtime data
• Method area
Runtime information of the class file
Type information
Constant Pool
Method information
Field information
Class static fields
Reference to the classloader of the class
Reference to reflection anchor (Class)
August 7, 2008
James Atlas - CISC370
32
The Constant Pool
• The "constant pool" is a heterogenous array of
data. Each entry in the constant pool can be one of
the following:
string , class or interface name , reference to a field or
method , numeric value , constant String value
• No other part of the class file makes specific
references to strings, classes, fields, or methods.
All references for constants, names of methods,
and fields are via lookup into the constant pool.
August 7, 2008
James Atlas - CISC370
33
The Class File Structure
H
E
A
D
E
R
CONSTANTPOOL
FIELDS
METHODS
I ACCESS A
N
T
FLAGS
T
T
E
R
(Final,
R
I
Native,
F Private, B
A Protected, U
C
T
...)
E
E
S
S
• You can use a classdumper like javap -c or DumpClass to
analyze these inner details
August 7, 2008
James Atlas - CISC370
34
Javap -c -verbose example
• Our Chicken.java class
August 7, 2008
James Atlas - CISC370
35
The Class File Format
• Java class files are brought into the JVM via the
•
•
classloader
The class file is basically just a plain byte array,
following the rules of the byte code verifier.
All 16-bit and 32-bit quantities are formed by
reading in two or four 8-bit bytes, respectively, and
joining them together in big-endian format.
August 7, 2008
James Atlas - CISC370
36
Methods and Fields
• The type of a field or method is indicated by a string
•
•
called its signature.
Fields may have an additional attribute giving the
field's initial value.
Methods have an additional CODE attribute giving
the java bytecode for executing that method.
August 7, 2008
James Atlas - CISC370
37
The CODE Attribute
• maximum stack space
• maximum number of local variables
• The actual bytecode for executing the
method.
• A table of exception handlers,
start and end offset into the bytecodes,
an exception type, and
the offset of a handler for the exception
August 7, 2008
James Atlas - CISC370
38
Bytecode Basics
August 7, 2008
James Atlas - CISC370
39
The JVM types
• JVM-Types and their prefixes
Byte
Short
Integer
ints!)
Long
Character
Single float
double float
References
b
s
i (java booleans are mapped to jvm
l
c
f
d
a to Classes, Interfaces, Arrays
• These
Prefixes used in opcodes (iadd, astore,...)
August 7, 2008
James Atlas - CISC370
40
The JVM Instruction Mnemonics
•
•
•
•
•
•
•
•
•
Shuffling (pop, swap, dup, ...)
Calculating (iadd, isub, imul, idiv, ineg,...)
Conversion (d2i, i2b, d2f, i2z,...)
Local storage operation (iload, istore,...)
Array Operation (arraylength, newarray,...)
Object management (get/putfield, invokevirtual,
new)
Push operation (aconst_null, iconst_m1,....)
Control flow (nop, goto, jsr, ret, tableswitch,...)
Threading (monitorenter, monitorexit,...)
August 7, 2008
James Atlas - CISC370
41
Bytecode
• Java Bytecode (JBC) are followed by zero or more
•
•
•
•
bytes of additional operand information.
Table lookup instructions (tableswitch,
lookupswitch) have a flexible length
The wide operation extension allows the base
operations to use „large“ operands
No self-modifying code
No branching to arbitrary locations, only to
beginning of instructions limited to scope of current
method (enforced by verifier!)
August 7, 2008
James Atlas - CISC370
42
Bytecode (Reverse) Engineering
August 7, 2008
James Atlas - CISC370
43
Bytecode Engineering tools
•
Obfuscators
Remove/Manipulate all information that can be used for
reverse engineering
•
Native compilers
„Real“ compile of java bytecodes to native instructions
(x86/sparc)
•
Build your own bytecode
Programmatic Generation
Manipulate classfiles with an API
August 7, 2008
James Atlas - CISC370
44
Obfuscators
Techniques used
• Identifier Name Mangling
•
•
•
The JVM does not need useful names for Methods and
Fields
They can be renamed to single letter identifiers
Constant Pool Name Mangling
•
•
Decrypts constant pool entries on runtime
Control flow obfuscation
•
•
Insertion of phantom variables, stack scrambling
And by relying on their default values inserting ghost
branch instructions, which never execute
August 7, 2008
James Atlas - CISC370
45
Obfuscators
Problems with Obfuscation
• Constant value Mangling implies overhead
processing in extra method call of an
„deobfuscatename“ method in each retrieval from
constant pool
• Dynamic class loading may become broken as
classes get new names and reflection calls like
class.forName(„Account“) will fail because class
„Account“ now known as by it‘s obfuscated name
„b16“!
• And: Obfuscation breaks patterns that can be
recognized by JIT-engines for optimization
August 7, 2008
James Atlas - CISC370
46
Protecting the Source Code:
Native Compilers
• Convert Java bytecode to C
• Generate executable via normal c-build
fast execution
Additional decompilation effort needed
Long turnaround times
Even for small java programs you get monster
size executable files (67mb source for Viva.java)
from some commercial products
Transformed program may than be vulnerable to
buffer overflows and off-by-ones
August 7, 2008
James Atlas - CISC370
48
Bytecode Reverse Engineering
• Decompilation
Get Source code from class files
• Graphical Analysis
Rebuild the logical control flow
• Disassembly
Get symbolic bytecode from class files
August 7, 2008
James Atlas - CISC370
49
JAD
• Java Decompiler
Free for personal use
JADClipse plugin for Eclipse - allows you to
browse .class files
August 7, 2008
James Atlas - CISC370
50