Java Class File Format

Download Report

Transcript Java Class File Format

Java .class File Format
陳正佳
1
Java Virtual Machine
• the cornerstone of Sun's Java programming language.
• a component of the Java technology responsible for
A. Java's cross-platform delivery,
B. the small size of its compiled code,
C. Java's ability to protect users from malicious
programs.
• JVM knows nothing of the Java programming language,
only of a particular file format, the class file format.
2
Class file format

A class file contains
» 1. Java Virtual Machine instructions (bytecodes)
» 2. a symbol table (Constant pool)
» 3. other ancillary information.

Javap options
» 1. -c bytecode
» 2. -s internal type signatures
» 3. -verbose stack size and number of local
variables and args.
2
3
Class file format
Each class file contains one Java type,
either a class or an interface.
 A class file consists of a stream of 8-bit
bytes.
 All 16-bit, 32-bit, and 64-bit quantities are
constructed by reading in two, four, and
eight consecutive 8-bit bytes, respectively.

2
4
Class File Format



Multibyte data items are always stored in bigendian order, where the high bytes come first.
u1, u2,and u4 represent an unsigned one-, two-,
or four-byte quantity, respectively.
These types may be read by methods such as
readUnsignedByte, readUnsignedShort, and
readInt of the interface java.io.DataInput.
3
5
class SumI {
public static void main (String[] args) {
int count=10;
int sum =0;
for (int index=1;index<count;index++)
sum=sum+index;
System.out.println("Sum="+sum);
} // method main
}
6
Class File Structure
ClassFile {
u4 magic;
u2 minor_version; u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
u2 access_flags;
u2 this_class;
u2 super_class;
u2 interfaces_count; u2 interfaces[interfaces_count];
u2 fields_count; field_info fields[fields_count];
u2 methods_count; method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}
4
7
Magic
magic :u4
The magic item supplies the magic
number identifying the class file format;
it has the value 0xCAFEBABE.
5
8
Version
minor_version :u2, major_version:u2
The values of the minor_version and
major_version items are the minor
and major version numbers of the
compiler that produced this class file.
5
9
Constant Pool
constant_pool_count :u2
» must > 0.
» indicates the number of entries in the
constant_pool table of the class file,
» where the constant_pool entry at index zero
is included in the count but is not present in
the constant_pool table of the class file.
» i.e., if count = 13 => pool[1] … pool[12]
5
10
Constant Pool
constant_pool[]
» a table of variable-length structures
» representing various
– numeric literals
– string constants,
– class/interface/type names,
– field reference,
– method reference and
– other constants that are referred to within the
ClassFile structure and its substructures.
5
11
Constant Pool

constant_pool[0],
» reserved for internal use by a JVM implementation. That
entry is not present in the class file.


The first entry in the class file is constant_pool[1].
Each constant_pool[i] is a variable-length structure
whose format is indicated by its first "tag" byte.
5
12
Access Flags
access_flags :u2
» a mask of modifiers used with class and
interface declarations.
5
13
Access Flags

The access_flags modifiers
Flag Name
ACC_PUBLIC
Value
0x0001
ACC_FINAL
ACC_SUPER
0x0010
0x0020
ACC_INTERFACE 0x0200
ACC_ABSTRACT 0x0400
Meaning
Is public;
Used By
Class,
interface
Is final;
Class
superclass
Class,
=1 for new JVM interface
interface.
Interface
Is abstract;
Class,
interface
5
14
Access Flags

The access_flags modifiers
Flag Name
ACC_PUBLIC
Value
0x0001
ACC_FINAL
ACC_SUPER
0x0010
0x0020
ACC_INTERFACE 0x0200
ACC_ABSTRACT 0x0400
Meaning
Is public;
Used By
Class,
interface
Is final;
Class
superclass
Class,
=1 for new JVM interface
interface.
Interface
Is abstract;
Class,
interface
5
15
Access Flags (continued)

The access_flags modifiers
Flag Name
Value
ACC_SYNTHETIC 0x1000
Meaning
Used By
synthetic
; Not present in the source code.
ACC_ANNOTATION 0x2000
annotation type.
ACC_ENUM
0x4000
enum type.
5
16
this class
this_class :u2
» a valid index into the constant_pool table.
» The entry at that index must be a
CONSTANT_Class_info structure
representing the class or interface defined by
this class file.
5
17
super_class
super_class
» For a class, the value of the super_class
item either must be zero or must be a valid
index into the constant_pool table.
» If the value of the super_class item is
nonzero, the constant_pool entry at that
index must be a CONSTANT_Class_info
structure representing the superclass of the
class defined by this class file.
5
18
super_class
super_class
» Neither the superclass nor any of its
superclasses may be a final class.
» If the value of super_class is zero, then this
class file must represent the class
java.lang.Object,
» the only class or interface without a
superclass.
5
19
interfaces
interfaces_count
» the number of direct superinterfaces of this class or
interface type.
interfaces[]
» Each value in the interfaces array must be a valid
index into the constant_pool table.
» The constant_pool entry at each value of
interfaces[i], where i < interfaces_count, must be
a CONSTANT_Class_info structure
5
20
Fields
fields_count
» gives the number of field_info structures in the fields
table.
fields[]
» Each entry a variable-length field_info
structure giving a complete description of a
field in the class or interface type.
» includes only those fields that are declared
by this class or interface.
» does not include fields inherited from
superclasses or superinterfaces.
5
21
Methods
methods_count
» gives the number of method_info structures in the
methods table.
methods[]
» Each entry a variable-length method_info
structure giving a complete description of Java
Virtual Machine code for a method in the class
or interface.
» The method_info structures represent all
methods, both instance methods, class (static)
methods, and constructor methods declared by
this class or interface type.
5
22
Methods
methods[]
» includes only those methods explicitly declared
by this class.
» Interfaces have only the single method <clinit>,
the interface initialization method.
» Constructor methods have the a common name
<init>.
» does not include items representing methods
that are inherited from superclasses or
superinterfaces.
5
23
Class Attributes
attributes_count
» the number of attributes in the attributes
table of this class
attributes[]
» Each value of the attributes table must be a
variable-length attribute structure.
» A ClassFile structure can have any number
of attributes associated with it.
5
24
Internal form of fully qualified
names

Replace all dots with /.
EX:
a.b.C ==> a/b/C
25
Descriptor
A descriptor is a string representing the
type of a field or method.
 Descriptors are represented in the class file
format using UTF-8 strings.
 Grammar:
FieldType ::=
BaseType
|
ObjectType
|
ArrayType

26
Fields Descriptors

BaseType ::= B | C | D | F | I | J | S | Z
» B for byte; Z for boolean
» C D F I J S for char, double, float, int, long
and short, respectively.
ObjectType ::= Lclassname;
 ArrayType ::= [ComponentType
 Ex:

» int[][] ==> [[I
» Object[] ==> [java/lang/Object;
27
Field and method descriptors
FieldDescriptor ::= FieldType
 ComponentType ::= FieldType
 MethodDescriptor ::=
( ParameterDescriptor* ) ReturnDescriptor
 ParameterDescriptor ::= FieldDescriptor
 ReturnDescriotor ::= FieldDescriptor | V

» V for void method

Ex:
1. Object mymethod(int i, double d, Thread t)
==> (IDLjava/lang/Thread;)Ljava/lang/Object;
2. void com.Clazz.m(int i) ==> (I)V.
28
The Constant Pool
JVM instructions do not rely on the runtime
layout of classes, interfaces, class
instances, or arrays.
 Instead, instructions refer to symbolic
information in the constant_pool table.
 All constant_pool table entries have the
following general format:
cp_info { u1 tag;
u1 info[]; }

29
Constant pool Tags











Constant Type
Value
CONSTANT_Utf8
1
CONSTANT_Integer
3
CONSTANT_Float
4
CONSTANT_Long
5
CONSTANT_Double
6
CONSTANT_Class
7
CONSTANT_String
8
CONSTANT_Fieldref
9
CONSTANT_Methodref
10
CONSTANT_InterfaceMethodref 11
CONSTANT_NameAndType
12
30
The CONSTANT_Utf8_info Structure
Used to represent constant string values.
 utf-8 encoding :
1. 1~127 (xxx xxxx)
==> 1 bytes : 0xxx xxxx
2. 0, \u0080~\u07ff ( xxx xxxx xxxx)
==> 2 bytes: 110x xxxx 10xx xxxx
0 => 1100 0000 1000 0000
3. \u0800 ~ \uffff (xxxx xxxx xxxx xxxx)
==> 3 bytes: 1110 xxxx 10xx xxxx 10xx xxxx

31
Format
CONSTANT_Utf8_info {
u1 tag;
// = 1
u2 length;
u1 bytes[length]; }
Notes:
 tag = 1.
 length : number of bytes in the encoded utf8 string.
bytes[]
 bytes[] : contains the bytes of the string.
 No byte may have the value (byte)0 or lie in the range
(byte)0xf0-(byte)0xff.
32
The CONSTANT_String_info
Structure
CONSTANT_String_info {
u1 tag; // = 8
u2 string_index; }

string_index :
» a valid index into a CONSTANT_Utf8_info
33
CONSTANT_Integer_info and
CONSTANT_Float_info


CONSTANT_Integer_info {
u1 tag; // = 3
u4 bytes; }
CONSTANT_Float_info {
u1 tag; // =4
u4 bytes; }
34
CONSTANT_Long_info and
CONSTANT_Double_info


CONSTANT_Long_info {
u1 tag; // = 5
u8 bytes; }
CONSTANT_Double_info {
u1 tag; // =6
u8 bytes; }
35
CONSTANT_Class_info

Used to represent a class or an interface:
CONSTANT_Class_info {
u1 tag;
// = 7
u2 name_index; }
 name_index is an index into an utf-8 entry
encoding a fully qualified class name.
 include also array type: [[I, [Lcom/Cl;, etc.

36
CONSTANT_NameAndType_info

used to represent a field or method, without
indicating which class or interface type it
belongs to:
CONSTANT_NameAndType_info {
u1 tag;
// = 12
u2 name_index; // index into utf8 string of a
simple name
u2 descriptor_index; // index into utf8 string of a
filed/method descriptor
}
37
Constant fieldref, methodRef
and interfaceMethodRef info



CONSTANT_Fieldref_info { u1 tag; // = 9
u2 class_index; // index into a class_info entry
u2 name_and_type_index; }
CONSTANT_Methodref_info { u1 tag; // = 10
u2 class_index;
u2 name_and_type_index; }
CONSTANT_InterfaceMethodref_info { u1 tag; //11
u2 class_index;
u2 name_and_type_index; }
38
Fields




Each field is described by a field_info structure.
No two fields in one class file may have the same
name and descriptor.
Format:
field_info {
» u2 access_flags;
» u2 name_index; // index to utf8 entry for name
» u2 descriptor_index;// index to uft8 entry for field
type
» u2 attributes_count;
» attribute_info attributes[attributes_count]; }
39
Methods
Each method, including each instance
initialization method(<init>) and the class or
interface initialization method(<clinit>) , is
described by a method_info structure.
 Format:
method_info {
u2 access_flags;
u2 name_index; // index into utf8 entry for name
u2 descriptor_index; // index into utf8 entry
u2 attributes_count;
attribute_info attributes[attributes_count]; }

40
Attributes
Attributes are used in the ClassFile, field_info,
method_info, and Code_attribute structures of
the class file format.
 General Format:
Attribute_info {
u2 attribute_name_index;
// into utf8
u4 attribute_length; // excluding init 6 bytes
u1 info[attribute_length];
}

41
Types of attributes

class attributes:
» synthetic, deprecated, sourceFile
» innerClasses

field attributes:
» constantValue, deprecated

method attributes:
» code attributes, deprecated,
» Exception attributes

code attributes:
» lineNumberTable, localVariableTable
42
ConstantValue Attribute

ConstantValue_attribute {
u2 attribute_name_index; // index to utf8 :
“conatantValue”
u4 attribute_length; // = 2
u2 constantvalue_index;
// into primitive(Integer_entry,…)
// or string_entry
}
43
Code Attribute


a variable-length attribute used in the attribute
table of method_info structures.
A Code attribute contains
» the Java virtual machine instructions and
» auxiliary information for a single method,
» instance initialization method , or class or interface
initialization method.


Every JVM implementation must recognize Code
attributes.
native or abstract, => no this attribute in its
method_info structure
» O/W=> has exactly one Code attribute.
44
Code_attribute {
u2 attribute_name_index;
// into utf8: “Code”
u4 attribute_length; // exclude initial 6 bytes
u2 max_stack;// max # of stack slots needed
u2 max_locals;
// maximum # of local variables needed
// Note: double and long count 2.
u4 code_length;
u1 code[code_length];
45
u2 exception_table_length;
{ u2 start_pc;
u2 end_pc; //exclusive
u2 handler_pc;
u2 catch_type;
// index to cp entry f type constant_class_info
// zero means called by all exceptions
// used to implement finally-clause
} exception_table[exception_table_length];
u2 attributes_count;
attribute_info attributes[attributes_count]; }
46
The (uncaught checked)
Exception Attribute




appear in the attribute table of a method_info
structure.
indicates which checked exceptions a method may
throw.
 1 Exception attribute in each method_info
structure.
Format:
Exceptions_attribute {
u2 attribute_name_index; // into utf8:”Exception”
u4 attribute_length; u2 number_of_exceptions;
u2 exception_index_table[number_of_exceptions];
// each entry an index to cp of class_info }
47
The InnerClasses Attribute


a variable-length attribute in the attributes table of
the ClassFile structure.
record any inner class/interface referred (or
declared) in this class/interface
» If the constant pool of a class or interface refers to any
class or interface that is not a member of a package,
its ClassFile structure must have exactly one
InnerClasses attribute in its attributes table.
» If a class has members that are classes or interfaces,
its constant_pool table (and hence its InnerClasses
attribute) must refer to each such member, even if that
member is not otherwise mentioned by the class.
48
Format:
InnerClasses_attribute {
u2 attribute_name_index; // to utf8: “innerClasses
u4 attribute_length; u2 number_of_classes;
{ u2 inner_class_info_index; // into class_info
// for each innerclass C in cp
u2 outer_class_info_index; // into class_info
// containing class of C
u2 inner_name_index; // into utf8 for simple
// name (no / no $); anonymous => zero
u2 inner_class_access_flags; }
classes[number_of_classes]; }
49
The Syntheticand Deprecated
Attribute
A class member that does not appear in the
source code must be marked using a Synthetic
attribute.
Synthetic_attribute {

» u2 attribute_name_index; // utf8(“synthetic”)
» u4 attribute_length; // = zero
}
The Deprecated attribute has the following
format:
Deprecated_attribute {

» u2 attribute_name_index; // utf8(“Deprecated”)
» u4 attribute_length; // = 0 }
50
The SourceFile Attribute
SourceFile_attribute {
u2 attribute_name_index;
// into utf8(“SourceFile”)
u4 attribute_length; // = 2
u2 sourcefile_index; // into utf8 entry
}
51
LineNumberTable Attribute

LineNumberTable_attribute {
» u2 attribute_name_index ; //
utf8(“LineNumberTable”)
» u4 attribute_length;
» u2 line_number_table_length;
» { u2 start_pc;
» u2 line_number;
}
line_number_table[line_number_table_length];
»
}
52
The LocalVariableTable Attribute

LocalVariableTable_attribute {
» u2 attribute_name_index; //
utf8(“LocalVariableTable”)
» u4 attribute_length;
» u2 local_variable_table_length;
» { u2 start_pc;
u2 length; // scope must have
»
// value in [start_pc, start_pc + length]
» u2 name_index; // utf8 for var name
» u2 descriptor_index; // uft8 for type
» u2 index; // local var index
»
}
local_variable_table[local_variable_table_length];
}
53