A class file consists of 8 -bit bytes streams. 16 bit, 32 bit and 64 bit quantities are constructed by reading 2,4,or 8 consecutive bytes. These bytes are stored in big endian format but while reading the byte codes it is neccesary to convert then into little endian as Intel microprocessor understands that format only.
A class file structure is given below.
class file
{
magic - 4 bytes.
minor version - 2 bytes.
major version - 2 bytes.
constant pool count - 2 bytes.
cp_info constant pool [constant pool count]
access flags - 2 bytes.
this class - 2 bytes.
super class - 2 bytes.
interface count - 2 bytes.
interfaces[interface count]
field count - 2 bytes.
field_info fields[field count]
method count - 2 bytes.
method_info methods[method count]
attribute count - 2 bytes.
attrbute_info attributes[attribute count]
}
The best way to understand the class file format is by understanding the actual bits and bytes.
Here we have a most difficult java program. Let's call this program file as "zzz.java". As usual we have to compile this file by a java compiler (e.g. SUN's javac). After successful compilation we get a ".class" file. Now this file will be named as "aaa.class".The "zzz.java" file is not important as it is just a text file but the "aaa.class" file is important as it contains the actual bytes.
class aaa
{
public void abc(int i)
{
int j;
j=2;
}
}
What is this aaa.class file and what does it contain?
Intel/Motorola microprocessors have their own machine language i.e., a particular code may mean something for an Intel microprocessor and something else for a Motorola microprocessor. So, to make Java machine independent, a hypothetical microprocessor was invented which has its own machine language. This hypothetical microprocessor is called the JAVA VIRTUAL MACHINE or the JVM. The JVM knows nothing about the Java programming language. All the instructions of the JVM are one byte large and they are called BYTECODES. The "aaa.class" file contains these bytecodes which are executed accordingly, whether it's an Intel or a Motorola microprocessor, by the JVM. As these instructions are one byte large (8 bits), there can be only 255 instructions. So, all that Java stands on is these 255 instructions and something more which will be discussed later.
So for the time being lets assume that with this vast knowledge of class files we can proceed further to understand the actual format of the bytecodes generated in class file by the java compiler.
First just try with "edit aaa.class" from the DOS prompt and see the contents of the class file. You will notice that the class file contains only ASCII characters from which no one can figure out the format of the instructions (bytecodes).
To make this more simpler we will write a simple ".CPP" program which reads the class file byte by byte and displays the contents in decimal,ASCII and simple character format.Let us call this program as "bytes.cpp"
(NOTE : We recommend you download (in the zip format) the entire project.)
#include<stdio.h>
#include<stdlib.h>
void abc(int ch)
{
FILE *fp=fopen("z.txt","a");
fprintf(fp,"%d..%x..%c\n",ch,ch,ch);
fclose(fp);
}
int main(int argc, char *argv[])
{
FILE *fp=fopen(argv[1],"rb");
if(fp==NULL)
{ printf("Error reading file\n");
exit(0);
}
int ch=fgetc(fp);
while( ch != -1)
{
abc(ch);
ch= fgetc(fp);
}
printf("Over");
fclose(fp);
return 0;
}
Now, run bytes.cpp with an argument aaa.class.Note that this program works under DOS and therefore the extension(.ext) of the class file has to be changed from 5 charcters to 3 characters i.e. from aaa.class to aaa.cla. As specified in the program we will have to look for the output in a file "z.txt". So, edit z.txt and go through the actual bytes.
The outputwill be in the form:
dec hex char
202.. ca.. ascii character.
The first value is an equivalent decimal no.,second is a hexadecimal no. and the third is the character.Now going through the bits and bytes of z.txt one can figure out how the java program is converted in the bits and bytes form in a class file.
Now let us see the actual bytes of aaa.class.
The first four bytes are
HEX DEC
CA 202
FE 254
BA 186
BE 190
These are called as the magic number .It has a specific meaning for the netscape,internet explorer or any other borwser. If these bytes are changed then the browser cannot recognised it as a .class file.
The next 4 bytes tells the version number of java compliler.
HEX DEC
00 00
03 03
00 00
2d 45
00 03 is a minor version number and 00 45 is the major version.Thus the version number of the java compiler is 45.3 as the bytes are stored in little endian format.
Next 2 bytes gives constant pool count.
HEX DEC
00 00
12 18
This means 18 array's of the variables size structures are followed by these bytes. Thus we have 0-17 arrays of structures.But in reality 0 th structure is used by JVM inrternally. Thus we have 1 to 17 variable length structures which gives the informations about the string constants, class names, field names and other constants that are referred to within the class file structures and its subststuctures.
After this the constant pool begins. The constant pool has the following
general format.
cp_info
{
tag;
info[];
}
The constant pool contains arrays of cp_info structures. Each structure has one byte tag and depepnding upon the value of the tag, size of info[] array is determined.
The following table gives the information about all tags those are used in
.class files for different tags.
constant type value of tag
CONSTANT_Class 7
CONSTANT_Fieldref 9
CONSTANT_Methodref 10
CONSTANT_InterfaceMethodref 11
CONSTANT_String 8
CONSTANT_Integer 3
CONSTANT_Float 4
CONSTANT_Long 5
CONSTANT_Double 6
CONSTANT_NameAndType 12
CONSTANT_Utf8 1
Coming back to where we left,the next byte after the constant pool count is 7 which indicates that the value of tag of the first structure is 7. This means that the next 2 bytes give an index into the constant pool which is the name of the class file. Here the next to bytes are 00 15, this means that in the 15th structure has the class file name i.e. aaa.class.Similarly the next byte is a tag which will indicate the index to a particular information of the class file like name of the java file, super class name etc.
Thus when we get tag 7 we will consider the structure as
CONSTANT_CLASS_info
{
tag;
name_index;
}
In this name_index is pointed to some array number of constant pool where the Constant Class is stored.
The third structure starts with tag 10. It indicates the type
CONSTANT_Methodref. Take the next 4 bytes for the explaination, of which
first 2 bytes indicates the class_index. It gives the number where we get
the CONSTANT_Classs_info structure. In our case the class_index bytes are
00 02, converting them into little endian format we get the number 2 where
CONSTANT_Class_info structure is stored. Next 2 bytes after that are 00 04
which indicates name_and_type_index. That array number 4 points to a
structure CONSTANT_NameAndType_info which has the name and description of
the methods.
When we get the tag 10 we will consider the following structure.
CONSTANT_Methodref_info
{
tag;
class_index;
name_index_and_type_index
}
The fourth structure starts with the tag 12. This structure gives the
information about method without indicating which class it belongs to .
It indicates the type CONSTANT_NameAndTyperef. Take the next 4 bytes in
which first 2 bytes points to an array where CONSTANT-Utf8_info structure
is stored giving java method name. In our case these bytes are 00 07 and
in pool 7 we get this method
The next 2 bytes following to these gives the descriptor index.That points
to an array in constant pool where the structure CONSTANT_Utf8_info resides.
In our case these two bytes are 00 05, at that pool 5 we get the descriptor
(signature) "( ) v".This is for void ( ).
When we get the tag 12 we will consider the following structure.
After that from 5th - 17th array of structures the tag we get is 1. It
indicates the type CONSTANT_Utf8ref. The next 2 bytes gives length of the
string follows. Take 5th structure in that two bytes after tag 1 are 00 03
which gives length of string equal to 3 .After those bytes the string
follows and it is " ( ) v ". When we get the tag 1 we will consider the
following structure.
Besides this depending on code in java file we may get other tags.Depending
on those tags we have to follow particular structure. For more information
refer "THE JAVA VIRTUAL MACHINE SPECIFICATION (By Tim Lindholm & Frank
Yellin). At present we have not considered the other tags and their structures.
After the constant pool two bytes indicate the access flags. It tells us
about class & interface declaration. Last bit is used for declaration of
private or public.
Here the last byte i.e.32, can be written in bitwise format as 0100 0000.
This indicates it's not a public class. If we write " public class aaa ",
then the last bit will be set indicating it's a public classs. The access
flag gives the information about super class,abstract class etc.depending
on the settings of other bits.
Access flags are followed by the This class ( 2 bytes ). The value indicates
an index into the constant pool where CONSTANT_CLASS_info structure
is stored. In our case thse bytes are
In the array no.1 in constant pool the bytes are
And the corresponding bytes in array no. 15 in constant pool are
Value of This class is 1 which indicates that, go to constant pool no. 1
( or array no.1 in the constant pool). In that pool, tag 7 indicates that
read the following two bytes & then goto that particular constant pool.
Here it gives the no. 15. In that 15th constant pool we get a this class
" aaa ". This shows that the This class name is aaa.
The next two bytes indicate the super class. The bytes are
In the array no.2 in constant pool the bytes are
And the corresponding bytes in array no. 15 in constant pool are
Thus class is derived from java/lang/object. Every class in java is finally
derived from java/lang/object. If we write code as " class aaa extends applet",
then the super class comes to be java.applet.Applet.
Next two bytes gives how many iterfaces there are.
It tells that virtual functions = 0.
Next two bytes gives how many fields there are, called as field count.
In present ' .java ' file we haven't include field variables.
Following two bytes gives method count.
There are two methods in our java file. But if we see carefully our java
program, only one method is present named abc(). This is because java gives
a construtor along with a class.
The following bytes to the method count is stored into method_info stucture.
In our case there are 2 methods. Therefore there are two method_info \
stuctures.
The first two bytes are access flag which is 01 indicating that the method
is a public method.
The next two bytes are name index .
And the corresponding bytes in array no. 16 in constant pool are
So the name of the first method is ' abc '.
After this we have name index descriptor index ( 2 bytes ).
And the corresponding bytes in array no. 06 in constant pool are
This is the signature of the method ' abc ( ) '.It looks like ( I ) v,
where 'I' within ( ) indicate that an integer argument is passed to the
method close bracket & v is for the return type void.
Now we have the atrribute count ( 2 bytes ) .
Attributes signify the properties of the method.
It is followed by attribute name index ( 2 bytes ).
And the corresponding bytes in array no. 08 in constant pool are
This means attribute of the mehtod is Code. All methods have only one
attribute and it is code.
The code attribute structure is as follows:
We have already discussed attribute name index(2 bytes).
Attrribute length ( 4 bytes )gives length of code .
Thus the code length is 31.
Next, max stack( 2 bytes ) are
This max stack shows max. no. of words on operand stack at any point during
execution of method.
After that 2 bytes gives the max local variable used by this method.
Following 2 bytes givs the code length.
After that code[] array follows which consists of no. of bytes as the code
length. From the byte codes, exception table length comes to zero.
Next 2 bytes are for attributes .
After this 2 bytes of name index follows.
At constant pool no.11 the bytes are
Thus the attribute name in code attribute is LineNumberTable .
The LineNumberTable has following structure.
Atribute name index (2 bytes ) points to array in constant pool.The value at
array further points to array where the CONSTANT_Utf8_info structure is
stored. This strucure represents a string "LineNumberTable".
This follows by length of attribute which is equal to 10 in our case.
Line number table length gives the no. of entries in the line number table
array. From the actual bytes it comes out to be 2.
In the first array start pc(2 bytes) = 0 &line number table (2 bytes) = 6.
In the second array start pc(2 bytes) = 2 &line number table (2 bytes) = 3.
Then 2nd method starts, where we get
After this attribute Line number table follows with attributes.
There are different attributes shown by the code depending on the java code.
Those are Exception attribute, Local variable attribute, Line Number
attribute, Constant value attribute, Code attribute.
After 2 methods the source file attribute count(2 bytes) starts. In our case
it is 1.
Then source_file attribute structure follows which has following format.
In our case attribute name index bytes are
At that constant pool no. 13 we have
Thus we can find out what is the attribute.
The next 4 bytes gives the attribute length.In our case it is 2.
This is followed by the source file index(2 bytes).
At that constant pool 14 the bytes are
Thus source file index points to the source File and gives the ". java "
file name i.e. a.java.
This is how bytes are stored in particular format by the java compiler.
We are grateful to Mr. Vijay Mukhi and Ms. Sonal Kotecha for their active support,
co-operation and guidance. We thank them for supervising the project and
helping us solve the complicated issues of the project.
Vijay Mukhi's Computer
Institute
CONSTANT_NameAndType_info
{
tag;
name_index;
descriptor index;
}
CONSTANT_Utf8_info
{
tag ;
length;
byte[length];
}
HEX DEC
00 00
20 32
HEX DEC
00 00
01 01
HEX DEC
07 07
00 00
00 15.
HEX DEC CHAR
01 01
00 00
03 03
61 97 a
61 97 a
61 97 a
HEX DEC
00 00
02 02
HEX DEC
07 07
00 00
11 17
HEX DEC CHAR
01 01
00 00
10 16
j
a
v
a
/
l
a
n
g
/
o
b
j
e
c
t
HEX DEC
00 00
00 00.
HEX DEC
00 00
00 00.
HEX DEC
00 00
02 02
method_info
{
access flags;
name index;
descriptor index;
attribute count;
attribute info attribute[ attribute count ];
}
HEX DEC
00 00
10 16
HEX DEC CHAR
01 01
00 00
03 03
61 97 a
62 98 b
63 99 c .
HEX DEC
00 00
06 06 .
HEX DEC CHAR
01 01
00 00
04 04
28 40 (
49 73 I
29 41 )
56 86 v .
HEX DEC
00 00
01 01 .
HEX DEC
00 00
08 08 .
HEX DEC CHAR
01 01
00 00
04 04
43 67 C
6f 111 o
64 100 d
65 101 e
code_attribute {
attribute name index ;
attribute length;
max stack;
max locals;
code length;
code [ code length ];
exception table length;
{
start pc;
end pc;
handler pc;
catch type;
}exception table [exception table length ]
attribute count;
attribute info attribute[ attribute count ];
}
HEX DEC
00 00
00 00
00 00
ff ff .
HEX DEC
00 00
01 01 .
HEX DEC
00 00
03 03 .
HEX DEC
00 00
00 00
00 00
03 03
HEX DEC
00 00
01 01
HEX DEC
00 00
0b 11
HEX DEC CHAR
01 01
00 00
0f 15
L
i
n
e
N
u
m
b
e
r
T
a
b
l
e
{
attribute name index ;
attribute length;
line number table length ;
{
start pc;
line number;
} line number table[line number table length ];
}
method name -
{
attribute name index ;
attribute length;
source file index;
}
HEX DEC
00 00
0d 13.
HEX DEC CHAR
01 01
00 00
0a 10
S
o
u
r
c
e
F
i
l
e
HEX DEC
00 00
00 00
00 00
02 02.
HEX DEC
00 00
0e 14
HEX DEC CHAR
01 01
00 00
06 06
a
.
j
a
v
a
This project & tutorial is as joint effort of
Back to the main page
VMCI, B-13, Everest Building, Tardeo, Mumbai 400 034, India
Tel : 91-22-496 4335 /6/7/8/9
Fax : 91-22-307 28 59
e-mail : vmukhi@giasbm01.vsnl.net.in
http://www.vijaymukhi.com