The .Class File


The deeper you delve into the innards of a particular technology, the better you'll understand it. Computer technologies are like the open expanse of the Sea; it all looks the same on the surface, but when you dive below into the deep, you discover wonders you'd never have dreamt existed.

The Java programming language has no pointers, so it's very difficult for programmers to find out what's happening under the hood. The .class file comes tumbling across the wires and executes on you machine, but it's difficult to figure out how. An in-depth discussion of the .class file will help clear up a lot of your doubts. For example, in C/C++ when we use printf(), we know that the linker adds the code for that function into our .exe file, but that's not what happens when we say System.out.println in Java. You can only discover the details by reading the .class file.

If you understand the format of the .class file, you'll find it easier to understand how Microsoft turned Java into a COM Object. You'll also be able to better appreciate our explanation of JavaBeans and other related technology from Sun.

If you really want to call yourself a Java programmer, then you have to read this!

zzz.java

class aaa
{
public void abc(int i)
{
}
}

Here is a simple .java file which really does nothing of any significance. When compiled using javac, it turns into a .class file called aaa.class. This process will be quite familiar to anyone who's ever programmed using Java. Once the .class file is ready, all you have to do is place the file on a Web server and place the applet tag in the appropriate HTML file.

Whenever someone downloads your HTML page, his browser, if it is java enabled, will read through your page and on finding the tag <applet> it'll download the .class file. Once the file has been fully downloaded, your browser will run the applet. If you've place this very same file on the Web server, then your going to have one very disappointed visitor!

What we wanted to clarify here was that it is a file called the .class file which actually comes across the Internet and is executed on your machine. The program that actually executes the .class file is called the Java Virtual Machine. This is the program that interprets the bytes in the .class file and carries out the instructions detailed in them.

Now Intel has it's own machine language instruction set which is used on it's line of x86 processors. No one really ever programs in machine language any more and very few people dabble in assembler which is just a teeny weeny improvement on machine language. Machine language is made up of the raw bytes which actually constitute every program. Even if you write a program in C/C++, eventually (through the labors of the compiler), the C/C++ program is converted into a machine language program. If, for example, the processor sees a 3E 04 then it knows that it's supposed to take the number 04 and put it into the AX register. This however, only applies to an Intel machine. A Motorola will react differently because it follows a different instruction set.

We can if we want build this functionality into a piece of software instead of a chip. The only difference will be that the instruction set embedded in hardware will outperform the software implementation. So I can take the Intel instruction set and build it into software instead of hardware and I can have Intel based programs work an any machine I want. Such a program is called an emulator or a Virtual Machine.

What the developers of Java did was invent a hypothetical microprocessor. Each of the instructions for this processor were intended to be one byte large. The reason the original Java team decided to do this was because if they'd stuck to a certain established instruction set, then switching between processors would be a problem; better to start with a clean slate. Since all the instructions are a byte large, they're called Byte Codes. So a class file is made up of byte codes and these byte codes are interpreted and acted upon by the JVM or the Java Virtual Machine or Engine.

The JVM is built into browsers like Netscape and Internet Explorer so they know how to run and execute .class files. The Java instruction set has also been built into microprocessors, so you can buy Java chips in the market. These will probably be used in Sun's Network Computer.

The actual bytes

aaa.class

DecHexCharComments
202ca  
254fe_ 
186ba¦ 
190be+Magic word
00  
33  
00  
452d  Version No.
00  
1812  
77 1
00  
14e  
77 2
00  
1610  
10a 3
00  
22  
00  
44  
12c 4
00  
77  
00  
55  
11 5
00  
33  
4028( 
4129) 
8656V 
11 6
00  
44  
4028( 
7349I 
4129) 
8656V 
11 7
00  
66  
603c< 
10569i 
1106en 
10569i 
11674t 
623e> 
11 8
00  
44  
6743C 
1116fo 
10064d 
10165e 
11 9
00  
13d  
6743C 
1116fo 
1106en 
11573s 
11674t 
9761a 
1106en 
11674t 
8656V 
9761a 
1086cl 
11775u 
10165e 
11 10
00  
10a  
6945E 
12078x 
9963c 
10165e 
11270p 
11674t 
10569i 
1116fo 
1106en 
11573s 
11 11
00  
15f  
764cL 
10569i 
1106en 
10165e 
784eN 
11775u 
1096dm 
9862b 
10165e 
11472r 
8454T 
9761a 
9862b 
1086cl 
10165e 
11 12
00  
14e  
764cL 
1116fo 
9963c 
9761a 
1086cl 
8656V 
9761a 
11472r 
10569i 
9761a 
9862b 
1086cl 
10165e 
11573s 
11 13
00  
10a  
8353S 
1116fo 
11775u 
11472r 
9963c 
10165e 
7046F 
10569i 
1086cl 
10165e 
11 14
00  
33  
9761a 
9761a 
9761a 
11 15
00  
33  
9761a 
9862b 
9963c 
11 16
00  
1610  
1066aj 
9761a 
11876v 
9761a 
472f/ 
1086cl 
9761a 
1106en 
10367g 
472f/ 
794fO 
9862b 
1066aj 
10165e 
9963c 
11674t 
11 17
00  
88  
1227az 
1227az 
1227az 
462e. 
1066aj 
9761a 
11876v 
9761a 
00  
3220 flags private
00  
11 this
00  
22 super
00  
00 interfaces
00  
00 fields
00  
22 methods
00  
11 flags
00  
15f name of the fn - abc
00  
66 signature - (I)V
00  
11  no. of attributes - 1
00  
88 name - Code
00  
00  
00  
2519 len of the attribute
00  
00  
00  
22 local variables
00  
00  
00  
11 len of the code
177b1±Actual Code
00  
00  
00  
11 no. of attributes
00  
11b Line Number Table
00  
00  
00  
66 Len
00  
11 No.of members
00  
00  
00  
33 Line number of the function
00  
00  
00  
77 name
00  
55 signature
00  
11 no. of attributes
00  
88 Code
00  
00  
00  
291d length
00  
11 Stack space
00  
11 Local Variable
00  
00  
00  
55 len
422a*Actual Code
183b7· 
00  
33  
177b1± 
00  
00  
00  
11 No. of attributes
00  
11b Line Number Table
00  
00  
00  
66 Len
00  
11 No. of members
00  
00  
00  
11 Line Number
00  
11 No. of Attributes
00  
13d  
00  
00  
00  
22 Length
00  
1711 zzz.java

We'll examine our earlier program, aaa.class byte by byte and try to understand exactly what the Java Virtual Machine is up to and what the .class file format is. Once you understand this, stuff like JavaBeans becomes much easier to handle. This may seem a daunting task, but the file's only 278 bytes large, so it won't drag on forever!

The .class file format is an industry standard and what we describe here holds true across all the Java compilers. Compile zzz.java using different compilers and the bytes will still be the same.

First write this program to display the bytes in hex, decimal and ASCII or use Pctools, Nu or some other utility.

#include <stdio.h>
void main()
{
FILE *fp; int i;
fp=fopen("aaa~1.cla","rb");
while ((i=fgetc(fp)) != EOF)
printf("%d \t %x \t %c \n",i,i,i);
fclose(fp);
}

Stepping through the bytes...

Throughout this tutorial , u1 stands for a single byte, u2 stands for two bytes or an int and u4 stands for four bytes or a long.

Lets jump right in and start explaining the bytes which make up the .class file. The very first four bytes of the .class file are:-

hexdec
ca202
fe254
ba 186
be 190

CAFE BABE... Get it? Lucky for James Gosling he doesn't live in India because if he said this over here, the local 'Mahila Mandal' -- A militant form of Women's Liberation whose members roam the streets armed with large rolled up newspapers, looking for unescorted young males -- would have his head!!

This bit of trivia can come in handy some times. Some Java sites (and employers!) demand that you know the meaning of these bytes, or they assume that you really don't know much about Java.

0xCA 0xFE 0xBA 0xBE are the first four bytes of any Java .class file and they're collectively called the magic number. They were added to the file format so that a .class could be instantly recognized by it's first few bytes. If these bytes are tampered with, then no JVM will recognize the file as a Java applet and the program will be useless.

The next 4 bytes are as follows

hexdec
000
33
00
2d 45

These are the version number of the file. 0x0003 (3) is the major version number and 0x002D (45) is the minor version number. So the version of Java this .class file is optomised for is version 3.45 of the Java virtual machine.

Immediately following this is an int with the value :-

hexdec
00
1218

This means that the array to follow has 0x12 (18) members. The length of individual members is not stored. Though we're told that there are 18 members, there are actually only 17 as the first 0 isn't counted.

The name of this huge array is the Constant Pool. The Constant Pool stores constant values like function names and strings which are called repeatedly. Rather than put the whole string in the code every time you wish to refer to it, Java uses a pointer to the value in the Constant Pool. We'll see its usefulness as we proceed.

The values stored in the Constant Pool are:-
DecHexCharNo
77 1
00  
14e  
77 2
00  
1610  
10a 3
00  
22  
00  
44  
12c 4
00  
77  
00  
55  
11 5
00  
33  
4028( 
4129) 
8656V 
11 6
00  
44  
4028( 
7349I 
4129) 
8656V 
11 7
00  
66  
603c< 
10569i 
1106en 
10569i 
11674t 
623e> 
11 8
00  
44  
6743C 
1116fo 
10064d 
10165e 
11 9
00  
13d  
6743C 
1116fo 
1106en 
11573s 
11674t 
9761a 
1106en 
11674t 
8656V 
9761a 
1086cl 
11775u 
10165e 
11 10
00  
10a  
6945E 
12078x 
9963c 
10165e 
11270p 
11674t 
10569i 
1116fo 
1106en 
11573s 
11 11
00  
15f  
764cL 
10569i 
1106en 
10165e 
784eN 
11775u 
1096dm 
9862b 
10165e 
11472r 
8454T 
9761a 
9862b 
1086cl 
10165e 
11 12
00  
14e  
764cL 
1116fo 
9963c 
9761a 
1086cl 
8656V 
9761a 
11472r 
10569i 
9761a 
9862b 
1086cl 
10165e 
11573s 
11 13
00  
10a  
8353S 
1116fo 
11775u 
11472r 
9963c 
10165e 
7046F 
10569i 
1086cl 
10165e 
11 14
00  
33  
9761a 
9761a 
9761a 
11 15
00  
33  
9761a 
9862b 
9963c 
11 16
00  
1610  
1066aj 
9761a 
11876v 
9761a 
472f/ 
1086cl 
9761a 
1106en 
10367g 
472f/ 
794fO 
9862b 
1066aj 
10165e 
9963c 
11674t 
11 17
00  
88  
1227az 
1227az 
1227az 
462e. 
1066aj 
9761a 
11876v 
9761a 

The first byte of each structure is the tag, a number which tells us something about the data to follow. So a tag of 1 tells us that the data to follow is a string. After the tag comes the length of the string which is contained in two bytes. After that comes the entire string. In the same way, all the values displayed above start with a tag, then comes the length and finally the actual data. For example, a 7 means that an Int will follow, 10 means two Ints will come next and so on.

Constant Pool Tags

Constant TypeValue
CONSTANT_Class7
CONSTANT_Methodref10
CONSTANT_Utf8(Octet string)1
CONSTANT_NameAndType12

CONSTANT_Class
{
	u1  tag (7)
	u2  name_index  ( 0 14)
}

CONSTANT_Methodref
{
	u1  tag (10)
	u2  class_index  ( 0 2)
	u2  name_and_type_index  (0 4)
}

CONSTANT_Utf8
{
	u1  tag (1)
	u2  len  ( 0 3)
	u1 [] bytes ( ( ) V )
}

CONSTANT_NameAndType
{
	u1  tag (12)
	u2  name_index  ( 0 7)
	u2  descriptor_index  ( 0 5)
}

The first section of the .class file is now over.

If you'll remember, our file looks a bit like this:-

class aaa
{
public void abc(int i)
{
}
}

We'll be getting into the meat of the matter now.

Notice we haven't said public class aaa

Now come two bytes whose first byte is a 0x00 and second byte is 0x20. This is the flags field and it yields important information about the file.

hexdec
0000
2032

If we were to write out the bits which make up the int, we'd get

0000000000100000

In other words, only one bit in the two bytes is turned on. The right most bit, if it is 1, denotes that our class is a public class. Since this bit is 0, we can be sure that the class is private. Most of the other flags are zero.

Next we have two more bytes :-

hexdec
0000
0101

This int represents the 'this' class. We can't use the 'this' pointer in Java (since it has no pointers), the 'this' pointer in Java is implemented as a class. Here the number is 1. This means we're supposed to go to the first array in the Constant Pool and refer to its value.

hexdec
77
00
e14
hexdecCharNo.
11 14
00  
33  
6197a 
6197a 
6197a 

The array starts with a 7. This means that the number following it is an offset pointer to a value on the Constant Pool. We'resupposed to jump to the the 14th location in the Constant Pool. There we find the string 'aaa' which is the name of our class. So the 'this' pointer is a pointer to where the name of our class is stored.
hexdec
00
22

After 'this', we have another int called 'super'. A Super class is the same as a Base class in C++. So when we say 'class aaa extends Applet', then Applet is our super class. The second entry in the Constant Pool is the address of where the super class is stored. It holds the number 16 which tells up to jump to the 16th array in the Pool. That's a string and it's pretty large too. It is the class every other class is derived from, java/lang/Object. Had we said 'class zzz extends Applet', then the 'string java.applet.Applet' would have come here.

hexdec
77
00
1016
hexdecCharNo.
11 16
00  
1016  
6a106j 
6197a 
76118v 
6197a 
2f47/ 
6c108l 
6197a 
6e110n 
67103g 
2f47/ 
4f79O 
6298b 
6a106j 
65101e 
6399c 
74116t 

The Reflection API does exactly what we're doing here, it disassembles a .class file. If you were to implement this code and put it in C/C++ then you'll have written one all by yourself. In fact that's exactly what some student from our institiute did. Jump over to their code later.

The next two bytes tell you the number of interfaces.

hexdec
00
00

The values of both these bytes is zero since we don't have any interfaces.

Now come two bytes for the fields. A field is just another name for a variable. The value of both the bytes is zero since we don't have any variables.

hexdec
00
00

The next two bytes hold the number of methods or functions present in your program. Even though you've only specified one, the value of these bytes is two because the second one is a free constructor provided by Java.

hexdec
0000
0202

Next comes the attributes of the method i.e. whether it is public or private. We've said it was public, so the flags have changed.

hexdec
0000
0101

Now come two bytes for the name index

hexdec
0000
0f15

Since the value contained within the int is 0x0F or 15, we're to go the 15th array in the Constant Pool. There we have a string which holds the name of our function abc().
hexdecCharNo.
11 15
00  
33  
6197a 
6298b 
6399c 

Now come two bytes for the signature. Remember, functions under both C++ and Java are stored as the function name + the parameters. So we're pointed to the 6th array in the Pool. There we find the string (I)V. This means our function takes an int and returns a void.

hexdec
0000
0606
hexdecCharNo.
11 6
00  
44  
2840( 
4973I 
2941) 
5686V 

Now we have two bytes for the attributes of the function. These two bytes tell us the number of attributes i.e. 1 is this case.

hexdec
0000
0101

Now come two bytes which point to the attribute name in the Constant Pool.

hexdec
0000
0808

Since the number here is 0x08, we go to the 8th array in the Constant Pool.
hexdecCharNo.
11 8
00  
44  
4367C 
6f111o 
64100d 
65101e 

So our function has one attribute called Code. This is natural since all functions have code. In fact, you can take any .class file you want and all the functions listed there will have the code attribute.

The next four bytes detail the length of the attribute called Code, which is 0x19 (25) bytes.

hexdec
0000
0000
0000
1925

Now come the actual attributes themselves.

hexdec Meaning
00 
00stack

hexdec Meaning
00 
00local variables

hexdec Meaning
00 
00 
00 
11length of the code

hexdec Meaning
b1177code

hexdec Meaning
00 
00Exceptions

hexdec Meaning
00 
11Attributes

hexdec Meaning
00 
b11Line Number Table

hexdec Meaning
00 
00 
00 
66length of the attribute

hexdec Meaning
00 
11No. of members

hexdec Meaning
00 
00 
00 
33line number of the function

The first two bytes hold the amount of space on the stack. Since we're not using the stack, the value is zero. The next int holds the number of local variables. We've got none, so the value is zero.

Now comes the length of my code and this takes up a whole long (4 bytes). The value held in those four bytes is 1 because our code is only one instruction long. Now comes one byte for the Byte Code which is 177. This means 'return no value' and this appears because we haven't specified the return value of the function.

Next we have two bytes for the number of exceptions. Exceptions are a way in with we can inform constructors and functions about errors that have occurred. They also help in keeping the code modular and neat. We don't have any exceptions so the value here is zero.

The next two bytes are 0x01 which means that these are the attributes. The next two bytes point to the 11th array in the Constant Pool which is the Line Number Table shown below.

hexdecCharNo.
11 11
00  
f15  
4c76L 
69105i 
6e110n 
65101e 
4e78 N 
75117u 
6d109m 
6298b 
65101e 
72114r 
5487T 
6197a 
6298b 
6c108l 
65101e 

The next four bytes hold the number 6 which is the length of the attribute. The next 0x01 tells us the number of attributes i.e. 1. The attribute that comes next is the line number of the function abc(). This value is used when the compiler spits out an error and tells you which line it appeared on. The line number bytes (there are four of them) are not one big long number. They're actually two int. The first int acts like the segment, the other like the offset. So when the value in

hexdec Meaning
00 
00 
00 
33Line Number

it means line three, but if it's
hexdec Meaning
00 
100100 
00 
33Line Number

it means goto line 103.

This is the structure every function follows.

Memberdata typeBytes
Flagsu200 00
Nameu200 15
Signatureu200 06
No. of attributesu200 01

First comes the flags field, then the pointer to the name in the Constant Pool, then the signature and finally the number of attributes for the function.

Then comes the attribute structure, used for each and every attribute. We've got only one attribute so the structure appears only once.

Memberdata typeBytes
Name of attributeu200 08
Len of attribute u400 00 00 25
Stacku200 00
Local Variablesu200 00
Len of Codeu400 00 00 01
Actual Codeu1177
Exceptionsu200 00
No. of Attributesu200 01
Details of Attribute

Details of Attribute
Memberdata typeBytes
Name of Attributeu200 11
Length of Attributeu400 00 00 06
Details of Attribute 00 01
  00 00
  00 03

We've actually got two functions in our code, the function abc and the constructor. The constructor follows exactly the same format as any other function. The format for it's bytes is given below.

Now for some reason or another, James Gosling decided to call Java constructors init in angle brackets rather than a plain old 'constructor'.

Every function starts with a flags field which in this case is :-

hexdec
0000
0000

Now comes the pointer to the name of the function stored in the Constant Pool.

hexdec
0000
0707

If we jump over to the Pool, we can see the name.

hexdecCharNo.
11 7
00  
66  
3c60< 
69105i 
6e110n 
69105i 
74116t 
3e62 > 

As I've just said, a constructor in Java is known as <init> and that's exactly what we see in the Constant Pool.

Now we have the signature of the constructor.

hexdec
0000
0505

This means we're supposed to skip to the Pool and retrieve the value of the 5th array, which is shown below.

hexdecCharNo.
11 5
00  
33  
2840( 
2941) 
5686V 

Now comes the number of attributes. Our constructor's got only one, i.e. Code.

hexdec
0000
0101

hexdec
0000
0808

hexdecCharNo.
11 8
00  
44  
4367C 
6f111o 
64100d 
65101e 

After the name of the attribute is the total length of the information, which in our case is 29.

hexdec
0000
1d29

The Stack space is set to 1.

hexdec
0000
0101

These bytes are set to 01 because the function <init> has created one local variable.

hexdec
0000
0101

These bytes hold the length of the code.

hexdec
0000
0505

Now comes the actual code:-

hexdecMeaning
2a42load object reference from local variable
b7183Invoke non-virtual method
00offset on Constant Pool
33offset on Constant Pool
b1177Return void

These five bytes are Java Byte Codes. If we refer to the manual, we'll see that 42 means 'Load Object Reference from Local Variable'. 183 indicated that a function, whose address in the Pool is given in the next tw bytes, whould be called. The next two bytes point to the third array in the Constant Pool. The last byte, 177 means return a void.

If you really want to discover what these values stand for, check out the Project our students worked on.

hexDecNo.
a103
00
22
00
44

The third member of this format starts with a 10 which is the Constant Method (function) reference. It means that the first int will point to where the object comes from and the next two ints will tell us the name of the function and it's signature.

The first int that follows refers to the second array in the Pool which is shown below.

hexDecNo.
772
00
1016

The 16 there tells us to go to the 16th array i.e. java/lang/Object

hexDecCharNo.
11 16
00  
1016  
6a106j 
6197a 
76118v 
6197a 
2f47/ 
6c108l 
6197a 
6e110n 
67103g 
2f47/ 
4f79O 
6298b 
6a106j 
65101e 
6399c 
74116t 

So the first part of 10 (the Constant Method (function) reference) talks about the type of Object. The next int says 00 04 which means go to the 4th array in the Pool.

hexDecNo.
c124
00 
77 
00 
55 

There the first number is 12 which stands for the Constant Name and Type. After the 12 we have two more ints. The first int is 7 which means go to the 7th array in the Pool.

hexDecCharNo.
11 7
00  
66  
3c60< 
69105i 
6e110n 
69105i 
74116t 
3e62> 
So there we see the name of our function, <init>. The Init function is called from java/lang/Object. This happens because we don't have a constructor of our own, so the default one is called.

Lets now check out the other int, 5.

hexDecCharNo.
11 5
00  
33  
2840( 
2941) 
5686V 

5 turn out to be the signature.

So by following this digital trail we've figured out quite a bit about our constructor. Everything from it's name, to the parameters it's called with.

After we finish with the Byte Codes, we handle the exceptions. We've got none, so the next bytes are set to zero.

hexdec
0000
0000

Now come the number of attributes the function has - 1 attribute.

hexdec
0000
0101

After the number of attributes, we have the name of the attribute. So we scoot over to the 11th array in the Constant Pool to check that out.

hexdec
0000
0b11

Here it again, the Line Number Table.

hexdecCharNo.
11 11
00  
f15  
4c76L 
69105i 
6e110n 
65101e 
4e78 N 
75117u 
6d109m 
6298b 
65101e 
72114r 
5487T 
6197a 
6298b 
6c108l 
65101e 

After that we have the length of the attribute. In this case it's:-

hexdec
0000
0000
0000
0606

Right after that we have the number of structures that look like line number table.

hexdec
0000
0101

Then come the members within the structure

hexdec
0000
0000
0000
0101

So we're told that the constructor is on line number 1 in the source code.

Now comes the very last section of all.

hexdec
0000
0101

That's right, it's attributes again. The attribute starts with the name of the attribute which is a pointer to the name in the Constant Pool.

hexdec
0000
0d13

hexdecCharNo.
11 13
00  
a10  
5383S 
6f111o 
75117u 
72114r 
6399c 
65101e 
4670F 
69105i 
6c108l 
65101e 
So the last attribute is the Source file.

Now comes the length of the attribute, which is 2.

hexdec
0000
0000
0000
0202
And finally is there's the value of the attribute which is stored in the Pool at location 17.

hexdec
0000
1117

hexdecCharNo.
11 17
00  
88  
7a122z 
7a122z 
7a122z 
2e46. 
6a106j 
6197a 
76118v 
6197a 

At position 17 we find the words zzz.java, the name of the source file.

That's it! You've written your own version of the reflection API. You can now decompile a .class file with ease. As a final revision, run your eyes over the following table.

The structure of the .class file


The above tutorial is a joint effort of

Mr. Vijay Mukhi
Ms. Sonal Kotecha
Mr. Arsalan Zaidi


Back to the Main Page


Vijay Mukhi's Computer Institute
VMCI, B-13, Everest Building, Tardeo, Mumbai 400 034, India
Tel : 91-22-496 4335 /6/7/8/9     Fax : 91-22-307 28 59
e-mail : vmukhi@giasbm01.vsnl.net.in
http://www.vijaymukhi.com