The deeper you delve into the innards of a particular technology, the better you'll understand it. Computer technologies are like the open expanse of the Sea; it all looks the same on the surface, but when you dive below into the deep, you discover wonders you'd never have dreamt existed.
The Java programming language has no pointers, so it's very difficult for programmers to find out what's happening under the hood. The .class file comes tumbling across the wires and executes on you machine, but it's difficult to figure out how. An in-depth discussion of the .class file will help clear up a lot of your doubts. For example, in C/C++ when we use printf(), we know that the linker adds the code for that function into our .exe file, but that's not what happens when we say System.out.println in Java. You can only discover the details by reading the .class file.
If you understand the format of the .class file, you'll find it easier to understand how Microsoft turned Java into a COM Object. You'll also be able to better appreciate our explanation of JavaBeans and other related technology from Sun.
If you really want to call yourself a Java programmer, then you have to read this!
zzz.java
class aaa { public void abc(int i) { } }
Here is a simple .java file which really does nothing of any significance. When compiled using javac, it turns into a .class file called aaa.class. This process will be quite familiar to anyone who's ever programmed using Java. Once the .class file is ready, all you have to do is place the file on a Web server and place the applet tag in the appropriate HTML file.
Whenever someone downloads your HTML page, his browser, if it is java enabled, will read through your page and on finding the tag <applet> it'll download the .class file. Once the file has been fully downloaded, your browser will run the applet. If you've place this very same file on the Web server, then your going to have one very disappointed visitor!
What we wanted to clarify here was that it is a file called the .class file which actually comes across the Internet and is executed on your machine. The program that actually executes the .class file is called the Java Virtual Machine. This is the program that interprets the bytes in the .class file and carries out the instructions detailed in them.
Now Intel has it's own machine language instruction set which is used on it's line of x86 processors. No one really ever programs in machine language any more and very few people dabble in assembler which is just a teeny weeny improvement on machine language. Machine language is made up of the raw bytes which actually constitute every program. Even if you write a program in C/C++, eventually (through the labors of the compiler), the C/C++ program is converted into a machine language program. If, for example, the processor sees a 3E 04 then it knows that it's supposed to take the number 04 and put it into the AX register. This however, only applies to an Intel machine. A Motorola will react differently because it follows a different instruction set.
We can if we want build this functionality into a piece of software instead of a chip. The only difference will be that the instruction set embedded in hardware will outperform the software implementation. So I can take the Intel instruction set and build it into software instead of hardware and I can have Intel based programs work an any machine I want. Such a program is called an emulator or a Virtual Machine.
What the developers of Java did was invent a hypothetical microprocessor. Each of the instructions for this processor were intended to be one byte large. The reason the original Java team decided to do this was because if they'd stuck to a certain established instruction set, then switching between processors would be a problem; better to start with a clean slate. Since all the instructions are a byte large, they're called Byte Codes. So a class file is made up of byte codes and these byte codes are interpreted and acted upon by the JVM or the Java Virtual Machine or Engine.
The JVM is built into browsers like Netscape and Internet Explorer so they know how to run and execute .class files. The Java instruction set has also been built into microprocessors, so you can buy Java chips in the market. These will probably be used in Sun's Network Computer.
The actual bytes
aaa.class
Dec | Hex | Char | Comments |
---|---|---|---|
202 | ca | ||
254 | fe | _ | |
186 | ba | ¦ | |
190 | be | + | Magic word |
0 | 0 | ||
3 | 3 | ||
0 | 0 | ||
45 | 2d | Version No. | |
0 | 0 | ||
18 | 12 | ||
7 | 7 | 1 | |
0 | 0 | ||
14 | e | ||
7 | 7 | 2 | |
0 | 0 | ||
16 | 10 | ||
10 | a | 3 | |
0 | 0 | ||
2 | 2 | ||
0 | 0 | ||
4 | 4 | ||
12 | c | 4 | |
0 | 0 | ||
7 | 7 | ||
0 | 0 | ||
5 | 5 | ||
1 | 1 | 5 | |
0 | 0 | ||
3 | 3 | ||
40 | 28 | ( | |
41 | 29 | ) | |
86 | 56 | V | |
1 | 1 | 6 | |
0 | 0 | ||
4 | 4 | ||
40 | 28 | ( | |
73 | 49 | I | |
41 | 29 | ) | |
86 | 56 | V | |
1 | 1 | 7 | |
0 | 0 | ||
6 | 6 | ||
60 | 3c | < | |
105 | 69 | i | |
110 | 6e | n | |
105 | 69 | i | |
116 | 74 | t | |
62 | 3e | > | |
1 | 1 | 8 | |
0 | 0 | ||
4 | 4 | ||
67 | 43 | C | |
111 | 6f | o | |
100 | 64 | d | |
101 | 65 | e | |
1 | 1 | 9 | |
0 | 0 | ||
13 | d | ||
67 | 43 | C | |
111 | 6f | o | |
110 | 6e | n | |
115 | 73 | s | |
116 | 74 | t | |
97 | 61 | a | |
110 | 6e | n | |
116 | 74 | t | |
86 | 56 | V | |
97 | 61 | a | |
108 | 6c | l | |
117 | 75 | u | |
101 | 65 | e | |
1 | 1 | 10 | |
0 | 0 | ||
10 | a | ||
69 | 45 | E | |
120 | 78 | x | |
99 | 63 | c | |
101 | 65 | e | |
112 | 70 | p | |
116 | 74 | t | |
105 | 69 | i | |
111 | 6f | o | |
110 | 6e | n | |
115 | 73 | s | |
1 | 1 | 11 | |
0 | 0 | ||
15 | f | ||
76 | 4c | L | |
105 | 69 | i | |
110 | 6e | n | |
101 | 65 | e | |
78 | 4e | N | |
117 | 75 | u | |
109 | 6d | m | |
98 | 62 | b | |
101 | 65 | e | |
114 | 72 | r | |
84 | 54 | T | |
97 | 61 | a | |
98 | 62 | b | |
108 | 6c | l | |
101 | 65 | e | |
1 | 1 | 12 | |
0 | 0 | ||
14 | e | ||
76 | 4c | L | |
111 | 6f | o | |
99 | 63 | c | |
97 | 61 | a | |
108 | 6c | l | |
86 | 56 | V | |
97 | 61 | a | |
114 | 72 | r | |
105 | 69 | i | |
97 | 61 | a | |
98 | 62 | b | |
108 | 6c | l | |
101 | 65 | e | |
115 | 73 | s | |
1 | 1 | 13 | |
0 | 0 | ||
10 | a | ||
83 | 53 | S | |
111 | 6f | o | |
117 | 75 | u | |
114 | 72 | r | |
99 | 63 | c | |
101 | 65 | e | |
70 | 46 | F | |
105 | 69 | i | |
108 | 6c | l | |
101 | 65 | e | |
1 | 1 | 14 | |
0 | 0 | ||
3 | 3 | ||
97 | 61 | a | |
97 | 61 | a | |
97 | 61 | a | |
1 | 1 | 15 | |
0 | 0 | ||
3 | 3 | ||
97 | 61 | a | |
98 | 62 | b | |
99 | 63 | c | |
1 | 1 | 16 | |
0 | 0 | ||
16 | 10 | ||
106 | 6a | j | |
97 | 61 | a | |
118 | 76 | v | |
97 | 61 | a | |
47 | 2f | / | |
108 | 6c | l | |
97 | 61 | a | |
110 | 6e | n | |
103 | 67 | g | |
47 | 2f | / | |
79 | 4f | O | |
98 | 62 | b | |
106 | 6a | j | |
101 | 65 | e | |
99 | 63 | c | |
116 | 74 | t | |
1 | 1 | 17 | |
0 | 0 | ||
8 | 8 | ||
122 | 7a | z | |
122 | 7a | z | |
122 | 7a | z | |
46 | 2e | . | |
106 | 6a | j | |
97 | 61 | a | |
118 | 76 | v | |
97 | 61 | a | |
0 | 0 | ||
32 | 20 | flags private | |
0 | 0 | ||
1 | 1 | this | |
0 | 0 | ||
2 | 2 | super | |
0 | 0 | ||
0 | 0 | interfaces | |
0 | 0 | ||
0 | 0 | fields | |
0 | 0 | ||
2 | 2 | methods | |
0 | 0 | ||
1 | 1 | flags | |
0 | 0 | ||
15 | f | name of the fn - abc | |
0 | 0 | ||
6 | 6 | signature - (I)V | |
0 | 0 | ||
1 | 1 | no. of attributes - 1 | |
0 | 0 | ||
8 | 8 | name - Code | |
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
25 | 19 | len of the attribute | |
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
2 | 2 | local variables | |
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
1 | 1 | len of the code | |
177 | b1 | ± | Actual Code |
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
1 | 1 | no. of attributes | |
0 | 0 | ||
11 | b | Line Number Table | |
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
6 | 6 | Len | |
0 | 0 | ||
1 | 1 | No.of members | |
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
3 | 3 | Line number of the function | |
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
7 | 7 | name | |
0 | 0 | ||
5 | 5 | signature | |
0 | 0 | ||
1 | 1 | no. of attributes | |
0 | 0 | ||
8 | 8 | Code | |
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
29 | 1d | length | |
0 | 0 | ||
1 | 1 | Stack space | |
0 | 0 | ||
1 | 1 | Local Variable | |
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
5 | 5 | len | |
42 | 2a | * | Actual Code |
183 | b7 | · | |
0 | 0 | ||
3 | 3 | ||
177 | b1 | ± | |
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
1 | 1 | No. of attributes | |
0 | 0 | ||
11 | b | Line Number Table | |
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
6 | 6 | Len | |
0 | 0 | ||
1 | 1 | No. of members | |
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
1 | 1 | Line Number | |
0 | 0 | ||
1 | 1 | No. of Attributes | |
0 | 0 | ||
13 | d | ||
0 | 0 | ||
0 | 0 | ||
0 | 0 | ||
2 | 2 | Length | |
0 | 0 | ||
17 | 11 | zzz.java |
We'll examine our earlier program, aaa.class byte by byte and try to understand exactly what the Java Virtual Machine is up to and what the .class file format is. Once you understand this, stuff like JavaBeans becomes much easier to handle. This may seem a daunting task, but the file's only 278 bytes large, so it won't drag on forever!
The .class file format is an industry standard and what we describe here holds true across all the Java compilers. Compile zzz.java using different compilers and the bytes will still be the same.
First write this program to display the bytes in hex, decimal and ASCII or use Pctools, Nu or some other utility.
#include <stdio.h> void main() { FILE *fp; int i; fp=fopen("aaa~1.cla","rb"); while ((i=fgetc(fp)) != EOF) printf("%d \t %x \t %c \n",i,i,i); fclose(fp); }
Stepping through the bytes...
Throughout this tutorial , u1 stands for a single byte, u2 stands for two bytes or an int and u4 stands for four bytes or a long.
Lets jump right in and start explaining the bytes which make up the .class file. The very first four bytes of the .class file are:-
hex | dec |
---|---|
ca | 202 |
fe | 254 |
ba | 186 |
be | 190 |
CAFE BABE... Get it? Lucky for James Gosling he doesn't live in India because if he said this over here, the local 'Mahila Mandal' -- A militant form of Women's Liberation whose members roam the streets armed with large rolled up newspapers, looking for unescorted young males -- would have his head!!
This bit of trivia can come in handy some times. Some Java sites (and employers!) demand that you know the meaning of these bytes, or they assume that you really don't know much about Java.
0xCA 0xFE 0xBA 0xBE are the first four bytes of any Java .class file and they're collectively called the magic number. They were added to the file format so that a .class could be instantly recognized by it's first few bytes. If these bytes are tampered with, then no JVM will recognize the file as a Java applet and the program will be useless.
The next 4 bytes are as follows
hex | dec |
---|---|
0 | 00 |
3 | 3 |
0 | 0 |
2d | 45 |
These are the version number of the file. 0x0003 (3) is the major version number and 0x002D (45) is the minor version number. So the version of Java this .class file is optomised for is version 3.45 of the Java virtual machine.
Immediately following this is an int with the value :-
hex | dec |
---|---|
0 | 0 |
12 | 18 |
This means that the array to follow has 0x12 (18) members. The length of individual members is not stored. Though we're told that there are 18 members, there are actually only 17 as the first 0 isn't counted.
The name of this huge array is the Constant Pool. The Constant Pool stores constant values like function names and strings which are called repeatedly. Rather than put the whole string in the code every time you wish to refer to it, Java uses a pointer to the value in the Constant Pool. We'll see its usefulness as we proceed.
The values stored in the Constant Pool are:-
Dec | Hex | Char | No |
---|---|---|---|
7 | 7 | 1 | |
0 | 0 | ||
14 | e | ||
7 | 7 | 2 | |
0 | 0 | ||
16 | 10 | ||
10 | a | 3 | |
0 | 0 | ||
2 | 2 | ||
0 | 0 | ||
4 | 4 | ||
12 | c | 4 | |
0 | 0 | ||
7 | 7 | ||
0 | 0 | ||
5 | 5 | ||
1 | 1 | 5 | |
0 | 0 | ||
3 | 3 | ||
40 | 28 | ( | |
41 | 29 | ) | |
86 | 56 | V | |
1 | 1 | 6 | |
0 | 0 | ||
4 | 4 | ||
40 | 28 | ( | |
73 | 49 | I | |
41 | 29 | ) | |
86 | 56 | V | |
1 | 1 | 7 | |
0 | 0 | ||
6 | 6 | ||
60 | 3c | < | |
105 | 69 | i | |
110 | 6e | n | |
105 | 69 | i | |
116 | 74 | t | |
62 | 3e | > | |
1 | 1 | 8 | |
0 | 0 | ||
4 | 4 | ||
67 | 43 | C | |
111 | 6f | o | |
100 | 64 | d | |
101 | 65 | e | |
1 | 1 | 9 | |
0 | 0 | ||
13 | d | ||
67 | 43 | C | |
111 | 6f | o | |
110 | 6e | n | |
115 | 73 | s | |
116 | 74 | t | |
97 | 61 | a | |
110 | 6e | n | |
116 | 74 | t | |
86 | 56 | V | |
97 | 61 | a | |
108 | 6c | l | |
117 | 75 | u | |
101 | 65 | e | |
1 | 1 | 10 | |
0 | 0 | ||
10 | a | ||
69 | 45 | E | |
120 | 78 | x | |
99 | 63 | c | |
101 | 65 | e | |
112 | 70 | p | |
116 | 74 | t | |
105 | 69 | i | |
111 | 6f | o | |
110 | 6e | n | |
115 | 73 | s | |
1 | 1 | 11 | |
0 | 0 | ||
15 | f | ||
76 | 4c | L | |
105 | 69 | i | |
110 | 6e | n | |
101 | 65 | e | |
78 | 4e | N | |
117 | 75 | u | |
109 | 6d | m | |
98 | 62 | b | |
101 | 65 | e | |
114 | 72 | r | |
84 | 54 | T | |
97 | 61 | a | |
98 | 62 | b | |
108 | 6c | l | |
101 | 65 | e | |
1 | 1 | 12 | |
0 | 0 | ||
14 | e | ||
76 | 4c | L | |
111 | 6f | o | |
99 | 63 | c | |
97 | 61 | a | |
108 | 6c | l | |
86 | 56 | V | |
97 | 61 | a | |
114 | 72 | r | |
105 | 69 | i | |
97 | 61 | a | |
98 | 62 | b | |
108 | 6c | l | |
101 | 65 | e | |
115 | 73 | s | |
1 | 1 | 13 | |
0 | 0 | ||
10 | a | ||
83 | 53 | S | |
111 | 6f | o | |
117 | 75 | u | |
114 | 72 | r | |
99 | 63 | c | |
101 | 65 | e | |
70 | 46 | F | |
105 | 69 | i | |
108 | 6c | l | |
101 | 65 | e | |
1 | 1 | 14 | |
0 | 0 | ||
3 | 3 | ||
97 | 61 | a | |
97 | 61 | a | |
97 | 61 | a | |
1 | 1 | 15 | |
0 | 0 | ||
3 | 3 | ||
97 | 61 | a | |
98 | 62 | b | |
99 | 63 | c | |
1 | 1 | 16 | |
0 | 0 | ||
16 | 10 | ||
106 | 6a | j | |
97 | 61 | a | |
118 | 76 | v | |
97 | 61 | a | |
47 | 2f | / | |
108 | 6c | l | |
97 | 61 | a | |
110 | 6e | n | |
103 | 67 | g | |
47 | 2f | / | |
79 | 4f | O | |
98 | 62 | b | |
106 | 6a | j | |
101 | 65 | e | |
99 | 63 | c | |
116 | 74 | t | |
1 | 1 | 17 | |
0 | 0 | ||
8 | 8 | ||
122 | 7a | z | |
122 | 7a | z | |
122 | 7a | z | |
46 | 2e | . | |
106 | 6a | j | |
97 | 61 | a | |
118 | 76 | v | |
97 | 61 | a |
The first byte of each structure is the tag, a number which tells us something about the data to follow. So a tag of 1 tells us that the data to follow is a string. After the tag comes the length of the string which is contained in two bytes. After that comes the entire string. In the same way, all the values displayed above start with a tag, then comes the length and finally the actual data. For example, a 7 means that an Int will follow, 10 means two Ints will come next and so on.
Constant Pool Tags
Constant Type | Value |
---|---|
CONSTANT_Class | 7 |
CONSTANT_Methodref | 10 |
CONSTANT_Utf8(Octet string) | 1 |
CONSTANT_NameAndType | 12 |
CONSTANT_Class { u1 tag (7) u2 name_index ( 0 14) }
CONSTANT_Methodref { u1 tag (10) u2 class_index ( 0 2) u2 name_and_type_index (0 4) }
CONSTANT_Utf8 { u1 tag (1) u2 len ( 0 3) u1 [] bytes ( ( ) V ) }
CONSTANT_NameAndType { u1 tag (12) u2 name_index ( 0 7) u2 descriptor_index ( 0 5) }
The first section of the .class file is now over.
If you'll remember, our file looks a bit like this:-
class aaa { public void abc(int i) { } }
We'll be getting into the meat of the matter now.
Notice we haven't said public class aaa
Now come two bytes whose first byte is a 0x00 and second byte is 0x20. This is the flags field and it yields important information about the file.
hex | dec |
---|---|
00 | 00 |
20 | 32 |
If we were to write out the bits which make up the int, we'd get
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
In other words, only one bit in the two bytes is turned on. The right most bit, if it is 1, denotes that our class is a public class. Since this bit is 0, we can be sure that the class is private. Most of the other flags are zero.
Next we have two more bytes :-
hex | dec |
---|---|
00 | 00 |
01 | 01 |
This int represents the 'this' class. We can't use the 'this' pointer in Java (since it has no pointers), the 'this' pointer in Java is implemented as a class. Here the number is 1. This means we're supposed to go to the first array in the Constant Pool and refer to its value.
hex | dec |
---|---|
7 | 7 |
0 | 0 |
e | 14 |
hex | dec | Char | No. |
---|---|---|---|
1 | 1 | 14 | |
0 | 0 | ||
3 | 3 | ||
61 | 97 | a | |
61 | 97 | a | |
61 | 97 | a |
The array starts with a 7. This means that the number following it is an offset pointer to a value on the Constant Pool. We'resupposed to jump to the the 14th location in the Constant Pool. There we find the string 'aaa' which is the name of our class. So the 'this' pointer is a pointer to where the name of our class is stored.
hex | dec |
---|---|
0 | 0 |
2 | 2 |
After 'this', we have another int called 'super'. A Super class is the same as a Base class in C++. So when we say 'class aaa extends Applet', then Applet is our super class. The second entry in the Constant Pool is the address of where the super class is stored. It holds the number 16 which tells up to jump to the 16th array in the Pool. That's a string and it's pretty large too. It is the class every other class is derived from, java/lang/Object. Had we said 'class zzz extends Applet', then the 'string java.applet.Applet' would have come here.
hex | dec |
---|---|
7 | 7 |
0 | 0 |
10 | 16 |
hex | dec | Char | No. |
---|---|---|---|
1 | 1 | 16 | |
0 | 0 | ||
10 | 16 | ||
6a | 106 | j | |
61 | 97 | a | |
76 | 118 | v | |
61 | 97 | a | |
2f | 47 | / | |
6c | 108 | l | |
61 | 97 | a | |
6e | 110 | n | |
67 | 103 | g | |
2f | 47 | / | |
4f | 79 | O | |
62 | 98 | b | |
6a | 106 | j | |
65 | 101 | e | |
63 | 99 | c | |
74 | 116 | t |
The Reflection API does exactly what we're doing here, it disassembles a .class file. If you were to implement this code and put it in C/C++ then you'll have written one all by yourself. In fact that's exactly what some student from our institiute did. Jump over to their code later.
The next two bytes tell you the number of interfaces.
hex | dec |
---|---|
0 | 0 |
0 | 0 |
The values of both these bytes is zero since we don't have any interfaces.
Now come two bytes for the fields. A field is just another name for a variable. The value of both the bytes is zero since we don't have any variables.
hex | dec |
---|---|
0 | 0 |
0 | 0 |
The next two bytes hold the number of methods or functions present in your program. Even though you've only specified one, the value of these bytes is two because the second one is a free constructor provided by Java.
hex | dec |
---|---|
00 | 00 |
02 | 02 |
Next comes the attributes of the method i.e. whether it is public or private. We've said it was public, so the flags have changed.
hex | dec |
---|---|
00 | 00 |
01 | 01 |
Now come two bytes for the name index
hex | dec |
---|---|
00 | 00 |
0f | 15 |
Since the value contained within the int is 0x0F or 15, we're to go the 15th array in the Constant Pool. There we have a string which holds the name of our function abc().
hex | dec | Char | No. |
---|---|---|---|
1 | 1 | 15 | |
0 | 0 | ||
3 | 3 | ||
61 | 97 | a | |
62 | 98 | b | |
63 | 99 | c |
Now come two bytes for the signature. Remember, functions under both C++ and Java are stored as the function name + the parameters. So we're pointed to the 6th array in the Pool. There we find the string (I)V. This means our function takes an int and returns a void.
hex | dec |
---|---|
00 | 00 |
06 | 06 |
hex | dec | Char | No. |
---|---|---|---|
1 | 1 | 6 | |
0 | 0 | ||
4 | 4 | ||
28 | 40 | ( | |
49 | 73 | I | |
29 | 41 | ) | |
56 | 86 | V |
Now we have two bytes for the attributes of the function. These two bytes tell us the number of attributes i.e. 1 is this case.
hex | dec |
---|---|
00 | 00 |
01 | 01 |
Now come two bytes which point to the attribute name in the Constant Pool.
hex | dec |
---|---|
00 | 00 |
08 | 08 |
Since the number here is 0x08, we go to the 8th array in the Constant Pool.
hex | dec | Char | No. |
---|---|---|---|
1 | 1 | 8 | |
0 | 0 | ||
4 | 4 | ||
43 | 67 | C | |
6f | 111 | o | |
64 | 100 | d | |
65 | 101 | e |
So our function has one attribute called Code. This is natural since all functions have code. In fact, you can take any .class file you want and all the functions listed there will have the code attribute.
The next four bytes detail the length of the attribute called Code, which is 0x19 (25) bytes.
hex | dec |
---|---|
00 | 00 |
00 | 00 |
00 | 00 |
19 | 25 |
Now come the actual attributes themselves.
hex | dec | Meaning |
---|---|---|
0 | 0 | |
0 | 0 | stack |
hex | dec | Meaning |
---|---|---|
0 | 0 | |
0 | 0 | local variables |
hex | dec | Meaning |
---|---|---|
0 | 0 | |
0 | 0 | |
0 | 0 | |
1 | 1 | length of the code |
hex | dec | Meaning |
---|---|---|
b1 | 177 | code |
hex | dec | Meaning |
---|---|---|
0 | 0 | |
0 | 0 | Exceptions |
hex | dec | Meaning |
---|---|---|
0 | 0 | |
1 | 1 | Attributes |
hex | dec | Meaning |
---|---|---|
0 | 0 | |
b | 11 | Line Number Table |
hex | dec | Meaning |
---|---|---|
0 | 0 | |
0 | 0 | |
0 | 0 | |
6 | 6 | length of the attribute |
hex | dec | Meaning |
---|---|---|
0 | 0 | |
1 | 1 | No. of members |
hex | dec | Meaning |
---|---|---|
0 | 0 | |
0 | 0 | |
0 | 0 | |
3 | 3 | line number of the function |
The first two bytes hold the amount of space on the stack. Since we're not using the stack, the value is zero. The next int holds the number of local variables. We've got none, so the value is zero.
Now comes the length of my code and this takes up a whole long (4 bytes). The value held in those four bytes is 1 because our code is only one instruction long. Now comes one byte for the Byte Code which is 177. This means 'return no value' and this appears because we haven't specified the return value of the function.
Next we have two bytes for the number of exceptions. Exceptions are a way in with we can inform constructors and functions about errors that have occurred. They also help in keeping the code modular and neat. We don't have any exceptions so the value here is zero.
The next two bytes are 0x01 which means that these are the attributes. The next two bytes point to the 11th array in the Constant Pool which is the Line Number Table shown below.
hex | dec | Char | No. |
---|---|---|---|
1 | 1 | 11 | |
0 | 0 | ||
f | 15 | ||
4c | 76 | L | |
69 | 105 | i | |
6e | 110 | n | |
65 | 101 | e | |
4e | 78 | N | |
75 | 117 | u | |
6d | 109 | m | |
62 | 98 | b | |
65 | 101 | e | |
72 | 114 | r | |
54 | 87 | T | |
61 | 97 | a | |
62 | 98 | b | |
6c | 108 | l | |
65 | 101 | e |
The next four bytes hold the number 6 which is the length of the attribute. The next 0x01 tells us the number of attributes i.e. 1. The attribute that comes next is the line number of the function abc(). This value is used when the compiler spits out an error and tells you which line it appeared on. The line number bytes (there are four of them) are not one big long number. They're actually two int. The first int acts like the segment, the other like the offset. So when the value in
hex | dec | Meaning |
---|---|---|
0 | 0 | |
0 | 0 | |
0 | 0 | |
3 | 3 | Line Number |
it means line three, but if it's
hex | dec | Meaning |
---|---|---|
0 | 0 | |
100 | 100 | |
0 | 0 | |
3 | 3 | Line Number |
it means goto line 103.
This is the structure every function follows.
Member | data type | Bytes |
---|---|---|
Flags | u2 | 00 00 |
Name | u2 | 00 15 |
Signature | u2 | 00 06 |
No. of attributes | u2 | 00 01 |
First comes the flags field, then the pointer to the name in the Constant Pool, then the signature and finally the number of attributes for the function.
Then comes the attribute structure, used for each and every attribute. We've got only one attribute so the structure appears only once.
Member | data type | Bytes |
---|---|---|
Name of attribute | u2 | 00 08 |
Len of attribute | u4 | 00 00 00 25 |
Stack | u2 | 00 00 |
Local Variables | u2 | 00 00 |
Len of Code | u4 | 00 00 00 01 |
Actual Code | u1 | 177 |
Exceptions | u2 | 00 00 |
No. of Attributes | u2 | 00 01 |
Details of Attribute |
Details of Attribute
Member | data type | Bytes | |
---|---|---|---|
Name of Attribute | u2 | 00 11 | |
Length of Attribute | u4 | 00 00 00 06 | |
Details of Attribute | 00 01 | ||
00 00 | |||
00 03 |
We've actually got two functions in our code, the function abc and the constructor. The constructor follows exactly the same format as any other function. The format for it's bytes is given below.
Flags
Now for some reason or another, James Gosling decided to call Java constructors init in angle brackets rather than a plain old 'constructor'.
Name
Signature
No. of attributes
Name of attribute
Len of attribute
Stack
Local Variables
Len of Code
Actual Code
Exceptions
No. of Attributes
Name of Attribute
Length of Attribute
Details of Attribute
Every function starts with a flags field which in this case is :-
hex | dec |
---|---|
00 | 00 |
00 | 00 |
Now comes the pointer to the name of the function stored in the Constant Pool.
hex | dec |
---|---|
00 | 00 |
07 | 07 |
If we jump over to the Pool, we can see the name.
hex | dec | Char | No. |
---|---|---|---|
1 | 1 | 7 | |
0 | 0 | ||
6 | 6 | ||
3c | 60 | < | |
69 | 105 | i | |
6e | 110 | n | |
69 | 105 | i | |
74 | 116 | t | |
3e | 62 | > |
As I've just said, a constructor in Java is known as <init> and that's exactly what we see in the Constant Pool.
Now we have the signature of the constructor.
hex | dec |
---|---|
00 | 00 |
05 | 05 |
This means we're supposed to skip to the Pool and retrieve the value of the 5th array, which is shown below.
hex | dec | Char | No. |
---|---|---|---|
1 | 1 | 5 | |
0 | 0 | ||
3 | 3 | ||
28 | 40 | ( | |
29 | 41 | ) | |
56 | 86 | V |
Now comes the number of attributes. Our constructor's got only one, i.e. Code.
hex | dec |
---|---|
00 | 00 |
01 | 01 |
hex | dec |
---|---|
00 | 00 |
08 | 08 |
hex | dec | Char | No. |
---|---|---|---|
1 | 1 | 8 | |
0 | 0 | ||
4 | 4 | ||
43 | 67 | C | |
6f | 111 | o | |
64 | 100 | d | |
65 | 101 | e |
After the name of the attribute is the total length of the information, which in our case is 29.
hex | dec |
---|---|
00 | 00 |
1d | 29 |
The Stack space is set to 1.
hex | dec |
---|---|
00 | 00 |
01 | 01 |
These bytes are set to 01 because the function <init> has created one local variable.
hex | dec |
---|---|
00 | 00 |
01 | 01 |
These bytes hold the length of the code.
hex | dec |
---|---|
00 | 00 |
05 | 05 |
Now comes the actual code:-
hex | dec | Meaning |
---|---|---|
2a | 42 | load object reference from local variable |
b7 | 183 | Invoke non-virtual method |
0 | 0 | offset on Constant Pool |
3 | 3 | offset on Constant Pool |
b1 | 177 | Return void |
These five bytes are Java Byte Codes. If we refer to the manual, we'll see that 42 means 'Load Object Reference from Local Variable'. 183 indicated that a function, whose address in the Pool is given in the next tw bytes, whould be called. The next two bytes point to the third array in the Constant Pool. The last byte, 177 means return a void.
If you really want to discover what these values stand for, check out the Project our students worked on.
hex | Dec | No. |
---|---|---|
a | 10 | 3 |
0 | 0 | |
2 | 2 | |
0 | 0 | |
4 | 4 |
The third member of this format starts with a 10 which is the Constant Method (function) reference. It means that the first int will point to where the object comes from and the next two ints will tell us the name of the function and it's signature.
The first int that follows refers to the second array in the Pool which is shown below.
hex | Dec | No. |
---|---|---|
7 | 7 | 2 |
0 | 0 | |
10 | 16 |
The 16 there tells us to go to the 16th array i.e. java/lang/Object
hex | Dec | Char | No. |
---|---|---|---|
1 | 1 | 16 | |
0 | 0 | ||
10 | 16 | ||
6a | 106 | j | |
61 | 97 | a | |
76 | 118 | v | |
61 | 97 | a | |
2f | 47 | / | |
6c | 108 | l | |
61 | 97 | a | |
6e | 110 | n | |
67 | 103 | g | |
2f | 47 | / | |
4f | 79 | O | |
62 | 98 | b | |
6a | 106 | j | |
65 | 101 | e | |
63 | 99 | c | |
74 | 116 | t |
So the first part of 10 (the Constant Method (function) reference) talks about the type of Object. The next int says 00 04 which means go to the 4th array in the Pool.
hex | Dec | No. |
---|---|---|
c | 12 | 4 |
0 | 0 | |
7 | 7 | |
0 | 0 | |
5 | 5 |
There the first number is 12 which stands for the Constant Name and Type. After the 12 we have two more ints. The first int is 7 which means go to the 7th array in the Pool.
hex | Dec | Char | No. |
---|---|---|---|
1 | 1 | 7 | |
0 | 0 | ||
6 | 6 | ||
3c | 60 | < | |
69 | 105 | i | |
6e | 110 | n | |
69 | 105 | i | |
74 | 116 | t | |
3e | 62 | > |
Lets now check out the other int, 5.
hex | Dec | Char | No. |
---|---|---|---|
1 | 1 | 5 | |
0 | 0 | ||
3 | 3 | ||
28 | 40 | ( | |
29 | 41 | ) | |
56 | 86 | V |
5 turn out to be the signature.
So by following this digital trail we've figured out quite a bit about our constructor. Everything from it's name, to the parameters it's called with.
After we finish with the Byte Codes, we handle the exceptions. We've got none, so the next bytes are set to zero.
hex | dec |
---|---|
00 | 00 |
00 | 00 |
Now come the number of attributes the function has - 1 attribute.
hex | dec |
---|---|
00 | 00 |
01 | 01 |
After the number of attributes, we have the name of the attribute. So we scoot over to the 11th array in the Constant Pool to check that out.
hex | dec |
---|---|
00 | 00 |
0b | 11 |
Here it again, the Line Number Table.
hex | dec | Char | No. |
---|---|---|---|
1 | 1 | 11 | |
0 | 0 | ||
f | 15 | ||
4c | 76 | L | |
69 | 105 | i | |
6e | 110 | n | |
65 | 101 | e | |
4e | 78 | N | |
75 | 117 | u | |
6d | 109 | m | |
62 | 98 | b | |
65 | 101 | e | |
72 | 114 | r | |
54 | 87 | T | |
61 | 97 | a | |
62 | 98 | b | |
6c | 108 | l | |
65 | 101 | e |
After that we have the length of the attribute. In this case it's:-
hex | dec |
---|---|
00 | 00 |
00 | 00 |
00 | 00 |
06 | 06 |
Right after that we have the number of structures that look like line number table.
hex | dec |
---|---|
00 | 00 |
01 | 01 |
Then come the members within the structure
hex | dec |
---|---|
00 | 00 |
00 | 00 |
00 | 00 |
01 | 01 |
So we're told that the constructor is on line number 1 in the source code.
Now comes the very last section of all.
hex | dec |
---|---|
00 | 00 |
01 | 01 |
That's right, it's attributes again. The attribute starts with the name of the attribute which is a pointer to the name in the Constant Pool.
hex | dec |
---|---|
00 | 00 |
0d | 13 |
hex | dec | Char | No. |
---|---|---|---|
1 | 1 | 13 | |
0 | 0 | ||
a | 10 | ||
53 | 83 | S | |
6f | 111 | o | |
75 | 117 | u | |
72 | 114 | r | |
63 | 99 | c | |
65 | 101 | e | |
46 | 70 | F | |
69 | 105 | i | |
6c | 108 | l | |
65 | 101 | e |
Now comes the length of the attribute, which is 2.
hex | dec |
---|---|
00 | 00 |
00 | 00 |
00 | 00 |
02 | 02 |
hex | dec |
---|---|
00 | 00 |
11 | 17 |
hex | dec | Char | No. |
---|---|---|---|
1 | 1 | 17 | |
0 | 0 | ||
8 | 8 | ||
7a | 122 | z | |
7a | 122 | z | |
7a | 122 | z | |
2e | 46 | . | |
6a | 106 | j | |
61 | 97 | a | |
76 | 118 | v | |
61 | 97 | a |
At position 17 we find the words zzz.java, the name of the source file.
That's it! You've written your own version of the reflection API. You can now decompile a .class file with ease. As a final revision, run your eyes over the following table.
The structure of the .class file
Mr. Vijay Mukhi
Ms. Sonal Kotecha
Mr. Arsalan Zaidi
Vijay Mukhi's Computer
Institute
VMCI, B-13, Everest Building, Tardeo, Mumbai 400 034, India
Tel : 91-22-496 4335 /6/7/8/9
Fax : 91-22-307 28 59
e-mail : vmukhi@giasbm01.vsnl.net.in
http://www.vijaymukhi.com