DISASSEMBLER

                               
 Let's understand what a Disassembler is ?

 It basically converts the byte codes of a class file i.e. machine code of
JVM to its actual source code i.e. the java program. The Dissassembler is not
a separate program.A few function which are added in our main program which
reads the class file. Thus this is the right time to understand what these
functions do.


NOTE:This Dissassembler has its own limitations. It only displays
variables & their initializations, airthematic operations on these variables,
strings, for loop, if loop, nested if & for loops, methods, their signatures.
But you can very well incorporate other unicodes in the methods to convert
into java code. This is an attempt made by us to make people understand how
a java Dissassembler can be written.

 As of now all the information of the class file is already stored in the
structures. Now we will try to arrange this information so that we get our
desired java source code. In source.cpp, we have seven functions which carry
out disassembling. These functions are namely methsign(), v(), cview(),
method(), arrange(), rtype() & catstr(). Look at the last three lines in
main(). There is for statement and there is function call cview() within the 
for statement. Now cview() will be called depending on the number of the
methods that are present in the class file, as meth_c denotes the numbers of
methods in the class files.

 Let's look at cview() as it is the first functions which gets called from
main(). The function takes an argument that is a pointer to the structure
"meth" which has the informatiion about the methods i.e. method signatures,
returns type etc.. The source code in a particular method also is in the
form of byte codes and c_len signifies the code length. It is the member of
structure cinfo which has the information of the code in the particular
method. The code is stored in array 'c'.

 First we have to start by displaying the return type of the method, name
of the method and arguments if any.  This is done by function methsign().
It takes three arguments namely the name of the method, its signature and
array to get the results. The question which arises here is when we have the
name and signature of the methods in the structure 'cinfo', then why can't
we display it directly? The problem is that the information is in the raw
format i.e method name, signature are in differnt variables. Moreover the
signature is in in the form "( ) v",where the return type specified at the last,
but we want the output to look like "return type  method name ( arguments) "
and hence we have the function meth sign which arranges and concatenates the
strings and gives the desired results. The arguments mi.name, mi.sign are
actually ints and as usual we have to use get() to extract the actual string
from the constant pool. Then the arrangment is done by several strcpy() and
strcat() and requried output is stored in 'ss'. Where 'ss ' actually is a
reference for proto in cview and the next line i.e printf(), prints the desired
output e.g. void abc ( int x, int y) .

 Hey, whats the function rtype() doing in methsign(). If you carefully have
a look at the output of bytes.cpp you will find that the return type in the
method signature(in constant pool) is just one character i.e void -'v',
int -'i'etc. So to convert this return type character into its equivalent
word we have the function rtype(). It has been passed, the character and
variable which gives the no. of the parameter passed in methods as arguments.
Now this function will return the actual return type of the method  i.e.
void ,int ,double etc. .

 Now we have to display the actual code. But the big problem is code is that
the code is in bytes form. As we know that JVM has a 256 instruction set,
each instruction is represented by a particular no. between  0-255, i.e.177
means return void, 16 means to push one byte signed integer on stack and so
on. So these instructions, in the form of bytes codes, have to be decoded to
obtain its equivalent Java Code. This decoding is done by function method().

 The method function takes 5 parameters namely a pointer to an array which
stores all the codes, length of the code, a pointer to an array of no. of
variables, pointer to a pointer of a chars which stores the actual strings
which represents the code & the reference variable 'ii' to store the no.of
lines of the code. The most crucial part in this method (  ) function is the
switch statement. In switch statement, each and every instruction has its case.
For e.g. 3,4,5,6,7,8 belong to same category and hence they have same program
lines.

NOTE: All 256 instruction have not been decoded in the program. But one
can easily add cases as and when required just by knowing what a particular
instruction does. So,write the java program and get the bytes by executing
bytes.cpp and figure out how the code is stored in bytes form. 

If you go through the method function carefully you will find that within the
various cases function catstr(), v() and arrange() are called several number
of times. So let us look at this functions and figure out what they do in
the program .

 First we start with a simple function v(). As we know that the instructions
belonging to the same category have tthe same case but the instruction having
same case may represet an object variable or an integer variable. So to
display the appropriate variable ie. for integer variable, print "ii" else
print "oo" for an object, we have the function v() which takes two parameters
one a character "i" or "o" and the number of the variable. Then depending upon
the character it will return a string with appropriate name. That's it and
now for the second  simple function ie.catstr( ).

Here we are actually reading each instruction(byte code) and displaying its
appropriate java code in text. So now if we want to display int ii1=0; we have
to take each string & dispay it one after another, i.e. first display 'int'
then 'ii1' ,operator '=' & '0'. This is too complicated. And we can have a
function which takes all these strings as a parametres and concatenates them
appropriately and stores it as a single string. This is what catstr() does.
It takes any no. of arguments(strings), concatenates them, returns the length
of the string and stores the string in the given pointer to it.

Now lets look at interesting function called arrange ( ).
            
Suppose we have any airthematic statement in our Java program, lets say
c = a + b. Then the problem is that the compiler will give appropriate
bytes according to Djikstra's algorithm. Which will store the above equation
as a b +. If we have equation d = a + b - c then Djikstra's algorithm
outputs a b + c - and so on. So we have to convert  a b + into a + b  and
hence we have the function arrange( ) .

This function takes a pointer to the actual bytes as an argument. Then each
byte is figured out whether it's a variable(number) or an operator. As soon
as operator is found, bytes are arranged to get the normal output a + b.
Now suppose we have an equation a + b - c * d + e according to Djikstra's
algorithm the equation will be stored as a b + c -d * e +.

 So now the first operator from left is traced  i.e. +  and it is placed
between the first two variables. So the equation becomes a+bc-d*e+. After that
operator ' - ' is taken but the last variable is (a+b) and ' - ' is placed
between (a+b) & c.Thus the equation becomes a+b-cd*e+. This how arrange
function extracts the actual airthematic expression.

This is what the Disassembler is all about. Though it is incomplete, one can
always write more complicated Java programs, compile them, get the actual
byte codes and then add more functions to the Disassembler program accordingly
to make this Disassembler more efficient.

NOTE:   This was just an attempt made to write a Disassembler. So now if
anyone who knows,the actual bits & bytes generated by Java compiler can write
a Java Compiler.
Back to the main page
Vijay Mukhi's Computer Institute
VMCI, B-13, Everest Building, Tardeo, Mumbai 400 034, India
Tel : 91-22-496 4335 /6/7/8/9 Fax : 91-22-307 28 59
e-mail : vmukhi@giasbm01.vsnl.net.in
http://www.vijaymukhi.com