The IL Disassembler .  

 

This book explains the internal workings of a disassembler. The programs given in the book produces an output similar to the one written by Microsoft i.e. ildasm. The only difference is that the source code of ildasm is not available. Our main objective in this book is to write innumerable programs, which ultimately focus on understanding the disassembler in a simplistic form. The final program has been tested against 5000 .net files.

 

Without getting into any more discussions, lets start with the disassembler right away. The output produced by our program will be tested with that of ildasm simultaneously. This is more to verify the results and keep us on the right track.

 

a.cs

public class zzz

{

public static void Main()

{

}

}

 

>ildasm /all /out:a.txt a.exe

 

Program a.cs is the smallest C# program which on compiling gives the smallest .Net executable, a.exe. If you fail to understand the above C# program or have forgotten how to compile a C# program, we request you to stop reading this book now. This book assumes that you know nothing about a disassembler but you must have a basic understanding of the C# programming language.

 

Once the executable is created, proceed further to write the first program in the series of the disassembler.

 

Program1.csc

using System;

using System.IO;

public class zzz

{

int [] datadirectoryrva;

int [] datadirectorysize;

int subsystem;

int stackreserve ;

int stackcommit;

int datad;

int sectiona;

int filea;

int entrypoint;

int ImageBase;

FileStream mfilestream ;

BinaryReader mbinaryreader ;

long sectionoffset;

short sections ;

string filename;

int [] SVirtualAddress ;

int [] SSizeOfRawData;

int [] SPointerToRawData ;

public static void Main (string [] args)

{

try

{

zzz a = new zzz();

a.abc(args);

}

catch ( Exception e)

{

Console.WriteLine(e.ToString());

}

}

public void abc(string [] args)

{

ReadPEStructures(args);

DisplayPEStructures();

}

public void ReadPEStructures(string [] args)

{

filename = args[0];

mfilestream  = new FileStream(filename ,FileMode.Open);

mbinaryreader = new BinaryReader (mfilestream);

mfilestream.Seek(60, SeekOrigin.Begin);

int startofpeheader = mbinaryreader.ReadInt32();

mfilestream.Seek(startofpeheader, SeekOrigin.Begin);

byte sig1,sig2,sig3,sig4;

sig1 = mbinaryreader.ReadByte();

sig2 = mbinaryreader.ReadByte();

sig3 = mbinaryreader.ReadByte();

sig4 = mbinaryreader.ReadByte();

//First Structure

short machine = mbinaryreader.ReadInt16();

sections = mbinaryreader.ReadInt16();

int time = mbinaryreader.ReadInt32();

int pointer = mbinaryreader.ReadInt32();

int symbols = mbinaryreader.ReadInt32();

int headersize= mbinaryreader.ReadInt16();

int characteristics = mbinaryreader.ReadInt16();

sectionoffset = mfilestream.Position + headersize;

//Second Structure

int magic = mbinaryreader.ReadInt16();

int major = mbinaryreader.ReadByte();

int minor = mbinaryreader.ReadByte();

int sizeofcode = mbinaryreader.ReadInt32();

int sizeofdata = mbinaryreader.ReadInt32();

int sizeofudata = mbinaryreader.ReadInt32();

entrypoint = mbinaryreader.ReadInt32();

int baseofcode = mbinaryreader.ReadInt32();

int baseofdata = mbinaryreader.ReadInt32();

ImageBase = mbinaryreader.ReadInt32();

sectiona= mbinaryreader.ReadInt32();

filea = mbinaryreader.ReadInt32();

int majoros = mbinaryreader.ReadInt16();

int minoros = mbinaryreader.ReadInt16();

int majorimage = mbinaryreader.ReadInt16();

int minorimage = mbinaryreader.ReadInt16();

int majorsubsystem= mbinaryreader.ReadInt16();

int minorsubsystem = mbinaryreader.ReadInt16();

int verison = mbinaryreader.ReadInt32();

int imagesize = mbinaryreader.ReadInt32();

int sizeofheaders= mbinaryreader.ReadInt32();

int checksum = mbinaryreader.ReadInt32();

subsystem = mbinaryreader.ReadInt16();

int dllflags = mbinaryreader.ReadInt16();

stackreserve = mbinaryreader.ReadInt32();

stackcommit = mbinaryreader.ReadInt32();

int heapreserve = mbinaryreader.ReadInt32();

int heapcommit = mbinaryreader.ReadInt32();

int loader = mbinaryreader.ReadInt32();

datad = mbinaryreader.ReadInt32();

datadirectoryrva = new int[16];

datadirectorysize = new int[16];

for ( int i = 0 ; i <=15 ; i++)

{

datadirectoryrva[i] = mbinaryreader.ReadInt32();

datadirectorysize[i] = mbinaryreader.ReadInt32();

}

if ( datadirectorysize[14] == 0)

throw new System.Exception("Not a valid CLR file");

mfilestream.Position = sectionoffset ;

SVirtualAddress = new int[sections ];

SSizeOfRawData = new int[sections ];

SPointerToRawData = new int[sections ];

for ( int i = 0 ; i < sections ; i++)

{

mbinaryreader.ReadBytes(12);

SVirtualAddress[i] = mbinaryreader.ReadInt32();

SSizeOfRawData[i] = mbinaryreader.ReadInt32();

SPointerToRawData[i] = mbinaryreader.ReadInt32();

mbinaryreader.ReadBytes(16);

}

}

public void DisplayPEStructures()

{

Console.WriteLine();

Console.WriteLine("//  Microsoft (R) .NET Framework IL Disassembler.  Version 1.0.3328.4");

Console.WriteLine("//  Copyright (C) Microsoft Corporation 1998-2001. All rights reserved.");

Console.WriteLine();

Console.WriteLine("// PE Header:");

Console.WriteLine("// Subsystem:                      {0}",subsystem.ToString("x8"));

Console.WriteLine("// Native entry point address:     {0}",entrypoint.ToString("x8"));

Console.WriteLine("// Image base:                     {0}",ImageBase.ToString("x8"));

Console.WriteLine("// Section alignment:              {0}",sectiona.ToString("x8"));

Console.WriteLine("// File alignment:                 {0}",filea.ToString("x8"));

Console.WriteLine("// Stack reserve size:             {0}",stackreserve.ToString("x8"));

Console.WriteLine("// Stack commit size:              {0}",stackcommit.ToString("x8"));

Console.WriteLine("// Directories:                    {0}",datad.ToString("x8"));

DisplayDataDirectory(datadirectoryrva[0] , datadirectorysize[0] , "Export Directory");

DisplayDataDirectory(datadirectoryrva[1] , datadirectorysize[1] , "Import Directory");

DisplayDataDirectory(datadirectoryrva[2] , datadirectorysize[2] , "Resource Directory");

DisplayDataDirectory(datadirectoryrva[3] , datadirectorysize[3] , "Exception Directory");

DisplayDataDirectory(datadirectoryrva[4] , datadirectorysize[4] , "Security Directory");

DisplayDataDirectory(datadirectoryrva[5] , datadirectorysize[5] , "Base Relocation Table");

DisplayDataDirectory(datadirectoryrva[6] , datadirectorysize[6] , "Debug Directory");

DisplayDataDirectory(datadirectoryrva[7] , datadirectorysize[7] , "Architecture Specific");

DisplayDataDirectory(datadirectoryrva[8] , datadirectorysize[8] , "Global Pointer");

DisplayDataDirectory(datadirectoryrva[9] , datadirectorysize[9] , "TLS Directory");

DisplayDataDirectory(datadirectoryrva[10] , datadirectorysize[10] , "Load Config Directory");

DisplayDataDirectory(datadirectoryrva[11] , datadirectorysize[11] , "Bound Import Directory");

DisplayDataDirectory(datadirectoryrva[12] , datadirectorysize[12] , "Import Address Table");

DisplayDataDirectory(datadirectoryrva[13] , datadirectorysize[13] , "Delay Load IAT");

DisplayDataDirectory(datadirectoryrva[14] , datadirectorysize[14] , "CLR Header");

Console.WriteLine();

}

public void DisplayDataDirectory(int rva, int size , string ss)

{

string sfinal =  "";

sfinal = String.Format("// {0:x}" , rva);

sfinal = sfinal.PadRight(12);

sfinal = sfinal + String.Format("[{0:x}" , size);

sfinal = sfinal.PadRight(21);

sfinal = sfinal + String.Format("] address [size] of {0}:" , ss);

if (ss == "CLR Header")

sfinal = sfinal.PadRight(67);

else

sfinal = sfinal.PadRight(68);

Console.WriteLine(sfinal);

}

}

 

On compiling the above program, program1.exe is generated. Now run the executable as

 

>Program1 a.exe

 

This command gives the following output.

 

Output

//  Microsoft (R) .NET Framework IL Disassembler.  Version 1.0.3328.4

//  Copyright (C) Microsoft Corporation 1998-2001. All rights reserved.

 

// PE Header:

// Subsystem:                      00000003

// Native entry point address:     0000227e

// Image base:                     00400000

// Section alignment:              00002000

// File alignment:                 00000200

// Stack reserve size:             00100000

// Stack commit size:              00001000

// Directories:                    00000010

// 0        [0       ] address [size] of Export Directory:         

// 2228     [53      ] address [size] of Import Directory:         

// 4000     [318     ] address [size] of Resource Directory:       

// 0        [0       ] address [size] of Exception Directory:       

// 0        [0       ] address [size] of Security Directory:       

// 6000     [c       ] address [size] of Base Relocation Table:    

// 0        [0       ] address [size] of Debug Directory:          

// 0        [0       ] address [size] of Architecture Specific:    

// 0        [0       ] address [size] of Global Pointer:           

// 0        [0       ] address [size] of TLS Directory:            

// 0        [0       ] address [size] of Load Config Directory:    

// 0        [0       ] address [size] of Bound Import Directory:   

// 2000     [8       ] address [size] of Import Address Table:     

// 0        [0       ] address [size] of Delay Load IAT:           

// 2008     [48      ] address [size] of CLR Header:              

 

Since time immemorial, the first function to be called is Main. In this function, to begin with, an instance of class zzz is created and then a non- static function abc is called from it. The only reason for placing the bulk of our code in the abc function is that the Main function is static. It cannot access instance variables till an instance of its class is not created.

 

We promise that it is for the first and the last time in this book that we will use names like zzz and a. Henceforth we will abide by big meaningful names for variables/objects. Another simple rule that we have adhered to is that if a variable is to be used by another function, then it is made a global or an instance variable. Global in the C# world is a no-no but in the C++ world is allowed. Therefore at times, the names may sound legally wrong but they are morally right.

 

The abc function is given an array of strings that hold the arguments assigned to the program. In our case, it is the name of the .Net executable that is to be disassembled. While writing code, there are possibilities of making errors. A dialog box pops up each time an error is encountered which at times get extremely irritating. For this purpose, the code in Main is enshrined within a try catch to simply display the exceptions.

 

Now to understand the functioning of abc.

The array variable args[0] contains the name of the file to be disassembled which is saved in an instance variable, filename.

 

The .Net world has a million classes to handle files of which we have presently used only two. The first one is the FileStream class. The constructor of this class simply takes two parameters, the filename and an enum FileMode. The enum specifies how the file should be opened. This enum takes three values which decide whether the file is to be opened, created or overwritten. In the good old days of C, numbers or strings were used for discreet values, however the modern world of today prefers the enums instead. If you honestly ask us, we would prefer the old days anytime, but we all have to move ahead with time, embrace the new and forget the old ways.

 

Since the file is to be opened, the value of Open in the enum is used. An exception is thrown if the file does not exist. The handle to the file is stored in an instance variable suitably named mfilestream. The only problem with the FileStream class is that other than opening a file, it does nothing. It has a few rudimentary functions that enable reading a byte from a file. However they are of no use to us since our interest lies in reading a short or an int or a string from the file. Therefore, another class BinaryReader, which permits reading primitive objects like shorts, ints and longs from the file is used. The constructor of this class requires the mfilestream handle. It is the BinaryReader class that will be used and not the FileStream class in order to access the file.

 

The file format used by any Windows application is called the PE or Portable Executable file format. Before Windows evolved to become the big daddy of operating systems, the earlier king of the hill was DOS. Each and every executable file started with the two bytes of M and Z. This is how the DOS operating system would recognize an executable file. The advent of windows did not in any sense change the mindset of people thus they did not acknowledge the difference between the two operating systems. Very often a a windows program was executed in the DOS environment.

 

DOS being a primitive operating system normally checks the first two bytes and on not seeing the magic numbers M and Z, it displays a confusing message ‘Bad Command or File Name’. This led to some confusion, thus as a conscious decision, the makers of the PE file format mandated that every PE file would start with a valid DOS header. This header was then followed up with a program that printed a valid error message if the program was to be executed in the DOS environment. The DOS box of windows is a simulation of the original DOS.

 

The actual PE header of the file starts at bytes 60. This location takes an int thus the first four bytes are clubbed up together and indicate the start of the PE header. This offset is not a fixed value as different compilers decide on the error messages for the DOS program and thus change the length of the message. Using the Seek method of the FileStream class, the file pointer is positioned to the 60th byte in the file. The second parameter of the Seek function is an enum that takes three values. These values decide whether the number specified in the first parameter is an absolute offset from the beginning or end of the file or a relative offset to the file pointer.

 

The file pointer is an imaginary construct that points to the current or the next byte to be read. The offset is stored in a variable startofpeheader and its value normally is 128. As mentioned earlier this value can vary depending upon the compilers used. The Seek method is used again to jump to the start of the PE header. The ReadByte method is then implemented from the BinaryReader class to read each byte. The magic number for a PE header is P and E followed by two zeroes i.e. ‘PE00’.

 

This magic number is followed by a structure called the standard COFF header. COFF is the Common Object File Format. The first two bytes or short is the machine or better still the CPU type that this executable or image file can run on. An executable can either run on the specified machine or a system that emulates it. The PE specifications are available on the Microsoft site which specifies all possible values that the various structures can have, hence we will not irk you with these details.

 

In our case, the hex value displayed is 0x14c which stands for an Intel 32 bit machine. This value has not been displayed in the output for the simple reason that ildasm does not display the value and we have decided to follow the ildasm program to the T. This value is stored in a local variable called machine, it is not an instance variable. The method ReadInt16 is used to read a short or two bytes from a file. This method from the BinaryReader class is used to fetch bytes from the file. Thus using the BinaryReader class saves us the hassle of reading bytes and then doing their multiplications.

 

The second field is the number of sections in the PE file. A PE file contains different types of entities like code, data, resources etc. Each entity or section needs to be stored in a different part of the PE file, therefore structures are used to keep track of all them. The next short gives the number of sections and the value received for the file is three. Some time later, the sections will have to saved in structures and hence the variable sections is an instance. This is followed by the date time stamp which gives information when this file was created. The method ReadInt32 is used to extract this 4 byte value.

 

This is followed by a 4 byte entity that is a pointer or offset to the symbol table. The next int is the number of symbols available. The value of the pointer to symbol table is zero, which means an absence of the symbol table. Symbol tables are present only in obj or object files. In the good old days the compilers created an obj file and linkers created exe file from obj files. In the .net world the obj file are obsolete and hence these two int’s are always zero.

 

After the first header, is another header called the image optional header. This header is never seen in obj files and its size can also vary but so far its been a constant value at 224 bytes.

 

Then comes a field called characteristics, which specifies the attributes of the file. The value received is 0x10e.

 

Bit diagram

 

Individual bits in a byte carry different bits of information. The value of  0xe or 14 has a bit pattern wherein the 2, 3 and 4th bits are on.

 

Bit Diagram

 

This signifies that the file is a valid executable ( bit 2), there are no COFF line number present in the file or have been stripped off ( bit 3) and the symbol table entries are also absent( bit 4). A value of 0x100 signifies that the machine running the executable is based on a 32 bit architecture. This value, which is the last member of the structure, is not displayed by the ildasm utility.

 

The section table begins immediately after the image optional header, i.e. thus it is after the start of the optional header plus the size of the optional header. The variable sectionoffset has been used to store this value, thus it can be used to jump to the section table as and when required.

 

The optional header has the first field of a short type, which represents the magic number. This can take any of the two values, 0x10b if it follows the PE format which presently is the case. The other value is 0x20b when the header is of a PE32+  format. This value is generally seen when files use 64 bit addresses.

 

In the optional header, the information is divided into three distinct parts. The first 28 bytes is part of the standard PE header, the next 68 bytes applies to the windows operating system only and the final bytes are for the data directories. The second and third field of the standard header are the major and minor linker verison numbers which presently have a value of 6 and 0. This is followed by the size of the code block in the exe file. The size of initialized and uninitialized data follows next.

 

The displayed value of 0x227e is for the next field called entrypoint. This value is relative to where the program is loaded in memory or image base. In our case, since the file is an exe file, the instruction at this value becomes the first memory location that gets executed by the Operating System. In case of a device driver, there is no such specific function to be called, and hence it is the address of the initialization function. A DLL does not have to have an entry point and thus may have a value of 0.

 

The base of code and base of data are similar to the entrypoint field, which reveal the code or data area when loaded in memory, all relative to the image base. The ImageBase field is a logical address that points to the area where the Operating System loads the exe file.

 

Similar to our likes and dislikes, the OS prefers a value 0x00400000 as an address for executables, for a DLL it is 0x10000000 and for Windows CE it is 0x00010000. These starting addresses can be changed by supplying an option to the linker ant it must be a multiple of 64 K. Nevertheless, it is not advisable experimenting with different values.

 

The next value of 2048 is the section alignment. The above value signifies that even when a section has a size of 100 bytes, the OS will allocate a minimum of 2048 bytes for it. The rest of the bytes in the memory area allocated remain unused. This section alignment is normally the page size of the machine and is used for purposes of efficiency. Similar to the Section alignment is the file alignment field that applies to the file stored on disk. The file alignment is displayed as 512 bytes, which implies that each section when stored on diske takes up at least 512 bytes on disk, on disk 512 bytes make up one sector.

 

The next fields are the major and minor numbers of the Operating System, the image and the subsystem. The next field called verison is reserved. The following field is a size of all the code plus headers, followed by a field that only stores the size of all the headers including section headers. The next field called checksum helps the Operating System detect whether the file has been damaged or tampered before it can be loaded into memory.

 

The next field of subsystem displayed by ildasm informs the Operating System of the minimum subsystem required of it by the exe file. A value of 3 in our case means a console subsystem, therefore no Graphical User Interface please; whereas a value of 2 would mean a Graphical User Interface system. The field dllflags applies to DLL’s as the name signifies.

 

Following the field dllflags are two fields that deal with the stack. The stack is an area in memory, which is used to pass parameters to functions and create local variables. The stack memory is reused at the end of a function call and hence it is short-term memory whereas the heap area is for long duration. The second field called stackcommit is the amount of memory that is allocated to the stack. The value seen is 0x1000 bytes which is the stack reserve memory given to the application. Thus initially stack commit is allocated and once this gets used, one page at time is allocated dynamically, till the stack reserve is used. The two fields after the stack field are not displayed as they deal with the heap area in memory. The documentation is pretty candid that the loader field is obsolete.

 

The last field of the optional header gives the number of data directories following. So far only a value of 16 is seen. Lets now understand the concept of a data directory.

 

A data directory is nothing but two fields, the first field is a location or what is technically called an RVA (Relative Virtual Address) that gives information as to where some data starts in memory.  The second field is size in bytes of the entity. These are stored back to back.

 

Two arrays of size 16 and data type int are created to store the RVA’s and sizes of each data directory entry. If the 14th data directory entry has a size of zero, then it is conclusive of the fact that the executable file is not created by a .Net compiler. In such a case, there is no reason to continue further, so the program is made to throw an exception and then gracefully quit out. The reasoning will be catered to a couple of paragraphs down the road.

 

The section headers start immediately after the data directories. However, we take no chances and use the Position property of the FileStream class, to give the current position of the imaginary file pointer. The Position property is read/write thus it not only gives the details about the imaginary file pointer but also sets it to a new position if need be.

 

The Seek method can be used again, like before to jump to a part of the file, but as variety is the spice of life, we set the Position property instead. The world of computer programming lets us skin a cat multiple ways.  All the fields of the section headers are not important except three of them, so we create three arrays of ints to store the three fields.

 

The first field is the virtual address or RVA of the section in memory (We remember our promise to explain it), this is followed by the size of the section and finally the location on disk where the section is located. The size of a section header is 40 bytes. The three fields of our interest start 12 bytes from the start of the header, so using the ReadBytes function, the first 12 bytes are skipped. Then, the next three fields are read into the array variables. Since the remaining 16 bytes too have no significance, the last 16 bytes are skipped. We could have used the Seek function to jump over the 24 bytes that we are not interested in. Then again, we decided to use a method that is easiest to explain to you. The data directory and the section headers are now saved in arrays.

 

The next function DisplayPEStructures finally displays these values on the console. The only stumbling block here is that the output should match that of ildasm and just to remind you ildasm displays its output in a formatted manner. What we have is the Shared Source code, which comes with the source code of a disassembler and not the actual code of ildasm. The code when executed in no sense displays the output similar to that of ildasm. Thus we had no choice but to spend a lot of time figuring out how many spaces need to be placed at different points in the line.

 

A byte by byte comparison with the output generated by the original ildasm program can surely indicate our follies. Thus we decided to take this approach as otherwise there is no other way of knowing whether the code we have written works or not.  To pursue it further, we wrote our own file compare program to check whether the output generated by our disassembler and that of ildasm is the same, however you have an option of choosing any file compare program to suit your needs.

 

After displaying a new line, the version number of the disassembler is displayed. In our case the version is 1.0.3328.4, however yours could be larger or smaller, so please make the appropriate changes. Then the values of 7 variables viz, subsystem, image base, sectiona, filea, the stack variables and the number of data directories are displayed

 

Initially, we have entered the spaces manually for alignment purpose wherein numeric variables by default are displayed in decimal and using the ToString function present in the object class. There are a myriad of formats that can be put to use. The small x is used for the hexadecimal numbering system with the alpha characters displayed in small and not caps. The number 8 right justifies the number and fills up the rest with zeroes.

 

The sixteen data directories are displayed using a function DisplayDataDirectory. This function takes the rva and size of the element in the array alongwith a string to denote the name of the data directory. The prime objective of this function is to format the output and display it in a certain manner.

 

The string sfinal does not have to be initialized to a null string. However, we do the same out of habit since C# does not permit using an uninititalized variable on the right hand side of the equal to sign or as a return value.

 

Thereafter, using the static Format function from the String class, the rva of the data directory is displayed. The curly braces is a format option used by the WriteLine function and the 0 is the placeholder for the first parameter.  The colon following is used to specify the formatting. The small x is for a hexadecimal output.

 

The open square [ brackets must be placed 12 spaces away, and hence the PadRight function is used to pad 12 spaces to the string. The entire line to be displayed is then finally stored in the string sfinal and then given to the WriteLine function to display it in one go. Then using the Format function the size of the data directory is emitted out but after having considered 21 spaces, to synchronize with the ildasm output. Thereafter, the name of the data directory is displayed. Now for some quirks. For some reason the last data directory is not displayed, the second last is the CLR header.

 

For this data directory, ildasm places 67 spaces before displaying it whereas for the others, after displaying them, 67 spaces are inserted till the end of the line. For this purpose, an if statement that checks the name of the data directory is introduced which decides on the spaces that are to be padded to the string before writing it out. To verify every byte displayed is similar to the output displayed by ildasm, we had to cater to ever space seen also. Thus we had no choice but to spend lots of time getting the spaces right. Now that the first program is over, the output can be compared with that of the disassembler and to check that it matches it to a T. 

 

Even though the .Net documentation very clearly specifies that the MS_DOS stub should be exactly 128 bytes large, not all .Net compilers follow the documentation. This documentation also specifies the values that most fields must have.

 

In the standard PE header the Machine field must always be 0x14c. The Date Time field is the number of seconds since 1st Jan 1970 i.e 00:00:00 and the Pointer to Symbol table and number of symbols must always be 0. The final field Characteristics has the following bits 0x2, 0x4, 0x8, 0x100 set and the rest 0. The bit 0x2000 is set for a dll and cleared for an exe file.

 

The PE standard header fields are now set as follows. The Magic number is 0x10b. The Major and Minor version numbers are 6 and 0. The Code and Data sizes have the same meanings as explained earlier. The RVA must point to bytes 0xff 0x25 followed by a RVA of 0x4000000 or 0 for a DLL. The section that it falls in must have the attributes execute and read. The Base of Code is 0x00400000 and 0 for a DLL and the base of Data is the data section.

 

Every exe file has a starting memory location that contains the first executable instruction which is called the entry point. Windows 98 for example does not understand a native .Net executable and hence it is called a non-CLI platform. The words CLI will be repeated a trillion times and its full form is Common Language Infrastructure.

 

For an exe file, the first function to be called is CorExeMain and for a dll it is _CorDllMain, the code of which resides in the library mscoree.dll. It is this function that understands a .Net executable, thus we believe that in future this function will reside in the operating system. It is this function that understands concepts like IL and metadata which we will explain in course of time.

 

The Windows-specific fields have the following values. The image base as mentioned earlier is 0x400000, the section and file alignment are 0x2000 and 0x200 respectively. The OS Major version is 4 and Minor version is 0. The User Major and Minor versions are 0. The Sub-System Major version is 4 and Minor version 0. The Reserved field is always 0. The Image Size is a size in bytes of all headers plus padding and it has to be a multiple of Section Alignment.  The Header Size is the size of three headers, DOS, PE header and  optional PE header. This also includes padding and must be a multiple of the File Alignment value. The Checksum and DLL flags must be zero and the Subsystem can take a value of 2 or 3 only. The Stack reserve has a value of 1Mb and stack commit is 4K. The heap Reserve and Commit have the same values also. The Loader flags are 0 and the Number of Data Directories are 16.

 

Most of the data directories have an RVA value but with a size of 0. These are the Import, Resource, Exception Certificate, Debug, Copyright , Global Ptr, TLS Table, Load Config, Bound Import, Delay Import table and the last that is reserved. The four directories that may have some size are the Import, Base Relocation , IAT and finally the CLI Header.

 

The section headers immediately follow the optional headers since there is no entry in the PE headers that point to the section headers. The name of the section is what the section headers start with and it is 8 bytes large. Therefore there is no terminating null when the length of the section name is 8 characters. Normally section names start with a dot, for example, the section containing code is called .text and that containing data is called .data. The second field is called Virtual Size and it is a multiple of the section alignment. The field stores the size of the section when the section is loaded in memory. The fourth field is the SizeOfRawData. If this field is greater than the fourth, the section is zero padded.

 

The third field VirtualAddress is an RVA and thus relative to the image base. It determines where the section is loaded in memory. The size of Raw Data is the fourth field and it is the size of the initialized data on disk, thus a multiple of the file alignment. As this field is rounded to the file alignment and not section alignment like the virtual size, it cannot be greater than the Virtual Size field. If the section contains only initialized data then the value stored in this field is 0. The PointerToRawData field is a RVA to the first page within the PE file and thus is a multiple of File Alignment. 

 

The next field is the Pointer to Relocations that is the rva of the relocation section or .reloc. The Pointer to Line Numbers that follows is zero and the Number of Relocations is the actual count number of relocations. The second last field is the Number of Line numbers that is obviously zero. Finally there is the characteristics that determines one of six possible attributes of the sections. These attributes decide whether the section carries executable code, initialized data, uninitialized data, is executable or read or write.

 

To stress test our disassembler, we have looked at other languages al