The Java virus came as a confirmation that all my craft is still well and functional. This entry in the series is a bit more technical than the introduction. You will need some proficiency in understanding the Java bytecode and the class file format. The class file format is available from Sun Microsystems as an online book, VM Spec The class File Format. It proved helpful to me throughout the process.
To write my computer virus, I had to tackle a few major issues:
First, the virus would need to retrieve a copy of itself if it intended to achieve infection. However, since direct access to the memory of the virtual machine is prohibited, I would not be able to obtain the code of the virus from memory. Moreover, there was no specification for the encoding of the viral data in memory, as this detail is left to each implementer of the virtual machine. Direct memory access was therefore out of the question. To reach the goal, I would need to resort to Class.getResourceAsStream and ClassLoader.getResourceAsStream, available since JDK 1.1, and be thankful that they can load class files.
A second issue to address would be the discovery of class files to infect. I could try to locate hard disks and recursively traverse their directory structure, or I could resort to the classpath. The classpath is a list of locations where classes should be loaded by the virtual machine. Quite a few classes are available in the classpath at all times, but the contents of the classpath itself is not directly available. Thankfully, since JDK 1.2, classes can be groupped into Java Archive files, which, most of the time, contain a manifest file with a well known name (path), /META-INF/MANIFEST.MF. This fact provided a powerful discovery tool, through finding the URLs of all manifests which in turn provides information about most of the classpath entries. Once the classpath entries are available, it is easy to locate and infect classes within JAR files or filesystem directories, which provides for a fairly large population.
A third issue to address is the need to infect a class file. This would constitute the largest problem and would cause one tool to be written. In order to infect, the virus had to place its code inside the class and ensure its execution, without disturbing access to various parts of the class files. In Java, classes are code and data repositories that can be accessed by other classes at a fairly low granularity. Contrast this with DOS EXE files or ELF executables aimed at executing code, not sharing code. (Things have changed a bit with embedded resources in Windows Portable Executables and not all ELF were agnostic of resource sharing but still, in Java this feature is around the place.)
First, the problems entailed by the copying of the collection of constants needed by the virus. The constants are accessed by index and I would not be able to use the same indices for my constants as I copied them from host to host. It became clear I would need to relocate the virus code such that it is able to access its constants wherever they end up in the host’s constant pool. I would need to locate all references into the constant pool and change them for each infection. The references are most of the time two byte indexes, with a small exception.
The small exception would prove a show stopper for the first initiative as it implied that a major task would need to be undertaken before we could proceed. The problem is that some constant pool indexes can be represented in one byte and most references from within the code to these indexes could be made by a special instruction. This had a profound implication, because relocation should be able to move the constant at index zero to index 1000. The problem created by the special bytecode instruction which uses a one-byte index is that it makes impossible to write 1000 instead of zero due to the use of only one byte. 1000 just doesn’t fit in one byte.
Those familiar with the bytecode instructions will realize I’m referring to the two forms of the load constant instruction LDC and LDC_W. To refer to a constant in the pool, you can use LDC_W and supply the two byte index of a constant, but if the index is below 256, you can use the shorter LDC instead, which is followed by a single byte index. The two are functionally equivalent, and you can write a perfectly valid class file which uses only the wide version, but the compiler will use LDC as often as possible, in order to reduce the size of the file.
To transform these one-byte-index instructions into two-byte-index instructions would be a terrible load for the virus, because it would introduce the need to relocate references within the code of a method. Not a good idea. Instead, I decided to guarantee that the virus contains only the wide version, LDC_W, of the load constant instruction. I would need a class file reverse engineering tool that reads the file and replaces all occurrences of LDC with LDC_W. I had to, so I wrote the tool. Apache BCEL may have done the trick, albeit it would be much harder to control use of LDC_W. Behind the scenes, BCEL will automatically attempt to employ the same optimization, so you would either have to use a modified version of it, or prepend dummy constants to your class file so that all useful constants get pushed above the 255 threshold. I decided to do without any of these complications. My tool, though simpler, serves its purpose. (and more, as you’ll see) I will make the tool available on my site and will post a link to it soon.
Once the tool was written, I could easily compile the virus, load the virus class file and employ the tool to convert all constant references to the wide form, run the modified virus class which would not need to worry about fitting relocated values.
In order to relocate, I would need intimate knowledge of the structure of the code, I would obviously need to be able to read entire methods and therefore I would require knowledge of instruction lengths and instructions which refer to the constant pool. I would also need to process exception tables, as they too contained references into the constant pool. And lastly, I would need to relocate the constant pool itself, because some constants refer to other constants within the constant pool. (Take CONSTANT_Fieldref_info and its two cousins, for example) Not too easy, but not too difficult either.
One special aspect of relocation is the reference to the “this class” constant. This reference should not be relocated, but replaced upfront with references to the “this class” constant of the host. This complication is necessary to maintain semantics when the virus runs in the context of a new host. It is the new host’s class that the virus is part of, hence whatever the virus originally had to do with its class, it will have to do with its new class.
Once relocation is implemented, infection becomes a bit clearer:
- Read the host constant pool
- Read the host fields
- Read the host methods
- Write the host constant pool
- Write the viral constant pool, relocating constant pool references
- Write the host fields
- Write the viral fields, relocating constant pool references
- Write the host methods
- Write the viral methods, relocating constant pool references
The structure of the class file is such that the host is still functional after applying these changes.
I will continue with the rundown of the problems faced and the solutions I implemented in the next article of this series.
Trackbacks & Pingbacks 1
[...] explained in my previous post on the subject of my Java class file parasitic infector, I had to tackle three important problems: [...]
Post a Comment
You must be logged in to post a comment.