In a previous article, I started telling you about viruses and high level programming languages. The concept behind the first high level programming language computer virus I wrote is so simple that, like they used to tell us in school about many graph properties, you almost feel you could have discovered it. But before talking about it, a note to our impatient readers: if you are looking for source code for free, and not trying to understand the concept, then you are missing a great deal. A smart, intelligent person will always dare themselves to solve problems rather than take solutions for granted. You will find that the pointers in this article are sufficient for you to implement your own virus, albeit with a bit of work. But even those readers who don’t plan to write a virus per se might want to read ahead, simply to be able to conclude that they know, after all, how to write a virus.
All parasitic infectors revolve around locating the viral code and applying it to more and more programs. We need the code of the virus to realize infection. Ideally, one will retrieve the code from memory. As an alternative, one can access the file that started the program, which also contains all the code. For the Pascal virus, I chose the latter, and accessing the code of the virus involves accessing a command line argument, which supplies the path of the file that started the current program. Unless the path has been tampered with, and this was possible in MS-DOS, and because the virus is running, it follows that the currently executing program is infected, therefore its starting file contains the code of the virus. We can then read the virus in memory very easily, by reading the first N bytes of the file, N being known beforehand or easy to calculate. How and why this works will soon become clear.
Now let’s go about locating hosts. This is standard filesystem access code. Find directories, find files within directories, recursively traverse directories. It should be easy enough for a high school level student
For each host, the virus should check for previous infections. Markers provide a mechanism for achieving this objective. A marker is like saying I have a tattoo on my leg which reads “I love you, Josephine” and I look into the host for a copy of my leg. If I see “I love you, Josephine” written someplace, I may have found a copy my leg. Basically, during infection, the virus “signs” its victim in a manner that does not alter the executable, so that it can find it next time it tries to infect the file. Of course, the marker method is not 100% bullet proof, however, it is generally considered better to miss a few files because of a false marker, than to keep infecting the same files indefinitely.
Some antivirus techniques consist of attacking the marker mechanism by injecting fake markers into programs. if and when a virus that uses the marker executes in the computer, perhaps through an infected external disk, it will not infect executables on the computer. The technique is better known as vaccination, but is brittle and not currently used.
Instead of markers, I devised a rather generic way of finding with fair accuracy whether a host is infected by a virus. The virus checks whether the host is constructed in a way consistent with the infection process. This insulated the development process from issues involving multiple variations of the virus competing to infect. The virus was smart enough to identify the presence of the virus or any variation thereof, because the infection method was identical across variations. And during development, variations were in abundance.
The infection step itself builds on a very simple fact: You may append any kind of data to an executable file, without corrupting it. It’ll work. I can write “Hail Mary!” or “Enter Sandman” or “Mike was here” — it’ll still work. That’s just the way it is. [The hand edit will not work if the editor is not binary aware, and corrupts your file when saving, which was the case with Turbo Pascal’s editor.] I believe I realized this simple fact due to clear understanding of the loading mechanism of the MS-DOS executable by the operating system. It’s all in the specifications!
This simple fact is the basis of the infection. If you’re tempted to think ahead and figure we’re going to append the virus and then patch the EXE header, think again. There’s a cunning change of perspective that will lead us to a much simpler infection procedure. The virus can write itself in the place of the host and become the executable. Then the host itself can be appended to the virus! As the implementation proved, it worked like a charm. There were small complications, given that the virus had to be smart enough to infect programs of various sizes, but nothing out of the ordinary.
First, writing a large host all over again could be costly. Instead of appending the host to the virus, we would do the following, less expensive, albeit more complicated, operation.
- Read the first N bytes from the host, N being the length of the virus.
- Write the bytes at the end of the host, appending to file.
- Write the virus at the beginning, overwriting the first N bytes of the host, which are now present at the end of the file anyway.
Second, we had to be able to manage hosts that are smaller than the virus for which the append step needs to be handled with some care. Basically, if the host is shorter than N bytes, we would read the entire host, then write it starting at the Nth byte in the file. This ensures that the virus does not overwrite any host data when writing a copy of itself.
All the babble about appending and writing was meant to help execution of the host, by keeping all information about the host intact. Had we overwritten part of the host without first copying it someplace, we would have lost information about the host and therefore we would have been unable to execute it. That would be terrible, for obvious reasons. Most importantly, it would alert the user about the presence of the virus, as the host is no longer available! “Ahem, all of a sudden norton commander stopped working. I smell a rat.” If they’re lucky, they’d reinstall the system and continue. In the unlucky scenario, the user will actually lose the program and perhaps lose important data. This approach would not allow the virus to spread quietly.
So just how do we do it? Looking back at the infect step, notice how the complete host information existed in the infected file, albeit turned and twisted. And it did not matter, as long as that data at the end of the virus was accessible. I could have encrypted it, or scrambled it even further, and it did not matter. What really mattered is that there was a way to reconstruct the original host and execute it. For this step, the virus reasoned as follows:
Because I (the virus) am currently running, it must be the case that the user has asked for an infected program to execute. Now that I have quickly infected some more files on this system, I will find the file containing this program — and this is simple, as the file is the same file we found in the Locate step — and I know for a fact that this file has been infected as outlined in the Infect step. I am going to read the last N bytes of this file (where N is the length of the virus) and write them to a temporary file. I will then read all the bytes after the virus but before these last N bytes and append them to the temporary file. After these steps, the temporary file contains the original, non-infected version of the host. All that remains now is to execute the program contained in this file using the standard system function “execute”!
Again, there is a small variation when the host is smaller than N bytes. As a bonus, you get a clean host by the time the host executes. If the host has built-in self checks that validate the integrity of the executable file, then the checks will fail to find the virus!
As a small variation, I could de-infect the host instead of creating a potentially expensive copy, which would have the added benefit of having the host run from its original file.
Voila, that was all. Nice and simple. Later on, I applied the idea for a UNIX ELF infector, as the same simple observations apply for these executable files. They are fairly universal, (AFAIK with class loader permissions you could even hijack JAR files this way). I even applied the idea without a virus to hijack the password changing program of the high school’s operating system. The hijacker program would run the original program which had been appended to it but would give me a shell if I provided specific command-line arguments.
I later presented the ELF infector written in C to a bug tracking list, called BUGTRAQ. The list moderator rejected the message, stating that “the list does not deal with viruses”, only to see a different person, months later, posting a virus based on the same concept! I strongly believe that my virus idea was being used, possibly by a friend of the moderators. Last but not least, the net was subsequently invaded by Pascal viruses employing variations of the same infection methodology. Happily, nowadays the internet is powerful enough that plagiarism is more difficult to achieve.
Post a Comment