Article written by David Sánchez Lavado
This post explains how to analyze the malicious code used in current Exploit Kits.
There are many ways to analyze this type of code, and you can find tools that do most of the job automatically. However, as researchers who like to understand how things work, we are going to analyze it with no other tools than a text editor and a Web browser.
IMPORTANT: I recommend that you perform this type of analysis on a virtual machine on its own isolated network in a laboratory dedicated exclusively to this type of research to avoid unwanted infection.
The final code is normally divided into two parts. The first one aims at detecting the Web browser version and the plug-ins installed on the victim’s computer (like Adobe Reader, Apple Quicktime or the Java virtual machine). The second part selects the vulnerability to exploit according to the information gathered in the first part.
The image below is a screenshot of the malicious code to be analyzed in this article.
As you can see, the code is made up of several HTML objects. However, if you look closer you can actually identify different things in these objects: First: The value of the id attribute for each of these objects has the format “<number>+CytobimusubekUda”, where “<number>” is a number from 0 to 1230 in consecutive order. Second: The value of each object is an apparently meaningless string of characters of approximately the same length, and the word Add repeated several times inside it.
All this seems to indicate that the id attribute is used as an index (look at the consecutive numbers) in a cycle to parse all HTML objects and deobfuscate their contents to create a new code layer. Let’s start analyzing the code.
FORMATTING THE CODE
You could also do this manually, line by line, but you risk making a mistake and it will take you too long. For example, the malicious code that we will analyze here contains almost 600 lines of script code and HTML code.
Malzilla is an excellent utility to analyze malicious code automatically. However, in this article we intend to analyze this malware strain manually.
- String search-and-replace: This will help you avoid mistakes when replacing the names of functions and variables.
- Windows Tabs: This is optional. Tabs will let you work very quickly when analyzing the code of various files.
FINDING THE ‘START’ FUNCTION
The first steps to take with every function are the following:
- Simplify the code to analyze
- Rename the functions and variables for the code to be easier to understand.
The screenshot below shows that code between lines 81 and 89 (both included). You can also see that the HazakeduhaQurenepenus() function (85) is the first one to run (the previous three don’t perform any important actions). Therefore, this is the first function that you must analyze.
SIMPLIFYING THE CODE AND MAKING IT EASIER TO UNDERSTAND
VERY IMPORTANT: When modifying the code, don’t change the final result that would be returned by the original code.
As previously said, start with the HazakeduhaQurenepenus() function. This function looks like this:
In the code above, the factor to resolve is the PypiwIgo() function that has the following code:
You may have noticed that I have changed the name of several variables and of the analyzed function itself to func_decrypt_01. This may seem a little bit bold, but after having analyzed many functions like this you become capable of recognizing certain code structures at a glance.
Your next objective is to resolve the value to be returned by the function in the buffer variable. To do that, you must separate the function from the original code and run it independently. Prior to that, you must make sure that the function to analyze will not need any external values or any other piece of data calculated by any other function of the assigned code in any global variable. Otherwise, you will have to first calculate that value and then replace it in the code to isolate. This is very important as otherwise you will probably not be able to run the code separately: the Web browser will show an error when loading the page and it will not be possible to run the code or it simply won’t behave in the same way as if it had been run with the entire malicious code.
Let’s see this with an example in the code we are analyzing. The following instruction refers to an external value in the DasuRokyduconiwidy HTML object.
string_01 = document.getElementById(“DasuRokyduconiwidy”).innerHTML;
The resulting value is assigned to the string_01 variable. Since this variable is used inside the code, you must resolve its value. Otherwise, if the variable was only used to confuse the user, you could eliminate it from the code.
The first thing you need to do is find the DasuRokyduconiwidy object. Once you find it, assign its value to the string_01 variable in the script code that you have created, and replace the return buffer instruction with a TEXTAREA object that will show the content of the buffer variable once the new code is run in the Web browser.
The screenshot below shows the simplified code and how the “return buffer” instruction has been replaced with a textarea object created at runtime.
Once you have the code, open it with the Web browser to see the function result.
Now that you have “resolved” the different parts of the function code, make the code more readable. This involves resolving the names of all the functions in brackets from inside out.
|Original function||Resolved function|
|[func_substr()](1736/56,585/65)||[func_substr()][31,9] → .substr(31,9)|
The result is the following code:
As you resolve more and more functions you will be able to discover the actions to be taken by the rest of them simply by taking a glance at their code. This is because you’ll have already resolved many unknown values. This will help you analyze other functions more quickly and eliminate obfuscation layers more easily.
Finally, let’s analyze the MivoJaqugutec() function:
At first glance, the first thing that you can identify in the code is a cycle that runs through all of the HTML objects, storing their values and concatenating them in the PofUhicehofudilysuwe variable returned by the function once the cycle ends. Well, with everything you have learnt so far you probably know what to do. Separate the function from the original code, resolve the unknown values and rename its variables for the code to be easier to understand. Your objective should be to determine the value of the PofUhicehofudilysuwe variable in the return instruction.
Once you run the code on the Web browser you’ll get the following result:
These 2 functions normally indicate the execution of a new obfuscation layer. Have you reached your final objective yet? Is this the final layer responsible for triggering the vulnerability? Well, let’s see what it contains.
ACCESSING THE FINAL CODE
The last 2 lines of code include the payload variable, which refers to an encoded, 55,496-character-long unicode string. After running its content with the eval( unescape(payload) ) instruction you’ll get to the last layer in the malicious code.
In this last part of the article we will only analyze the generic parts often found in malicious codes.
The following two screenshots show a series of instructions that are often used both in legitimate and malicious code, although with very different purposes. Whereas they are used in legitimate code for design purposes, in malicious code they are used to obtain information about the victim’s environment and exploit the most appropriate vulnerability.
As you can see in the two screenshots above, the programmer has used the userAgent method of the navigator object to identify the Web browser used by the victim. In the case of Internet Explorer they check to see if the version is lower than 6.
They also try to identify if there are any plug-ins installed on the browser.
In this code the programmer has decided to create an object identified by the CLSID CA8A9780-280D-11CF-A24D-444553540000 in the Pdf1 variable. Although the name of the variable gives a hint as to what object the programmer wants to create, let’s make sure. Use the regedit.exe tool to find the CLSID key in the Windows registry.
Our suppositions were true: The CLSID key refers to the Adobe Acrobat/Reader ActiveX control. The programmer has created this object to find out if the victim has Adobe Acrobat or Adobe Reader installed (and what version they are using), and select the malicious PDF file that can exploit one of the vulnerabilities in the detected version.
They use the GetVersions() method to find out the version of the Adobe program installed on the victim’s computer, as seen in the first instruction in the code below:
The last part of the code is used to select the most appropriate PDF file to exploit the vulnerability. If the value of the lv variable is greater than or equal to 800 (which possibly identifies version 8), the code will call the fghjdfgxbz function passing the string “d0456d.pdf” as a parameter. Otherwise, it will pass the “07dd5d.pdf” string as a parameter. The fghjdfgxbz function simply creates an IFRAME object at runtime that points to the value passed as the parameter. As a result, the Web browser will open a malicious PDF file designed to exploit an unpatched security vulnerability.