Friday, February 21, 2014

Analyzing Malicious Javascript

Once you have downloaded the malicious web pages using wget or curl tool , you can analyze those pages for malicious javascript.

The main purpose of this writing is to understand, analyze and deobfuscate the obfuscated malicious javascript.

Fig. 1

 As shown in the Fig.1 , the malicious script may include a set of deobfuscating subroutines, implemented and visible as plain Javascript. The original scripts are concealed, either obfuscated or encrypted, in one or more variables as data. Once the deobfusccating subroutines decode the original scripts, they pass control to them. This is typically accomplished by invoking Javascript's methods eval()  or document.write() or document.appendChild().

From attacker's perspective, the more harder the analysis of decoder script the more effective is the obfuscation technique. Attacker make the analysis harder by using the following techniques:

  1. Use the DOM elements such as document.write, document.getElementById, etc along with argument.callee function to make rhino-debugger useless.
  2. Use the eval function to evaluate both the decryption key as well as the malicious scripts at the same time.
  3. Use the browser attribute as a key . For example : key = navigator.userAgent.toLowerCase()
  4. Use the location as a key. For example : decryption_key=document.location.href
  5. By using browser properties to genearte the decryption key such as                                                xyz=document.lastmodified; sdfxdf=new Date(xyz).toString() ;fsdfdedd=sdfxdf.split(--) ;key=fsdfdedd[2]
  6. Use the encoding schemes inside the decoder script . Example: A can be encoded as follows A=\65=\x41=%41=%u0041=\u0041
  7. Exploiting the browser's parsing methods . For example 8-bit ASCII encoding. The 8-bit encoded decoder cannot be read by human analyst but browser will properly decode and execute the file.
  8. Again there are many more ways to make analysis of the decoder script very hard.


Important point to remember:
 We need to focus and understand on how decoder passes control to decoded script. Three such methods are eval() , document.write(), document.appendChild() method present in decoder function. The parameters of the eval() and document.write() method contains the decoded data, which are executable by browser. Similar to unpacking the packed executable using various debuggers such as OllyDbg , we can use rhino-debugger to unpack the malicious javascript and read the contents of the variable by setting a breakpoint on eval() function or on document.write() method .

But there are limitations of rhino-debugger : you can not set a breakpoint in between the code. Breakpoint can be set on line basis. Although you may introduce lines in between decoding javascript but sometimes it is not possible as decoder is using argument.callee() function or checksum as an anti-analysis technique.


If argument.callee() function is present inside the decoder function and the decoder function doesn't contain the \n character , you can analyze the obfuscated javascript code using spidermonkey or rhino.
Following are the steps to decode using spidermonkey or rhino.
1. Define eval=print and document.write function in a separate file
2. Use that file along with the malicious javascript in spidermonkey or rhino.

1. You can define eval=print and document.write as follows:
eval=function(input_string){print(input_string);};
document={write:print};
Save the above lines in a file with any name example : rule_set.js

2. $ rhino -f rule_set.js -f malicious_javascript.js > out.txt

Note:
You can simply define eval=print;  in the same malicious javascript file but why I am asking to define the eval=print function in a separate file is because if javascript decoder contains the argument.callee() function then adding the above eval=print; function will change the length of the file and hence decoder will not decode the malicious script properly.

Important point to remember:
There are various anti analysis techniques can be used to make this analysis useless.
 1. Attacker can use DOM objects in script decoder to make analysis difficult as rhino or spidermonkey tools are javascript interpreter , they don't understand DOM objects.  
For example attacker can use document.getElementById() function inside the decoder function to make analysis more difficult as rhino or spidermonkey will throw an error message when they encounter the DOM objects. You need to remove the DOM objects and replace them with their contents or define it in rule_set.js file as mentioned above to decode the malicious javascript.

2. Attacker can use document.location.href / navigator.userAgent etc...values to decode the string. In document.location.href case you have to find out the exact value of the document.location.href and define it in a rule_set.js file as follows:
window={
location:{href:"http://www.mysite.com/page"}
};

3. Attacker can use eval function in script decoder to evaluate key for decoding. Hence, if we define eval=print then key would not be evaluated . Hence we need to go through the decoding function to properly decode the malicious script.

4. Attacker can use various encoding schemes in script decoder to make analysis more difficult.

5. There are various more techniques which I will discuss in next part.

Note:
We have more tools such as Jsunpack-n tool which makes our analysis easier. 

No comments:

Post a Comment