Automatically Detecting Evasive Malware

Automatically Detecting Evasive Malware

Malware has always been in continuous evolution: Throughout the years we have seen simple viruses become polymorphic, autonomous self-replicating code connecting to a master host and becoming a botnet, and JavaScript being used to launch increasingly sophisticated attacks against browsers. This last attack vector has become increasingly popular, as drive-by-download exploits have become commoditized, and are routinely used to compromise hundreds of thousands of computers.

One of the major challenges in detecting malicious JavaScript is the dynamic nature of the language itself.  Data in JavaScript can be turned into code by calling the eval() function on a string. This string can be heavily obfuscated in order to prevent signature-based systems from detecting an exploit. Therefore, the only way to reliably detect the attack is to execute the JavaScript code and observe its behavior.

This is achieved using sandboxing technologies, often called honeyclients.  These tools load the web page to be analyzed, execute the associated JavaScript code, and observe the actions performed, looking for evidence of an attack in progress. These are effective tools in detecting web-based malware, but they are not perfect, and cyber-criminals are catching up fast. The bad guys took note of how these systems detect web-based attacks and are using sophisticated techniques to evade detection. The goal of these evasion attacks is to devise an exploit that works reliably when launched against a real victim but fails to expose its nefarious intent when executed in a sandbox or honeyclient. 

These highly-evasive attacks are often “evolutionary” with respect to initial exploits. This means that the evasive attacks are variations of attacks that were once successful and then started losing effectiveness because the honeyclients were detecting them. Therefore, the exploit writer startS to “tweak” the exploit, adding evasive feature until the JavaScript Is, once again, undetected by existing solutions. So what can be said about these “evolved” attacks?

Quite a bit, according to recent research in this field which, for the first time, provided techniques for the automated detection of evasive web-based malware. This research has been published in 2013 in the Proceedings of the USENIX Security Symposium, one of the top venues for the dissemination of highly innovative scientific results. The research work is titled: “Revolver: An Automated Approach to the Detection of Evasive Web-based Malware” and has been authored by our group, composed of researchers from the University of California in Santa Barbara and Lastline, Inc.

revolver-evasive-malware-diagram

A high-level overview of Revolver is presented in the Figure above. Revolver analyzes JavaScript code that has been executed by a honeyclient (that is, after it has been de-obfuscated) and extracts an abstract representation of the code structure (in tech-speak this is called an Abstract Syntax Tree, or AST). The ASTs (i.e., the code fragments) are marked as benign or malicious according to the current capability of the system (which is called an oracle). This means that some malicious code might be marked as benign even if it is in fact malicious, because it successfully evaded the system.

The various ASTs that are collected are then clustered, that is, they are grouped together according to their functionality. This first step reduces substantially the number of items to be analyzed. In the second steps, the ASTs within a cluster are compared to each other. If two code fragments are similar, but have a different classification (i.e., one is malicious and one is benign), then the difference between the two code fragments is computed and analyzed. If the code that has been added to a fragment caused an evolution from being considered malicious to being considered benign, then this is a case of evasive behavior, and the evasion technique is automatically identified.  The evasive code fragment can be then brought to the attention of a human analyst, so that the evasion can be mitigated. If, instead, the code that was added caused a benign code fragment to be considered malicious, this might represent an injection attack, in which cyber-criminals embed malicious functionality in popular benign JavaScript components such as jQuery, in order to confuse existing filters.

In either case, the Revolver system is able to leverage machine learning in order to identify cases in which malware evolution created variants that are not detected anymore or to identify injections in benign components. This is a very first step towards a new set of techniques that will focus on detecting evasive activity, in addition to openly malicious activity. It is a necessary new step in the fight against sophisticated malware, which is becoming more aware of sandboxes and other analysis systems.

The details of this research effort are available in the technical paper, which is available here:
http://www.lastline.com/papers/revolver.pdf

The system is available to malware analysts. Please contact revolver@lastline.com for further information.

The authors of the paper are:
Alexander Kapravelos, PhD Student at UCSB
Yan Shoshitaishvili, PhD Student at UCSB
Marco Cova, Head of Lastline Europe and Professor at University of Birmingham 
Christopher Kruegel, Chief Scientist at Lastline and Professor at UCSB
Giovanni Vigna, CTO at Lastline and Professor at UCSB

For further information about this research work, please contact me at vigna@lastline.com.

Giovanni Vigna

Giovanni Vigna

Giovanni Vigna is one of the founders and CTO of Lastline as well as a Professor in the Department of Computer Science at the University of California in Santa Barbara. His current research interests include malware analysis, web security, vulnerability assessment, and mobile phone security. He also edited a book on Security and Mobile Agents and authored one on Intrusion Correlation. He has been the Program Chair of the International Symposium on Recent Advances in Intrusion Detection (RAID 2003), of the ISOC Symposium on Network and Distributed Systems Security (NDSS 2009), and of the IEEE Symposium on Security and Privacy in 2011. He is known for organizing and running an inter-university Capture The Flag hacking contest, called iCTF, that every year involves dozens of institutions around the world. Giovanni Vigna received his M.S. with honors and Ph.D. from Politecnico di Milano, Italy, in 1994 and 1998, respectively. He is a member of IEEE and ACM.
Giovanni Vigna