A Pipeline for Scalable Analysis Capability
An area where we spend quite some effort here at Lastline is scaling up our malware analysis capabilities, that is our ability to analyze (potentially) malicious artifacts, such as binaries, documents, and web pages. This is a very important area that affects not only our internal/backend operations, but also the data that our users see on their network (and the quality of this data).
We all know the challenges: we want to achieve great accuracy and performance, while keeping cost down. Accuracy is concerned with having good detection rates (classifying correctly the artifacts we analyze). Performance is critical both in terms of throughput (number of artifacts we inspect overall) and of latency (time one has to wait to know whether a specific artifact is malicious or benign). The third factor is basic economics: analyzing any given sample costs some money, for example, to cover the CPU, disk, and network usage that are required to perform an analysis, and the research costs associated with developing, improving, and maintaining a given analysis system.
Finding a solution to this problem requires one to explore a big search space (lots of different options at all corners), or, more likely, multiple search spaces, as one may try to individually optimize the detection capabilities for binaries, documents, mobile apps, web pages, etc., all of which may require or benefit from specialized techniques or analysis engines.
As we were going through this sort of exercise for all the malware domains we care about, we found a process that allows us to effectively tackle these challenges and to scale up our analysis capabilities to levels we are comfortable with. In short, we break up the full analysis process, from the collection of samples to the evaluation of analysis results, into a series of steps and we develop components that work on each of these steps. The result is a processing pipeline, which we apply, with small variations, to each artifact type we deal with. In the rest of the post, we will give some more details about the pipeline we use specifically for processing web pages to detect drive-by-download attacks.
The first step in the analysis process is to collect artifacts to analyze. There exist many sources of artifacts: for example, one can collect and inspect the artifacts observed at customer locations (given appropriate permissions are in place) or use industry-wide feeds, both commercially produced or open source.
Of course, we like to also obtain artifacts on our own, to ensure that we have good coverage and an up-to-date view of current threats. In the domain of malicious web pages, the standard way of doing this is via crawling: one picks some reasonable seeds to get some initial web pages and then follows links extracted in visited web pages. This works but has the disadvantage of visiting a lot of benign web pages and only few malicious web pages. While this is expected (luckily, there are far more benign pages out there than malicious ones), we would like of course to increase the number of malicious web pages we discover.
We have improved on this situation by coming up with better ways to seed the crawling process. In particular, we have a number of “gadgets,” methods to search for web pages that are “similar” to pages that we found in the past to be malicious: the assumption here is that these pages are also more likely to be malicious than those that we find by random crawling. For all the nitty-gritty details, see this academic paper.
If the artifact collection works as expected, we end up with far more artifacts than we can reasonably analyze in depth (e.g., using a sandbox). This is not as bad as it sounds: the vast majority of artifacts that are collected will actually be found to be benign and can be safely discarded. In fact, we want to discard as many of the benign samples as early as possible in the processing pipeline: there’s no point in spending resources to analyze them.
The challenge here is of course that of identifying benign samples (e.g., benign web pages) quickly, without doing a full analysis. To address this, we introduced a number of filters that statically inspect web pages and determine, using various techniques (e.g., lightweight machine learning techniques), whether they are likely to be benign: these are discarded without further ado. Filters are designed to provide a response in a few milliseconds as opposed to the several tens of seconds that a regular analysis would take. Of course, it would be great if filters had perfect detection, but that’s of course rarely the case: in general, we strive to keep their false positives down (note that, in this phase, they result in no actual alert but only extra work from our in-depth analyzers) and to avoid false negatives as much as possible (they would lead to actual missed detection).
Similarly, filters can be applied to detect early on malicious artifacts that are similar to artifacts that were analyzed in the past and found to be malicious. These can also be discarded (or, more likely, their analysis can be prioritized at a lower level), since the results of their analysis can be pre-determined with high confidence. In these cases, filters typically rely on clustering techniques: incoming artifacts that are clustered sufficiently “close” to known malware samples (e.g., polymorphic variations of an existing sample) are de-prioritized.
The actual in-depth analysis is where we actually perform the full analysis of all artifacts that have arrived this stage of the pipeline. This is where we visit URLs, execute binaries, open documents and inspect the resulting behavior of the system. If you are interested in the details of one our analysis systems, we have discussed the design of our binary analysis environment here.
The last step in the pipeline consists of evaluating the results produced by the previous steps, and, in particular, to assess the presence of false positives and false negatives (e.g., resulting from evasion attempts). This step is important for both quality control purposes and to provide a feedback to improve the tools and techniques used in the earlier phases of the pipeline.
False positives are typically easier to detect: they result in all sorts of noise, e.g., spikes in the numbers of detections, etc. False negatives are more challenging: there’s nothing obviously wrong to see. To identify them, we use a number of techniques. For example, we selectively re-run the analysis of certain artifacts on systems that use different detection techniques (e.g., on a emulation-based sandbox and on a physical machine) with the idea that analysis errors or evasion techniques that worked on one will not trigger on the other system as well. We can also actively check for evasion attempts: for example, in Revolver we introduced a technique to automatically identify evasions in web pages by finding pairs of pages that are similar and that have been classified differently (one malicious and the other benign): the different classification outcome is in some cases attributable to successful evasion techniques, which, once identified, we can handle and bypass, by fixing the appropriate component in the pipeline.
Is there a role in this pipeline for manual analysis? There is, and we are lucky to have a terrific team of reversers and malware analysts: they get to review the hard cases (evasive malware that uses techniques that we cannot isolate automatically) and to propose the fixes needed to allow us to handle such cases automatically.
Looking for protection against advanced malware?
Lastline Enterprise provides premier malware protection for web, e-mail, content, and mobile applications, deployed anywhere on your physical or virtual network. Learn more here: