Home  |  Client Login

Exclusive Anti-Malware Technology

The Technology: Hunt, Crack, Track, Block

  • Hunt: Discover the malicious infrastructure components within the Internet (the Malscape™)
  • Crack: Analyze millions of web pages and binaries daily
  • Track: Automatically update how the Malscape evolves
  • Block: Protect your network

Lastline's innovative technology delivers a more extensive and accurate list of malicious sites through high-definition analysis of malware (both binary and web-based) and significantly broader Internet coverage than available from any other source. That means fewer false positives and fewer threats overlooked.

There are three major components in our cloud backend infrastructure that we leverage to produce high quality threat intelligence: LLWEB, LLAMA, and LLCHECK. An overview of these three components can be seen in Figure 2.

In a nutshell, LLWEB is responsible for finding exploits and drive-by download sites. LLAMA analyzes malware to detect the command and control (C&C) locations to which the malware connects. LLCHECK processes the output from LLAMA and LLWEB and ensures that the locations found by both tools are indeed malicious. This requires checks to determine whether threats are still active and checks to remove false positives.

LLWEB – Detecting Exploit and Drive-By Download Sites

LLWEB crawls the Internet and identifies new threats as they emerge. To this end, LLWEB visits web pages with an instrumented, simulated browser and records events that occur during the interpretation of HTML elements as well as the execution of JavaScript code, Flash components, and other embedded objects, such as Java applets. The events are extracted and then analyzed using anomaly detection techniques that identify malicious behavior. Anomaly detection techniques – in contrast to signature-based approaches – are able to detect novel, previously unseen attacks (called 0-day, or zero-day attacks). LLWEB’s unique browser emulation technology exposes malicious behavior more effectively than competing technologies, and provides more comprehensive, detailed information about the actions performed by web-based malware.

Given finite resources, a key challenge for LLWEB is to find as many malicious sites as possible in the shortest amount of time. We address this problem using a combination of three approaches: First, we minimize the time that LLWEB requires for each page. This increases the total number of pages analyzed in a given time. We minimize the time to discard legitimate pages by using a fast, static pre-filter that employs nearly one hundred features for each page, the script code it contains, and its URL. The pre-filter leverages a classifier that is trained using machine learning to make efficient and precise decisions about the maliciousness of a web page.

The second approach is to improve the coverage of LLWEB by increasing the “toxicity” of the URL feed that is analyzed – increasing the “maliciousness” of the pool of pages to be analyzed pool. To do this, we use two primary techniques. First, we seed our web crawler with URLs that rank high in the results of search engine queries for popular keywords. Attackers have discovered the power of poisoning search engine results with malicious URLs (a process called blackhat search engine optimization - SEO). LLWEB continuously queries popular search engines for trending topics and targets these results.

The second technique relies on the fact that attackers frequently run large-scale campaigns in which they replicate what is essentially the same attack on many pages. When LLWEB finds a few malicious pages that appear similar, it uses this set of pages as a seed to look for other pages on the web. To this end, LLWEB leverages multiple search engines to query for pages that exhibit properties that have been found to be characteristic for a set of malicious seed pages.

The third approach to increase the coverage of LLWEB is to scale up the number of analysis engines in the cloud. Since analysis runs are independent, the process is easily parallelized.

LLAMA – Detecting Command And Control Sites

LLAMA is an advanced, automated malware analysis engine. Lastline receives malware samples from a number of different sources, including honeypots, spam feeds, user submissions, and LLWEB-detected drive-by download exploits. LLAMA analyzes the malware programs, understands their behavior on the host, and exposes the malicious command and control infrastructure to which they connect. The analysis engine uses processor-level emulation to perform fine-grained (instruction-level), dynamic analysis.

Most existing analysis environments only record system call level information that is too coarse-grained for accurate analysis. LLAMA’s high-resolution analysis extracts in-depth execution traces that provide a fidelity that exceeds the capability of competing approaches. LLAMA collects traces of individual processor instructions, memory accesses, and records information at the system call level. Additionally, the engine collects all network traffic that a program produces and performs dynamic data flow (or taint) analysis. Taint analysis tracks how a malware program uses pieces of information read from the file system or received from the network. From this, LLAMA can determine what data is sent over the network, even when the information is obfuscated or encrypted. Taint analysis provides powerful and detailed insights into the behavior of a malicious program not otherwise easily available.

Malware frequently connects to legitimate endpoints, for example, to determine network connectivity, to send spam, to launch denial-of-service attacks, or to commit click fraud. However, the fact that a malware program contacts to a server does not indicate that this server is malicious. Techniques that attempt to identify command and control sites simply by looking at the destination of network connections invariably produce many false positives. One of LLAMA’s strengths is its ability to determine which remote network endpoints are used for command and control traffic and which endpoints are legitimate based on its detailed analysis.

LLAMA uses a combination of three approaches to effectively identify command and control (C&C) sites. First, the system uses taint analysis to determine which information is sent to which endpoint. Malware typically sends information to C&C sites that uniquely identify the infected machine as well as sensitive data collected from the host. Using the results of taint analysis and machine learning, LLAMA builds behavioral models that automatically classify the nature of connections based on (a) the data that the malware program sends and (b) the actions the malware program takes based on the data received. For example, data received from C&C sites (commands) are typically used to trigger malicious actions.

The second approach to classify connections is based on content analysis of network traffic. In particular, LLAMA implements automated signature generation algorithms that examine the entire network traffic to find snippets of content that are characteristic for suspicious connections. These content snippets can then be applied to additional traffic, and matches are classified as C&C traffic.

The third approach uses reputation information about the domains and IPs that a malware program contacts. For example, when an unknown domain maps to an IP address that was previously found to host C&C sites, it is considered to be suspicious. Moreover, certain dynamic DNS and hosting domains are frequently abused, and hence, their subdomains are likely malicious as well.

LLCHECK – Ensuring Correctness of Detection Results

LLWEB and LLAMA analyze millions of web pages and binary samples every day. The threat intelligence produced by LLWEB and LLAMA is further processed and cross-correlated to identify the IP addresses and domain names associated with malware activity. LLCHECK analyzes these “bad neighborhoods” to remove potential false positives and stale entries, which are often commonplace in blacklists.

To remove stale entries, LLCHECK periodically reexamines all entries that have previously been found to be malicious. For these checks, LLCHECK pretends to be an infected host and replays previously seen connections to the malicious servers. To prevent attackers from fingerprinting these “liveness” checks, LLCHECK anonymizes the source of the connections.

To remove false positives, LLCHECK leverages reputation-based analysis. In particular, the engine ensures that popular sites that receive a lot of legitimate traffic are never blocked. Moreover, LLCHECK uses heuristics that leverage passive DNS information to ensure that “mixed” servers (that host both malicious and benign domains) are properly handled.

The resulting threat intelligence produced by LLCHECK is pushed to the network sensors deployed at our customers’ networks regularly throughout each day. The sensors analyze network traffic that is associated with the establishment of network connections (i.e., SYN packets and DNS requests) and prevent connections to the components of the malware infrastructure, creating a last line of defense against malware infections.