Image Recognition to Spot Elusive Phishing Attacks

Image Recognition to Spot Elusive Phishing Attacks

Image recognition

Phishing has been a profitable attack vector for criminals for more than 20 years. The reasons for its success are quite diverse. Throughout this blog, we will provide an overview of different types of phishing, the techniques that are used in phishing attempts and how to detect them using Lastline’s recent innovation in image recognition.

What is Phishing?

Phishing is a social engineering technique used by criminals to trick users into providing their login credentials or other sensitive information (e.g., credit card data and social security number).

The attack typically starts with an email sent to lure the user to visit a phishing website. The content of the email is trying to make a legitimate impression so that the user will click on a link to a phishing website. Similarly, the website is trying to imitate the look and feel of a real website to increase the chances that the victim will enter sensitive information there.

Phishing is not limited to emails. It can also be achieved by contacting the victim through text messages, phone calls or social media.

In this blog, I will focus mostly on the last part of the email-based attack pipeline, phishing URLs and websites and not on the email itself.

Types of Phishing

Phishing attacks are categorized into different types depending on who the victim and who the target are. Most commonly, phishing attacks are divided into the following categories:

  • Bulk Phishing: This is the simplest and most widely used phishing strategy. The attacker sends minimally personalized emails to as many users as possible to increase the chances of success. Targets are typically services that are popular among Internet users, such as online banking, online retailers or cloud solutions.
  • Spear Phishing: This type of attack is typically a little more sophisticated than bulk phishing. The attacker gives emails a more personal touch such as  using real names of victims, potentially increasing the success rate of an attempt.
  • Whaling: This attack is specifically designed to lure high-profile victims into entering their sensitive information. The attacker does a significant amount of work to craft an email that looks legitimate and sometimes even registers certain domain names that look similar to the one of the target.

Phishing URLs

It is not possible to use just a URL to accurately detect phishing. Although there are certain properties that are quite prevalent among phishing URLs, they can not be used in isolation for detection because too many benign websites would be identified as phishing websites. 

One common technique among phishing actors is to register domains or to create subdomains that look similar to a target domain. If the user does not take a close look at the URL in the email before clicking on it this might trick the victim into considering the URL as trustworthy.

Phishing URLs and websites are often generated through so called phishing kits. These kits allow an attacker to easily create phishing campaigns, but it is possible to fingerprint them by looking at certain parts of the URL. There are kits that use vulnerabilities in widely used content management systems to inject phishing websites. These phishing pages often end up in paths on the server that are usually not used to serve content, e.g., private administration directories and image directories. Additionally, the file name of the phishing website can be matched to campaign-related patterns.

The following URL is an example of a URL pointing to an injected phishing website on a server running the WordPress Content Management System:

http://example.com/wordpress/wp-admin/includes/onlinebanking/account/validation/chase.com/home/jc0nta=/

The wp-admin directory usually points to the administration interface of a WordPress page. Having a URL like this in an email should of course raise concerns if you are not managing a WordPress managed website. Also, the rest of the path looks suspicious as it includes onlinebanking and chase.com. It is no surprise that this URL points to a phishing page for Chase Bank:

Login example

Detecting Phishing Websites with Image Similarity

So why is it that companies still struggle to reliably detect phishing attacks even though they have been around for so long?

Compared to other threats, phishing attacks do not necessarily show any behavior that is malicious. The criminal can send a phishing mail from a legitimate, highly reputable mail server with a regular mail account. The content of the mail might not have any suspicious content, just some text that looks similar to a variety of benign mails that users see in their mailboxes. The phishing URL inside the text can point to a domain with a high reputation and the website itself usually contains no malicious code. A phishing attack ultimately tries to exploit weaknesses in human behavior. Thus, technical solutions are having a hard time detecting this sort of attack. This is why it is important to train users and to raise awareness for phishing attacks.

But considering the technical solutions, many phishing detections just work by blacklisting URLs and trying to keep this blacklist up to date with data from public and private phishing feeds. This approach has several weaknesses.

First of all, it is only possible to detect phishing that is already known. New phishing URLs can only be detected after an update of the feeds even though the phishing pages may have not changed much in their behavior or visual appearance.

Many phishing campaigns make use of hijacked domains and servers to serve the phishing website that often get cleaned up after a few hours or days by the hosting provider or the owner of the host. This results in a very short lifetime of a phishing URL and, therefore, causes many entries of a blacklist to be out of date.

The just mentioned hijacked servers and domains are also the reason why it is not possible to blacklist by domain; only the full URL or patterns of the URL path will work. Otherwise, the blacklist would block legitimate websites that are hosted under the same domain.

Phishing URL blacklists are still valuable and provide a quick way to be protected from well- known campaigns. To better cope with the short lifetime of phishing URLs and to counterbalance the weaknesses of pure blacklists, Lastline has developed more sophisticated phishing detection.

Instead of trying to understand a phishing attack based on technical details in the source code of the website or its URL, we look at the phishing attack as the victim sees it. We analyze the visual appearance of a website and detect if a page tries to imitate a legitimate one.

paypal login

Three different phishing websites for PayPal. Although every single attempt looks different, Lastline’s phishing detection is able to detect the similarities between them and the legitimate target page.

Everyday, Lastline’s Network Detection and Response platform uses Network Traffic Analysis to analyze hundreds of thousands of URLs. During analysis, it collects screenshots of all websites, normalizes them and compresses in a way that we can still compare similar images to each other. This way, we are able to build a global knowledge base of what both phishing and legitimate websites look like.

Using the knowledge base, we cluster websites that appear to be similar and analyze the cluster reputation based on the reputation of each website in it. This way we can see if certain low-reputation domains are trying to imitate high-reputation ones. Similarity based clusters also increase the efficiency of phishing URL blacklists, since if a blacklisted URL appears in the same cluster with low-reputation, unknown URLs, we can tag the whole cluster as phishing.

With this image recognition technique, Lastline is able to detect phishing campaigns independently of a URL and can quickly adapt to new phishing campaigns to provide the best protection for our customers.

Tobias Jarmuzek

Tobias Jarmuzek

Tobias Jarmuzek is a software engineer for Lastline’s anti-malware group, focusing on the detection of web threats. Before his more than four years at Lastline, Tobias worked as a research assistant at the SecLab at the University of California, Santa Barbara. and as the chair of IT-Security at RWTH Aachen University where he graduated with a master’s degree.
Tobias Jarmuzek

Latest posts by Tobias Jarmuzek (see all)